Skip to content

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Oct 15, 2020

Prepare unicodedata to add a state per module: start with a global
"module" state, pass it to subfunctions which access &UCD_Type. This
change also prepares the conversion of the UCD_Type static type to a
heap type.

https://bugs.python.org/issue1635741

Prepare unicodedata to add a state per module: start with a global
"module" state, pass it to subfunctions which access &UCD_Type. This
change also prepares the conversion of the UCD_Type static type to a
heap type.
@vstinner
Copy link
Member Author

With this PR, I understood that my main concern comes from the PyCapsule API: unicodedata.ucnhash_CAPI.

_PyUnicode_DecodeUnicodeEscape() uses it like this:

ucnhash_CAPI->getcode(NULL, start, (int)namelen, &ch, 0)

No state is passed to _getcode() and so currently it can only access global variables.

I looked at how to create a C function which would be a wrapper to _getcode() which automatically pass a state. Problem: the only portable way is to use it with something like: func(closure, ...), you must pass a state to the wrapper as an argument. There is a non-portable way to really create a closure in C, but it requires libffi which sounds a heavy solution.

Hopefully we don't need to go that far. _PyUnicode_Name_CAPI is private and excluded from the limited C API. We can move it to the internal C API and introduce incompatible change, like require to pass a state. For example, we can add a state in the _PyUnicode_Name_CAPI structure and require the caller to pass it:

ucnhash_CAPI->getcode(ucnhash_CAPI->state, NULL, start, (int)namelen, &ch, 0)

@vstinner vstinner changed the title [WIP] bpo-1635741: Add a global module state to unicodedata bpo-1635741: Add a global module state to unicodedata Oct 15, 2020
@vstinner vstinner merged commit e6b8c52 into python:master Oct 15, 2020
@vstinner vstinner deleted the unicodedata_state branch October 15, 2020 14:22
xzy3 pushed a commit to xzy3/cpython that referenced this pull request Oct 18, 2020
Prepare unicodedata to add a state per module: start with a global
"module" state, pass it to subfunctions which access &UCD_Type. This
change also prepares the conversion of the UCD_Type static type to a
heap type.
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 13, 2021
Prepare unicodedata to add a state per module: start with a global
"module" state, pass it to subfunctions which access &UCD_Type. This
change also prepares the conversion of the UCD_Type static type to a
heap type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants