This is a continuation proposal of PEP 489 and later PEPs that aim to add the possibility of C-API modules and classes behaving more like Python equivalents. I would appreciate any feedback. Thanks.
Abstract
Superclass objects are currently not so easy to access depending on per-module state in multi-phase init extensions, which affects common subclass-checking such as Py*_Check(). They are unreachable in some cases.
This proposes a new Py_tp_token type slot ID for storing a key into heaptypes, so that a desired superclass can be found through the given type’s tp_mro or tp_bases slot.
Background
Superclasses are frequently referenced in slot methods (e.g. nb_add) for subclass checking. When implementing a multi-phase init module, a superclass heaptype object is supposed to be placed on a module state, which is not always good for the check:
-
Unsafe during the module finalization.
When a module is finalized, its state can be cleared by the GC before the associated objects and heaptypes are freed. Even if the module outlives the heaptypes, the types’ MRO and the module reference can also be cleared (gh-115874). Currently, there is no secure way to get a particular superclass at the final phase.
-
Possibly redundant, which can affect at least micro benchmarks.
module = PyType_GetModuleByDef(type, module_def); // 1st MRO walk
module_state = PyModule_GetState(module);
PyType_IsSubtype(type, module_state->My_Super_Type); // 2nd MRO walk
Proposal
The heap types will contain an additional pointer member as a kind of token, as long as the module author confirms:
- The pointer outlives the class.
- It is “owned” by the module where the class lives, so it can keep alive and won’t clash with other modules.
The new Py_tp_token type slot will be available to store it as below:
PyType_Slot foo_slots[] = {
{Py_tp_token, (pointer)},
...
};
PyType_Spec foo_spec = { ..., .slots = foo_slots};
PyType_Spec bar_spec = { ..., .slots = foo_slots};
-
{Py_tp_token, NULL}:
Equivalent to the absence of the slot.
-
{Py_tp_token, foo_slots}:
Token will be the pointer not to the foo_slots but to the assosiated type spec. The spec can be used at least to identify the memory layout of the given type.
-
{Py_tp_token, &pointee_in_the_module}:
The spec’s address can be specified explicitly, which will need a forward declaration. For another example, an extension modules that automatically wraps C++ classes could use the typeid operator for a token.
After the existing PyType_From*(spec, ...) function call, the token in the created type can be verified with PyType_GetSlot(type, Py_tp_token).
Specification
The PyHeapTypeObject struct will have a new member, the ht_token void pointer (NULL by default), which will not be inherited by subclasses.
The existing PyType_FromMetaclass(..., spec, ...) function will do the following, when the proposed type slot ID, Py_tp_token, is detected in spec->slots:
ht_token = spec if PyType_Slot.pfunc == spec->slots else PyType_Slot.pfunc
Helpers
No public function is planned to be added.
Another subclass check would be:
- Walk the subtype’s
tp_mro from the last, or walk the tp_bases recursively.
- Look for the token of the desired superclass.
Backwards Compatibility
One new pointer is added to the PyHeapTypeObject struct:
ht_token member, which will not be documented.
One type slot ID is added:
Py_tp_token to set PyHeapTypeObject.ht_token, which will be documented together with the usage of a public helper function for subclass checking.
Alternative
Another effective approach will be:
PyType_Slot foo_slots[] = {
{Py_tp_token, Py_USE_SPEC}, // Py_USE_SPEC is NULL
-
{Py_tp_token, NULL}:
Token will be the pointer to the assosiated type spec rather than NULL.
-
Absence of the slot:
Token will not be stored into a heap type.
Pervious discussions