bpo-40137: Micro-optimize _PyType_GetModuleByDef()#25393
bpo-40137: Micro-optimize _PyType_GetModuleByDef()#25393vstinner wants to merge 1 commit intopython:masterfrom vstinner:type_getmodulebyref
Conversation
Honestly, the runtime check on PyModule_GetState() argument could replaced with an assertion. But I prefer to not change in as part of this PR. See https://bugs.python.org/issue37406 for my previous attempt: "Disable runtime checks in release mode". But _PyModule_GetState() remains useful to ensure that the code can be inlined, especially on macOS which doesn't use LTO: https://bugs.python.org/issue41181 |
Make _PyType_GetModuleByDef() 2.3 ns faster.
Benchmark: 43.2 ns +- 0.7 ns -> 40.9 ns +- 1.0 ns: 1.05x faster
./python -m pyperf timeit \
--duplicate=4096
-s "from functools import lru_cache; f = lru_cache(lambda: 42)" \
"f()"
Changes:
* _PyType_GetModuleByDef(): add fast-path for tp_mro[0] and use
_PyType_HasFeature().
* Add a new pycore_moduleobject.h internal C API header file.
* Add _PyModule_GetDef() and _PyModule_GetState() which can be
inlined without LTO and don't check invalid argument at runtime.
* Replace PyModule_GetState() with _PyModule_GetState() in _abc,
_array, _operator, _pickle, _queue, _random, _struct and os
extension modules. The _array, _queue and _struct extensions are
now built with Py_BUILD_CORE_MODULE macro defined.
|
This looks good to my eyes but I haven't deeply thought about what the slow path is doing and what cases it handling. So, this is a 95% sign-off. |
|
🤖 New build scheduled with the buildbot fleet by @rhettinger for commit 158c102 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
|
Hum. I'm not 100% sure about the assumptions that I made in my optimization. I plan to write first preparation changes to make the assumptions reliable :-) |
|
Also, PR #25492 makes this optimization less important. |
|
This is still important. I mitigated only one case. If this PR isn't applied, I think all the other cases should be reverted. We're making everyone pay a cost and it looks like there is currently zero offsetting benefit. It is not clear that these ever should have gone in the first place. |
|
I splitted this PR into 3 smaller PRs with a few minor fixes: https://bugs.python.org/issue40137#msg391547 The main change is that I added a runtime check to ensure that a type MRO cannot be empty. I close this PR. |
Make _PyType_GetModuleByDef() 2.3 ns faster.
Benchmark: 43.2 ns +- 0.7 ns -> 40.9 ns +- 1.0 ns: 1.05x faster
./python -m pyperf timeit
--duplicate=4096
-s "from functools import lru_cache; f = lru_cache(lambda: 42)"
"f()"
Changes:
_PyType_HasFeature().
inlined without LTO and don't check invalid argument at runtime.
_array, _operator, _pickle, _queue, _random, _struct and os
extension modules. The _array, _queue and _struct extensions are
now built with Py_BUILD_CORE_MODULE macro defined.
https://bugs.python.org/issue40137