bpo-38644: Add _PyObject_VectorcallTstate()#17052
bpo-38644: Add _PyObject_VectorcallTstate()#17052vstinner merged 1 commit intopython:masterfrom vstinner:call_tstate2
Conversation
|
@jdemeyer: Would it be possible to change _PyObject_Vectorcall() and _PyObject_MakeTpCall() to add a tstate parameter? Or is it better to add new functions? The functions are private, so we don't provide any backward compatibility warranty. |
|
Please benchmark this very carefully on Windows. One note from Mark was that there's a performance loss on Windows when passing more than 4 arguments in C. That the main reason why Vectorcall combines the "number of args" and "flags" into a single argument. |
|
tl; dr I don't see any significant performance difference when running microbenchmarks on Linux. Linux x86-64 ABI allows to pass up to six function parameters as registers: https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI I ran a benchmark on Linux (Fedora 30) with CPU isolation ( EDIT: I compiled Python with PGO+LTO: I used all benchmarks called "bench*call.py" in my https://github.com/vstinner/pymicrobench project: collection of Python microbenchmarks. If I ignore differences smaller than 5%, all results are shown as "Not significant". They are micro benchmarks. IMHO on a microbenchmark, a significant difference should be at least 10% (-10% or +10%). Raw results: Script to run the benchmark: I moved all .json to ref/ (reference Python) or patch/ (patched Python). Script to compare results: |
|
That's not surprising. Are you planning to run Windows benchmarks as well? |
I'm now trying to redo the same benchmark on Windows. |
|
It's hard to me to understand microbenchmark results on Windows because I don't know how to minimize the std dev. I installed psutil, so pyperf calls proc.nice(psutil.REALTIME_PRIORITY_CLASS) to set the benchmark process to the highest priority. But all differences are smaller than 10%. Some microbenchmarks are faster, some are slower. But it may come from the benchmark "noise". Benchmarks on Windows. I used the following commands to build Python: I cloned https://github.com/vstinner/pymicrobench To create the venv, I used: I downloaded https://files.pythonhosted.org/packages/03/9a/95c4b3d0424426e5fd94b5302ff74cea44d5d4f53466e1228ac8e73e14b4/psutil-5.6.5.tar.gz and extracted using "python -m tarfile -e psutil-5.6.5.tar.gz". I hacked its setup.py to add Then I ran benchmarks using: Comparison ignoring differences smaller than 5%: Comparison: The problem is that I don't know how to run benchmar ks on Windows. For example, on \bench_fastcall_partial.json, the std dev is between +- 0.8 ns and +- 9 ns: When I read |
* Add _PyObject_VectorcallTstate() function: similar to _PyObject_Vectorcall(), but with tstate parameter * Add tstate parameter to _PyObject_MakeTpCall()
|
I removed _PyObject_MakeTpCallTstate(): I modified _PyObject_MakeTpCall() to add a tstate parameter instead. You should not call _PyObject_MakeTpCall() directly. But I chose to leave _PyObject_Vectorcall() unchanged, and add _PyObject_VectorcallTstate(). |
|
I didn't notice any significant performance overhead of this change, nor speedup. The balance seems to be null. This change is more about correctness rather than performance. See https://bugs.python.org/issue36710 for the rationale. |
|
I'm not really keen on this change, especially the change to What would be helpful from a performance point of view is to fix up |
|
@markshannon: "I'm not really keen on this change, (...)" I started a thread on python-dev: https://mail.python.org/archives/list/[email protected]/thread/PQBGECVGVYFTVDLBYURLCXA3T7IPEHHO/ I invite you to participate to the discussion there ;-) |
* Add _PyObject_VectorcallTstate() function: similar to _PyObject_Vectorcall(), but with tstate parameter * Add tstate parameter to _PyObject_MakeTpCall()
* Add _PyObject_VectorcallTstate() function: similar to _PyObject_Vectorcall(), but with tstate parameter * Add tstate parameter to _PyObject_MakeTpCall()
_PyObject_Vectorcall() with tstate parameter
https://bugs.python.org/issue38644