-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
gh-139103: fix free-threading dataclass.__init__ perf issue
#141596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @colesbury , Please help take a review and correct me if I misunderstand anything about this case. 😊 Wish you a good day. Thanks very much! Best Regards, |
65887ba to
aaaec25
Compare
ZeroIntensity
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also run the pyperformance benchmarks on this PR? Deferred reference counting can damage single-threaded code.
| @@ -0,0 +1 @@ | |||
| Make ``type_setattro`` use defer refcount in free-threading for functions without defer refcount. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too technical for a news entry. Instead, can we say something like "Improve multithreaded scaling of dataclasses on the free-threaded build"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ZeroIntensity ,
Thanks very much for your suggestion!
I will update the news entry as suggested.
Best Regards,
Edward
Objects/typeobject.c
Outdated
| #ifdef Py_GIL_DISABLED | ||
| if (value != NULL && PyFunction_Check(value)) { | ||
| if (!_PyObject_HasDeferredRefcount(value)) { | ||
| BEGIN_TYPE_LOCK(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to hold the type lock here, since PyUnstable_Object_EnableDeferredRefcount is thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ZeroIntensity ,
Thanks very much for pointing this out. ❤
I go through the code of PyUnstable_Object_EnableDeferredRefcount and find that it's thread-safe.
Lines 2734 to 2739 in 3d14805
| if (_Py_atomic_compare_exchange_uint8(&op->ob_gc_bits, &bits, bits | _PyGC_BITS_DEFERRED) == 0) | |
| { | |
| // Someone beat us to it! | |
| return 0; | |
| } | |
| _Py_atomic_add_ssize(&op->ob_ref_shared, _Py_REF_SHARED(_Py_REF_DEFERRED, 0)); |
It uses compare-and-set to guard the _PyGC_BITS_DEFERRED bit flag before updating the payload of the deferred refcount. So only one thread could make this change.
I removed the TYPE_LOCK and the double check of the _PyObject_HasDeferredRefcount. They are unnecessary.
Best Regards,
Edward
Objects/typeobject.c
Outdated
| if (value != NULL && PyFunction_Check(value)) { | ||
| if (!_PyObject_HasDeferredRefcount(value)) { | ||
| BEGIN_TYPE_LOCK(); | ||
| if (!_PyObject_HasDeferredRefcount(value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this checked twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ZeroIntensity ,
I tried to double-check the flag to reduce the conflict of entering the lock.
But as you mentioned above, we don't need a lock for PyUnstable_Object_EnableDeferredRefcount .
This double check has been removed with the lock.
An atomic action on the type flag is very efficient and is safe for our scenario.
Thanks very much for your suggestion!
Best Regards,
Edward
Hi @ZeroIntensity , Got it! 😊 Here is my perf report on this PR and main cc6b62a . https://gist.github.com/LindaSummer/8d3420b10bf591b0e1d76336787e0a49 Best Regards, |
colesbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM w/ some minor formatting adjustments
|
Thanks @LindaSummer for the PR, and @colesbury for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14. |
|
Sorry, @LindaSummer and @colesbury, I could not cleanly backport this to |
…issue (pythongh-141596) The dataclasses `__init__` function is generated dynamically by a call to `exec()` and so doesn't have deferred reference counting enabled. Enable deferred reference counting on functions when assigned as an attribute to type objects to avoid reference count contention when creating dataclass instances. (cherry picked from commit ce79154) Co-authored-by: Edward Xu <[email protected]>
|
GH-141750 is a backport of this pull request to the 3.14 branch. |
|
Thanks @LindaSummer! |
…h-141596) (gh-141750) The dataclasses `__init__` function is generated dynamically by a call to `exec()` and so doesn't have deferred reference counting enabled. Enable deferred reference counting on functions when assigned as an attribute to type objects to avoid reference count contention when creating dataclass instances. (cherry picked from commit ce79154) Co-authored-by: Edward Xu <[email protected]>
…ythongh-141596) The dataclasses `__init__` function is generated dynamically by a call to `exec()` and so doesn't have deferred reference counting enabled. Enable deferred reference counting on functions when assigned as an attribute to type objects to avoid reference count contention when creating dataclass instances.
…ythongh-141596) The dataclasses `__init__` function is generated dynamically by a call to `exec()` and so doesn't have deferred reference counting enabled. Enable deferred reference counting on functions when assigned as an attribute to type objects to avoid reference count contention when creating dataclass instances.
Issue
#139103
Proposed Changes
type_setattro.Comment
Test Case
Here is my original test case for this problem.
Here is the output for this testcase.
configure command:
./configure --disable-gil "CC=clang"on 4ceb077 .Root Cause
In
test_3, I use_testcapi.pyobject_enable_deferred_refcountto make the generated__init__with deferred refcount, and the problem is fixed.Final Action
So I use the
PyUnstable_Object_EnableDeferredRefcountprovided in_testcapi.pyobject_enable_deferred_refcountto make the function object a deferred refcount one.cpython/Modules/_testcapi/object.c
Lines 127 to 132 in ed81baf
Here is the output of my test script in this PR.
Here is the result of the new benchmark case in the current PR.
We could see that the instantiate_dataclass is 5.1x faster.
Please correct me if I misunderstand anything about this case.
Thanks very much! 😊
Best Regards,
Edward Xu