Skip to content

Conversation

@LindaSummer
Copy link
Contributor

@LindaSummer LindaSummer commented Nov 15, 2025

Issue

#139103

Proposed Changes

  • make function use defer refcount in type_setattro.
  • add benchmark test case.

Comment

Test Case

Here is my original test case for this problem.

import _testcapi
from threading import Thread
from time import time

def test_1():
    class Foo:
        def __init__(self, x):
            self.x = x

    niter = 5 * 1000 * 1000

    def benchmark(n):
        for i in range(n):
            Foo(x=1)

    for nth in (1, 4):
        t0 = time()
        threads = [Thread(target=benchmark, args=(niter,)) for _ in range(nth)]
        for t in threads:
            t.start()
        for t in threads:
            t.join()
        print(f"{nth=} {(time() - t0) / nth}")
        
def test_2():
    class Foo2:
        def __init__(self, x):
                pass
        pass
    
    _Foo2_x = int
    
    create_str = """def create_init(_Foo2_x,):
        def __init__(self, x: _Foo2_x):
            self.x = x
        return (__init__,)
    """
    ns = {}
    exec(create_str, globals(), ns)
    fn = ns['create_init']({**locals()})
    setattr(Foo2, '__init__', fn[0])
    niter = 5 * 1000 * 1000
    def benchmark(n):
        for i in range(n):
            Foo2(x=1)
    
    for nth in (1, 4):
        t0 = time()
        threads = [Thread(target=benchmark, args=(niter,)) for _ in range(nth)]
        for t in threads:
            t.start()
        for t in threads:
            t.join()
        print(f"{nth=} {(time() - t0) / nth}")
        
def test_3():
    class Foo3:
        def __init__(self, x):
                pass
        pass
    
    _Foo3_x = int
    
    create_str = """def create_init(_Foo3_x,):
        def __init__(self, x: _Foo3_x):
            self.x = x
        return (__init__,)
    """
    ns = {}
    exec(create_str, globals(), ns)
    fn = ns['create_init']({**locals()})
    setattr(Foo3, '__init__', fn[0])
    _testcapi.pyobject_enable_deferred_refcount(Foo3.__init__)
    niter = 5 * 1000 * 1000
    def benchmark(n):
        for i in range(n):
            Foo3(x=1)
    
    for nth in (1, 4):
        t0 = time()
        threads = [Thread(target=benchmark, args=(niter,)) for _ in range(nth)]
        for t in threads:
            t.start()
        for t in threads:
            t.join()
        print(f"{nth=} {(time() - t0) / nth}")
        
if __name__ == "__main__":
    print("------test_1-------")
    test_1()
    print("------test_2-------")
    test_2()
    print("------test_3-------")
    test_3()

Here is the output for this testcase.
configure command: ./configure --disable-gil "CC=clang" on 4ceb077 .

------test_1-------
nth=1 2.6803863048553467
nth=4 0.7154628038406372
------test_2-------
nth=1 2.715298891067505
nth=4 1.4010276794433594
------test_3-------
nth=1 2.5294859409332275
nth=4 0.6726199388504028

Root Cause

In test_3, I use _testcapi.pyobject_enable_deferred_refcount to make the generated __init__ with deferred refcount, and the problem is fixed.

Final Action

So I use the PyUnstable_Object_EnableDeferredRefcount provided in _testcapi.pyobject_enable_deferred_refcount to make the function object a deferred refcount one.

static PyObject *
pyobject_enable_deferred_refcount(PyObject *self, PyObject *obj)
{
int result = PyUnstable_Object_EnableDeferredRefcount(obj);
return PyLong_FromLong(result);
}

Here is the output of my test script in this PR.

------test_1-------
nth=1 2.489182233810425
nth=4 0.6642101407051086
------test_2-------
nth=1 2.475900173187256
nth=4 0.6655464172363281
------test_3-------
nth=1 2.4749510288238525
nth=4 0.6595655679702759

Here is the result of the new benchmark case in the current PR.

./python ./Tools/ftscalingbench/ftscalingbench.py -t 8
Running benchmarks with 8 threads
object_cfunction           5.4x faster
cmodule_function           5.2x faster
object_lookup_special      5.7x faster
context_manager            5.6x faster
mult_constant              5.2x faster
generator                  5.5x faster
pymethod                   4.9x faster
pyfunction                 5.7x faster
module_function            5.4x faster
load_string_const          5.8x faster
load_tuple_const           5.6x faster
create_pyobject            5.7x faster
create_closure             5.7x faster
create_dict                5.4x faster
thread_local_read          5.4x faster
method_caller              6.0x faster
instantiate_dataclass      5.1x faster

We could see that the instantiate_dataclass is 5.1x faster.

Please correct me if I misunderstand anything about this case.

Thanks very much! 😊

Best Regards,
Edward Xu

@python-cla-bot
Copy link

python-cla-bot bot commented Nov 15, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@LindaSummer
Copy link
Contributor Author

LindaSummer commented Nov 15, 2025

Hi @colesbury ,

Please help take a review and correct me if I misunderstand anything about this case. 😊

Wish you a good day.

Thanks very much!

Best Regards,
Edward

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also run the pyperformance benchmarks on this PR? Deferred reference counting can damage single-threaded code.

@@ -0,0 +1 @@
Make ``type_setattro`` use defer refcount in free-threading for functions without defer refcount.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too technical for a news entry. Instead, can we say something like "Improve multithreaded scaling of dataclasses on the free-threaded build"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ZeroIntensity ,

Thanks very much for your suggestion!
I will update the news entry as suggested.

Best Regards,
Edward

#ifdef Py_GIL_DISABLED
if (value != NULL && PyFunction_Check(value)) {
if (!_PyObject_HasDeferredRefcount(value)) {
BEGIN_TYPE_LOCK();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to hold the type lock here, since PyUnstable_Object_EnableDeferredRefcount is thread-safe.

Copy link
Contributor Author

@LindaSummer LindaSummer Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ZeroIntensity ,

Thanks very much for pointing this out. ❤

I go through the code of PyUnstable_Object_EnableDeferredRefcount and find that it's thread-safe.

cpython/Objects/object.c

Lines 2734 to 2739 in 3d14805

if (_Py_atomic_compare_exchange_uint8(&op->ob_gc_bits, &bits, bits | _PyGC_BITS_DEFERRED) == 0)
{
// Someone beat us to it!
return 0;
}
_Py_atomic_add_ssize(&op->ob_ref_shared, _Py_REF_SHARED(_Py_REF_DEFERRED, 0));

It uses compare-and-set to guard the _PyGC_BITS_DEFERRED bit flag before updating the payload of the deferred refcount. So only one thread could make this change.

I removed the TYPE_LOCK and the double check of the _PyObject_HasDeferredRefcount. They are unnecessary.

Best Regards,
Edward

if (value != NULL && PyFunction_Check(value)) {
if (!_PyObject_HasDeferredRefcount(value)) {
BEGIN_TYPE_LOCK();
if (!_PyObject_HasDeferredRefcount(value)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this checked twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ZeroIntensity ,

I tried to double-check the flag to reduce the conflict of entering the lock.
But as you mentioned above, we don't need a lock for PyUnstable_Object_EnableDeferredRefcount .
This double check has been removed with the lock.

An atomic action on the type flag is very efficient and is safe for our scenario.

Thanks very much for your suggestion!

Best Regards,
Edward

@colesbury colesbury self-requested a review November 17, 2025 14:00
@LindaSummer
Copy link
Contributor Author

LindaSummer commented Nov 17, 2025

Could you also run the pyperformance benchmarks on this PR? Deferred reference counting can damage single-threaded code.

Hi @ZeroIntensity ,

Got it! 😊
I will run this performance profile and paste the results of both the main branch and the current PR.

Here is my perf report on this PR and main cc6b62a .

https://gist.github.com/LindaSummer/8d3420b10bf591b0e1d76336787e0a49

Best Regards,
Edward

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w/ some minor formatting adjustments

@colesbury colesbury enabled auto-merge (squash) November 18, 2025 23:10
@colesbury colesbury merged commit ce79154 into python:main Nov 19, 2025
46 checks passed
@colesbury colesbury added the needs backport to 3.14 bugs and security fixes label Nov 19, 2025
@miss-islington-app
Copy link

Thanks @LindaSummer for the PR, and @colesbury for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @LindaSummer and @colesbury, I could not cleanly backport this to 3.14 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker ce791541769a41beabec0f515cd62e504d46ff1c 3.14

colesbury pushed a commit to colesbury/cpython that referenced this pull request Nov 19, 2025
…issue (pythongh-141596)

The dataclasses `__init__` function is generated dynamically by a call to `exec()` and so doesn't have deferred reference counting enabled. Enable deferred reference counting on functions when assigned as an attribute to type objects to avoid reference count contention when creating dataclass instances.
(cherry picked from commit ce79154)

Co-authored-by: Edward Xu <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Nov 19, 2025

GH-141750 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Nov 19, 2025
@colesbury
Copy link
Contributor

Thanks @LindaSummer!

colesbury added a commit that referenced this pull request Nov 19, 2025
…h-141596) (gh-141750)

The dataclasses `__init__` function is generated dynamically by a call to `exec()` and so doesn't have deferred reference counting enabled. Enable deferred reference counting on functions when assigned as an attribute to type objects to avoid reference count contention when creating dataclass instances.
(cherry picked from commit ce79154)

Co-authored-by: Edward Xu <[email protected]>
StanFromIreland pushed a commit to StanFromIreland/cpython that referenced this pull request Dec 6, 2025
…ythongh-141596)

The dataclasses `__init__` function is generated dynamically by a call to `exec()` and so doesn't have deferred reference counting enabled. Enable deferred reference counting on functions when assigned as an attribute to type objects to avoid reference count contention when creating dataclass instances.
ashm-dev pushed a commit to ashm-dev/cpython that referenced this pull request Dec 8, 2025
…ythongh-141596)

The dataclasses `__init__` function is generated dynamically by a call to `exec()` and so doesn't have deferred reference counting enabled. Enable deferred reference counting on functions when assigned as an attribute to type objects to avoid reference count contention when creating dataclass instances.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants