gh-139871: Optimize bytearray unique bytes iconcat #141862

cmaloney · 2025-11-22T23:28:46Z

If the bytearray is empty and a uniquely referenced bytes object is being concatenated (ex. one just received from read), just use its storage as the backing for the bytearray rather than copying it. The bigger the bytes the bigger the saving.

build_bytes_unique: Mean +- std dev: [base] 383 ns +- 11 ns -> [iconcat_opt] 342 ns +- 5 ns: 1.12x faster
build_bytearray: Mean +- std dev: [base] 496 ns +- 8 ns -> [iconcat_opt] 471 ns +- 13 ns: 1.05x faster
encode: Mean +- std dev: [base] 482 us +- 2 us -> [iconcat_opt] 13.8 us +- 0.1 us: 34.78x faster

Benchmark hidden because not significant (1): build_bytes

Geometric mean: 2.53x faster

note: Performance of build_bytes is expected to stay constant.

import pyperf

runner = pyperf.Runner()

count1 = 1_000
count2 = 100
count3 = 10_000

CHUNK_A = b'a' * count1
CHUNK_B = b'b' * count2
CHUNK_C = b'c' * count3

def build_bytes():
    # Bytes not uniquely referenced.
    ba = bytearray()
    ba += CHUNK_A
    ba += CHUNK_B
    ba += CHUNK_C

def build_bytes_unique():
    ba = bytearray()
    # Repeat inline results in uniquely referenced bytes.
    ba += b'a' * count1
    ba += b'b' * count2
    ba += b'c' * count3

def build_bytearray():
    # Each bytearray appended is uniquely referenced.
    ba = bytearray()
    ba += bytearray(CHUNK_A)
    ba += bytearray(CHUNK_B)
    ba += bytearray(CHUNK_C)

runner.bench_func('build_bytes', build_bytes)
runner.bench_func('build_bytes_unique', build_bytes_unique)
runner.bench_func('build_bytearray', build_bytearray)
runner.timeit(
    name="encode",
    setup="a = 'a' * 1_000_000",
    stmt="bytearray(a, encoding='utf8')")

From my understanding of reference counting I think this is safe to do for iconcat (and would be safe to do for ba[:] = b'\0' * 1000 discuss topic). The briefly refcount 2 isn't ideal but I think good enough for the performance delta. I'm hoping if I can ship an implementation of gh-87613 can do the same optimization for bytearray(b'\0' * 4096).

If the iconcat refcount 2 part isn't good, can tweak to keep the enecode + bytearray performance improvement without changing iconcat generally.

cc: @vstinner , @encukou

Issue: Add .take_bytes([n]) a zero-copy path from bytearray to bytes #139871

If the bytearray is empty and a uniquely referenced bytes object is being concatenated (ex. one just recieved from read), just use its storage as the backing for the bytearray rather than copying it. build_bytes_unique: Mean +- std dev: [base] 383 ns +- 11 ns -> [iconcat_opt] 342 ns +- 5 ns: 1.12x faster build_bytearray: Mean +- std dev: [base] 496 ns +- 8 ns -> [iconcat_opt] 471 ns +- 13 ns: 1.05x faster encode: Mean +- std dev: [base] 482 us +- 2 us -> [iconcat_opt] 13.8 us +- 0.1 us: 34.78x faster Benchmark hidden because not significant (1): build_bytes Geometric mean: 2.53x faster note: Performance of build_bytes is expected to stay constant. ```python import pyperf runner = pyperf.Runner() count1 = 1_000 count2 = 100 count3 = 10_000 CHUNK_A = b'a' * count1 CHUNK_B = b'b' * count2 CHUNK_C = b'c' * count3 def build_bytes(): # Bytes not uniquely referenced. ba = bytearray() ba += CHUNK_A ba += CHUNK_B ba += CHUNK_C def build_bytes_unique(): ba = bytearray() # Repeat inline results in uniquely referenced bytes. ba += b'a' * count1 ba += b'b' * count2 ba += b'c' * count3 def build_bytearray(): # Each bytearray appended is uniquely referenced. ba = bytearray() ba += bytearray(CHUNK_A) ba += bytearray(CHUNK_B) ba += bytearray(CHUNK_C) runner.bench_func('build_bytes', build_bytes) runner.bench_func('build_bytes_unique', build_bytes_unique) runner.bench_func('build_bytearray', build_bytearray) runner.timeit( name="encode", setup="a = 'a' * 1_000_000", stmt="bytearray(a, encoding='utf8')") ```

vstinner · 2025-11-24T15:53:52Z

Objects/bytearrayobject.c


+    // optimization: Avoid copying the bytes coming in when possible.
+    if (self->ob_alloc == 0 && _PyObject_IsUniquelyReferenced(other)) {
+        // note: ob_bytes_object is always the immortal empty bytes here.


Can you replace the comment with an assertion?

vstinner · 2025-11-24T15:54:54Z

Objects/bytearrayobject.c

+    // optimization: Avoid copying the bytes coming in when possible.
+    if (self->ob_alloc == 0 && _PyObject_IsUniquelyReferenced(other)) {
+        // note: ob_bytes_object is always the immortal empty bytes here.
+        if (!_canresize(self)) {


Why not also run this check when (self->ob_alloc == 0 && _PyObject_IsUniquelyReferenced(other)) is false? So move it before the outer if?

For the general case it's currently checked by bytearray_resize_lock_held.

Checking here technically changes the order of operations where an error can happen (the main codepath the buffer protocol always is called and can error before _canresize). Not certain how important that is to keep in this case. Keeping order would mean moving _canresize check inside the type checks here.

Objects/bytearrayobject.c

Co-authored-by: Victor Stinner <[email protected]>

Objects/bytearrayobject.c

encukou · 2025-11-25T15:35:43Z

Here's a test that should pass, but doesn't:

    // make some bytes
    PyObject *bytes = PyBytes_FromString("aaB");
    assert(bytes);
    // make an empty bytearray
    PyObject *ba = PyByteArray_FromStringAndSize("", 0);
    assert(ba);
    // append bytes to bytearray (in place, getting a new reference)
    PyObject *new_ba = PySequence_InPlaceConcat(ba, bytes);
    assert(new_ba == ba);
    Py_DECREF(new_ba);
    // pop from bytearray
    Py_DECREF(PyObject_CallMethod(ba, "pop", ""));

    // check that our bytes was not modified
    assert(memcmp(PyBytes_AsString(bytes), "aaB", 3) == 0);

    Py_DECREF(bytes);
    Py_DECREF(ba);

AFAIK, you need to use PyUnstable_Object_IsUniqueReferencedTemporary.

colesbury · 2025-11-25T17:07:12Z

Objects/bytearrayobject.c

+            PyObject *taken = PyObject_CallMethodNoArgs(other,
+                                                        &_Py_ID(take_bytes));


This looks unsafe to me. If you call a method, you may invalidate the assumptions you verified earlier

Maybe call bytearray_take_bytes_impl() directly to reduce the risk of side effects? And you can check again _PyObject_IsUniquelyReferenced() in an assertion.

cmaloney · 2025-11-29T00:59:32Z

(Iterating on this locally; should have updates next week)

Co-authored-by: Victor Stinner <[email protected]>

cmaloney · 2025-12-03T21:54:36Z

@encukou : When I try using PyUnstable_Object_IsUniqueReferencedTemporary then use ./python -m test test_bytes -W I get: python: ../cpython/Include/internal/pycore_stackref.h:695: PyObject *PyStackRef_AsPyObjectBorrow(_PyStackRef): Assertion !PyStackRef_IsTaggedInt(ref)' failed.`. Not sure exactly why.

I was modeling this off of PyBytes_Concat which does a _PyObject_IsUniquelyReferenced but looking more closely that isn't the implementation behind the sequence operations (bytes_concat) nor does it modify the right hand side in any way.

I don't see a way to do the generalized optimization currently; still think there should be a way just not sure the path and suspect it's a number of steps. I'll probably close this PR and open one for just the encoding case (bytearray('test', encoding='utf-8')) and put more general byearray iconcat, extend, and construction optimization in the back of my head for the moment.

colesbury · 2025-12-03T21:59:51Z

That seems like a bug in PyUnstable_Object_IsUniqueReferencedTemporary. We should skip over tagged ints when checking variables on the stack:

cpython/Objects/object.c

Lines 2758 to 2767 in c0c6514

    
               _PyStackRef *base = _PyFrame_Stackbase(frame); 
        
               _PyStackRef *stackpointer = frame->stackpointer; 
        
               while (stackpointer > base) { 
        
                   stackpointer--; 
        
                   if (op == PyStackRef_AsPyObjectBorrow(*stackpointer)) { 
        
                       return PyStackRef_IsHeapSafe(*stackpointer); 
        
                   } 
        
               } 
        
               return 0; 
        
           }

cmaloney · 2025-12-03T22:57:47Z

Created GH-142243 doing just the bytearray('test', encoding='utf-8') portion of this. My time is becoming more limited shortly (starting a new job) so won't have quite as much time to explore new to me corners of CPython. Definitely happy to revisit more generally (there's a lot of optimizations both here and in bytes), maybe at PyConUS this year.

cmaloney · 2025-12-03T23:33:50Z

Tested with if (PyStackRef_IsTaggedInt(*stackpointer)) { continue; } in PyUnstable_Object_IsUniqueReferencedTemporary and that seems to work; that change I think needs a separate issue + news + test. Added to my backlog but not sure when I'll be able to get to.

bedevere-app bot added the awaiting review label Nov 22, 2025

bedevere-app bot mentioned this pull request Nov 22, 2025

Add .take_bytes([n]) a zero-copy path from bytearray to bytes #139871

Closed

cmaloney added the skip news label Nov 22, 2025

cmaloney mentioned this pull request Nov 23, 2025

gh-141863: Use bytearray.take_bytes in asyncio.streams #141864

Merged

vstinner reviewed Nov 24, 2025

View reviewed changes

cmaloney and others added 2 commits November 24, 2025 11:20

Apply suggestions from code review

08364c1

Co-authored-by: Victor Stinner <[email protected]>

Switch from comment to assertion

70c15bf

vstinner reviewed Nov 25, 2025

View reviewed changes

Objects/bytearrayobject.c Outdated Show resolved Hide resolved

colesbury reviewed Nov 25, 2025

View reviewed changes

cmaloney and others added 3 commits December 3, 2025 13:16

Call take_bytes directly

614e743

Update Objects/bytearrayobject.c

c5e9da6

Co-authored-by: Victor Stinner <[email protected]>

Borrow -> Borrowed

e6fd741

cmaloney mentioned this pull request Dec 11, 2025

gh-139871: Optimize bytearray construction with encoding #142243

Open

colesbury mentioned this pull request Dec 11, 2025

PyUnstable_Object_IsUniqueReferencedTemporary doesn't handle tagged ints properly #142589

Closed

		PyObject *taken = PyObject_CallMethodNoArgs(other,
		&_Py_ID(take_bytes));

Uh oh!

gh-139871: Optimize bytearray unique bytes iconcat #141862

Are you sure you want to change the base?

gh-139871: Optimize bytearray unique bytes iconcat #141862

Uh oh!

Conversation

cmaloney commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

vstinner Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

cmaloney Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

encukou commented Nov 25, 2025

Uh oh!

colesbury Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

vstinner Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

cmaloney commented Nov 29, 2025

Uh oh!

cmaloney commented Dec 3, 2025

Uh oh!

colesbury commented Dec 3, 2025

Uh oh!

cmaloney commented Dec 3, 2025

Uh oh!

cmaloney commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cmaloney commented Nov 22, 2025 •

edited

Loading

cmaloney Nov 24, 2025 •

edited

Loading