Skip to content

Conversation

@cmaloney
Copy link
Contributor

@cmaloney cmaloney commented Nov 22, 2025

If the bytearray is empty and a uniquely referenced bytes object is being concatenated (ex. one just received from read), just use its storage as the backing for the bytearray rather than copying it. The bigger the bytes the bigger the saving.

build_bytes_unique: Mean +- std dev: [base] 383 ns +- 11 ns -> [iconcat_opt] 342 ns +- 5 ns: 1.12x faster
build_bytearray: Mean +- std dev: [base] 496 ns +- 8 ns -> [iconcat_opt] 471 ns +- 13 ns: 1.05x faster
encode: Mean +- std dev: [base] 482 us +- 2 us -> [iconcat_opt] 13.8 us +- 0.1 us: 34.78x faster

Benchmark hidden because not significant (1): build_bytes

Geometric mean: 2.53x faster

note: Performance of build_bytes is expected to stay constant.

import pyperf

runner = pyperf.Runner()

count1 = 1_000
count2 = 100
count3 = 10_000

CHUNK_A = b'a' * count1
CHUNK_B = b'b' * count2
CHUNK_C = b'c' * count3

def build_bytes():
    # Bytes not uniquely referenced.
    ba = bytearray()
    ba += CHUNK_A
    ba += CHUNK_B
    ba += CHUNK_C

def build_bytes_unique():
    ba = bytearray()
    # Repeat inline results in uniquely referenced bytes.
    ba += b'a' * count1
    ba += b'b' * count2
    ba += b'c' * count3

def build_bytearray():
    # Each bytearray appended is uniquely referenced.
    ba = bytearray()
    ba += bytearray(CHUNK_A)
    ba += bytearray(CHUNK_B)
    ba += bytearray(CHUNK_C)

runner.bench_func('build_bytes', build_bytes)
runner.bench_func('build_bytes_unique', build_bytes_unique)
runner.bench_func('build_bytearray', build_bytearray)
runner.timeit(
    name="encode",
    setup="a = 'a' * 1_000_000",
    stmt="bytearray(a, encoding='utf8')")

From my understanding of reference counting I think this is safe to do for iconcat (and would be safe to do for ba[:] = b'\0' * 1000 discuss topic). The briefly refcount 2 isn't ideal but I think good enough for the performance delta. I'm hoping if I can ship an implementation of gh-87613 can do the same optimization for bytearray(b'\0' * 4096).

If the iconcat refcount 2 part isn't good, can tweak to keep the enecode + bytearray performance improvement without changing iconcat generally.

cc: @vstinner , @encukou

If the bytearray is empty and a uniquely referenced bytes object is
being concatenated (ex. one just recieved from read), just use its
storage as the backing for the bytearray rather than copying it.

build_bytes_unique: Mean +- std dev: [base] 383 ns +- 11 ns -> [iconcat_opt] 342 ns +- 5 ns: 1.12x faster
build_bytearray: Mean +- std dev: [base] 496 ns +- 8 ns -> [iconcat_opt] 471 ns +- 13 ns: 1.05x faster
encode: Mean +- std dev: [base] 482 us +- 2 us -> [iconcat_opt] 13.8 us +- 0.1 us: 34.78x faster

Benchmark hidden because not significant (1): build_bytes

Geometric mean: 2.53x faster

note: Performance of build_bytes is expected to stay constant.
```python
import pyperf

runner = pyperf.Runner()

count1 = 1_000
count2 = 100
count3 = 10_000

CHUNK_A = b'a' * count1
CHUNK_B = b'b' * count2
CHUNK_C = b'c' * count3

def build_bytes():
    # Bytes not uniquely referenced.
    ba = bytearray()
    ba += CHUNK_A
    ba += CHUNK_B
    ba += CHUNK_C

def build_bytes_unique():
    ba = bytearray()
    # Repeat inline results in uniquely referenced bytes.
    ba += b'a' * count1
    ba += b'b' * count2
    ba += b'c' * count3

def build_bytearray():
    # Each bytearray appended is uniquely referenced.
    ba = bytearray()
    ba += bytearray(CHUNK_A)
    ba += bytearray(CHUNK_B)
    ba += bytearray(CHUNK_C)

runner.bench_func('build_bytes', build_bytes)
runner.bench_func('build_bytes_unique', build_bytes_unique)
runner.bench_func('build_bytearray', build_bytearray)
runner.timeit(
    name="encode",
    setup="a = 'a' * 1_000_000",
    stmt="bytearray(a, encoding='utf8')")
```

// optimization: Avoid copying the bytes coming in when possible.
if (self->ob_alloc == 0 && _PyObject_IsUniquelyReferenced(other)) {
// note: ob_bytes_object is always the immortal empty bytes here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace the comment with an assertion?

// optimization: Avoid copying the bytes coming in when possible.
if (self->ob_alloc == 0 && _PyObject_IsUniquelyReferenced(other)) {
// note: ob_bytes_object is always the immortal empty bytes here.
if (!_canresize(self)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not also run this check when (self->ob_alloc == 0 && _PyObject_IsUniquelyReferenced(other)) is false? So move it before the outer if?

Copy link
Contributor Author

@cmaloney cmaloney Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the general case it's currently checked by bytearray_resize_lock_held.

Checking here technically changes the order of operations where an error can happen (the main codepath the buffer protocol always is called and can error before _canresize). Not certain how important that is to keep in this case. Keeping order would mean moving _canresize check inside the type checks here.

@encukou
Copy link
Member

encukou commented Nov 25, 2025

Here's a test that should pass, but doesn't:

    // make some bytes
    PyObject *bytes = PyBytes_FromString("aaB");
    assert(bytes);
    // make an empty bytearray
    PyObject *ba = PyByteArray_FromStringAndSize("", 0);
    assert(ba);
    // append bytes to bytearray (in place, getting a new reference)
    PyObject *new_ba = PySequence_InPlaceConcat(ba, bytes);
    assert(new_ba == ba);
    Py_DECREF(new_ba);
    // pop from bytearray
    Py_DECREF(PyObject_CallMethod(ba, "pop", ""));

    // check that our bytes was not modified
    assert(memcmp(PyBytes_AsString(bytes), "aaB", 3) == 0);

    Py_DECREF(bytes);
    Py_DECREF(ba);

AFAIK, you need to use PyUnstable_Object_IsUniqueReferencedTemporary.

Comment on lines 347 to 348
PyObject *taken = PyObject_CallMethodNoArgs(other,
&_Py_ID(take_bytes));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unsafe to me. If you call a method, you may invalidate the assumptions you verified earlier

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call bytearray_take_bytes_impl() directly to reduce the risk of side effects? And you can check again _PyObject_IsUniquelyReferenced() in an assertion.

@cmaloney
Copy link
Contributor Author

(Iterating on this locally; should have updates next week)

@cmaloney
Copy link
Contributor Author

cmaloney commented Dec 3, 2025

@encukou : When I try using PyUnstable_Object_IsUniqueReferencedTemporary then use ./python -m test test_bytes -W I get: python: ../cpython/Include/internal/pycore_stackref.h:695: PyObject *PyStackRef_AsPyObjectBorrow(_PyStackRef): Assertion !PyStackRef_IsTaggedInt(ref)' failed.`. Not sure exactly why.

I was modeling this off of PyBytes_Concat which does a _PyObject_IsUniquelyReferenced but looking more closely that isn't the implementation behind the sequence operations (bytes_concat) nor does it modify the right hand side in any way.

I don't see a way to do the generalized optimization currently; still think there should be a way just not sure the path and suspect it's a number of steps. I'll probably close this PR and open one for just the encoding case (bytearray('test', encoding='utf-8')) and put more general byearray iconcat, extend, and construction optimization in the back of my head for the moment.

@colesbury
Copy link
Contributor

That seems like a bug in PyUnstable_Object_IsUniqueReferencedTemporary. We should skip over tagged ints when checking variables on the stack:

cpython/Objects/object.c

Lines 2758 to 2767 in c0c6514

_PyStackRef *base = _PyFrame_Stackbase(frame);
_PyStackRef *stackpointer = frame->stackpointer;
while (stackpointer > base) {
stackpointer--;
if (op == PyStackRef_AsPyObjectBorrow(*stackpointer)) {
return PyStackRef_IsHeapSafe(*stackpointer);
}
}
return 0;
}

@cmaloney
Copy link
Contributor Author

cmaloney commented Dec 3, 2025

Created GH-142243 doing just the bytearray('test', encoding='utf-8') portion of this. My time is becoming more limited shortly (starting a new job) so won't have quite as much time to explore new to me corners of CPython. Definitely happy to revisit more generally (there's a lot of optimizations both here and in bytes), maybe at PyConUS this year.

@cmaloney
Copy link
Contributor Author

cmaloney commented Dec 3, 2025

Tested with if (PyStackRef_IsTaggedInt(*stackpointer)) { continue; } in PyUnstable_Object_IsUniqueReferencedTemporary and that seems to work; that change I think needs a separate issue + news + test. Added to my backlog but not sure when I'll be able to get to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants