Skip to content

Conversation

@cmaloney
Copy link
Contributor

@cmaloney cmaloney commented Dec 3, 2025

When a str is encoded in bytearray.__init__ the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger the str the bigger the saving.

Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster

import pyperf

runner = pyperf.Runner()

runner.timeit(
    name="encode",
    setup="a = 'a' * 1_000_000",
    stmt="bytearray(a, encoding='utf8')")

When a `str` is encoded in `bytearray.__init__` the encoder tends to
create a new unique bytes object. Rather than allocate new memory and
copy the bytes use the already created bytes object as bytearray
backing. The bigger the `str` the bigger the saving.

Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster

```python
import pyperf

runner = pyperf.Runner()

runner.timeit(
    name="encode",
    setup="a = 'a' * 1_000_000",
    stmt="bytearray(a, encoding='utf8')")
```
@cmaloney
Copy link
Contributor Author

cc: @vstinner this construction form doesn't appear a lot in the CPython codebase but does exist in other codebases.

This one I think is a safe subset from #141862; hope to revisit that eventually but it's definitely many-step to get working just right.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It seems to be safe to pick the bytes object in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants