gh-139871: Optimize bytearray construction with encoding #142243

cmaloney · 2025-12-03T22:52:01Z

When a str is encoded in bytearray.__init__ the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger the str the bigger the saving.

Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster

import pyperf

runner = pyperf.Runner()

runner.timeit(
    name="encode",
    setup="a = 'a' * 1_000_000",
    stmt="bytearray(a, encoding='utf8')")

Issue: Add .take_bytes([n]) a zero-copy path from bytearray to bytes #139871

When a `str` is encoded in `bytearray.__init__` the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger the `str` the bigger the saving. Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster ```python import pyperf runner = pyperf.Runner() runner.timeit( name="encode", setup="a = 'a' * 1_000_000", stmt="bytearray(a, encoding='utf8')") ```

cmaloney · 2025-12-11T06:20:41Z

cc: @vstinner this construction form doesn't appear a lot in the CPython codebase but does exist in other codebases.

This one I think is a safe subset from #141862; hope to revisit that eventually but it's definitely many-step to get working just right.

vstinner

LGTM. It seems to be safe to pick the bytes object in this case.

bedevere-app bot added the awaiting review label Dec 3, 2025

bedevere-app bot mentioned this pull request Dec 3, 2025

Add .take_bytes([n]) a zero-copy path from bytearray to bytes #139871

Closed

cmaloney added the skip news label Dec 3, 2025

cmaloney mentioned this pull request Dec 3, 2025

gh-139871: Optimize bytearray unique bytes iconcat #141862

Open

Merge branch 'main' into ba_tb_encoding

663ed88

vstinner approved these changes Dec 11, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Dec 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-139871: Optimize bytearray construction with encoding #142243

gh-139871: Optimize bytearray construction with encoding #142243

cmaloney commented Dec 3, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

cmaloney commented Dec 11, 2025

Uh oh!

vstinner left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

gh-139871: Optimize bytearray construction with encoding #142243

Are you sure you want to change the base?

gh-139871: Optimize bytearray construction with encoding #142243

Conversation

cmaloney commented Dec 3, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmaloney commented Dec 11, 2025

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cmaloney commented Dec 3, 2025 •

edited by bedevere-app bot

Loading