-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
GH-141686: Break cycles created by JSONEncoder.iterencode
#141687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| elif isinstance(value, int): | ||
| # see comment for int/float in _make_iterencode | ||
| yield _intstr(value) | ||
| yield int.__repr__(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this affect performance for a large list of integers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like doing things the "normal" way (int.__repr__) is about 0-10% faster on a little microbenchmark:
LOOPS = 1 << 24
def outer():
_intstr = int.__repr__
def old():
for _ in range(LOOPS):
_intstr(42)
def new():
for _ in range(LOOPS):
int.__repr__(42)
return old, new
def bench(func):
start = time.perf_counter()
func()
end = time.perf_counter()
print(end - start)
old, new = outer()
bench(old)
bench(new)I think the combination of cell + bound method unwrapping is enough overhead to cancel out the gains from avoiding a (cached) global lookup and a (cached) attribute load. But it's pretty much in the noise anyways, so probably not worth keeping the old micro-optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bench shows only 1-2% difference on my machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. On my machine, this bench shows 2-3% difference on 3.14 and main and 20-25% difference on 3.12 and 3.13. So this micro-optimization was justified in the past, but perhaps not so much now. It cannot be backported if you are planning to backport this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No plans to backport.
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍
|
…de (pythonGH-141687)" This reverts commit 4cfa695.
As mentioned in the issue, clearing these cycles explicitly results in a ~10% slowdown for
json.dumpmicrobenchmarks when the GC is disabled, but a ~20% speedup when the GC is enabled (since we're now eagerly doing the work done less efficiently by the GC).I've also removed a "hack" that tried to turn globals into locals (in practice these "locals" are actually cells). This may have been faster in the past, but what it really does now is turn cached
LOAD_GLOBALspecializations into unoptimizedLOAD_DEREFinstructions. Removing these gives another ~10% or so speedup for me.json.JSONEncoder.iterencodecreates reference cycles #141686