-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
bpo-37388: Check codec error handler in development mode #14341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Proof-of-concept PR to always check the error handle in development mode. If the idea is approved, I will extend it to every decoder and encoder. |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will add an overhead for every decoding operation, even for decoding an empty or single-byte bytes object.
|
Microbenchmark:
Cost of this PR, the overhead seems small (2 ns) and may be lower with a PGO build: Cost of checking the error handler, it's quite significant, 38 ns on 89 ns: I would appreciate if someone could check with CPU isolation and a build using PGO. |
|
Could you please test with an empty and 1-byte data? Use |
I wrote bench.py that I attached to https://bugs.python.org/issue37388: My laptop is not really stable without CPU isolation: |
|
Sorry, I forgot my menton that the new benchmark results were measured at commit e77c1e548e0dff8dffc3f68b655014a503eb138f which uses __builtin_expect(x, 0). |
As expected, I misused this dangerous macro :-) I used UNLIKELY instead of LIKELY causing branch misprediction which adds a 5 ns overhead... Fixed benchmark using LIKELY, commit 09de3a9eefd8934405a1a6c4b9ee065475dbe0a1: With std dev: |
|
New benchmark run with CPU isolation, the std dev is way better (way smaller). The overhead on b"".decode() is 1.8 ns. It's around 1% in general: |
|
Since @serhiy-storchaka likes the idea, I completed my PR to handle more cases and I wrote documentation. @serhiy-storchaka: Would you mind to review the update PR? @methane: Would yiu mind to review it as well? I know that you care about performance of codecs :-) I removed the LIKELY/UNLIKELY macros. A PGO compilation builds the branch prediction statistics for us automatically. I'm trying to avoid starting to use these macros, since they can have the opposite effect of the expected effect, when misused. See my first attempt: I used UNLIKELY rather than LIKELY, and so added an overhead rather than an optimization... |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about open()? Should not we add early checks for encoding and errors?
|
I modified open() to check errors early. open() already checks encoding indirectly. I modified TextIOWrapper in practice. I added tests. Oh, in some cases, the encoding is not tested neither: I will complete my PR to also test the encoding. |
In development mode and in debug build, encoding and errors arguments are now checked on string encoding and decoding operations. Examples: open(), str.encode() and bytes.decode(). By default, for best performances, the errors argument is only checked at the first encoding/decoding error, and the encoding argument is sometimes ignored for empty strings.
|
I rebased my PR, squashed commits. I add many unit tests on open, byte, bytearray, io.open, _pyio.open. I also added checks on the encoding argument, not only on the errors argument. encoding is ignored sometimes with empty strings. |
|
Thank You! |
Thank you, your "Boom, Shaka Laka, Boom!" example is now part of the Python code base! |
In development mode and in debug build, encoding and errors arguments are now checked on string encoding and decoding operations. Examples: open(), str.encode() and bytes.decode(). By default, for best performances, the errors argument is only checked at the first encoding/decoding error, and the encoding argument is sometimes ignored for empty strings.
In development mode and in debug build, encoding and errors arguments are now checked on string encoding and decoding operations. Examples: open(), str.encode() and bytes.decode(). By default, for best performances, the errors argument is only checked at the first encoding/decoding error, and the encoding argument is sometimes ignored for empty strings.
https://bugs.python.org/issue37388