bpo-31170: Write unit test for Expat 2.2.4 UTF-8 bug#3570
bpo-31170: Write unit test for Expat 2.2.4 UTF-8 bug#3570vstinner merged 3 commits intopython:masterfrom vstinner:expat224_utf8_bug
Conversation
|
The XML file comes from the libexpat bug: libexpat/libexpat#115 |
There was a problem hiding this comment.
I don't know Chinese. I used other example for reproducing this bug: https://bugs.python.org/issue31303#msg300997.
Or simpler:
xml.etree.ElementTree.XML(b'<a b="' + b'x'*1023 + b'\xc3\xa0"/>')
There was a problem hiding this comment.
xml.etree.ElementTree.XML(b'')
Ok, I also added this test. I prefer 2 tests rather than a single one :-) I like to reuse the same XML data which was used to reproduce the initial Expat bug.
There was a problem hiding this comment.
I have proposed different test not because I don't know Chinese, but because it is simpler, don't need an external file, and easier extensible. For example you can test with two strings
b'<a b="' + b'\xc3\xa0'*1024 + b'"/>'
b'<a b="x' + b'\xc3\xa0'*1024 + b'"/>'
and be sure that it covers any buffer boundaries in the range of two kilobytes.
The original Chinese example works only for specific buffer size (1 KiB) and specific buffering strategy (attribute value is written from the start of the buffer).
There was a problem hiding this comment.
For example you can test with two strings : (...)
Ok, I added these two examples as well.
FYI I tested and b'' (2 KB) worked well on Expat <= 2.2.3, but b'' failed (2 KB + 1).
I added 4 tests. I should now be enough to test for non-regression, no? If you want more tests, IMHO they should be written in Expat, not CPython ;-)
I only wanted to make sure that the fix was applied, not try all corner cases.
tiran
left a comment
There was a problem hiding this comment.
I'm fine with using the reproducer from the expat bug.
Non-regression tests for the Expat 2.2.3 UTF-8 decoder bug.
Lib/test/test_xml_etree.py
Outdated
| # Check that Expat 2.2.4 fixed the bug. | ||
|
|
||
| # 1 KB buffer | ||
| text = b'x' * 1023 + b'\xc3\xa0' |
There was a problem hiding this comment.
This test is redundant. The third test supersedes it.
Lib/test/test_xml_etree.py
Outdated
| text = b'x' * 1023 + b'\xc3\xa0' | ||
| self.check_expat224_utf8_bug(text) | ||
|
|
||
| # 2 KB buffer |
There was a problem hiding this comment.
It would be better to document these two tests as testing buffer bounds at odd and even positions.
And please use binary prefixes: KiB.
vstinner
left a comment
There was a problem hiding this comment.
New update to take latest @serhiy-storchaka comments in account.
Lib/test/test_xml_etree.py
Outdated
| text = b'x' * 1023 + b'\xc3\xa0' | ||
| self.check_expat224_utf8_bug(text) | ||
|
|
||
| # 2 KB buffer |
Lib/test/test_xml_etree.py
Outdated
| # Check that Expat 2.2.4 fixed the bug. | ||
|
|
||
| # 1 KB buffer | ||
| text = b'x' * 1023 + b'\xc3\xa0' |
Lib/test/test_xml_etree.py
Outdated
| # | ||
| # Test buffer bounds at odd and even positions. | ||
|
|
||
| # 2 KiB buffer (even) |
There was a problem hiding this comment.
Vice verse. The first test covers the case of buffer bound at odd position, and the second test covers the case of buffer bound at even position.
"2 KiB" is not related. The tests cover all buffer sizes (up to 2 KiB).
There was a problem hiding this comment.
I removed the comments. "# Test buffer bounds at odd and even positions." is enough.
There was a problem hiding this comment.
Sorry for my comments were confusing.
|
Thanks for the review and advices @serhiy-storchaka! |
|
Thanks @Haypo for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6. |
1 similar comment
|
Thanks @Haypo for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6. |
|
Sorry, @Haypo, I could not cleanly backport this to |
|
GH-3745 is a backport of this pull request to the 2.7 branch. |
|
GH-3746 is a backport of this pull request to the 3.6 branch. |
https://bugs.python.org/issue31170