[3.9] gh-135661: Fix CDATA section parsing in HTMLParser (GH-135665) (GH-137774)#139661
[3.9] gh-135661: Fix CDATA section parsing in HTMLParser (GH-135665) (GH-137774)#139661ambv merged 2 commits intopython:3.9from
Conversation
…onGH-135665) (pythonGH-137774) "] ]>" and "]] >" no longer end the CDATA section. Make CDATA section parsing context depending. Add private method HTMLParser._set_support_cdata() to change the context. If called with True, "<[CDATA[" starts a CDATA section which ends with "]]>". If called with False, "<[CDATA[" starts a bogus comments which ends with ">". (cherry picked from commit 0cbbfc4) (cherry picked from commit dcf2476) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ezio-melotti
left a comment
There was a problem hiding this comment.
3.9 seems to have two additional tests at the end of test_htmlparser:
test_invalid_keyword_error_exceptiontest_invalid_keyword_error_pass
These are missing in 3.10+ and the former is currently failing:
======================================================================
FAIL: test_invalid_keyword_error_exception (test.test_htmlparser.AttributesTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/cpython/cpython/Lib/test/test_htmlparser.py", line 1118, in test_invalid_keyword_error_exception
parser.feed('<![invalid>')
AssertionError: InvalidMarkupException not raised|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be poked with soft cushions! |
Let me check how they got to be removed from 3.10 and if this isn't problematic, we'll do the same here. |
|
@serhiy-storchaka the additional tests were added in #32256. Do you think they are valid? |
|
Originally, the HTML parser used the code from These tests were correct for existing code, but are no longer correct for the new code. If we decide to accept these changes in 3.9, then the tests should be removed (and there is no replacement, the tested code no longer exist). If we decided to give up on the backport, they remain. |
"] ]>" and "]] >" no longer end the CDATA section.
Make CDATA section parsing context depending.
Add private method HTMLParser._set_support_cdata() to change the context.
If called with True, "<[CDATA[" starts a CDATA section which ends with "]]>".
If called with False, "<[CDATA[" starts a bogus comments which ends with ">".
(cherry picked from commit 0cbbfc4)
(cherry picked from commit dcf2476)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com