-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
Closed
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15new features, bugs and security fixesnew features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
Bug description:
When HTMLParser is initialized with convert_charrefs=False, it behaves incorrectly when processing an invalid named entity reference (e.g., &A, which is not a valid HTML entity). The parser silently drops the & character and only passes the subsequent A to handle_data. I think this indicates a silent data loss problem.
from html.parser import HTMLParser
class MyParser(HTMLParser):
def handle_data(self, data):
print(f"handle_data received: {data!r}")
parser_false = MyParser(convert_charrefs=False)
parser_false.feed('&A')
parser_false.close()handle_data received: 'A'CPython versions tested on:
3.12
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15new features, bugs and security fixesnew features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Projects
Status
Done