Move backslash unescaping to treeprocessor#1272
Move backslash unescaping to treeprocessor#1272waylan merged 3 commits intoPython-Markdown:masterfrom
Conversation
By unescaping backslash escapes in a treeprocessor, the text is properly escaped during serialization. Fixes Python-Markdown#1131. As it is recognized that varous third-party extensions may be calling the old class at `postprocessors.UnescapePostprocessor` the old class remains in the codebase, but has been deprecated and will be removed in a future release. The new class `treeprocessors.UnescapeTreeprocessor` should be used instead.
waylan
left a comment
There was a problem hiding this comment.
Below are a few comments and concerns I have about this change. Feedback is welcome.
| <p>Left paren: (</p> | ||
| <p>Right paren: )</p> | ||
| <p>Greater-than: ></p> | ||
| <p>Greater-than: ></p> |
There was a problem hiding this comment.
This is the one and only change in behavior in the existing tests. I'm okay with this, however, as technically this results in valid output. The reason for the change is that the angle bracket gets escaped during serialization. Previously, a placeholder was there during serialization, which was swapped out for the actual character later. The whole point of this change was to better ensure valid HTML output, so this is an acceptable change in behavior.
There was a problem hiding this comment.
Having unescaped > in HTML was a bug, so good that you fixed it.
| """ Loop over all elements and unescape all text. """ | ||
| for elem in root.iter(): | ||
| # Unescape text content | ||
| if elem.text and not elem.tag == 'code': |
There was a problem hiding this comment.
I'm not sure we actually need to skip code tags, In fact, if I remove the check, the tests all pass. In fact, the previous code did not have a way to distinguish between code and other content. However, there is always a possibility that code could intentionally contain what looks like a placeholder. In that case, the content should not be altered. Therefore, I have left the check in.
|
I'll try and look at this soon. I'd like to pull it and see how it impacts some of my things. |
mitya57
left a comment
There was a problem hiding this comment.
As all tests pass and the bug is fixed, I am happy with this change.
facelessuser
left a comment
There was a problem hiding this comment.
I see nothing breaking on my end. Seems good.
See Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <anders@zulip.com>
This replaced the deprecated `markdown.postprocessors.UnescapePostprocessor` in Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This replaced the deprecated `markdown.postprocessors.UnescapePostprocessor` in Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <andersk@mit.edu>
See Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <anders@zulip.com>
This replaced the deprecated `markdown.postprocessors.UnescapePostprocessor` in Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This replaced the deprecated `markdown.postprocessors.UnescapePostprocessor` in Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <andersk@mit.edu>
See Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <anders@zulip.com>
This replaced the deprecated `markdown.postprocessors.UnescapePostprocessor` in Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This replaced the deprecated `markdown.postprocessors.UnescapePostprocessor` in Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <andersk@mit.edu>
See Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <anders@zulip.com>
By unescaping backslash escapes in a treeprocessor, the text is properly
escaped during serialization. Fixes #1131.
As it is recognized that varous third-party extensions may be calling the
old class at
postprocessors.UnescapePostprocessorthe old class remainsin the codebase, but has been deprecated and will be removed in a future
release. The new class
treeprocessors.UnescapeTreeprocessorshould beused instead.