Conversation
|
Thanks for the contribution! This PR is quite large, reviewing! |
src/element_handler/mod.rs
Outdated
| // instances of two or more newlines with a newline, followed by escaped | ||
| // newlines. Tricky part: replace LFCR differently than CR. Probably with | ||
| // groups. | ||
| Regex::new(r#"(\r?\n\s*)(\r?\n\s*)"#).unwrap()) |
There was a problem hiding this comment.
Is it possible to get rid of regex? We just got pr #49 for removing regex. It would be nice if regex could be removed or opt-in.
There was a problem hiding this comment.
Cool! But didn't see the commit?
src/element_handler/mod.rs
Outdated
| Regex::new(r#"(\r?\n\s*)(\r?\n\s*)"#).unwrap()) | ||
| .replace_all(&s, |caps: &Captures| { | ||
| caps[1].to_string() | ||
| + &(caps[2].replace("\r", " ").replace("\n", " ")) |
There was a problem hiding this comment.
These two HTML entities seem to have been swapped? https://developer.mozilla.org/en-US/docs/Glossary/CRLF
| @@ -99,7 +128,7 @@ fn code_blocks_with_lang_class() { | |||
| #[test] | |||
| fn code_blocks_with_lang_class_on_pre_tag() { | |||
There was a problem hiding this comment.
Does the faithful mode break this test? Should the default convert() still work?
There was a problem hiding this comment.
Okay, this is a special test case in #14. Some libraries don't write the language class in the code tag.
There was a problem hiding this comment.
OK, thanks for pointing this out! I'll revert these changes to the test and run it in pure mode instead. I'll also add a link to #14 to explain the purpose of the test.
There was a problem hiding this comment.
Can we keep this file unchanged? So we can easily pull the latest tests from https://github.com/mixmark-io/turndown/blob/master/test/index.html
There was a problem hiding this comment.
I fully agree that we should be using tests from turndown. However:
- The tests don't cover some important cases; they're also wrong in some cases. For example, 2d5cdcc fixes Turndown's use of
_for emphasis, since it doesn't work inside words; the correct option is to use*; see Emphasis doesn't work within a word with_mixmark-io/turndown#441. - The tests are updated rarely. There's been one commit changing it this year, another commit last year, then the next commit is from 2020.
- The changes I introduced are easy to merge in.
So, I'd suggest merging in changes when they are (infrequently) made. I re-reviewed the latest from Turndown and saw one test I'd accidentally removed; the other changes were errors, poorer test coverage, etc.
There was a problem hiding this comment.
Reasonable. Perhaps we can simply use pure mode in Rust for the turndown test cases.
There was a problem hiding this comment.
I actually found several bugs using faithful mode on the Turndown test cases; only a few are really pure mode only. So, I'd prefer to keep these tests. In general, faithful mode is a stricter test and helps point out bugs that pure mode ignores.
|
Thanks for the review! I've pushed changes that addresses two of the comments. Still thinking about replacing regex. |
|
I pushed a commit that replaced |
Oops, I realized I didn't push it. Pushed regex fixes plus test fixes. |
|
All look good to me, Merged! Thanks again for the work! |
|
Thanks for the merge! I'm really excited to see this feature in htmd -- it's unique and provides some neat capabilities. |

This PR adds what I call "faithful mode" -- if the HTML to translate can't be exactly captured by Markdown, then this mode outputs the HTML instead. For example, the HTML snippet
<em>Hello</em>becomes*Hello*, while<span id="foo">Hello</span>becomesHelloin the default translation mode (pure), but stays as<span id="foo">Hello</span>in faithful translation mode.