Commit 3517a70
committed
Fix legacy text conversion filter for CP50220
CP50220 converts some codepoints which represent kana
(hiragana/katakana) to a different form. This is the only difference
between CP50220 and CP50221 (which doesn't perform such conversion).
In some cases, this conversion means collapsing two codepoints to
a single output byte sequence. Since the legacy text conversion
filters only worked a byte at a time, the legacy filter had to
cache a byte, then wait until it was called again with the next
byte to compare the cached byte with the following one.
That was all fine, but it didn't work as intended when there were
errors (invalid byte sequences) in the input. Our code (both old
and new) for emitting error markers recursively calls the same
conversion filter. When the old CP50220 filter was called
recursively, the logic for managing cached bytes did not behave
as intended. As a result, the error markers could be reordered
with other characters in the output.
I used an ugly hack to fix this in 6938e35; when making a
recursive call to emit an error marker, temporarily swap out
`filter->filter_function` to bypass the byte-caching code,
so the error marker immediately goes through to the output.
This worked, but I overlooked the fact that the very same
problem can occur if an invalid byte sequence is detected
*in the flush function*. Apply the same (ugly) fix.1 parent 4b37033 commit 3517a70
1 file changed
+2
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
555 | 555 | | |
556 | 556 | | |
557 | 557 | | |
| 558 | + | |
558 | 559 | | |
| 560 | + | |
559 | 561 | | |
560 | 562 | | |
561 | 563 | | |
| |||
0 commit comments