Commit 18e526c
committed
Fix legacy text conversion filter for SJIS-2004
EUC-JP-2004 includes special byte sequences starting with 0x8E
for kana. The legacy output routine for EUC-JP-2004 emits
these sequences if the value of the output variable `s` is
between 0x80 and 0xFF.
Since the same routine was also used for SJIS-2004 and
ISO-2022-JP-2004, before 8a915ed, the same 0x8E sequences
would be emitted when converting to those text encodings as well.
But that is completely wrong. 0x8E 0x__ does not mean the same
in SJIS-2004 or ISO-2022-JP-2004 as it does in EUC-JP-2004.
Therefore, in 8a915ed, I fixed the legacy conversion routine
by checking whether the output encoding is EUC-JP-2004 or not.
If it's not, and `s` is 0x80-0xFF, I made it emit an error.
Well, it turns out that single bytes with values from 0xA1
to 0xDF are meaningful in SJIS-2004. To emit these bytes when
appropriate, I had to amend the legacy conversion routine again.
(For clarity, this does NOT mean reverting to the behavior prior
to 8a915ed. We were right not to emit sequences starting with
0x8E in SJIS-2004. But in SJIS-2004, we *do* sometimes need to
emit single bytes from 0xA1-0xDF.)1 parent 3517a70 commit 18e526c
1 file changed
+2
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
633 | 633 | | |
634 | 634 | | |
635 | 635 | | |
| 636 | + | |
| 637 | + | |
636 | 638 | | |
637 | 639 | | |
638 | 640 | | |
| |||
0 commit comments