Fix legacy text conversion filter for CP50220

alexdowad · alexdowad · commit 3517a70f93e9 · 2022-08-16T16:43:27.000+02:00
CP50220 converts some codepoints which represent kana (hiragana/katakana) to a different form. This is the only difference between CP50220 and CP50221 (which doesn't perform such conversion). In some cases, this conversion means collapsing two codepoints to a single output byte sequence. Since the legacy text conversion filters only worked a byte at a time, the legacy filter had to cache a byte, then wait until it was called again with the next byte to compare the cached byte with the following one. That was all fine, but it didn't work as intended when there were errors (invalid byte sequences) in the input. Our code (both old and new) for emitting error markers recursively calls the same conversion filter. When the old CP50220 filter was called recursively, the logic for managing cached bytes did not behave as intended. As a result, the error markers could be reordered with other characters in the output. I used an ugly hack to fix this in 6938e35; when making a recursive call to emit an error marker, temporarily swap out `filter->filter_function` to bypass the byte-caching code, so the error marker immediately goes through to the output. This worked, but I overlooked the fact that the very same problem can occur if an invalid byte sequence is detected *in the flush function*. Apply the same (ugly) fix.
diff --git a/ext/mbstring/libmbfl/filters/mbfilter_cp5022x.c b/ext/mbstring/libmbfl/filters/mbfilter_cp5022x.c
@@ -555,7 +555,9 @@ static int mbfl_filt_conv_wchar_cp50220_flush(mbfl_convert_filter *filter)
 
 	if (filter->cache) {
 		int s = mb_convert_kana_codepoint(filter->cache, 0, NULL, NULL, mode);
+		filter->filter_function = mbfl_filt_conv_wchar_cp50221;
 		mbfl_filt_conv_wchar_cp50221(s, filter);
+		filter->filter_function = mbfl_filt_conv_wchar_cp50220;
 		filter->cache = 0;
 	}
 

Original file line number	Diff line number	Diff line change
`@@ -555,7 +555,9 @@ static int mbfl_filt_conv_wchar_cp50220_flush(mbfl_convert_filter *filter)`
`555`	`555`
`556`	`556`	`if (filter->cache) {`
`557`	`557`	`int s = mb_convert_kana_codepoint(filter->cache, 0, NULL, NULL, mode);`
	`558`	`+ filter->filter_function = mbfl_filt_conv_wchar_cp50221;`
`558`	`559`	`mbfl_filt_conv_wchar_cp50221(s, filter);`
	`560`	`+ filter->filter_function = mbfl_filt_conv_wchar_cp50220;`
`559`	`561`	`filter->cache = 0;`
`560`	`562`	`}`
`561`	`563`