Commit 7914b8c
committed
Use pakutoma's encoding check functions for mb_detect_encoding even in non-strict mode
In 6fc8d01, pakutoma added specialized validity checking functions
for some legacy text encodings like ISO-2022-JP and UTF-7. These
check functions perform a more strict validity check than the encoding
conversion functions for the same text encodings. For example, the
check function for ISO-2022-JP verifies that the string ends in the
correct state required by the specification for ISO-2022-JP.
These check functions are already being used to make detection of text
encoding more accurate when 'strict' detection mode is enabled.
However, since the default is 'non-strict' detection (a bad API design
but we're stuck with it now), most users will not benefit from
pakutoma's work. I was previously reluctant to enable this new logic
for non-strict detection mode. My intention was to reduce the scope of
behavior changes, since almost *any* behavior change may affect *some*
user in a way we don't expect.
However, we definitely have users whose (production) code was broken
by the changes I made in 28b346b, and enabling pakutoma's check
functions for non-strict detection mode would un-break it. (See
GH-10192 as an example.) The added checks do also make sense.
In non-strict detection mode, we will not immediately reject candidate
encodings whose validity check function returns false; but they will
be much less likely to be selected. However, failure of the validity
check function is weighted less heavily than an encoding error detected
by the encoding conversion function.1 parent 3ab10da commit 7914b8c
File tree
3 files changed
+25
-13
lines changed- ext/mbstring
- tests
3 files changed
+25
-13
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1816 | 1816 | | |
1817 | 1817 | | |
1818 | 1818 | | |
1819 | | - | |
1820 | 1819 | | |
1821 | 1820 | | |
1822 | 1821 | | |
| |||
3006 | 3005 | | |
3007 | 3006 | | |
3008 | 3007 | | |
| 3008 | + | |
| 3009 | + | |
| 3010 | + | |
| 3011 | + | |
3009 | 3012 | | |
3010 | 3013 | | |
3011 | | - | |
| 3014 | + | |
3012 | 3015 | | |
3013 | 3016 | | |
3014 | | - | |
| 3017 | + | |
| 3018 | + | |
| 3019 | + | |
| 3020 | + | |
| 3021 | + | |
3015 | 3022 | | |
3016 | 3023 | | |
3017 | 3024 | | |
3018 | 3025 | | |
3019 | | - | |
3020 | | - | |
3021 | | - | |
3022 | 3026 | | |
3023 | 3027 | | |
3024 | 3028 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
78 | | - | |
| 78 | + | |
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
96 | | - | |
| 96 | + | |
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
| |||
174 | 174 | | |
175 | 175 | | |
176 | 176 | | |
177 | | - | |
| 177 | + | |
178 | 178 | | |
179 | 179 | | |
180 | 180 | | |
| |||
219 | 219 | | |
220 | 220 | | |
221 | 221 | | |
222 | | - | |
| 222 | + | |
223 | 223 | | |
224 | 224 | | |
225 | 225 | | |
| |||
228 | 228 | | |
229 | 229 | | |
230 | 230 | | |
231 | | - | |
| 231 | + | |
232 | 232 | | |
233 | 233 | | |
234 | 234 | | |
| |||
237 | 237 | | |
238 | 238 | | |
239 | 239 | | |
240 | | - | |
| 240 | + | |
241 | 241 | | |
242 | 242 | | |
243 | 243 | | |
| |||
246 | 246 | | |
247 | 247 | | |
248 | 248 | | |
249 | | - | |
| 249 | + | |
250 | 250 | | |
251 | 251 | | |
252 | 252 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
81 | 88 | | |
82 | 89 | | |
83 | 90 | | |
| |||
400 | 407 | | |
401 | 408 | | |
402 | 409 | | |
| 410 | + | |
403 | 411 | | |
404 | 412 | | |
405 | 413 | | |
| |||
0 commit comments