decoder2: Add support for surrogates#25193
Conversation
|
Connected to Huly®: V_0.6-24328 |
|
|
||
| fn test_surrogate() { | ||
| assert decoder2.decode[string](r'"\ud83d\ude00"')! == '😀' | ||
| assert decoder2.decode[string](r'"\ud83d\ude00 text"')! == '😀 text' |
There was a problem hiding this comment.
What will be the result, if you later JSON encode the decoded string?
Can you please add a test for that too?
There was a problem hiding this comment.
the json2 encoder currently handles these characters incorrectly see #25115. In my new implementation it outputs utf-8 by default unless specified otherwise
|
Excellent work @Larsimusrex. I am curious, what are some of the JSON encoders, that produce such output? |
|
Python definitely does it by default. I think java and c# too. |
|
Python is already preinstalled on the CI, so we can add a V test, that invokes a python program, that generates a json encoded value with surrogates on stdout, then decodes it and asserts on the output. We also support |
Will now decode utf-16 surrogates, used by some encoders for characters outside the bilingual plane.
println(decoder2.decode[string](r'"\ud83d\ude00"')!) // '😀'