Skip to content

decoder2: Add support for surrogates#25193

Merged
spytheman merged 1 commit into
vlang:masterfrom
Larsimusrex:add_surrogate_decode
Aug 30, 2025
Merged

decoder2: Add support for surrogates#25193
spytheman merged 1 commit into
vlang:masterfrom
Larsimusrex:add_surrogate_decode

Conversation

@Larsimusrex

Copy link
Copy Markdown
Contributor

Will now decode utf-16 surrogates, used by some encoders for characters outside the bilingual plane.

println(decoder2.decode[string](r'"\ud83d\ude00"')!) // '😀'

@huly-for-github

Copy link
Copy Markdown

Connected to Huly®: V_0.6-24328


fn test_surrogate() {
assert decoder2.decode[string](r'"\ud83d\ude00"')! == '😀'
assert decoder2.decode[string](r'"\ud83d\ude00 text"')! == '😀 text'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be the result, if you later JSON encode the decoded string?
Can you please add a test for that too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the json2 encoder currently handles these characters incorrectly see #25115. In my new implementation it outputs utf-8 by default unless specified otherwise

@spytheman

Copy link
Copy Markdown
Contributor

Excellent work @Larsimusrex.

I am curious, what are some of the JSON encoders, that produce such output?
If they are easy to install on the CI, we can add round trip tests with them.

@Larsimusrex

Copy link
Copy Markdown
Contributor Author

Python definitely does it by default. I think java and c# too.

@spytheman

Copy link
Copy Markdown
Contributor

Python is already preinstalled on the CI, so we can add a V test, that invokes a python program, that generates a json encoded value with surrogates on stdout, then decodes it and asserts on the output.

We also support // vtest build: present_python? so that the V test could be skipped on environments that do not have Python.

@spytheman spytheman merged commit 24f9128 into vlang:master Aug 30, 2025
79 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants