You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Custom encode: : {"a":["\u0000","\t","a","\u00f1","\u2665","\ud834\udd1e"]}
json2 encode: : {"a":["\u0000","\t","a","{c3}{b1}","\u2665","{f0}{9d}{84}{9e}"]}
json_emoji/json_emoji.v:31: FAIL: fn main.main: assert e2 == input
left value: e2 = {"a":["\u0000","\t","a","ñ","\u2665","𝄞"]}
right value: input = {"a":["\u0000","\t","a","\u00f1","\u2665","\ud834\udd1e"]}
V panic: Assertion failed...
v hash: bbb61ab
/tmp/v_1000/json_emoji.01K2QAZ1E40Y1X0SAN2VN3EKAY.tmp.c:4718: at _v_panic: Backtrace
/tmp/v_1000/json_emoji.01K2QAZ1E40Y1X0SAN2VN3EKAY.tmp.c:10753: by main__main
/tmp/v_1000/json_emoji.01K2QAZ1E40Y1X0SAN2VN3EKAY.tmp.c:10869: by main
What did you expect to see?
pass all asserts
Problem
JSON spec strings encodes unicode with only two bytes in the form \uxxxx where x is in 0-F range. As a comparison TOML use both formats uxxx and Uxxxxxxxx to store runes up to four bytes. In order JSON can store runes with more than 2 bytes a surrogate is adviced
To escape a code point that is not in the Basic Multilingual Plane, the character may be represented as a
twelve-character sequence, encoding the UTF-16 surrogate pair corresponding to the code point. So for
example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E".
The program attached in this issue includes a custom string encoder with the surrogate pair algorithm to encode runes like 𝄞 described in pdf link above with value U+1D11E that should be surrogate pair \ud834\udd1e. As a test an array of five runes is expected to be compared against an encoder.
Results
json2 encoder encodes ok 2-byte runes like ♥ (See #25103), but don't have programmed the surrogate pair to encode runes like 𝄞. Seems also runes like ñ cannot be encoded properly. What json2 does with higher runes is output the codes, first one always > 0x7F against the JSON spec.
V version: V 0.4.11 bbb61ab, press to see full `v doctor` output
What did you do?
./v -g -o vdbg cmd/v && ./vdbg json_emoji/json_emoji.v && json_emoji/json_emojiWhat did you see?
What did you expect to see?
pass all asserts
Problem
JSON spec strings encodes unicode with only two bytes in the form
\uxxxxwhere x is in0-Frange. As a comparison TOML use both formatsuxxxandUxxxxxxxxto store runes up to four bytes. In order JSON can store runes with more than 2 bytes a surrogate is advicedThe surrogate pair algorithm is described here
Code
The program attached in this issue includes a custom string encoder with the surrogate pair algorithm to encode runes like 𝄞 described in pdf link above with value U+1D11E that should be surrogate pair
\ud834\udd1e. As a test an array of five runes is expected to be compared against an encoder.Results
json2 encoder encodes ok 2-byte runes like ♥ (See #25103), but don't have programmed the surrogate pair to encode runes like 𝄞. Seems also runes like
ñcannot be encoded properly. What json2 does with higher runes is output the codes, first one always > 0x7F against the JSON spec.So the idea is to incorporate a surrogate pair algorithm in json2 encode string function https://github.com/vlang/v/blob/master/vlib/x/json2/encoder.v#L472 . I think one of the json2 mantainers can make a better PR to extended the encoder than me.
What's next
json2decoder seems that partially decode properly all the runes.jsonis the same story.Note
You can use the 👍 reaction to increase the issue's priority for developers.
Please note that only the 👍 reaction to the issue itself counts as a vote.
Other reactions and those to comments will not be taken into account.