-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Optimize storage of icu locale data #45296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
to reduce SPC size by about 10k and avoid allocating huge string on the heap Fixes dotnet#45129
src/libraries/System.Private.CoreLib/src/System/Globalization/IcuLocaleData.cs
Show resolved
Hide resolved
| { | ||
| return (left); | ||
| // Binary search the array | ||
| while (lo <= hi) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The corefx coding conventions have { on new line.
|
What generates the new table ? |
There is a short program included at the top as comment |
|
There is opportunity to optimize more here by removing the whole LCID tables/data and use the ICU APIs https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uloc_8h.html#a5a5d08ba5eca798e1ee9e3eb981d6b39 and https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uloc_8h.html#af513851228029d0e63316af1d0039fd8 but this can done be later and don't have to block this PR. Thanks @marek-safar for getting this done. |
| // Table which holds index into LocalesNames data and length of the string for each locale | ||
| // Values are binary searched and need to be sorted alphabetically | ||
| // | ||
| private static ushort[] s_localesNamesIndexes = new ushort[CulturesCount] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readonly
|
|
||
| Console.WriteLine("};"); | ||
|
|
||
| Console.WriteLine("private static ushort[] s_localesNamesIndexes = new ushort [] {"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static readonly ushort[] s_localesNamesIndexes
|
|
||
| Debug.Assert(s_localeNamesIndices[s_localeNamesIndices.Length - 1] == c_localeNames.Length); | ||
| Span<byte> lower_case = stackalloc byte[name.Length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the name that comes in here ever be provided by a caller, such that (in an erroneous case) this could crash with a stack overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there is check for length 2 lines above https://github.com/dotnet/runtime/pull/45296/files#diff-423b39c173efa91a947f7f39560ae5aa07a690040b4cc6438240107df45bdda9R3845
| 0x7c68 << 16 | 1555 << 4 | 7, // ha-latn | ||
| 0x7c86 << 16 | 2444 << 4 | 8, // quc-latn | ||
| 0x7c92 << 16 | 1888 << 4 | 7, // ku-arab | ||
| 0x1007f << 16 | 3175 << 4 | 11, // x-iv_mathan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These overflow 0x1007f and are truncated 0x007f; working on a fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix is in #45643
to reduce SPC size by about 10k and avoid allocating huge string on the heap
Fixes #45129