Improve snippet case transforms suport for non-Latin scripts (fix: #286165)#287150

lucas-gomes-santana · 2026-01-12T18:33:14Z

Description

This PR was made to solve a problem reported on Issue #286165, and the objective is improves snippet case transforms by replacing ASCII-only regular expressions with Unicode-aware patterns and locale-aware case mapping.

Previously, snippet transforms such as upcase, downcase, camelcase, pascalcase, kebabcase, and snakecase relied on [a-zA-Z]-based matching. As a result, non-Latin input (e.g. Cyrillic or Greek) was not recognized correctly and transforms were silently skipped, producing no output changes at all.

The changes in this PR:

Use Unicode property escapes (\p{L}, \p{Lu}, \p{Ll}, \p{Nd}) to properly detect letters and numbers across modern scripts.
Use locale-aware casing (toLocaleLowerCase / toLocaleUpperCase) instead of ASCII-only case conversion.
Preserve existing behavior for Latin input while improving support for scripts that have uppercase/lowercase distinctions (e.g. Cyrillic, Greek).

Limitations

This change does not aim to provide a fully language-aware or linguistically perfect solution for all scripts.
Word-based transforms (camelCase, PascalCase, kebab-case, snake_case) inherently rely on uppercase/lowercase transitions and therefore cannot be meaningfully applied to scripts without case (e.g. Chinese, Japanese, Arabic, Hebrew).

For such scripts, transforms effectively become no-ops, which is consistent with current behavior and preferable to producing arbitrary or destructive output.

Summary

Fixes silent failures for non-Latin input in snippet transforms
Improves Unicode correctness without breaking existing behavior
Clearly scoped as an incremental improvement, not a universal linguistic solution

Final inputs:

одинДва -> ОДИНДВА одиндва одинДва ОдинДва один-два один_два (Russian)
一个测试 -> 一个测试 一个测试 一个测试 一个测试 一个测试 一个测试 (Simplefied Chinese)
έναςΔύο -> ΈΝΑΣΔΎΟ έναςδύο έναςΔύο ΈναςΔύο ένας-δύο ένας_δύο (Greek)
ひらがなカタカナ -> ひらがなカタカナ ひらがなカタカナ ひらがなカタカナ ひらがなカタカナ ひらがなカタカナ ひらがなカタカナ(Japonese Hiragana + Katakana)
하나둘 -> 하나둘 하나둘 하나둘 하나둘 하나둘 하나둘
одинДва3 -> ОДИНДВА3 одиндва3 одинДва3 ОдинДва3 один-два3 один_два3 (Russian with number)
ένας_δύο -> ΈΝΑΣ_ΔΎΟ ένας_δύο έναςΔύο ΈναςΔύο ένας-δύο ένας_δύο (Greek with underline)
こんにちはWorld -> こんにちはWORLD こんにちはworld こんにちはWorld こんにちはWorld world こんにちはworld (Japonese with english word)
واحدإثنين -> واحدإثنين واحدإثنين واحدإثنين واحدإثنين واحدإثنين واحدإثنين (Arabic)

Russian input before the regexs changes:

одинДва -> ОДИНДВА одиндва одинДва одинДва одинДва одиндва (wrong formatting)

vs-code-engineering · 2026-01-12T18:33:39Z

📬 CODENOTIFY

The following users are being notified based on files changed in this PR:

@jrieken

Matched files:

src/vs/editor/contrib/snippet/browser/snippetParser.ts
src/vs/editor/contrib/snippet/test/browser/snippetParser.test.ts

lucas-gomes-santana · 2026-01-12T18:36:37Z

@microsoft-github-policy-service agree

dmitrivMS

You mentioned adding some tests - I think it would be really good in this case.

src/vs/editor/contrib/snippet/browser/snippetParser.ts

lucas-gomes-santana · 2026-01-14T19:57:27Z

@dmitrivMS Now I added tests for the modified regexs, including a test with the turkish language. Waiting for review and feedback.

dmitrivMS · 2026-01-16T23:09:43Z

Apologies for the delay, I'll review and respond in about 24h.

lucas-gomes-santana

I think now these changes should work. I have mentionated the tests logs on past comments.

lucas-gomes-santana · 2026-01-25T15:25:40Z

Apologies for the delay, I'll review and respond in about 24h.

Did you review my changes now? I think everything is working now

jrieken

Thanks for this @lucas-gomes-santana

Improve Unicode support in snippet case transforms

3385dd0

Merge branch 'main' into fix/snippet-unicode-support

0042b74

lucas-gomes-santana mentioned this pull request Jan 12, 2026

Casing transforms in snippets only support Latin characters #286165

Closed

vs-code-engineering bot assigned jrieken Jan 12, 2026

dmitrivMS requested changes Jan 13, 2026

View reviewed changes

lucas-gomes-santana added 2 commits January 14, 2026 16:48

Merge branch 'microsoft:main' into fix/snippet-unicode-support

91b1d6b

Fix: Use toLowerCase instead of toLocaleLowerCase for case transforms

4d180b4

lucas-gomes-santana added 3 commits January 14, 2026 20:00

Merge branch 'microsoft:main' into fix/snippet-unicode-support

777e8fd

Merge branch 'microsoft:main' into fix/snippet-unicode-support

47bfada

Adding new tests for the changed regexs

7c06d59

lucas-gomes-santana requested a review from dmitrivMS January 14, 2026 23:55

Merge branch 'microsoft:main' into fix/snippet-unicode-support

7961740

lucas-gomes-santana commented Jan 19, 2026

View reviewed changes

jrieken approved these changes Jan 26, 2026

View reviewed changes

vs-code-engineering bot added this to the January 2026 milestone Jan 26, 2026

lucas-gomes-santana added 2 commits January 26, 2026 08:53

Merge branch 'microsoft:main' into fix/snippet-unicode-support

94003ca

Merge branch 'main' into fix/snippet-unicode-support

aec6cc8

jrieken enabled auto-merge January 26, 2026 16:05

joaomoreno approved these changes Jan 26, 2026

View reviewed changes

dmitrivMS approved these changes Jan 26, 2026

View reviewed changes

jrieken merged commit 283d8d0 into microsoft:main Jan 26, 2026
17 checks passed

ynss mentioned this pull request Jan 26, 2026

Git: Stash editor redirects to wrong stash when new stashes are created #290509

Open

lucas-gomes-santana deleted the fix/snippet-unicode-support branch January 28, 2026 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve snippet case transforms suport for non-Latin scripts (fix: #286165)#287150