Skip to content

Update spell-check dictionaries#3717

Merged
imnasnainaec merged 12 commits intomasterfrom
dict-update
Apr 16, 2025
Merged

Update spell-check dictionaries#3717
imnasnainaec merged 12 commits intomasterfrom
dict-update

Conversation

@imnasnainaec
Copy link
Collaborator

@imnasnainaec imnasnainaec commented Apr 8, 2025

The dictionary for Hindi (hi) has grown a lot since this was last done, so dropped the custom threshold adjustment (changing from -t 1000 to the default -t 2000) and added a max word-length (-m 6) to keep the updated src/resources/dictionaries/hi down to 2.4 MB.

The Spanish (es) dictionary has shrunk a bit, so added a custom threshold (from default -T 20000 to -T 15000) to maintain some of the file splitting.

Dropped numeral support, to clear noisy entries, especially in the French (fr) wordlist.

Switched from NFKD to NFD normalization to allow for unexpected characters and to match #3718.

This change is Reviewable

@codecov
Copy link

codecov bot commented Apr 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.32%. Comparing base (4a6f4d5) to head (25c1bd1).
Report is 29 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3717   +/-   ##
=======================================
  Coverage   73.32%   73.32%           
=======================================
  Files         285      285           
  Lines       10654    10656    +2     
  Branches     1327     1327           
=======================================
+ Hits         7812     7814    +2     
  Misses       2450     2450           
  Partials      392      392           
Flag Coverage Δ
backend 82.83% <ø> (ø)
frontend 65.91% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@imnasnainaec imnasnainaec marked this pull request as draft April 8, 2025 18:20
@imnasnainaec imnasnainaec marked this pull request as ready for review April 8, 2025 18:27
@imnasnainaec imnasnainaec marked this pull request as draft April 11, 2025 14:12
@imnasnainaec imnasnainaec marked this pull request as ready for review April 11, 2025 15:59
Copy link
Collaborator

@andracc andracc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 52 of 117 files at r1, 101 of 101 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @imnasnainaec)

@imnasnainaec imnasnainaec merged commit ed921c1 into master Apr 16, 2025
18 checks passed
@imnasnainaec imnasnainaec deleted the dict-update branch April 16, 2025 18:10
andracc pushed a commit that referenced this pull request Apr 22, 2025
The dictionary for Hindi (hi) has grown a lot since this was last done, so dropped the custom threshold adjustment (changing from -t 1000 to the default -t 2000) and added a max word-length (-m 6) to keep the updated src/resources/dictionaries/hi down to 2.4 MB.

The Spanish (es) dictionary has shrunk a bit, so added a custom threshold (from default -T 20000 to -T 15000) to maintain some of the file splitting.

Dropped numeral support, to clear noisy entries, especially in the French (fr) wordlist.

Switched from NFKD to NFD normalization to allow for unexpected characters and to match #3718.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants