Do not warn about similar ASCII-only idents#2923
Do not warn about similar ASCII-only idents#2923pyfisch wants to merge 1 commit intorust-lang:masterfrom pyfisch:patch-2
Conversation
See discussion at rust-lang/rust#55467 (comment) ff. Rename `less_used_codepoints` to `uncommon_codepoints` (rust-lang/rust#55467 (comment))
|
One caveat of ignoring ASCII-only identifiers is that the compiler still warns about mixed script identifiers with a potentially confusable ASCII character. By this metric |
|
Looks great. By the way, another unsolved question:
Reading the actual generated table, the answer is also clearly 'yes', i think. The mixed script confusable alphabet for ascii latin is: and a single lib.rs with cc @Manishearth |
|
@crlf0710 that table seems incorrectly generated. |
|
oh, never mind |
|
Yes, we need a Latin exception, because the stdlib is Latin. The RFC states this outright. |
|
Should I just remove the (un)resolved question about |
|
No, that's not how unresolved questions work, we resolve them in the tracking issue if necessary. |
|
Implementation adjustment is ready at rust-lang/rust#72069 |
Is there some way we can implement the notion that the actual confusable characters must be from different scripts? |
|
That's already in the RFC in the heuristics for mixed_script_confusables |
Aren't them the same? Person and 人 have the same meaning but there is no easy way to write el or e1 in Chinese or Japanese. So a mix of characters is still likely needed, such as 人1 or 人2, it would be weird to see 人一 or 人二. I don't know if writing variables in non-latin would be helpful but a mix for numeric might be needed. |
No, one is rén + the letter e + the number 1, the other is rén + the letter e + the lowercase letter l |
|
I would rather use @joshtriplett's heuristic here. mixed_script_confusables already uses this, we just need to reuse this: we lint on different strings that map to each other via mixed-script-confusables only. |
|
@Manishearth Ah, you are talking about the By the way, how did you type |
|
@pickfire sigh the point is that the current patch catches that case incorrectly because it contains a non-confusable non-ascii character as well. I have a compose key on my keyboard. I also have a chinese input method set up, but I wanted to specifically list out the word there. |
| [TR39RestrictionLevel]: https://www.unicode.org/reports/tr39/#Restriction_Level_Detection | ||
| [unicode-set-confusables]: https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AIdentifier_Status%3DAllowed%3A%5D%26%5B%3AXID_Continue%3DYes%3A%5D%26%5B%3AConfusable_MA%3A%5D%5D&g=&i=Confusable_MA%2CScript_Extensions | ||
| [unicode-set-allowed]: https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AIdentifier_Status%3DAllowed%3A%5D%26%5B%3AXID_Continue%3DYes%3A%5D%5D&g=&i=Script_Extensions No newline at end of file | ||
| [unicode-set-allowed]: https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AIdentifier_Status%3DAllowed%3A%5D%26%5B%3AXID_Continue%3DYes%3A%5D%5D&g=&i=Script_Extensions |
There was a problem hiding this comment.
Is this link correct? Why do I get 404?
There was a problem hiding this comment.
the tool is down right now, unicode is having IT issues
|
I don't think this is considered anymore. |
See discussion at rust-lang/rust#55467 (comment) ff.
Rename
less_used_codepointstouncommon_codepoints(rust-lang/rust#55467 (comment))cc @crlf0710