Do not warn about similar ASCII-only idents#2923

pyfisch · 2020-05-08T08:12:56Z

See discussion at rust-lang/rust#55467 (comment) ff.

Rename less_used_codepoints to uncommon_codepoints (rust-lang/rust#55467 (comment))

See discussion at rust-lang/rust#55467 (comment) ff. Rename `less_used_codepoints` to `uncommon_codepoints` (rust-lang/rust#55467 (comment))

pyfisch · 2020-05-08T08:19:18Z

One caveat of ignoring ASCII-only identifiers is that the compiler still warns about mixed script identifiers with a potentially confusable ASCII character. By this metric person_e1 and person_el are not confusable but 人_e1 and 人_el are confusable.

crlf0710 · 2020-05-08T08:19:36Z

Looks great.

By the way, another unsolved question:

In mixed_script_confusables, do we actually need to make an exception for Latin identifiers?

Reading the actual generated table, the answer is also clearly 'yes', i think.

The mixed script confusable alphabet for ascii latin is:
ABCEHIJKMNOPSTUVWXYZ
abcefghijlnopqrsuvwxy

and a single lib.rs with pub fn foo() {} will trigger the mixed_script_confusable lint if the exception is not made.

cc @Manishearth

Manishearth · 2020-05-08T15:21:36Z

@crlf0710 that table seems incorrectly generated.

Manishearth · 2020-05-08T15:22:54Z

oh, never mind

Manishearth · 2020-05-08T15:23:59Z

Yes, we need a Latin exception, because the stdlib is Latin. The RFC states this outright.

pyfisch · 2020-05-09T09:34:25Z

Should I just remove the (un)resolved question about mixed_script_confusables?

Manishearth · 2020-05-09T13:41:08Z

No, that's not how unresolved questions work, we resolve them in the tracking issue if necessary.

crlf0710 · 2020-05-10T13:50:22Z

Implementation adjustment is ready at rust-lang/rust#72069

joshtriplett · 2020-05-12T22:13:23Z

One caveat of ignoring ASCII-only identifiers is that the compiler still warns about mixed script identifiers with a potentially confusable ASCII character. By this metric person_e1 and person_el are not confusable but 人_e1 and 人_el are confusable.

Is there some way we can implement the notion that the actual confusable characters must be from different scripts?

Manishearth · 2020-05-12T22:28:40Z

That's already in the RFC in the heuristics for mixed_script_confusables

pickfire · 2020-05-13T03:31:34Z

One caveat of ignoring ASCII-only identifiers is that the compiler still warns about mixed script identifiers with a potentially confusable ASCII character. By this metric person_e1 and person_el are not confusable but 人_e1 and 人_el are confusable.

Aren't them the same? Person and 人 have the same meaning but there is no easy way to write el or e1 in Chinese or Japanese. So a mix of characters is still likely needed, such as 人1 or 人2, it would be weird to see 人一 or 人二.

I don't know if writing variables in non-latin would be helpful but a mix for numeric might be needed.

Manishearth · 2020-05-13T04:51:20Z

Aren't them the same?

No, one is rén + the letter e + the number 1, the other is rén + the letter e + the lowercase letter l

Manishearth · 2020-05-13T04:52:30Z

I would rather use @joshtriplett's heuristic here. mixed_script_confusables already uses this, we just need to reuse this: we lint on different strings that map to each other via mixed-script-confusables only.

pickfire · 2020-05-13T09:19:00Z

@Manishearth Ah, you are talking about the e1 and el. But isn't that up to the font used by the developer, should we even limit that?

By the way, how did you type rén?

Manishearth · 2020-05-13T14:38:14Z

@pickfire sigh the point is that the current patch catches that case incorrectly because it contains a non-confusable non-ascii character as well.

I have a compose key on my keyboard. I also have a chinese input method set up, but I wanted to specifically list out the word there.

pickfire · 2020-05-13T15:53:34Z

text/2457-non-ascii-idents.md

 [TR39RestrictionLevel]: https://www.unicode.org/reports/tr39/#Restriction_Level_Detection
 [unicode-set-confusables]: https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AIdentifier_Status%3DAllowed%3A%5D%26%5B%3AXID_Continue%3DYes%3A%5D%26%5B%3AConfusable_MA%3A%5D%5D&g=&i=Confusable_MA%2CScript_Extensions
-[unicode-set-allowed]: https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AIdentifier_Status%3DAllowed%3A%5D%26%5B%3AXID_Continue%3DYes%3A%5D%5D&g=&i=Script_Extensions
+[unicode-set-allowed]: https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AIdentifier_Status%3DAllowed%3A%5D%26%5B%3AXID_Continue%3DYes%3A%5D%5D&g=&i=Script_Extensions


Is this link correct? Why do I get 404?

the tool is down right now, unicode is having IT issues

pyfisch · 2020-07-02T11:58:57Z

I don't think this is considered anymore.

Do not warn about similar ASCII-only idents

1225ab4

See discussion at rust-lang/rust#55467 (comment) ff. Rename `less_used_codepoints` to `uncommon_codepoints` (rust-lang/rust#55467 (comment))

crlf0710 mentioned this pull request May 10, 2020

Refactor and adjust non_ascii_idents lints. rust-lang/rust#72069

Merged

pickfire reviewed May 13, 2020

View reviewed changes

pyfisch closed this Jul 2, 2020

Conversation

pyfisch commented May 8, 2020

Uh oh!

pyfisch commented May 8, 2020

Uh oh!

crlf0710 commented May 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Manishearth commented May 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Manishearth commented May 8, 2020

Uh oh!

Manishearth commented May 8, 2020

Uh oh!

pyfisch commented May 9, 2020

Uh oh!

Manishearth commented May 9, 2020

Uh oh!

crlf0710 commented May 10, 2020

Uh oh!

joshtriplett commented May 12, 2020

Uh oh!

Manishearth commented May 12, 2020

Uh oh!

pickfire commented May 13, 2020

Uh oh!

Manishearth commented May 13, 2020

Uh oh!

Manishearth commented May 13, 2020

Uh oh!

pickfire commented May 13, 2020

Uh oh!

Manishearth commented May 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pickfire May 13, 2020

Choose a reason for hiding this comment

Uh oh!

Manishearth May 13, 2020

Choose a reason for hiding this comment

Uh oh!

pyfisch commented Jul 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

crlf0710 commented May 8, 2020 •

edited

Loading

Manishearth commented May 8, 2020 •

edited

Loading

Manishearth commented May 13, 2020 •

edited

Loading

pyfisch commented Jul 2, 2020 •

edited

Loading