-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
Translate \r\n -> \n when reading files #62865
Copy link
Copy link
Closed
Labels
A-frontendArea: Compiler frontend (errors, parsing and HIR)Area: Compiler frontend (errors, parsing and HIR)C-cleanupCategory: PRs that clean code up or issues documenting cleanup.Category: PRs that clean code up or issues documenting cleanup.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Metadata
Metadata
Assignees
Labels
A-frontendArea: Compiler frontend (errors, parsing and HIR)Area: Compiler frontend (errors, parsing and HIR)C-cleanupCategory: PRs that clean code up or issues documenting cleanup.Category: PRs that clean code up or issues documenting cleanup.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Currently, we handle
\r\nexplicitly in the lexer. We should do this at the file read time instead.Motivation:
Line endings should not affect semantics of the language. For example, git on windows by default checkouts with
\r\nline endings, and it would be bad if compiling code on windows led to observably different results. This could be handled on a case-by-case basis in the lexer (the current approach), but it's easy to miss something (for example Raw Byte strings do not handle\r#60604). Additionally, proc macros currently see original tokens, and so can observe different line endings. The simplest way to make compiler lineending-invariant is to normalize line endings as soon as possible.For IDE features, which work close to the source code, and especially for code generation during refactorings, the surface are where you need to handle different line endings is much larger. It would be easier for IDE to assume
\nline endings and convert at one place, at the boundary.i propose that we convert
\r\nonce, inSourceFile::newconstructor. Note that we already do BOM-removal there, so, in theory, all code should already be prepared to mismatch between in-memory and on-disk file content. The replacement shortens the string, so it can be a pretty fast in-place transformation.One technical question is what to do with isolated
\r? I see two options:\r\r\n->\r\n, and requires other code to explicitely not treat\r\nas line ending.\rin the source text. That is, after normalization,\rcan not be found in the source code at all.I propose to go with the second option it's slightly more annoying to implement, but seems more robust.