-
-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Description
Currently, we handle \r\n explicitly in the lexer. We should do this at the file read time instead.
Motivation:
-
Line endings should not affect semantics of the language. For example, git on windows by default checkouts with
\r\nline endings, and it would be bad if compiling code on windows led to observably different results. This could be handled on a case-by-case basis in the lexer (the current approach), but it's easy to miss something (for example Raw Byte strings do not handle\r#60604). Additionally, proc macros currently see original tokens, and so can observe different line endings. The simplest way to make compiler lineending-invariant is to normalize line endings as soon as possible. -
For IDE features, which work close to the source code, and especially for code generation during refactorings, the surface are where you need to handle different line endings is much larger. It would be easier for IDE to assume
\nline endings and convert at one place, at the boundary.
i propose that we convert \r\n once, in SourceFile::new constructor. Note that we already do BOM-removal there, so, in theory, all code should already be prepared to mismatch between in-memory and on-disk file content. The replacement shortens the string, so it can be a pretty fast in-place transformation.
One technical question is what to do with isolated \r? I see two options:
- leave them as is. This makes the transform non-idempotent though:
\r\r\n->\r\n, and requires other code to explicitely not treat\r\nas line ending. - error at file load time if there's a lone
\rin the source text. That is, after normalization,\rcan not be found in the source code at all.
I propose to go with the second option it's slightly more annoying to implement, but seems more robust.