Skip to content

Conversation

@graphemecluster
Copy link
Contributor

@graphemecluster graphemecluster commented Jan 24, 2025

This PR fixes several issues related to identifiers scanning:

For regular expression group name

For all identifiers

- Added the `scanIdentifierStart` method
- Refactored `scanIdentifierParts` & `scanIdentifier`

Implementing the above automatically fixes another issue that Unicode
escapes & extended Unicode escapes are not recognised at the beginning
of a RegExp group name.
@typescript-bot typescript-bot added the For Uncommitted Bug PR for untriaged, rejected, closed or missing bug label Jan 24, 2025
Copy link
Contributor Author

@graphemecluster graphemecluster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to leave out the last commit when reviewing for a better diff.

// "-" and ":" are valid in JSX Identifiers
(identifierVariant === LanguageVariant.JSX ? (ch === CharacterCodes.minus || ch === CharacterCodes.colon) : false) ||
// "-" is valid in JSX Identifiers. ":" is part of JSXNamespacedName but not JSXIdentifier.
identifierVariant === LanguageVariant.JSX && ch === CharacterCodes.minus ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this line doesn't affect everything in the test suite, but I am not sure if it affects code completion.

ch = peekExtendedUnicodeEscape();
if (ch >= 0 && isIdentifierPart(ch, languageVersion)) {
if (ch >= 0 && isIdentifierPart(ch, languageVersion, identifierVariant)) {
result += text.substring(start, pos);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the missing line that causes #61043.

pos += charSize(ch);
return token; // Still `SyntaxKind.Unknown`
tokenFlags = TokenFlags.None;
return scanIdentifier(ScriptTarget.ESNext);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested and it does not affect anything in the test suite even if pos is not advanced.

return token = SyntaxKind.Identifier;
}

function scanIdentifier(languageVersion: ScriptTarget, identifierVariant?: LanguageVariant | "RegExpGroupName") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, either an internal value can be added into LanguageVariant, or an enum called IdentifierVariant can be created, but for the latter case, I am not sure if isIdentifierPart and isIdentifierText should also be amended since it might be breaky, at least at type level.

Comment on lines +277 to +281
/(?<\u{D800}\u{DC00}>)\k<\u{D800}\u{DC00}>/;

!!! error TS1514: Expected a capturing group name.
~~~~~~~~
!!! error TS1538: Unicode escape sequences are only available when the Unicode (u) flag or the Unicode Sets (v) flag is set.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first escape sequence error is not reported as it starts at the same position as the TS1514 error. Is it really desirable? Personally, I would prefer to have an additional check that the last error is not of length zero, in which case it is not considered to be overlapping.

@graphemecluster
Copy link
Contributor Author

graphemecluster commented Nov 4, 2025

@RyanCavanaugh I understand that your Team is busy working on Corsa, but since you just pinged me regarding another regex issue, would it be possible for you to take some time to have a look at this pull request too?

Edit: Due to Ron’s unfortunate situation, please allow me to bring this to the attention of an extra person. @jakebailey?

@jakebailey
Copy link
Member

I intend to review this, yes, though I will note that whatever we add here is going to have to get ported to the new Go compiler so hopefully it's not too complicated or tied to JS strings

@jakebailey jakebailey requested review from jakebailey and removed request for rbuckton November 17, 2025 23:14
@typescript-bot typescript-bot added For Backlog Bug PRs that fix a backlog bug and removed For Uncommitted Bug PR for untriaged, rejected, closed or missing bug labels Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

For Backlog Bug PRs that fix a backlog bug

Projects

Status: Not started

Development

Successfully merging this pull request may close these issues.

Text between Unicode escapes within an identifier is skipped

3 participants