Skip to content

Fix incorrect atomic loop optimization when body contains backtracking#124254

Merged
stephentoub merged 1 commit intodotnet:mainfrom
stephentoub:fixatomicloop
Feb 11, 2026
Merged

Fix incorrect atomic loop optimization when body contains backtracking#124254
stephentoub merged 1 commit intodotnet:mainfrom
stephentoub:fixatomicloop

Conversation

@stephentoub
Copy link
Member

MakeLoopAtomic wraps a loop in an Atomic group, which discards all backtracking state from the body. However, CanBeMadeAtomic only proves that giving back iterations is unnecessary... it does not account for within-body backtracking. When the body itself contains backtracking constructs, the atomic wrapper incorrectly suppresses that internal backtracking, producing wrong match results.

The fix adds a MayContainBacktracking check that walks the loop body tree before allowing MakeLoopAtomic. A loop is only made atomic when the body has no backtracking constructs of its own. The backtracking-construct detection logic is extracted into a reusable IsBacktrackingConstruct property on RegexNode, which RegexTreeAnalyzer now also uses, removing the duplicated switch.

MakeLoopAtomic wraps a loop in an Atomic group, which discards all backtracking state from the body. However, CanBeMadeAtomic only proves that giving back iterations is unnecessary... it does not account for within-body backtracking. When the body itself contains backtracking constructs, the atomic wrapper incorrectly suppresses that internal backtracking, producing wrong match results.

The fix adds a MayContainBacktracking check that walks the loop body tree before allowing MakeLoopAtomic. A loop is only made atomic when the body has no backtracking constructs of its own. The backtracking-construct detection logic is extracted into a reusable IsBacktrackingConstruct property on RegexNode, which RegexTreeAnalyzer now also uses, removing the duplicated switch.

Tests are added covering lazy/greedy single-char loops, char-class loops, alternations, and general multi-char loops inside outer loop bodies, as well as cases where the optimization should still apply (non-backtracking bodies).
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an unsound regex optimization where wrapping certain loops in an Atomic group could incorrectly suppress within-body backtracking (not just “giving back” loop iterations), leading to wrong match results. The PR adds a conservative check to ensure a loop body has no intrinsic backtracking constructs before applying MakeLoopAtomic, and refactors backtracking-construct detection into a reusable RegexNode property.

Changes:

  • Add a subtree walk (MayContainBacktracking) to block MakeLoopAtomic when the loop body contains backtracking constructs.
  • Extract backtracking-construct detection into RegexNode.IsBacktrackingConstruct and reuse it from RegexTreeAnalyzer.
  • Add regression tests covering nested-loop cases where atomic wrapping must not suppress inner backtracking.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs Adds regression cases for loops whose bodies require backtracking to produce correct matches.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexTreeAnalyzer.cs Replaces a duplicated switch with RegexNode.IsBacktrackingConstruct.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs Adds IsBacktrackingConstruct + MayContainBacktracking and gates MakeLoopAtomic accordingly.
Comments suppressed due to low confidence (1)

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs:2612

  • The MayContainBacktracking summary/remarks mention “variable-width loops”, but the detection is based on IsBacktrackingConstruct which keys off M != N (variable iteration count) and alternation. Consider rewording the summary to avoid implying it’s about width rather than iteration variability.
        /// <summary>
        /// Checks whether a node tree may contain backtracking constructs (variable-width loops or alternations).
        /// </summary>

@stephentoub
Copy link
Member Author

/ba-g failures are unrelated

@stephentoub stephentoub enabled auto-merge (squash) February 11, 2026 18:22
@stephentoub stephentoub merged commit 5d62ba2 into dotnet:main Feb 11, 2026
94 of 100 checks passed
@stephentoub stephentoub deleted the fixatomicloop branch February 11, 2026 18:22
@stephentoub
Copy link
Member Author

/backport to release/10.0

@github-actions
Copy link
Contributor

Started backporting to release/10.0 (link to workflow run)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments