Fix incorrect atomic loop optimization when body contains backtracking#124254
Fix incorrect atomic loop optimization when body contains backtracking#124254stephentoub merged 1 commit intodotnet:mainfrom
Conversation
MakeLoopAtomic wraps a loop in an Atomic group, which discards all backtracking state from the body. However, CanBeMadeAtomic only proves that giving back iterations is unnecessary... it does not account for within-body backtracking. When the body itself contains backtracking constructs, the atomic wrapper incorrectly suppresses that internal backtracking, producing wrong match results. The fix adds a MayContainBacktracking check that walks the loop body tree before allowing MakeLoopAtomic. A loop is only made atomic when the body has no backtracking constructs of its own. The backtracking-construct detection logic is extracted into a reusable IsBacktrackingConstruct property on RegexNode, which RegexTreeAnalyzer now also uses, removing the duplicated switch. Tests are added covering lazy/greedy single-char loops, char-class loops, alternations, and general multi-char loops inside outer loop bodies, as well as cases where the optimization should still apply (non-backtracking bodies).
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
There was a problem hiding this comment.
Pull request overview
Fixes an unsound regex optimization where wrapping certain loops in an Atomic group could incorrectly suppress within-body backtracking (not just “giving back” loop iterations), leading to wrong match results. The PR adds a conservative check to ensure a loop body has no intrinsic backtracking constructs before applying MakeLoopAtomic, and refactors backtracking-construct detection into a reusable RegexNode property.
Changes:
- Add a subtree walk (
MayContainBacktracking) to blockMakeLoopAtomicwhen the loop body contains backtracking constructs. - Extract backtracking-construct detection into
RegexNode.IsBacktrackingConstructand reuse it fromRegexTreeAnalyzer. - Add regression tests covering nested-loop cases where atomic wrapping must not suppress inner backtracking.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs | Adds regression cases for loops whose bodies require backtracking to produce correct matches. |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexTreeAnalyzer.cs | Replaces a duplicated switch with RegexNode.IsBacktrackingConstruct. |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs | Adds IsBacktrackingConstruct + MayContainBacktracking and gates MakeLoopAtomic accordingly. |
Comments suppressed due to low confidence (1)
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs:2612
- The MayContainBacktracking summary/remarks mention “variable-width loops”, but the detection is based on
IsBacktrackingConstructwhich keys offM != N(variable iteration count) and alternation. Consider rewording the summary to avoid implying it’s about width rather than iteration variability.
/// <summary>
/// Checks whether a node tree may contain backtracking constructs (variable-width loops or alternations).
/// </summary>
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
Show resolved
Hide resolved
|
/ba-g failures are unrelated |
|
/backport to release/10.0 |
|
Started backporting to |
MakeLoopAtomic wraps a loop in an Atomic group, which discards all backtracking state from the body. However, CanBeMadeAtomic only proves that giving back iterations is unnecessary... it does not account for within-body backtracking. When the body itself contains backtracking constructs, the atomic wrapper incorrectly suppresses that internal backtracking, producing wrong match results.
The fix adds a MayContainBacktracking check that walks the loop body tree before allowing MakeLoopAtomic. A loop is only made atomic when the body has no backtracking constructs of its own. The backtracking-construct detection logic is extracted into a reusable IsBacktrackingConstruct property on RegexNode, which RegexTreeAnalyzer now also uses, removing the duplicated switch.