Do not use the string bit vector in regexes, if it's all ones or all zeroes.#72317
Do not use the string bit vector in regexes, if it's all ones or all zeroes.#72317stephentoub merged 2 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsFixes #72312. When emitting code that matches regex character classes, if all other shortcuts fail, we resort to hardcoding whether ASCII characters belong in the set using a string as a bit vector, and calling However these analysis results are not perfect and cannot produce reliable results if for example the character class contains a subtraction. This PR checks the string bit vector after its construction to see if it contains only ones or zeroes and if so, emits code that does not use it.
|
|
Thanks. Are these new post-string-building cases exercised by our existing tests? i.e. does code coverage show all the new branches hit? |
Fixes #72312.
When emitting code that matches regex character classes, if all other shortcuts fail, we resort to hardcoding whether ASCII characters belong in the set using a string as a bit vector, and calling
CharInClassfor the other characters. Sometimes, based onRegexCharClass.Analyzewe determine that the character class contains either all ASCII characters or none of them, and skip the bit vector search.However these analysis results are not perfect and cannot produce reliable results if for example the character class contains a subtraction. This PR checks the string bit vector after its construction to see if it contains only ones or zeroes and if so, emits code that does not use it.