Avoid allocating Regex when using static methods#496
Avoid allocating Regex when using static methods#496stephentoub merged 2 commits intodotnet:masterfrom
Conversation
Regex supports a cache that's meant to be used for the static methods, e.g. the static Regex.IsMatch. However, currently that cache isn't caching the actual Regex class, but rather a sub object that stores some of the state from the Regex and then is used to rehydrate a new Regex instance. This means we allocate a new Regex object when these static methods are used, even if the data is found in the cache. This means an operation like the static Regex.IsMatch is never allocation-free. There's also weirdness around this caching, in that when a new Regex instance is constructed, it checks the cache (paying the relevant lookup costs) even though it doesn't add back to the cache, so the only way it would ever find something in the cache is if the same set of inputs (e.g. pattern, options, timeout, etc.) are used with both the Regex ctor and with the static methods, where the static methods would populate the cache that's then consulted by the ctor. This adds unnecessary expense on the common path for a very uncommon situation; it also complicates the code non-trivially. This commit changes things to cleanly separate the cache from Regex, and to actually cache the Regex instance itself.
src/libraries/System.Text.RegularExpressions/tests/Regex.Cache.Tests.cs
Outdated
Show resolved
Hide resolved
b808a24 to
862c173
Compare
eerhardt
left a comment
There was a problem hiding this comment.
This looks really good. I just had a few comments on places that confused me while reading the code.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.Cache.cs
Show resolved
Hide resolved
|
@stephentoub just curious, did you also run the other 2 regex perf classes in the perf repo (Perf_Regex and RegexRedux). I am curious how these look with all your recent changes. |
I have. I'll post numbers in my next PR. Thanks. |
|
The benchmarks games "best" C# entry for regex-redux now invokes PCRE. Would be interesting to see how close/far we are from that with our regex implementation and the recent improvements... |
It would (we even have a Q6600 we could boot up to measure officially) but as I understand it, that entry avoids the widening to a .NET string that we're still inevitably doing. (Unless we have something like https://github.com/dotnet/corefx/issues/24145 or an equivalent) |
|
Failures are due to https://github.com/dotnet/core-eng/issues/8456 |
| !left.Equals(right); | ||
|
|
||
| public override int GetHashCode() => | ||
| _pattern.GetHashCode() ^ |
There was a problem hiding this comment.
HashCode.Combine ?
Although in this case, perf is far more important than entropy. Plus, the problem with ^ that it coalesces nulls isn't an issue these won't be null.
Regex supports a cache that's meant to be used for the static methods, e.g. the static Regex.IsMatch. However, currently that cache isn't caching the actual Regex class, but rather a sub object that stores some of the state from the Regex and then is used to rehydrate a new Regex instance. This means we allocate a new Regex object when these static methods are used, even if the data is found in the cache. This means an operation like the static Regex.IsMatch is never allocation-free. There's also weirdness around this caching, in that when a new Regex instance is constructed, it checks the cache (paying the relevant lookup costs) even though it doesn't add back to the cache, so the only way it would ever find something in the cache is if the same set of inputs (e.g. pattern, options, timeout, etc.) are used with both the Regex ctor and with the static methods, where the static methods would populate the cache that's then consulted by the ctor. This adds unnecessary expense on the common path for a very uncommon situation; it also complicates the code non-trivially.
This commit changes things to cleanly separate the cache from Regex, and to actually cache the Regex instance itself.
Running the "Regex*Static" perf tests:
cc: @eerhardt, @danmosemsft, @ViktorHofer