-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Remove System.Net.Http dependency on System.Text.RegularExpressions #41110
Conversation
This dependency is the only reason we end up with a 105K System.Text.RegularExpressions.dll as part of a trimmed new MVC app. Regex is used in the case where on Windows a bypass list is provided, in which case each item in the list is changed into a regex, which is then evaluated against each url provided to SocketsHttpHandler. But the patterns usable are simple: the only special character recognized is an asterisk, which can map to zero or more of any character. So, we can instead employ a simple processor for such patterns, which then eliminates the need to reference System.Text.RegularExpressions.dll from System.Net.Http.dll. It also happens to be faster.
a4ac806 to
6962f70
Compare
src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpWindowsProxy.cs
Show resolved
Hide resolved
|
I was hoping I could write it in a simple way like this, which works fine, then transform into tail-call. But tail-call can't work with the alternation. private static bool Match(string t, int it, string p, int ip)
{
if (ip == p.Length) // if there's no more pattern, it's a match iff there's no more text
return it == t.Length;
if (it == t.Length) // if there's no more text, it's a match iff pattern is '*' and it's a match for subpattern
return p[ip] == '*' && Match(t, it, p, ip + 1);
if (p[ip] == '*') // if pattern is *, it's a match iff it's a match for subtext (* consumes and continues)
// or it's a match for subpattern (* consumes nothing)
return Match(t, it + 1, p, ip) || Match(t, it, p, ip + 1);
if (char.ToLowerInvariant(p[ip]) == char.ToLowerInvariant(t[it]))
// if pattern and text match, it's a match if it's a match for subtext + subpattern
return Match(t, it + 1, p, ip + 1);
return false;
} |
| try | ||
| { | ||
| // Escape any special characters and unescape * to get wildcard pattern match. | ||
| Regex re = new Regex(Regex.Escape(tmp).Replace("\\*", ".*?") + "$", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It adds "$" at the end but not "^" at the start. Does that mean that it would match suffixes-only, while your new code will not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good catch. That seems wrong in the previous code (e.g. if the pattern were "a.com" it would say that "ba.com" matched). @wfurt, was that intentional? The new code could be changed to match by adding a * at the beginning of the pattern if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think this is bug. With curl, .example.com would also match www.example.com (https://curl.haxx.se/libcurl/c/CURLOPT_NOPROXY.html) but for windows it does not work that way. *a.com should match a.com and ba.com but a.com rule should not match ba.com
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks, then I'll leave this as-is. Is fixing it in master sufficient, or should a fix for this (presumably prepending "^" +) be considered for 3.1?
Yeah, but this can end up having really problematic performance in the worst case. Try running this with your code: Match(new string('a', 100), 0, "a*a*a*a*a*a*a*a*b", 0) |
src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpWindowsProxy.cs
Outdated
Show resolved
Hide resolved
…pWindowsProxy.cs Co-Authored-By: Jan Kotas <[email protected]>
...tem.Net.Http.WinHttpHandler/tests/UnitTests/System.Net.Http.WinHttpHandler.Unit.Tests.csproj
Show resolved
Hide resolved
…otnet/corefx#41110) * Remove System.Net.Http dependency on System.Text.RegularExpressions This dependency is the only reason we end up with a 105K System.Text.RegularExpressions.dll as part of a trimmed new MVC app. Regex is used in the case where on Windows a bypass list is provided, in which case each item in the list is changed into a regex, which is then evaluated against each url provided to SocketsHttpHandler. But the patterns usable are simple: the only special character recognized is an asterisk, which can map to zero or more of any character. So, we can instead employ a simple processor for such patterns, which then eliminates the need to reference System.Text.RegularExpressions.dll from System.Net.Http.dll. It also happens to be faster. * Address PR feedback * Update src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpWindowsProxy.cs Co-Authored-By: Jan Kotas <[email protected]> Commit migrated from dotnet/corefx@0e65e92
This dependency is the only reason we end up with a 105K System.Text.RegularExpressions.dll as part of a trimmed new MVC app. Regex is used in the case where on Windows a bypass list is provided, in which case each item in the list is changed into a regex, which is then evaluated against each url provided to SocketsHttpHandler. But the patterns usable are simple: the only special character recognized is an asterisk, which can map to zero or more of any character. So, we can instead employ a simple processor for such patterns, which then eliminates the need to reference System.Text.RegularExpressions.dll from System.Net.Http.dll. It also happens to be faster.
cc: @ViktorHofer, @jkotas, @wfurt