Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@stephentoub
Copy link
Member

This dependency is the only reason we end up with a 105K System.Text.RegularExpressions.dll as part of a trimmed new MVC app. Regex is used in the case where on Windows a bypass list is provided, in which case each item in the list is changed into a regex, which is then evaluated against each url provided to SocketsHttpHandler. But the patterns usable are simple: the only special character recognized is an asterisk, which can map to zero or more of any character. So, we can instead employ a simple processor for such patterns, which then eliminates the need to reference System.Text.RegularExpressions.dll from System.Net.Http.dll. It also happens to be faster.

cc: @ViktorHofer, @jkotas, @wfurt

This dependency is the only reason we end up with a 105K System.Text.RegularExpressions.dll as part of a trimmed new MVC app.  Regex is used in the case where on Windows a bypass list is provided, in which case each item in the list is changed into a regex, which is then evaluated against each url provided to SocketsHttpHandler.  But the patterns usable are simple: the only special character recognized is an asterisk, which can map to zero or more of any character.  So, we can instead employ a simple processor for such patterns, which then eliminates the need to reference System.Text.RegularExpressions.dll from System.Net.Http.dll.  It also happens to be faster.
@davidsh davidsh added this to the 5.0 milestone Sep 14, 2019
@danmoseley
Copy link
Member

I was hoping I could write it in a simple way like this, which works fine, then transform into tail-call. But tail-call can't work with the alternation.

        private static bool Match(string t, int it, string p, int ip)
        {
            if (ip == p.Length) // if there's no more pattern, it's a match iff there's no more text
                return it == t.Length;

            if (it == t.Length) // if there's no more text, it's a match iff pattern is '*' and it's a match for subpattern
                return p[ip] == '*' && Match(t, it, p, ip + 1);

            if (p[ip] == '*') // if pattern is *, it's a match iff it's a match for subtext (* consumes and continues) 
                              // or it's a match for subpattern (* consumes nothing)
                return Match(t, it + 1, p, ip) || Match(t, it, p, ip + 1);

            if (char.ToLowerInvariant(p[ip]) == char.ToLowerInvariant(t[it]))
                // if pattern and text match, it's a match if it's a match for subtext + subpattern
                return Match(t, it + 1, p, ip + 1);

            return false;
        }

try
{
// Escape any special characters and unescape * to get wildcard pattern match.
Regex re = new Regex(Regex.Escape(tmp).Replace("\\*", ".*?") + "$",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It adds "$" at the end but not "^" at the start. Does that mean that it would match suffixes-only, while your new code will not?

Copy link
Member Author

@stephentoub stephentoub Sep 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good catch. That seems wrong in the previous code (e.g. if the pattern were "a.com" it would say that "ba.com" matched). @wfurt, was that intentional? The new code could be changed to match by adding a * at the beginning of the pattern if necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think this is bug. With curl, .example.com would also match www.example.com (https://curl.haxx.se/libcurl/c/CURLOPT_NOPROXY.html) but for windows it does not work that way. *a.com should match a.com and ba.com but a.com rule should not match ba.com

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, then I'll leave this as-is. Is fixing it in master sufficient, or should a fix for this (presumably prepending "^" +) be considered for 3.1?

@stephentoub
Copy link
Member Author

I was hoping I could write it in a simple way like this

Yeah, but this can end up having really problematic performance in the worst case. Try running this with your code:

Match(new string('a', 100), 0, "a*a*a*a*a*a*a*a*b", 0)

@stephentoub stephentoub merged commit 0e65e92 into dotnet:master Sep 16, 2019
@stephentoub stephentoub deleted the httpregex branch September 16, 2019 18:57
@stephentoub stephentoub added the assembly-size Issues related to the size of assemblies, before or after trimming label Sep 19, 2019
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…otnet/corefx#41110)

* Remove System.Net.Http dependency on System.Text.RegularExpressions

This dependency is the only reason we end up with a 105K System.Text.RegularExpressions.dll as part of a trimmed new MVC app.  Regex is used in the case where on Windows a bypass list is provided, in which case each item in the list is changed into a regex, which is then evaluated against each url provided to SocketsHttpHandler.  But the patterns usable are simple: the only special character recognized is an asterisk, which can map to zero or more of any character.  So, we can instead employ a simple processor for such patterns, which then eliminates the need to reference System.Text.RegularExpressions.dll from System.Net.Http.dll.  It also happens to be faster.

* Address PR feedback

* Update src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpWindowsProxy.cs

Co-Authored-By: Jan Kotas <[email protected]>


Commit migrated from dotnet/corefx@0e65e92
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Net.Http assembly-size Issues related to the size of assemblies, before or after trimming

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants