-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
[Security] bpo-30713: Reject newline in urllib.parse #2303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The splittype(), splitport() and splithost() functions of the urllib.parse module now reject URLs which contain a newline character.
| global _typeprog | ||
| if _typeprog is None: | ||
| _typeprog = re.compile('([^/:]+):(.*)', re.DOTALL) | ||
| _typeprog = re.compile('([^/:\n]+):(.*)', re.DOTALL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is wrong with '\n' in a type? In any case it is compared with fixed set of supported schemes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is wrong with '\n' in a type?
It doesn't make sense to me to have a newline in a type, it really looks like an attempt to compromise a server.
| _typeprog = re.compile('([^/:\n]+):(.*)', re.DOTALL) | ||
|
|
||
| match = _typeprog.match(url) | ||
| match = _typeprog.fullmatch(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no difference between match() and fullmatch() here.
| global _hostprog | ||
| if _hostprog is None: | ||
| _hostprog = re.compile('//([^/#?]*)(.*)', re.DOTALL) | ||
| _hostprog = re.compile('//([^/#?\n]*)(.*)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will caused just returning a tuple (None, url) as in the case when the host is not specified. The second item can contain '\n'. This doesn't mean splithost() rejects the url.
If you want to make splithost() rejecting URLs with newlines, check explicitly '\n' in url and raise an exception. But I think this is not the best place of doing such checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't mean splithost() rejects the url.
Right now, we don't raise an exception if an URL looks "invalid". So I tried to fit into the current behaviour: return (None, invalid_url). Maybe we should raise an exception instead?
| global _portprog | ||
| if _portprog is None: | ||
| _portprog = re.compile('(.*):([0-9]*)$', re.DOTALL) | ||
| _portprog = re.compile('(.*):([0-9]*)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as for splithost(). splithost('example.org\n') will return a tuple ('example.org\n', None).
|
Sorry @Haypo, I think newlines are a special case in some regular expression implementations, but I don’t remember the details for Python, so it is not clear to me what your code will do. I trust Serhiy has better knowledge with regular expressions :) |
|
https://bugs.python.org/issue29606 was fixed in ftplib. urllib is not the right place to reject invalid inputs. |
The splittype(), splitport() and splithost() functions of the
urllib.parse module now reject URLs which contain a newline
character.