Skip to content

Conversation

@vstinner
Copy link
Member

The splittype(), splitport() and splithost() functions of the
urllib.parse module now reject URLs which contain a newline
character.

The splittype(), splitport() and splithost() functions of the
urllib.parse module now reject URLs which contain a newline
character.
@vstinner vstinner changed the title bpo-30713: Reject newline in urllib.parse [Security] bpo-30713: Reject newline in urllib.parse Jun 28, 2017
@vstinner vstinner added the type-security A security issue label Jun 28, 2017
global _typeprog
if _typeprog is None:
_typeprog = re.compile('([^/:]+):(.*)', re.DOTALL)
_typeprog = re.compile('([^/:\n]+):(.*)', re.DOTALL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with '\n' in a type? In any case it is compared with fixed set of supported schemes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with '\n' in a type?

It doesn't make sense to me to have a newline in a type, it really looks like an attempt to compromise a server.

_typeprog = re.compile('([^/:\n]+):(.*)', re.DOTALL)

match = _typeprog.match(url)
match = _typeprog.fullmatch(url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no difference between match() and fullmatch() here.

global _hostprog
if _hostprog is None:
_hostprog = re.compile('//([^/#?]*)(.*)', re.DOTALL)
_hostprog = re.compile('//([^/#?\n]*)(.*)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will caused just returning a tuple (None, url) as in the case when the host is not specified. The second item can contain '\n'. This doesn't mean splithost() rejects the url.

If you want to make splithost() rejecting URLs with newlines, check explicitly '\n' in url and raise an exception. But I think this is not the best place of doing such checks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't mean splithost() rejects the url.

Right now, we don't raise an exception if an URL looks "invalid". So I tried to fit into the current behaviour: return (None, invalid_url). Maybe we should raise an exception instead?

global _portprog
if _portprog is None:
_portprog = re.compile('(.*):([0-9]*)$', re.DOTALL)
_portprog = re.compile('(.*):([0-9]*)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as for splithost(). splithost('example.org\n') will return a tuple ('example.org\n', None).

@vadmium
Copy link
Member

vadmium commented Jul 1, 2017

Sorry @Haypo, I think newlines are a special case in some regular expression implementations, but I don’t remember the details for Python, so it is not clear to me what your code will do. I trust Serhiy has better knowledge with regular expressions :)

@vstinner
Copy link
Member Author

https://bugs.python.org/issue29606 was fixed in ftplib. urllib is not the right place to reject invalid inputs.

@vstinner vstinner closed this Jul 26, 2017
@vstinner vstinner deleted the urllib_newline branch July 26, 2017 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-security A security issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants