[Security] bpo-30713: Reject newline in urllib.parse #2303

vstinner · 2017-06-20T16:11:24Z

The splittype(), splitport() and splithost() functions of the
urllib.parse module now reject URLs which contain a newline
character.

The splittype(), splitport() and splithost() functions of the urllib.parse module now reject URLs which contain a newline character.

serhiy-storchaka · 2017-06-28T05:07:33Z

Lib/urllib/parse.py

    global _typeprog
    if _typeprog is None:
-        _typeprog = re.compile('([^/:]+):(.*)', re.DOTALL)
+        _typeprog = re.compile('([^/:\n]+):(.*)', re.DOTALL)


What is wrong with '\n' in a type? In any case it is compared with fixed set of supported schemes.

What is wrong with '\n' in a type?

It doesn't make sense to me to have a newline in a type, it really looks like an attempt to compromise a server.

serhiy-storchaka · 2017-06-28T05:08:38Z

Lib/urllib/parse.py

+        _typeprog = re.compile('([^/:\n]+):(.*)', re.DOTALL)

-    match = _typeprog.match(url)
+    match = _typeprog.fullmatch(url)


There is no difference between match() and fullmatch() here.

serhiy-storchaka · 2017-06-28T05:15:09Z

Lib/urllib/parse.py

    global _hostprog
    if _hostprog is None:
-        _hostprog = re.compile('//([^/#?]*)(.*)', re.DOTALL)
+        _hostprog = re.compile('//([^/#?\n]*)(.*)')


This will caused just returning a tuple (None, url) as in the case when the host is not specified. The second item can contain '\n'. This doesn't mean splithost() rejects the url.

If you want to make splithost() rejecting URLs with newlines, check explicitly '\n' in url and raise an exception. But I think this is not the best place of doing such checks.

This doesn't mean splithost() rejects the url.

Right now, we don't raise an exception if an URL looks "invalid". So I tried to fit into the current behaviour: return (None, invalid_url). Maybe we should raise an exception instead?

serhiy-storchaka · 2017-06-28T05:17:43Z

Lib/urllib/parse.py

    global _portprog
    if _portprog is None:
-        _portprog = re.compile('(.*):([0-9]*)$', re.DOTALL)
+        _portprog = re.compile('(.*):([0-9]*)')


The same as for splithost(). splithost('example.org\n') will return a tuple ('example.org\n', None).

vadmium · 2017-07-01T06:03:39Z

Sorry @Haypo, I think newlines are a special case in some regular expression implementations, but I don’t remember the details for Python, so it is not clear to me what your code will do. I trust Serhiy has better knowledge with regular expressions :)

vstinner · 2017-07-26T02:43:21Z

https://bugs.python.org/issue29606 was fixed in ftplib. urllib is not the right place to reject invalid inputs.

the-knights-who-say-ni added the CLA signed label Jun 20, 2017

vstinner mentioned this pull request Jun 20, 2017

bpo-30713: Reject newline in urllib.parse #2301

Closed

bpo-30713: Reject newline in urllib.parse

242076a

The splittype(), splitport() and splithost() functions of the urllib.parse module now reject URLs which contain a newline character.

vstinner changed the title ~~bpo-30713: Reject newline in urllib.parse~~ [Security] bpo-30713: Reject newline in urllib.parse Jun 28, 2017

vstinner added the type-security A security issue label Jun 28, 2017

vstinner requested review from serhiy-storchaka and vadmium June 28, 2017 01:51

serhiy-storchaka reviewed Jun 28, 2017

View reviewed changes

vstinner closed this Jul 26, 2017

vstinner deleted the urllib_newline branch July 26, 2017 02:43

mcepl mentioned this pull request Apr 17, 2019

bpo-35906: Avoid headers injections in urllib #11768

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Security] bpo-30713: Reject newline in urllib.parse #2303

[Security] bpo-30713: Reject newline in urllib.parse #2303

Uh oh!

vstinner commented Jun 20, 2017

Uh oh!

serhiy-storchaka Jun 28, 2017

Uh oh!

vstinner Jul 21, 2017

Uh oh!

serhiy-storchaka Jun 28, 2017

Uh oh!

serhiy-storchaka Jun 28, 2017

Uh oh!

vstinner Jul 21, 2017

Uh oh!

serhiy-storchaka Jun 28, 2017

Uh oh!

vadmium commented Jul 1, 2017

Uh oh!

vstinner commented Jul 26, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Security] bpo-30713: Reject newline in urllib.parse #2303

[Security] bpo-30713: Reject newline in urllib.parse #2303

Uh oh!

Conversation

vstinner commented Jun 20, 2017

Uh oh!

serhiy-storchaka Jun 28, 2017

Choose a reason for hiding this comment

Uh oh!

vstinner Jul 21, 2017

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Jun 28, 2017

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Jun 28, 2017

Choose a reason for hiding this comment

Uh oh!

vstinner Jul 21, 2017

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Jun 28, 2017

Choose a reason for hiding this comment

Uh oh!

vadmium commented Jul 1, 2017

Uh oh!

vstinner commented Jul 26, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants