-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
bpo-36274: Encode request lines with surrogate escapes #12315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Lib/test/test_httplib.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is too much duplication in the unit tests for both of your pull requests. Please use a loop or two (like what is done in test_invalid_headers()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure; done. Any preference on which approach to take, though?
While this is out of spec according to RFC 7230 (which limits
expected octets to some subset of ASCII), it is often useful to
be able to mimic an out-of-spec client when testing a server or
application.
Don't use Latin-1 (though that would be in keeping with how we
handle headers and bodies) to encourage callers to write
RFC-complient clients. Rather, use surrogate escape sequences
('\udc80' - '\udcff') to increase friction while still
allowing out-of-spec requests to be expressable.
https://bugs.python.org/issue36274
d57b342 to
0573040
Compare
jaraco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good to me and is my preferred option from those presented. The only thing I'd like to see is some explanation in the tests linking to the justification (already made in the issue tracker).
|
After reviewing this request with @ericsnowcurrently, we've decided that this approach is dangerous in that it has the potential to expose users unexpectedly to non-compliant behavior, where as currently they are assured compliance. In particular, if a user had input from a source where it was surrogate-escaped non-ascii unicode, the request would currently be rejected but now will be accepted. Instead, we would like to see a more explicit opt-in, such as through a separate method or through a setting on the call and/or client object. As a result, I'll be closing this PR ad the alternate and will follow up in the bug. |
While this is out of spec according to RFC 7230 (which limits
expected octets to some subset of ASCII), it is often useful to
be able to mimic an out-of-spec client when testing a server or
application.
Don't use Latin-1 (though that would be in keeping with how we
handle headers and bodies) to encourage callers to write
RFC-complient clients. Rather, use surrogate escape sequences
('\udc80' - '\udcff') to increase friction while still
allowing out-of-spec requests to be expressable.
This is the second fix proposed in the bug report; the first was submitted as #12314 so reviewers can decide between fixes.
https://bugs.python.org/issue36274
https://bugs.python.org/issue36274