Skip to content

Conversation

@tipabu
Copy link
Contributor

@tipabu tipabu commented Mar 13, 2019

While this is out of spec according to RFC 7230 (which limits
expected octets to some subset of ASCII), it is often useful to
be able to mimic an out-of-spec client when testing a server or
application.

Don't use Latin-1 (though that would be in keeping with how we
handle headers and bodies) to encourage callers to write
RFC-complient clients. Rather, use surrogate escape sequences
('\udc80' - '\udcff') to increase friction while still
allowing out-of-spec requests to be expressable.

This is the second fix proposed in the bug report; the first was submitted as #12314 so reviewers can decide between fixes.

https://bugs.python.org/issue36274

https://bugs.python.org/issue36274

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is too much duplication in the unit tests for both of your pull requests. Please use a loop or two (like what is done in test_invalid_headers()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; done. Any preference on which approach to take, though?

While this is out of spec according to RFC 7230 (which limits
expected octets to some subset of ASCII), it is often useful to
be able to mimic an out-of-spec client when testing a server or
application.

Don't use Latin-1 (though that would be in keeping with how we
handle headers and bodies) to encourage callers to write
RFC-complient clients. Rather, use surrogate escape sequences
('\udc80' - '\udcff') to increase friction while still
allowing out-of-spec requests to be expressable.

https://bugs.python.org/issue36274
@tipabu tipabu force-pushed the bpo-36274-surrogate-escape branch from d57b342 to 0573040 Compare July 8, 2019 18:09
Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me and is my preferred option from those presented. The only thing I'd like to see is some explanation in the tests linking to the justification (already made in the issue tracker).

@jaraco
Copy link
Member

jaraco commented Sep 11, 2019

After reviewing this request with @ericsnowcurrently, we've decided that this approach is dangerous in that it has the potential to expose users unexpectedly to non-compliant behavior, where as currently they are assured compliance. In particular, if a user had input from a source where it was surrogate-escaped non-ascii unicode, the request would currently be rejected but now will be accepted. Instead, we would like to see a more explicit opt-in, such as through a separate method or through a setting on the call and/or client object.

As a result, I'll be closing this PR ad the alternate and will follow up in the bug.

@jaraco jaraco closed this Sep 11, 2019
@tipabu tipabu deleted the bpo-36274-surrogate-escape branch July 17, 2020 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants