-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
Description
Bug report
Bug description:
Issue
The add_handler() method of urllib.request.OpenerDirector matches too broadly, and generates spurious handler entries as a result.
Specifically, any method name that starts with zero or one underscore, followed by error, open, request, or response, and that isn't in the small list of exclusions, will be treated as a handler method.
Minimal reproducible example
>>> import urllib.request
>>> class BogusHandler(urllib.request.BaseHandler):
... def _open(self):
... pass
... def open(self):
... pass
... def error(self):
... pass
...
>>> opdir = urllib.request.OpenerDirector()
>>> opdir.add_handler(BogusHandler())
>>> opdir.handle_open
{'': [<__main__.BogusHandler...>], 'ope': [<__main__.BogusHandler...>]}
>>> opdir.handle_error
{'erro': {'error': [<__main__.BogusHandler...>]}}Cause
Lines 419 to 421 in 401fff7
| i = meth.find("_") | |
| protocol = meth[:i] | |
| condition = meth[i+1:] |
In the normal case, this code splits a method name like http_open on the first underscore character, giving protocol == 'http' and condition == 'open'.
If there is a leading underscore character, this produces a false match with an empty protocol. For example, a method called _open will give protocol == '' and condition == 'open'.
If there is no underscore character, then meth.find('_') == -1, and weird results happen. In this case, condition contains the entire method name, and protocol contains all but the last character. For example, a method called open will give protocol == 'ope' and condition == 'open'.
The subsequent lines branch on the content of condition, with only error*, open, request, and response treated as handler methods.
Impact
Low. It is probably impossible that a request would have the empty string as its protocol, and unlikely that it would be ope, reques, respons, or erro. If one of these did happen, the spurious match wouldn't be called: OpenerDirector would go looking for (say) ope_open, probably not find it, and raise an AttributeError. (It's for this reason that I'm positive this is a bug, and not an undocumented feature where open, etc. are meant to be registered as handlers.)
Spurious entries are added to the internal members handle_open, handle_error, process_response, process_request, and (especially) the catch-all handler, using bisect to perform sort-preserving insertions. This is unnecessary busywork, but I have no data on any actual performance impact.
Solution
It suffices to add a check like so, immediately after meth.find("_"):
if i < 1:
continueThis catches both a leading underscore (i == 0) and no underscore (i == -1). I've forked the repo and made such a change myself, but submitting a pull request is a big step up from filing an issue, and it expects an issue to be filed first in any case, so… here it is! 😄
Related issues
I can only see one relevant existing issue, #61924. This was a withdrawn proposal to clean up the add_handler() code. It was more far-reaching than the solution above, but (on my inspection at least) it would've fixed this bug. At the time, @bitdancer commented, "I'll try to keep this patch in mind if we have occasion to fix anything in that code".
The code to extract a specific type of error behaves similarly:
Lines 424 to 425 in 401fff7
| j = condition.find("_") + i + 1 | |
| kind = meth[j+1:] |
However, I'm not positive that it's an error for it to give (for example) kind == 'error' when the method name is *_error (or _error or error), so I haven't explored further.
CPython versions tested on:
3.12
Operating systems tested on:
No response