bpo-25711: Rewrite zipimport in pure Python.#6809
bpo-25711: Rewrite zipimport in pure Python.#6809serhiy-storchaka merged 17 commits intopython:masterfrom
Conversation
fcf8f37 to
fb93cdd
Compare
Lib/zipimport.py
Outdated
|
|
||
| path = _get_module_path(self, fullname) | ||
| if mi: | ||
| fullpath = path + path_sep + '__init__.py' |
There was a problem hiding this comment.
bootstrap_external._path_join()
There was a problem hiding this comment.
What is the benefit of using bootstrap_external._path_join()?
There was a problem hiding this comment.
Readability. When I read path + path_sep + '__init__.py' I have to mentally piece together that this is constructing a file path, while _path_join(path, '__init__.py)` gives me that context implicitly.
Lib/zipimport.py
Outdated
| if mi: | ||
| fullpath = path + path_sep + '__init__.py' | ||
| else: | ||
| fullpath = path + '.py' |
There was a problem hiding this comment.
Might as well use an f-string.
There was a problem hiding this comment.
What is the benefit of using an f-string instead of + for concatenating two strings?
Lib/zipimport.py
Outdated
| # add __path__ to the module *before* the code gets | ||
| # executed | ||
| path = _get_module_path(self, fullname) | ||
| fullpath = f'{self.archive}{path_sep}{path}' |
There was a problem hiding this comment.
_bootstrap_external._path_join()
Lib/zipimport.py
Outdated
|
|
||
| # implementation | ||
|
|
||
| def _unpack_uint32(data): |
There was a problem hiding this comment.
_bootstrap_external._r_long()
There was a problem hiding this comment.
What to do with _unpack_uint16? _unpack_uint32 looks to me more descriptive than _r_long.
There was a problem hiding this comment.
You can keep _unpack_uint16, I just don't think we need to have two implementations to do the exact same thing of reading in the bytes of a little-endian 4 bytes int. If you don't like the name then feel free to rename it in importlib.
Lib/zipimport.py
Outdated
| name = name.decode('latin1').translate(cp437_table) | ||
|
|
||
| name = name.replace('/', path_sep) | ||
| path = f'{archive}{path_sep}{name}' |
There was a problem hiding this comment.
bootstrap_external._path_join()
Lib/zipimport.py
Outdated
|
|
||
| def _unpack_uint16(data): | ||
| """Convert 2 bytes in little-endian to an integer.""" | ||
| assert len(data) == 2 |
There was a problem hiding this comment.
Nit: Wouldn't be better if we raised ValueError here and in _unpack_uint32()?
There was a problem hiding this comment.
In all cases the length of the data is checked before calling _unpack_uint*(). The assertion condition is always true.
warsaw
left a comment
There was a problem hiding this comment.
Comments just based on reading the code. I'm going to test this branch with importlib_resources (standalone) next.
Lib/zipimport.py
Outdated
| @@ -0,0 +1,640 @@ | |||
| '''zipimport provides support for importing Python modules from Zip archives. | |||
There was a problem hiding this comment.
Style nit: this should be a """ string.
Lib/zipimport.py
Outdated
| try: | ||
| raw_data = fp.read(data_size) | ||
| except OSError: | ||
| raise OSError("zipimport: can't read data") |
There was a problem hiding this comment.
Is this just here to change the error message? I guess that's for compatibility with the message raised by the C implementation? I'm not sure how useful that is, and it does kind of hide the original exception (which might be useful for debugging). My inclination is to preserve the original OSError rather than produce one with a different message, but if you feel otherwise, why not use a raise from here too?
There was a problem hiding this comment.
Removed the try block.
| try: | ||
| decompress = _get_decompress_func() | ||
| except: | ||
| raise ZipImportError("can't decompress data; zlib not available") |
There was a problem hiding this comment.
Added except Exception.
warsaw
left a comment
There was a problem hiding this comment.
The branch does work with importlib_resources AFAICT.
|
When you're done making the requested changes, leave the comment: |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Thank you for your review @warsaw. I have addressed most of your comments. But AFAIK from error is not needed in the Python code. Exceptions is chained by default. You need to add from None for suppressing chaining.
Lib/zipimport.py
Outdated
| 'PQRSTUVWXYZ[\\]^_' | ||
| '`abcdefghijklmno' | ||
| 'pqrstuvwxyz{|}~\x7f' | ||
|
|
There was a problem hiding this comment.
Just for readability. The above part is ASCII, and 16 chars per row, the below part is non-ASCII, and 8 chars per row. What comment should I add?
| from zlib import decompress | ||
| except: | ||
| _bootstrap._verbose_message('zipimport: zlib UNAVAILABLE') | ||
| raise ZipImportError("can't decompress data; zlib not available") |
There was a problem hiding this comment.
The C implementation catches all exceptions. I agree, that this is evil and will change this.
Lib/zipimport.py
Outdated
| try: | ||
| raw_data = fp.read(data_size) | ||
| except OSError: | ||
| raise OSError("zipimport: can't read data") |
There was a problem hiding this comment.
Removed the try block.
| try: | ||
| decompress = _get_decompress_func() | ||
| except: | ||
| raise ZipImportError("can't decompress data; zlib not available") |
There was a problem hiding this comment.
Added except Exception.
serhiy-storchaka
left a comment
There was a problem hiding this comment.
I prefer to make behavior changing clean up and remove unneeded tests in separate PR.
| try: | ||
| toc_entry = self._files[key] | ||
| except KeyError: | ||
| raise OSError(0, '', key) |
There was a problem hiding this comment.
For raising a proper FileNotFoundError you need to specify the corresponding errno.
import errno
raise FileNotFoundError(errno.ENOENT, 'No such file or directory', key)(and maybe using pathname is more proper than key).
I prefer to defer this change to later. It is nontrivial and may need additional discussion and tests.
warsaw
left a comment
There was a problem hiding this comment.
I think we're down to one issue/question for this branch, a merge conflict, and deferring cleanups to a follow on branch.
warsaw
left a comment
There was a problem hiding this comment.
Thanks for doing this - I think it will lead to great improvements in zipimport in the future. My only suggestion would be to open a bpo issue for the subsequent clean up branch, since there are things that we know can be improved on the next pass. I think this is a good change to land.
https://bugs.python.org/issue25711