-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
bpo-36305: Fixes to path handling and parsing in pathlib #12361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Lib/pathlib.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is not the solution. "C:" is a drive-relative path. Of course it has a drive. We need to fix absolute to properly support drive-relative paths. See the FIXME comment in the code. It's not necessarily as simple as calling ntpath._getfullpatname, as suggested in the comment, unless we give up on preserving ".." components in Windows. (That wouldn't be a problem really since preserving ".." components serves no point in Windows, unlike POSIX.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I will modify the PR.
This is not a bug. The resolved value of "c:" depends on its working directory, so joining it with "a" has to remain as "c:a". If we call |
Lib/test/test_pathlib.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that "c:" is not a file stream path, and neither is "c:a". The system interprets "[A-Za-z]:" as a drive. If we want to create a file stream named "a" in a file named "c" in the working directory, we have to use a qualified path such as "./c:a".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which by the way is a separate bug in pathlib, if you want to look into it. For example, it's wrong for it to remove the explicit reference to the working directory in the following:
>>> p = Path('./c:a')
>>> str(p)
'c:a'
We gave it the path of a file stream, but it mistakenly handles it as a drive-relative path when getting the path as a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and yet,
>>> p = Path('c:a')
>>> p
WindowsPath('c:a')
>>> Path('.') / p
WindowsPath('c:a')
>>> Path('./c:a')
WindowsPath('c:a')I don't think it's reasonable for Path('.') / Path('c:a') to give a different result than Path('./c:a'). I'm pretty sure I've seen it documented somewhere (although not in the official docs) that using Path('a/b/c') is a portable and shorter way of writing Path('a')/Path('b')/Path('c'), and the proposed behaviour violates that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does make sense, and does seem to have been the assumption in pathlib (seeing the implementation for parse_parts).
But this is a wrong assumption, as sometimes a path part can be interpreted in more than one way (for example, c:a that can either be a drive-relative path or a file with a stream). That means that separating a path to its parts may cause a loss of information about the original path and lead to incorrect results.
Assuming that this assumption can't be true, I think that the current behavior -
WindowsPath('a') / WindowsPath('./c:b') returning WindowsPath('c:b') is surely worse than
WindowsPath('a') / WindowsPath('./c:b') returning WindowsPath('a/c:b') (which would be the case if my fixes are applied)
|
Part of this was fixed with bpo-34775, causing the merge conflict. |
Pathlib's internal representation of paths is more complex than a single string. This allows operations like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once #101667 lands, you should only need to change _format_parsed_parts() to fix the original bug.
| check(['a', 'Z:b', 'c'], ('Z:', '', ['Z:', 'a', 'b', 'c'])) | ||
| check(['Y:a', 'Z:b', 'c'], ('Z:', '', ['Z:', 'a', 'b', 'c'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right - if we switch drives mid-way through we should discard everything before. This is what os.path.join() does.
| equivalences.update({ | ||
| 'c:a': [ ('c:', 'a'), ('c:', 'a/'), ('/', 'c:', 'a') ], | ||
| './a:b': [ ('./a:b',) ], | ||
| 'a:b:c': [ ('./b:c', 'a:'), ('b:', 'a:b:c') ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the above, I don't see why ('./b:c', 'a:') shouldn't resolve to a:.
The bugs
This PR fixes three bugs with path parsing in
pathlib.The following examples show these bugs (when the cwd is
C:\d):WindowsPath('C:a').absolute()should returnWindowsPath('C:\\d\\a')but returnsWindowsPath('C:a').This is caused by flawed logic in the
parse_partsmethod of the_Flavourclass.WindowsPath('./b:a').absolute()should returnWindowsPath('C:\\d\\b:a')but returnsWindowsPath('b:a').This is caused by the limited interface of
parse_parts, and affects thePath.absolute,Path.expanduserandPath.__rtruediv__methods.WindowsPath('./b:a').resolve()should returnWindowsPath('C:\\d\\b:a')but returnsWindowsPath('b:a').This is caused by missing logic in the
resolvemethod and inPath.__str__The fixes
To fix the first bug, I fixed a flaw in the
parse_partsmethod.The second one was more complicated, as with the current interface of
parse_parts(called by_parse_args), the bug can't be fixed. Let's take a simple example:WindowsPath(WindowsPath('./a:b'))andWindowsPath('a:b')- before the bugfix, they are equal. That happens because in both cases,parse_partsis called with['a:b']. This part can be interpreted in two ways - either as the relative path'b'with the drive'a:', or as a file'a'with the NTFS data-stream'b'.That means that in some cases, passing the flattened
_partsof a path toparse_partsis lossy. Therefore we have to a modifyparse_parts's interface to enable passing more detailed information about the given parts. What I decided to do was allow passing tuples in addition to strings, thus supporting the old interface. The tuples would contain the drive, root and path parts, enough information to determine the correct parsing of path parts with data-streams, and maybe more future edge cases.After modifying
parse_parts's interface, I changed_parse_argsto use it and madePath.absolute,Path.expanduserandPath.__rtruediv__passPathobjects to_parse_argsinstead of path parts, to preserve the path information.To solve the third bug I had to make small changes to both the
resolvemethod and toPath.__str__.Notes
In addition to the changes in the code, I've added regression tests and modified old incorrect tests.
Details about drive-relative paths can be found here.
https://bugs.python.org/issue36305