- In the internal NT API, filenames are 16-bit Unicode strings. It’s not strictly UTF-16 because surrogate codes are not validated as surrogate pairs.
- The API also does not normalize filenames to a particular Unicode normal form (e.g. “NFC” or “NFKC”).
- If a filesystem directory is case insensitive, name comparisons first translate to upper case using a locale-invariant case table. One-to-many case conversions are not supported (e.g. “ß” maps to “ß”, not to “SS”) .
- Starting with Windows 10, NTFS supports case-sensitive directories.
For bytes paths, Python 3.6+ uses UTF-8 as the filesystem encoding. Bytes paths get decoded to wide-character strings before calling system functions. The error handler is “surrogatepass” due to the possibility of lone surrogate codes in filenames. This is sometimes called 8-bit Wobbly Transformation Format (WTF-8).
Regarding short filenames, they’re a legacy feature for compatibility with ancient applications.
- ReFS and exFAT filesystems do not support short filenames.
- NTFS allows disabling the automatic creation of short filenames, either for individual filesystems or system-wide, and they can be stripped from existing files. This can improve performance since NTFS stores short filenames as separate, specially-flagged entries in a directory.
- FAT32 generates short filenames that can include non-ASCII OEM characters, which violates the documented specification. It also uses a best-fit encoding that can be problematic. For example, given OEM is code page 850, “spĀm.txt” has the associated short name “SPAM.TXT”. In this case, most people will be surprised that opening or creating “spam.txt” actually opens or replaces “spĀm.txt”.
The list of reserved DOS device names includes “NUL”, “CON”, “CONIN$”, CONOUT$", “AUX”, “PRN”, “COM<1-9>” and “LPT<1-9>”. The names are case insensitive. These devices are virtually present in the unqualified current directory on all Windows versions, just like the dive-letter names “A:” through “Z:”. The device name can be followed by a colon and any number of dots and spaces. For example:
>>> stat.S_ISCHR(os.stat('CONIN$:. . . .').st_mode)
True
Unlike drive-letter names, the virtually present DOS device names cannot have a path, and the optional colon is not part of the real device name. For example:
>>> os.getcwd()
'C:\\Temp'
>>> nt._getfullpathname('CON:/spam')
'C:\\Temp\\CON:\\spam'
>>> nt._getfullpathname('CON:')
'\\\\.\\CON'
Prior to Windows 11, DOS device names are reserved in a wider range of cases than drive-letter names:
- DOS device names can have an extension that gets ignored (e.g. “CON.txt”).
- DOS device names are present in the explicitly referenced current directory (e.g. “.\CON”), as well as the parent directory of most opened paths (e.g. “C:\Temp\CON”), except never in UNC paths.
For some reason the latter behavior is still implemented for the “NUL” device on Windows 11. For example:
>>> nt._getfullpathname('./NUL')
'\\\\.\\NUL'
>>> nt._getfullpathname('Temp/NUL')
'\\\\.\\NUL'
>>> nt._getfullpathname('C:/Temp/NUL')
'\\\\.\\NUL'
DOS devices have never been virtually present in UNC share paths and device paths, in which case they’re just regular filenames, at least as far as the API is concerned. For example:
>>> nt._getfullpathname('//localhost/C$/Temp/NUL')
'\\\\localhost\\C$\\Temp\\NUL'
>>> nt._getfullpathname('//./C:/Temp/NUL')
'\\\\.\\C:\\Temp\\NUL'
A filesystem or filesystem redirector (e.g. SMB) may disallow creating DOS device names, even in cases that the API doesn’t reserve. For example:
>>> open('//localhost/C$/Temp/NUL', 'w')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
PermissionError: [Errno 13] Permission denied: '//localhost/C$/Temp/NUL