bpo-42236: Use UTF-8 encoding if nl_langinfo(CODESET) fails#23086
Merged
vstinner merged 1 commit intopython:masterfrom Nov 1, 2020
vstinner:utf8_fs_encoding
Merged
bpo-42236: Use UTF-8 encoding if nl_langinfo(CODESET) fails#23086vstinner merged 1 commit intopython:masterfrom vstinner:utf8_fs_encoding
vstinner merged 1 commit intopython:masterfrom
vstinner:utf8_fs_encoding
Conversation
If the nl_langinfo(CODESET) function returns an empty string, Python now uses UTF-8 as the filesystem encoding. In May 2010 (commit b744ba1), I modified Python to log a warning and use UTF-8 as the filesystem encoding (instead of None) if nl_langinfo(CODESET) returns an empty string. In August 2020 (commit 94908bb), I modified Python startup to fail with a fatal error and a specific error message if nl_langinfo(CODESET) returns an empty string. The intent was to prevent guessing the encoding and also investigate user configuration where this case happens. In 10 years (2010 to 2020), I saw zero user report about the error message related to nl_langinfo(CODESET) returning an empty string. Today, UTF-8 became the defacto standard and it's safe to make the assumption that the user expects UTF-8. For example, nl_langinfo(CODESET) can return an empty string on macOS if the LC_CTYPE locale is not supported, and UTF-8 is the default encoding on macOS. While this change is likely to not affect anyone in practice, it should make UTF-8 lover happy ;-) Rewrite also the documentation explaining how Python selects the filesystem encoding and error handler.
Member
Author
|
@methane: A PR written for your greatest pleasure :-D |
adorilson
pushed a commit
to adorilson/cpython
that referenced
this pull request
Mar 13, 2021
…-23086) If the nl_langinfo(CODESET) function returns an empty string, Python now uses UTF-8 as the filesystem encoding. In May 2010 (commit b744ba1), I modified Python to log a warning and use UTF-8 as the filesystem encoding (instead of None) if nl_langinfo(CODESET) returns an empty string. In August 2020 (commit 94908bb), I modified Python startup to fail with a fatal error and a specific error message if nl_langinfo(CODESET) returns an empty string. The intent was to prevent guessing the encoding and also investigate user configuration where this case happens. In 10 years (2010 to 2020), I saw zero user report about the error message related to nl_langinfo(CODESET) returning an empty string. Today, UTF-8 became the defacto standard and it's safe to make the assumption that the user expects UTF-8. For example, nl_langinfo(CODESET) can return an empty string on macOS if the LC_CTYPE locale is not supported, and UTF-8 is the default encoding on macOS. While this change is likely to not affect anyone in practice, it should make UTF-8 lover happy ;-) Rewrite also the documentation explaining how Python selects the filesystem encoding and error handler.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If the nl_langinfo(CODESET) function returns an empty string, Python
now uses UTF-8 as the filesystem encoding.
In May 2010 (commit b744ba1), I
modified Python to log a warning and use UTF-8 as the filesystem
encoding (instead of None) if nl_langinfo(CODESET) returns an empty
string.
In August 2020 (commit 94908bb), I
modified Python startup to fail with a fatal error and a specific
error message if nl_langinfo(CODESET) returns an empty string. The
intent was to prevent guessing the encoding and also investigate user
configuration where this case happens.
In 10 years (2010 to 2020), I saw zero user report about the error
message related to nl_langinfo(CODESET) returning an empty string.
Today, UTF-8 became the defacto standard and it's safe to make the
assumption that the user expects UTF-8. For example,
nl_langinfo(CODESET) can return an empty string on macOS if the
LC_CTYPE locale is not supported, and UTF-8 is the default encoding
on macOS.
While this change is likely to not affect anyone in practice, it
should make UTF-8 lover happy ;-)
Rewrite also the documentation explaining how Python selects the
filesystem encoding and error handler.
https://bugs.python.org/issue42236