-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
WIP bpo-34589: Move locale coercion back to the CLI app #9257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP bpo-34589: Move locale coercion back to the CLI app #9257
Conversation
vstinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike the idea of having PYTHONCOERCECLOCALE env var which behaves differently than all other PYTHON* env vars. See my -X proposal: https://bugs.python.org/issue34589#msg325282
|
When you're done making the requested changes, leave the comment: |
|
I merged my PR #9073 instead: it is closer to the current code, allows to implement a new -X coerce_c_locale command line option, and still respects -E and -I option: ignore PYTHONCOERCECLOCALE. Maybe we can refactor the Python initialization code later, but we need a fix on the current code before 3.7.1 release which is scheduled tomorrow by our release manager, Ned. |
|
What the hell Victor? This is NOT OK! |
Victor is attempting to arbitrarily overrule the PEP process
|
I've dismissed Victor's review, as he is inappropriately attempting to overrule the PEP process due to his personal preferences. |
|
@vstinner "Closer to the current code" is not an advantage when the structure of the current code goes completely against the way PEP 538 was designed to work (i.e. the CPython runtime never even seeing the original uncoerced locale, except in the locale coercion helper functions) |
…le-coercion-back-to-the-intended-location
|
Initial Travis failure affecting bb16468 was in test_nntplib relating to being unable to assign an address, so I restarted the job. |
|
Got the test_nntplib failure again, so restarted all the CI checks. See https://bugs.python.org/issue19756 (in particular, the last couple of comments) |
|
Moved back to WIP, as I plan to make further changes to the docs wording, and will be making it so that PYTHONCOERCECLOCALE is always handled the same way as the C-level locale configuration variables: read as ASCII independently of the Python |
…le-coercion-back-to-the-intended-location
- PYTHONCOERCECLOCALE is handled the same way as LANG, LC_CTYPE and LC_ALL (i.e. independently of -E and -I) - locale coercion tests are once again running in isolated mode - be explicit that the env var key and value must be encoded as ASCII to have any effect - implement bpo-30672, such that the POSIX locale is always handled the same way as the C locale, even when it isn't a simple platform level alias for the latter
9ff65d7 to
e48e5b7
Compare
|
@vstinner I know you object-in-principle to the idea of having The big items are:
(This isn't urgent, since it's currently only for Python 3.8, and even if a variant of it does get backported, it would be for 3.7.2 at the earliest) |
|
Sorry, I will not review this change. (If you want my opinion, I dislike the PR since it introduces an exception to the -E option and I dislike that.) |
|
Sorry this is taking so long. I wasn't super familiar with the locale stuff so I've been taking my time to make sure I've got it straight. Let me make sure I understand the change correctly. Status quo:
This PR:
Here are the differences I noticed: Status quo:
This PR:
In practice, it seems that the key changes are:
|
ericsnowcurrently
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change makes sense to me, especially in light of the following from PEP 538:
The interpreter will always check for the PYTHONCOERCECLOCALE environment variable at startup (even when running under the -E or -I switches), as the locale coercion check necessarily takes place before any command line argument processing. For consistency, the runtime check to determine whether or not to suppress the locale compatibility warning will be similarly independent of these settings.
There are a few minor things to address otherwise.
One point of clarification: is it intentional that the warning is not emitted under -E even if the locale is coerced?
Python/pylifecycle.c
Outdated
| const char *ctype_loc = setlocale(LC_CTYPE, NULL); | ||
| return ctype_loc != NULL && strcmp(ctype_loc, "C") == 0; | ||
| return ctype_loc != NULL && | ||
| (strcmp(ctype_loc, "C") == 0 || strcmp(ctype_loc, "C") == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean for one of these two to be "c" (lower-case) or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-and-paste error (second one was supposed to POSIX), combined with deferral of the related tests to a follow-up PR for https://www.bugs.python.org/issue30672
Modules/main.c
Outdated
| /* Set LC_CTYPE to the user preferred locale */ | ||
| _Py_SetLocaleFromEnv(LC_CTYPE); | ||
|
|
||
| /* TODO: With locale coercion moved back to _PyUnix_Main, this can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function comment above for pymain_read_conf() is out-of-date with this change, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified (also filed https://bugs.python.org/issue34914 which I was planning to do anyway, updating that comment just reminded me I hadn't done it yet)
| void | ||
| _Py_CoerceLegacyLocale(int warn) | ||
| int | ||
| _Py_CoerceLegacyLocale(const char **coercion_target, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth noting somewhere that this is one of the few C-API functions that may be called before the runtime is configured (or possibly even initialized). I know it's somewhat orthogonal to the objectives of this PR, but such a comment would give a little context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note to both this and the legacy locale detection API.
The downside of the notes I added is that a lot of other functions have those same restrictions, but don't have a marker comment like the ones I just added, so they may give folks an incorrect impression.
So it may be better to take those comments out again, and file an enhancement issue on BPO pointing out that this is a maintainability issue, where it's not obvious which parts of the code need to be careful about only calling pre-init friendly APIs.
| pymain_init_stdio(pymain, config); | ||
|
|
||
| if (config->warn_on_c_locale && pymain->locale_coercion_warning != NULL) { | ||
| fprintf(stderr, pymain->locale_coercion_warning, pymain->locale_coercion_target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we used sprintf in _Py_CoerceLegacyLocale() then we'd only need to return the fully-formed warning from it and we could drop pymain->locale_coercion_target entirely. I mention this because that field was a little misleading/confusing at first (I was expecting it be used elsewhere).
Did you have some other plan for that field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue I had with doing the message formatting in _Py_CoerceLegacyLocale is coping with the dynamic memory allocation needed (you could use a statically allocated array instead, but that's a different kind of ick).
However, I'm thinking there may be a better way to defer the rendering of the warning while still only needing one field on the main struct: make the field a function pointer accepting a C output stream that gets called instead of a C string that gets printed, and then in the warning rendering function, substitute the actual current locale into the message, rather than saving the name of the coercion target (This is all internal to the CPython command line app, so we don't need to worry about intervening locale changes).
With that change, this would just be a standard C callback API, rather than something that tightly couples the code that determines the warning is needed to the code that actually prints it after the standard streams are configured.
Misc/NEWS.d/next/Core and Builtins/2018-09-20-21-36-57.bpo-34589.9zERCD.rst
Show resolved
Hide resolved
|
When you're done making the requested changes, leave the comment: |
…le-coercion-back-to-the-intended-location
…le-coercion-back-to-the-intended-location
Misc/NEWS.d/next/Core and Builtins/2018-09-20-21-36-57.bpo-34589.9zERCD.rst
Show resolved
Hide resolved
…le-coercion-back-to-the-intended-location
|
While messing with the current locale setting from a support library rather than the main application still bothers me, I've come to the conclusion that it doesn't bother me enough to want to introduce a non-trivial distinction between the way Python 3.7 handles this scenario, and the way Python 3.8+ handles it. https://bugs.python.org/issue34589#msg332123 provides a bit more rationale (essentially, I expect that the the combination of circumstances required for there to be an externally observable difference in behaviour between Python 3.7.0 and this PR is going to be sufficiently rare that it simply isn't worth adding a net 200+ lines of code to maintain). Accordingly, closing this PR and the associated issue indefinitely - while I preferred the original simpler implementation of locale coercion in isolation, all the machinery that the current more complex implementation relies on has to exist to handle UTF-8 mode anyway, so reverting locale coercion back to its original implementation model would end up making the interpreter as a whole more complicated (both to maintain, and to understand). |
This PR moves locale coercion back to happening as soon as possible in the CLI app, so there's no re-encoding dance involved in handling it, and so you can emulate systems that don't have a suitable target locale with
PYTHONCOERCECLOCALE=0, even when working on code that runs with-Eor-I.To help with the documentation enhancements in the PR, it also implements [bpo-30672](https://www.bugs.python.org/issue30672) (which makes the POSIX locale behave the same way as it does on glibc, even when it's not a simple alias for the default C locale)
https://bugs.python.org/issue34589