Skip to content

Conversation

@ncoghlan
Copy link
Contributor

@ncoghlan ncoghlan commented Sep 13, 2018

This PR moves locale coercion back to happening as soon as possible in the CLI app, so there's no re-encoding dance involved in handling it, and so you can emulate systems that don't have a suitable target locale with PYTHONCOERCECLOCALE=0, even when working on code that runs with -E or -I.

To help with the documentation enhancements in the PR, it also implements [bpo-30672](https://www.bugs.python.org/issue30672) (which makes the POSIX locale behave the same way as it does on glibc, even when it's not a simple alias for the default C locale)

https://bugs.python.org/issue34589

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the idea of having PYTHONCOERCECLOCALE env var which behaves differently than all other PYTHON* env vars. See my -X proposal: https://bugs.python.org/issue34589#msg325282

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@vstinner
Copy link
Member

I merged my PR #9073 instead: it is closer to the current code, allows to implement a new -X coerce_c_locale command line option, and still respects -E and -I option: ignore PYTHONCOERCECLOCALE.

Maybe we can refactor the Python initialization code later, but we need a fix on the current code before 3.7.1 release which is scheduled tomorrow by our release manager, Ned.

@vstinner vstinner closed this Sep 17, 2018
@ncoghlan
Copy link
Contributor Author

What the hell Victor? This is NOT OK!

@ncoghlan ncoghlan reopened this Sep 19, 2018
@ncoghlan ncoghlan dismissed vstinner’s stale review September 19, 2018 07:34

Victor is attempting to arbitrarily overrule the PEP process

@ncoghlan
Copy link
Contributor Author

I've dismissed Victor's review, as he is inappropriately attempting to overrule the PEP process due to his personal preferences.

@ncoghlan
Copy link
Contributor Author

@vstinner "Closer to the current code" is not an advantage when the structure of the current code goes completely against the way PEP 538 was designed to work (i.e. the CPython runtime never even seeing the original uncoerced locale, except in the locale coercion helper functions)

@ncoghlan
Copy link
Contributor Author

Initial Travis failure affecting bb16468 was in test_nntplib relating to being unable to assign an address, so I restarted the job.

@ncoghlan ncoghlan changed the title WIP: bpo-34589: Move locale coercion back to the CLI app bpo-34589: Move locale coercion back to the CLI app Sep 20, 2018
@ncoghlan ncoghlan closed this Sep 20, 2018
@ncoghlan ncoghlan reopened this Sep 20, 2018
@ncoghlan
Copy link
Contributor Author

Got the test_nntplib failure again, so restarted all the CI checks. See https://bugs.python.org/issue19756 (in particular, the last couple of comments)

@ncoghlan ncoghlan changed the title bpo-34589: Move locale coercion back to the CLI app WIP bpo-34589: Move locale coercion back to the CLI app Sep 25, 2018
@ncoghlan
Copy link
Contributor Author

Moved back to WIP, as I plan to make further changes to the docs wording, and will be making it so that PYTHONCOERCECLOCALE is always handled the same way as the C-level locale configuration variables: read as ASCII independently of the Python -E and -I switches

- PYTHONCOERCECLOCALE is handled the same way as LANG,
  LC_CTYPE and LC_ALL (i.e. independently of -E and -I)
- locale coercion tests are once again running in isolated
  mode
- be explicit that the env var key and value must be encoded
  as ASCII to have any effect
- implement bpo-30672, such that the POSIX locale is always
  handled the same way as the C locale, even when it isn't a
  simple platform level alias for the latter
@ncoghlan ncoghlan force-pushed the bpo-34589-move-locale-coercion-back-to-the-intended-location branch from 9ff65d7 to e48e5b7 Compare September 30, 2018 07:19
@ncoghlan ncoghlan changed the title WIP bpo-34589: Move locale coercion back to the CLI app bpo-34589: Move locale coercion back to the CLI app Sep 30, 2018
@ncoghlan
Copy link
Contributor Author

@vstinner I know you object-in-principle to the idea of having PYTHONCOERCECLOCALE behave the same way as LANG, LC_CTYPE, and LC_ALL do (rather than having it behave like other PYTHON* settings), but I'd still appreciate it if you were willing to review the changes I'm proposing to the CoreConfig initialisation code.

The big items are:

  • the coercion related settings go away, replaced with a simple setting related solely to displaying C locale related warnings (or not)
  • a separate read_unconditional_env_vars helper function is added

(This isn't urgent, since it's currently only for Python 3.8, and even if a variant of it does get backported, it would be for 3.7.2 at the earliest)

@vstinner
Copy link
Member

vstinner commented Oct 1, 2018

Sorry, I will not review this change. (If you want my opinion, I dislike the PR since it introduces an exception to the -E option and I dislike that.)

@ericsnowcurrently
Copy link
Member

Sorry this is taking so long. I wasn't super familiar with the locale stuff so I've been taking my time to make sure I've got it straight. Let me make sure I understand the change correctly.

Status quo:

  1. deal with commandline/env (and locale)
    1. init runtime state
    2. default to Py_UTF8Mode
    3. call _Py_SetLocaleFromEnv(LC_CTYPE)
    4. parse commandline
    5. maybe populate core config from env vars (with locale already set), including coerce_c_locale* (from PYTHONCOERCECLOCALE)
    6. call _Py_CoerceLegacyLocale() if coerce_c_locale is set (from step 1.5)...and go back to step 1.4
    7. go back to step 1.4 if switched to UTF8 mode
  2. init stdio
  3. init core
  4. configure main interpreter
  5. initialize main interpreter
  6. run the program

This PR:

  1. call _Py_SetLocaleFromEnv(LC_CTYPE)
  2. call Py_CoerceLegacyLocale() if a legacy locale is detected and PYTHONCOERCECLOCALE is set
  3. deal with commandline/env (and locale)
    1. init runtime state
    2. default to Py_UTF8Mode
    3. parse commandline
    4. maybe populate core config from env vars, not including coerce_c_locale*
    5. always populate warn_c_locale (using the locale-encoded value from PYTHONCOERCECLOCALE)
    6. go back to step 3 if switched to UTF8 mode
  4. init stdio
  5. print the locale coercion warning if appropriate
  6. init core
  7. configure main interpreter
  8. initialize main interpreter
  9. run the program

Here are the differences I noticed:

Status quo:

  • 2 _PyCoreConfig fields (one to indicate that coercion should happen & one for the warning)
  • initial locale setting (with LC_CTYPE)
    • right before processing commandline (step 1.3 above)
  • reading PYTHONCOERCECLOCALE (step 1.5 above)
    • not used if -E was used
    • relies on currently-set locale (possibly UTF mode)
  • _Py_LegacyLocaleDetected()
    • called in config_init_locale() (basically right after env vars are processed; step 1.5 above)
    • used as a fallback if PYTHONCOERCECLOCALE wasn't set
  • _Py_CoerceLegacyLocale()
    • called when in the midst of processing the commandline (step 1.6 above)
    • called if PYTHONCOERCECLOCALE set or _Py_LegacyLocaleDetected() returns true
  • warning
    • happens immediately when _Py_CoerceLegacyLocale() is called (step 1.6 above)

This PR:

  • 1 _PyCoreConfig field (to indicate the warning)
  • the locale is initialized to LC_CTYPE immediately (step 1 above)
  • reading PYTHONCOERCECLOCALE (step 2 above)
    • used even if -E was used
    • relies only on LC_CTYPE locale
  • reading PYTHONCOERCECLOCALE (step 3.5 above)
    • only used if -E was used
    • relies on currently-set locale
  • _Py_LegacyLocaleDetected()
    • called right away, before any other locale stuff might have been done or any env vars/cmdline processed (step 2 above)
  • _Py_CoerceLegacyLocale()
    • also called right away and only if a legacy locale is detected (step 2 above)
    • called if PYTHONCOERCECLOCALE set (and not "0") and _Py_LegacyLocaleDetected() returns true
  • warning
    • happens right before core initialization (step 5 above)

In practice, it seems that the key changes are:

  • _Py_CoerceLegacyLocale() is only called if a legacy locale is detected and PYTHONCOERCECLOCALE is set
  • -E no longer affects the decision to call _Py_CoerceLegacyLocale()

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me, especially in light of the following from PEP 538:

The interpreter will always check for the PYTHONCOERCECLOCALE environment variable at startup (even when running under the -E or -I switches), as the locale coercion check necessarily takes place before any command line argument processing. For consistency, the runtime check to determine whether or not to suppress the locale compatibility warning will be similarly independent of these settings.

There are a few minor things to address otherwise.

One point of clarification: is it intentional that the warning is not emitted under -E even if the locale is coerced?

const char *ctype_loc = setlocale(LC_CTYPE, NULL);
return ctype_loc != NULL && strcmp(ctype_loc, "C") == 0;
return ctype_loc != NULL &&
(strcmp(ctype_loc, "C") == 0 || strcmp(ctype_loc, "C") == 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean for one of these two to be "c" (lower-case) or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-and-paste error (second one was supposed to POSIX), combined with deferral of the related tests to a follow-up PR for https://www.bugs.python.org/issue30672

Modules/main.c Outdated
/* Set LC_CTYPE to the user preferred locale */
_Py_SetLocaleFromEnv(LC_CTYPE);

/* TODO: With locale coercion moved back to _PyUnix_Main, this can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function comment above for pymain_read_conf() is out-of-date with this change, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified (also filed https://bugs.python.org/issue34914 which I was planning to do anyway, updating that comment just reminded me I hadn't done it yet)

void
_Py_CoerceLegacyLocale(int warn)
int
_Py_CoerceLegacyLocale(const char **coercion_target,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth noting somewhere that this is one of the few C-API functions that may be called before the runtime is configured (or possibly even initialized). I know it's somewhat orthogonal to the objectives of this PR, but such a comment would give a little context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note to both this and the legacy locale detection API.

The downside of the notes I added is that a lot of other functions have those same restrictions, but don't have a marker comment like the ones I just added, so they may give folks an incorrect impression.

So it may be better to take those comments out again, and file an enhancement issue on BPO pointing out that this is a maintainability issue, where it's not obvious which parts of the code need to be careful about only calling pre-init friendly APIs.

pymain_init_stdio(pymain, config);

if (config->warn_on_c_locale && pymain->locale_coercion_warning != NULL) {
fprintf(stderr, pymain->locale_coercion_warning, pymain->locale_coercion_target);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we used sprintf in _Py_CoerceLegacyLocale() then we'd only need to return the fully-formed warning from it and we could drop pymain->locale_coercion_target entirely. I mention this because that field was a little misleading/confusing at first (I was expecting it be used elsewhere).

Did you have some other plan for that field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I had with doing the message formatting in _Py_CoerceLegacyLocale is coping with the dynamic memory allocation needed (you could use a statically allocated array instead, but that's a different kind of ick).

However, I'm thinking there may be a better way to defer the rendering of the warning while still only needing one field on the main struct: make the field a function pointer accepting a C output stream that gets called instead of a C string that gets printed, and then in the warning rendering function, substitute the actual current locale into the message, rather than saving the name of the coercion target (This is all internal to the CPython command line app, so we don't need to worry about intervening locale changes).

With that change, this would just be a standard C callback API, rather than something that tightly couples the code that determines the warning is needed to the code that actually prints it after the standard streams are configured.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@ncoghlan ncoghlan changed the title bpo-34589: Move locale coercion back to the CLI app WIP bpo-34589: Move locale coercion back to the CLI app Nov 6, 2018
@ncoghlan
Copy link
Contributor Author

While messing with the current locale setting from a support library rather than the main application still bothers me, I've come to the conclusion that it doesn't bother me enough to want to introduce a non-trivial distinction between the way Python 3.7 handles this scenario, and the way Python 3.8+ handles it.

https://bugs.python.org/issue34589#msg332123 provides a bit more rationale (essentially, I expect that the the combination of circumstances required for there to be an externally observable difference in behaviour between Python 3.7.0 and this PR is going to be sufficiently rare that it simply isn't worth adding a net 200+ lines of code to maintain).

Accordingly, closing this PR and the associated issue indefinitely - while I preferred the original simpler implementation of locale coercion in isolation, all the machinery that the current more complex implementation relies on has to exist to handle UTF-8 mode anyway, so reverting locale coercion back to its original implementation model would end up making the interpreter as a whole more complicated (both to maintain, and to understand).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants