-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
bpo-38076 Clear the interpreter state only after clearing module globals #18039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-38076 Clear the interpreter state only after clearing module globals #18039
Conversation
|
🤖 New build scheduled with the buildbot fleet by @pablogsal for commit f2cfc13 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
|
I'm about to hop on a plane so I can't check it now but I'm curious what happens with: |
Still segfaults on my system: |
|
That of course fails since we are clearing the module state before we clear up the sys globals. You bring up a good point though. I'll just push this past the clearing of every module dictionary. I just tested that locally and it worked, even on your new example. I will add a new test and update the PR |
|
Another possibility would be restoring sys to its original form like we do builtins: https://github.com/eduardo-elizondo/cpython/blob/f2cfc13f65e94f755b5d47218f4479bca30118fc/Python/import.c#L564 And then collecting and then killing off sys/builtin and collecting again. |
DinoV
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to add the various test cases in the bug to the PR as well.
|
@DinoV given that sys is also treated specially just like builtins, it might make sense to also add this. Also, agreed now that you've pointed out a couple of other edge cases, I'll add more tests. |
|
Actually, I decided against creating a sysdict copy to avoid increasing the heap memory needed for the interpreter state. Just pushing further back the module state clearing does the trick. By the way, as you might have noticed in the sys test, |
| } | ||
| _PyModule_ClearDict(interp->builtins); | ||
|
|
||
| /* Clear module dict copies stored in the interpreter state */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tempted to change this comment to: "Clear module dict copies stored in the interpreter state only after all the modules have been cleared" or something along those lines.
|
Wow. Funny story. I was debugging some Python crash I got lately, and realized it has something to do with mishandled refcounts. I hooked After "solving" that with a temporary Just before getting off the computer, I hopped by the latest PRs of CPython, to see what kind of interpreter bugs people usually encounter and fix. And so it happens the very first PR I stumble upon is this exact problem. Luckiest click ever. TL;DR it was funny and I can confirm moving |
|
Thank you for the patch Eddie. This is a dark and ugly corner of CPython, as you probably realized. Over the years, many revisions have been made to the interpreter shutdown logic. E.g. doing things in various orders, adding extra steps, GC passes, etc. I think in nearly each revision, someone had an example where the old shutdown process produced a behavior they didn't like and they were motivated to tweak until their certain problem went away. It's a bit like playing "wack a mole", hammer one bug down, another pops up. Not saying your change is bad or wrong, I'm just giving you some background history. I suspect it is possible your change will result in someone else's program failing to shutdown gracefully. E.g. somehow they depend on interpreter state getting cleared first. Based on a quick review, the ordering you propose looks more sensible and robust. I've been thinking about a better way to do shutdown for a while now. To me, it seems intimately tied to the GC and how we handle finalizers. I.e. if there wasn't any finalizers, it should not matter what order we destroy things. Could we use gcmodule.c finalize_garbage() and handle_legacy_finalizers() to deal with finalizers and then not have them executing while we are destroying things? Making shutdown work that way would be a big change. |
I was recently working on a prototype for this idea based on the desired that the maximum amount of objects are destroyed when the interpreter ends. I could open an issue so we could evaluate that |
|
I would be interested to see that prototype Pablo. I don't know what we should do with this PR then. Based on a quick review, it looks sane to me but I don't feel confident I understand the full impact of the change. It is sort of a "band aid" fix and I'm okay with that if it improves things overall. Fixing a segfault is certainly a good thing. |
encukou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on a quick review, it looks sane to me but I don't feel confident I understand the full impact of the change. It is sort of a "band aid" fix and I'm okay with that if it improves things overall. Fixing a segfault is certainly a good thing.
I'm in the same position.
Let's apply the band-aid now, so it gets real-world testing in the alphas.
Currently, during runtime destruction,
_PyImport_Cleanupis clearing the interpreter state before clearing out the modules themselves. This leads to a segfault on modules that rely on the module state to clear themselves up.For example, let's take the small snippet added in the issue by @DinoV :
The module
_structuses the module state to runpack. Therefore, the module state has to be alive until after the module has been cleared out to successfully runC.__del__. This happens at line 606, when_PyImport_Cleanupcalls_PyModule_Clear. In fact, the loop that calls_PyModule_Clearhas in its comments:That means that we can't clear the module state (which is used by C Extensions) before we run that loop.
Moving
_PyInterpreterState_ClearModulesuntil after it, fixes the segfault in the code snippet.Finally, this updates a test in
ioto correctly assert the error that it now throws (since it now finds the io module state). The test that uses this is:test_create_at_shutdown_without_encoding. Given this test is now working is a proof that the module state now stays alive even when__del__is called at module destruction time. Thus, I didn't add a new tests for this.https://bugs.python.org/issue38076
Automerge-Triggered-By: @encukou