bpo-17852: Fix PR #3372, flush BufferedWriter objects on exit.#4847
bpo-17852: Fix PR #3372, flush BufferedWriter objects on exit.#4847nascheme wants to merge 2 commits intopython:masterfrom
Conversation
We can't use _Py_PyAtExit() as it only supports registering a single callback. It is used by the atexit module and so we can't use it. We can't use Py_AtExit() either because it calls functions too late in the interpreter shutdown process. Instead, create io._flush_all_buffers. In io.py, register it with the atexit module.
Lib/_pyio.py
Outdated
| for w in _all_writers: | ||
| try: | ||
| w.flush() | ||
| except: |
There was a problem hiding this comment.
except Exception sounds better here.
pitrou
left a comment
There was a problem hiding this comment.
I think the approach here is nice and adequate. It would be nice to come up with a test case, if that's easily doable.
|
When you're done making the requested changes, leave the comment: |
|
I have made the requested changes; please review again. |
|
Thanks for making the requested changes! @pitrou: please review the changes made to this pull request. |
|
I spent some time trying to think of how to build a test. It is difficult. I believe you need a BufferedWriter + FileIO pair to be part of a reference cycle. Then you need the finalization of the FileIO object to happen before the BufferedWriter. At least, that is my memory of how to trigger the non-flushing behaviour. I spent about half a day trying to trigger the bug originally and figure out what was happening. Unfortunately I did not save the script that triggered it. I do distinctly recall that the buffered file needs to be part of a reference cycle. The motivation for me to make this patch was to hopefully save someone else going through solving a similar bug. |
|
Ha ha, I managed to trigger it. I added a script to bpo-17852. Also added some explanation that this bug has to do with topologically ordering of finalizers. |
|
Could you convert your script into a simple unit test? |
|
Closing this PR as it is not really a fix. It works if the reference cycle containing the raw file and buffered file object are not re-claimed before atexit gets run. However, the cycle can be claimed during a normal gc.collect() run. In that case, the raw file can get closed before the buffered file and data in the buffer will be discarded. I thought having the raw file keep a list of weak refs to the buffer and calling flush() on those when the raw close() is called would be a proper fix. However, that doesn't work either. The GC clears the weakrefs in handle_weakrefs() before it calls The only other idea I have is to split the buffered IO object into two parts. A "proxy" object that wraps the underlying state. The raw file object would keep a strong reference to this underlying state object.Then the raw file can call flush() before closing itself. |
New version of PR #3372, which was reverted by 317def9.
We can't use _Py_PyAtExit() as it only supports registering a single
callback. It is used by the atexit module and so we can't use it. We
can't use Py_AtExit() either because it calls functions too late in the
interpreter shutdown process. Instead, create io._flush_all_buffers.
In io.py, register it with the atexit module.
https://bugs.python.org/issue17852