-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
gh-140601: Add ResourceWarning to iterparse when not closed #140603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-140601: Add ResourceWarning to iterparse when not closed #140603
Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
7d1d79a to
54742a1
Compare
When iterparse() opens a file by filename and is not explicitly closed, emit a ResourceWarning to alert developers of the resource leak. This implements the TODO comment at line 1270 of ElementTree.py which has been requesting this feature since the close() method was added. - Add _closed flag to IterParseIterator to track state - Emit ResourceWarning in __del__ if not closed - Add comprehensive test cases - Update existing tests to properly close iterators Signed-off-by: Osama Abdelkader <[email protected]>
54742a1 to
bf3f888
Compare
|
Please don't force push to CPython PRs; it's preferred to keep commit history intact for easier review. See: https://devguide.python.org/getting-started/pull-request-lifecycle/#quick-guide |
|
cc: @serhiy-storchaka who added the |
Lib/test/test_xml_etree.py
Outdated
| it = iterparse(SIMPLE_XMLFILE) | ||
| del it | ||
| import gc | ||
| gc.collect() # Ensure previous iterator is cleaned up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to use support.gc_collect() instead of directly calling gc.collect() was specifically added for use in tests to ensure consistent and deterministic garbage collection behavior across platforms and configurations.
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests were intentionally added to check that the resource is closed without explicit close(). Don't add close(). Just replace check_no_resource_warning() with assertWarns(ResourceWarning) and add gc_collect(). There are other tests that explicitly call close().
- Use nonlocal close_source instead of _closed flag - Bind warnings.warn as default parameter in __del__ - Use assertWarns(ResourceWarning) instead of adding close() calls - Use support.gc_collect() for consistent test behavior - Remove broad exception handling Signed-off-by: Osama Abdelkader <[email protected]>
Signed-off-by: Osama Abdelkader <[email protected]>
Signed-off-by: Osama Abdelkader <[email protected]>
F-strings can fail during cleanup. Use % formatting like subprocess.Popen does for safer __del__ behavior. Signed-off-by: Osama Abdelkader <[email protected]>
Check that warning message contains 'unclosed file' and the filename to ensure it's from iterparse, not the file destructor. Addresses review feedback from serhiy-storchaka. Signed-off-by: Osama Abdelkader <[email protected]>
The ResourceWarning tests are already covered in test_iterparse() as noted by serhiy-storchaka in review. Signed-off-by: Osama Abdelkader <[email protected]>
Removing source=self prevents unraisable exception in C extension. The warning message still contains the filename via %r formatting. Signed-off-by: Osama Abdelkader <[email protected]>
|
I've addressed all the review feedback: Is this expected behavior for warnings in Thanks! |
|
No, it is not an expected behavior. I have fixed the warnings. The underlying file is closed when the iterator has been exhausted or raised an exception, so no explicit close() is needed in this case. And many tests depended on this. I have removed a ResourceWarning in such case. Please add a |
Document that iterparse now emits ResourceWarning in Python 3.15 when not explicitly closed. Signed-off-by: Osama Abdelkader <[email protected]>
|
Thanks @serhiy-storchaka for the support. |
|
I did not merge this PR yet, because there is an important question -- should the warning be emitted if the generator was automatically closed after iterating to the end (or after parsing error)? I silenced them, otherwise many existing tests would need an explicit But now I think that the warning should be emitted even if the file was closed. Yes, this will force users to add an explicit |
|
👍 to always emitting the warning. I think more critical in potential data loss cases (ex. there's a write buffer) but it's also worthwhile when it helps prevent hard to track down easy to resolve bugs |
| try: | ||
| _warn(f"unclosed iterparse iterator {source.name!r}", ResourceWarning, stacklevel=2) | ||
| finally: | ||
| source.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is better?
| try: | |
| _warn(f"unclosed iterparse iterator {source.name!r}", ResourceWarning, stacklevel=2) | |
| finally: | |
| source.close() | |
| name = getattr(source, 'name', None) | |
| if name: | |
| _warn("unclosed iterparse iterator %r" % (name,), | |
| ResourceWarning, stacklevel=2) | |
| source.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source.name always exist and not empty. This code is only executed when iterparse() opened a file by name.
BTW, if the iterparse() argument was file descriptor 0, your variant would not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right - I missed that close_source=True guarantees source.name exists, and if name: breaks on fd 0. Thanks for catching that!
The original code is correct. (Though % formatting might be slightly safer than f-string in __del__, but not a blocker.)
| nonlocal close_source | ||
| if close_source: | ||
| source.close() | ||
| close_source = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this option be better here?
try:
if close_source:
close_source = False
source.close()
finally:
gen.close()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder. It seems that only exception can be raised here is a KeyboardInterrupt. The difference between calling and not calling gen.close() is whether you can continue iterating after iterparse().close() (cached elements can still be yielded). But if iterparse().close() failed, you do not have any guarantees. Anyway, the file is closed first, so there will not be leaks. I think there will be no practical difference, so we can keep the simplest code.
Lib/test/test_xml_etree.py
Outdated
|
|
||
| # Not exhausting the iterator still closes the resource (bpo-43292) | ||
| with warnings_helper.check_no_resource_warning(self): | ||
| def test_iterparse_not_close(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to split one huge test into several smaller ones, since this test has already become too big?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. There are 4 different cases here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other big tests here, so it is worth to refactor them before merging this PR. See #141499.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
Co-authored-by: Cody Maloney <[email protected]>
* main: (463 commits) pythongh-140601: Add ResourceWarning to iterparse when not closed (pythonGH-140603) pythongh-137969: Fix double evaluation of `ForwardRef`s which rely on globals (python#140974) pythongh-139109: A new tracing JIT compiler frontend for CPython (pythonGH-140310) pythongh-141004: Document `PyErr_RangedSyntaxLocationObject` (python#141521) pythongh-140873: Add support of non-descriptor callables in functools.singledispatchmethod() (pythonGH-140884) pythongh-139653: Add PyUnstable_ThreadState_SetStackProtection() (python#139668) pythongh-141004: Document `PyCode_Optimize` (pythonGH-141378) pythongh-141004: Document C APIs for dictionary keys, values, and items (pythonGH-141009) pythongh-137959: Fix `TIER1_TO_TIER2` macro name in JIT InternalDocs (pythonGH-141496) pythongh-139871: Add `bytearray.take_bytes([n])` to efficiently extract `bytes` (pythonGH-140128) pythongh-140601: Refactor ElementTree.iterparse() tests (pythonGH-141499) pythongh-135801: Add the module parameter to compile() etc (pythonGH-139652) pythongh-140260: fix data race in `_struct` module initialization with subinterpreters (python#140909) pythongh-137109: refactor warning about threads when forking (python#141438) pythongh-141004: Document `PyRun_InteractiveOneObject` (pythonGH-141405) pythongh-124111: Fix TCL 9 thread detection (pythonGH-128103) pythongh-141442: Add escaping to iOS testbed arguments (python#141443) pythongh-140936: Fix JIT assertion crash at finalization if some generator is alive (pythonGH-140969) Add details about JIT build infrastructure and updating dependencies to `Tools/jit` (python#141167) pythongh-141412: Use reliable target URL for urllib example (pythonGH-141428) ...
* 'main' of github.com:python/cpython: (464 commits) pythongh-140601: Add ResourceWarning to iterparse when not closed (pythonGH-140603) pythongh-137969: Fix double evaluation of `ForwardRef`s which rely on globals (python#140974) pythongh-139109: A new tracing JIT compiler frontend for CPython (pythonGH-140310) pythongh-141004: Document `PyErr_RangedSyntaxLocationObject` (python#141521) pythongh-140873: Add support of non-descriptor callables in functools.singledispatchmethod() (pythonGH-140884) pythongh-139653: Add PyUnstable_ThreadState_SetStackProtection() (python#139668) pythongh-141004: Document `PyCode_Optimize` (pythonGH-141378) pythongh-141004: Document C APIs for dictionary keys, values, and items (pythonGH-141009) pythongh-137959: Fix `TIER1_TO_TIER2` macro name in JIT InternalDocs (pythonGH-141496) pythongh-139871: Add `bytearray.take_bytes([n])` to efficiently extract `bytes` (pythonGH-140128) pythongh-140601: Refactor ElementTree.iterparse() tests (pythonGH-141499) pythongh-135801: Add the module parameter to compile() etc (pythonGH-139652) pythongh-140260: fix data race in `_struct` module initialization with subinterpreters (python#140909) pythongh-137109: refactor warning about threads when forking (python#141438) pythongh-141004: Document `PyRun_InteractiveOneObject` (pythonGH-141405) pythongh-124111: Fix TCL 9 thread detection (pythonGH-128103) pythongh-141442: Add escaping to iOS testbed arguments (python#141443) pythongh-140936: Fix JIT assertion crash at finalization if some generator is alive (pythonGH-140969) Add details about JIT build infrastructure and updating dependencies to `Tools/jit` (python#141167) pythongh-141412: Use reliable target URL for urllib example (pythonGH-141428) ...
* 'main' of github.com:python/cpython: (464 commits) pythongh-140601: Add ResourceWarning to iterparse when not closed (pythonGH-140603) pythongh-137969: Fix double evaluation of `ForwardRef`s which rely on globals (python#140974) pythongh-139109: A new tracing JIT compiler frontend for CPython (pythonGH-140310) pythongh-141004: Document `PyErr_RangedSyntaxLocationObject` (python#141521) pythongh-140873: Add support of non-descriptor callables in functools.singledispatchmethod() (pythonGH-140884) pythongh-139653: Add PyUnstable_ThreadState_SetStackProtection() (python#139668) pythongh-141004: Document `PyCode_Optimize` (pythonGH-141378) pythongh-141004: Document C APIs for dictionary keys, values, and items (pythonGH-141009) pythongh-137959: Fix `TIER1_TO_TIER2` macro name in JIT InternalDocs (pythonGH-141496) pythongh-139871: Add `bytearray.take_bytes([n])` to efficiently extract `bytes` (pythonGH-140128) pythongh-140601: Refactor ElementTree.iterparse() tests (pythonGH-141499) pythongh-135801: Add the module parameter to compile() etc (pythonGH-139652) pythongh-140260: fix data race in `_struct` module initialization with subinterpreters (python#140909) pythongh-137109: refactor warning about threads when forking (python#141438) pythongh-141004: Document `PyRun_InteractiveOneObject` (pythonGH-141405) pythongh-124111: Fix TCL 9 thread detection (pythonGH-128103) pythongh-141442: Add escaping to iOS testbed arguments (python#141443) pythongh-140936: Fix JIT assertion crash at finalization if some generator is alive (pythonGH-140969) Add details about JIT build infrastructure and updating dependencies to `Tools/jit` (python#141167) pythongh-141412: Use reliable target URL for urllib example (pythonGH-141428) ...
…thonGH-140603) When iterparse() opens a file by filename and is not explicitly closed, emit a ResourceWarning to alert developers of the resource leak. Signed-off-by: Osama Abdelkader <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
When iterparse() opens a file by filename and is not explicitly closed, emit a ResourceWarning to alert developers of the resource leak.
This implements the TODO comment at line 1270 of ElementTree.py which has been requesting this feature since the close() method was added.