bpo-38912: fix close before connect callback on SSL tests#22691
bpo-38912: fix close before connect callback on SSL tests#22691aeros merged 1 commit intopython:masterfrom JustinTArthur:ensure_connect_in_ssl_tests
Conversation
There was a problem hiding this comment.
I was unable to replicate the failure under normal conditions (on Arch Linux, kernel 5.8.x). However, adding time.sleep(.1) to the start of MyProto.connection_made() resulted in the race condition occurring regularly:
======================================================================
ERROR: test_create_server_ssl_verified (test.test_asyncio.test_events.PollEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_events.py", line 1111, in test_create_server_ssl_verified
proto.transport.close()
AttributeError: 'NoneType' object has no attribute 'close'
Including the time.sleep(.1) change, I ran ~500 iterations of test_create_server_ssl_verified using the changes applied in this PR:
[aeros:~/repos/cpython]$ ./python -m test -v test_asyncio.test_events --match test_create_server_ssl_verified -j4 -F
...
Ran 3 tests in 0.709s
OK
0:02:28 load avg: 2.63 [565] test_asyncio.test_events passed
test_create_server_ssl_verified (test.test_asyncio.test_events.EPollEventLoopTests) ... ok
test_create_server_ssl_verified (test.test_asyncio.test_events.PollEventLoopTests) ... ok
test_create_server_ssl_verified (test.test_asyncio.test_events.SelectEventLoopTests) ... ok
and ~500 iterations of test_create_unix_server_ssl_verified:
[aeros:~/repos/cpython]$ ./python -m test -v test_asyncio.test_events --match test_create_unix_server_ssl_verified -j4 -F
...
Ran 3 tests in 0.215s
OK
0:01:23 load avg: 5.05 [508] test_asyncio.test_events passed
test_create_unix_server_ssl_verified (test.test_asyncio.test_events.EPollEventLoopTests) ... ok
test_create_unix_server_ssl_verified (test.test_asyncio.test_events.PollEventLoopTests) ... ok
test_create_unix_server_ssl_verified (test.test_asyncio.test_events.SelectEventLoopTests) ... ok
So, I think it's safe to say that it was indeed an issue with a race condition between the event loop and the test prototype's connection_made(), as suggested in the bpo issue, and that this patch addresses it. To be additionally certain that it no longer occurs on other platforms, I'll add a test-with-buildbots label (note that other intermittent failures unrelated to this PR could potentially occur, so we'll be specifically looking out for relevant test_asyncio.test_events issues).
|
Thanks for running them and help test, @aeros. Looks like all passed except for an ENV_CHANGED from |
There was a problem hiding this comment.
Hmm, it looks like we reached a similar error within ssl_proto (from warnings log in buildbot/PPC64LE RHEL8 Refleaks PR):
...
./home/buildbot/buildarea/pull_request.cstratak-RHEL8-ppc64le.refleak/build/Lib/asyncio/sslproto.py:320: ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x7fffa21fc830>
_warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Warning -- Unraisable exception
Exception ignored in: <function _SSLProtocolTransport.__del__ at 0x7fffaa142c30>
/home/buildbot/buildarea/pull_request.cstratak-RHEL8-ppc64le.refleak/build/Lib/asyncio/selector_events.py:704: ResourceWarning: unclosed transport <_SelectorSocketTransport closing fd=9>
_warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
At the very least though, I think that based on my local testing and buildbot fleet results, the failure occurs much less commonly now, which is a clear improvement from before. As the core dev that's probably the most experienced in chasing down buildbot failures, @vstinner, do you have any thoughts? My vote is +1, but I wouldn't mind a second opinion here.
|
I've decided to proceed with merging since it's a clear difference from before with the test failure happening significantly less often (even if it doesn't 100% address the issue). Thanks @JustinTArthur! |
|
Thanks @JustinTArthur for the PR, and @aeros for merging it 🌮🎉.. I'm working now to backport this PR to: 3.8, 3.9. |
pythonGH-22691) Reduces the rate at which the ENV CHANGED failure occurs in test_asyncio SSL tests (due to unclosed transport), but does not 100% resolve it. (cherry picked from commit de73d43) Co-authored-by: Justin Turner Arthur <[email protected]>
|
GH-22799 is a backport of this pull request to the 3.9 branch. |
pythonGH-22691) Reduces the rate at which the ENV CHANGED failure occurs in test_asyncio SSL tests (due to unclosed transport), but does not 100% resolve it. (cherry picked from commit de73d43) Co-authored-by: Justin Turner Arthur <[email protected]>
|
GH-22800 is a backport of this pull request to the 3.8 branch. |
GH-22691) Reduces the rate at which the ENV CHANGED failure occurs in test_asyncio SSL tests (due to unclosed transport), but does not 100% resolve it. (cherry picked from commit de73d43) Co-authored-by: Justin Turner Arthur <[email protected]>
GH-22691) Reduces the rate at which the ENV CHANGED failure occurs in test_asyncio SSL tests (due to unclosed transport), but does not 100% resolve it. (cherry picked from commit de73d43) Co-authored-by: Justin Turner Arthur <[email protected]>
|
Thanks for the review and testing, @aeros. I hope to get some time to look at the remaining. We might be able to do a sweeping change to wrap cleanup activities in finally blocks or context managers for most of those tests to help ensure that only the leading exception is raised. |
pythonGH-22691) Reduces the rate at which the ENV CHANGED failure occurs in test_asyncio SSL tests (due to unclosed transport), but does not 100% resolve it.
As shown in bpo-38912 comments' log entries,
test_create_unix_server_ssl_verifiedandtest_create_server_ssl_verifiedcould end up trying to close a transport they didn't have a reference to yet because a testProtocol'sconnection_madecallback had yet to be called by the event loop:This PR ensures the callback is waited on before closing the connection to the client (and assuming that the desired SSL connectivity was established).
https://bugs.python.org/issue38912