Skip to content

Conversation

@JustinTArthur
Copy link
Contributor

@JustinTArthur JustinTArthur commented Oct 14, 2020

As shown in bpo-38912 comments' log entries, test_create_unix_server_ssl_verified and test_create_server_ssl_verified could end up trying to close a transport they didn't have a reference to yet because a test Protocol's connection_made callback had yet to be called by the event loop:

======================================================================
ERROR: test_create_server_ssl_verified (test.test_asyncio.test_events.SelectEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vstinner/python/master/Lib/test/test_asyncio/test_events.py", line 1110, in test_create_server_ssl_verified
    proto.transport.close()
AttributeError: 'NoneType' object has no attribute 'close'

This PR ensures the callback is waited on before closing the connection to the client (and assuming that the desired SSL connectivity was established).

https://bugs.python.org/issue38912

Copy link
Contributor

@aeros aeros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to replicate the failure under normal conditions (on Arch Linux, kernel 5.8.x). However, adding time.sleep(.1) to the start of MyProto.connection_made() resulted in the race condition occurring regularly:

======================================================================
ERROR: test_create_server_ssl_verified (test.test_asyncio.test_events.PollEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_events.py", line 1111, in test_create_server_ssl_verified
    proto.transport.close()
AttributeError: 'NoneType' object has no attribute 'close'

Including the time.sleep(.1) change, I ran ~500 iterations of test_create_server_ssl_verified using the changes applied in this PR:

[aeros:~/repos/cpython]$ ./python -m test -v test_asyncio.test_events --match test_create_server_ssl_verified -j4 -F
...
Ran 3 tests in 0.709s

OK
0:02:28 load avg: 2.63 [565] test_asyncio.test_events passed
test_create_server_ssl_verified (test.test_asyncio.test_events.EPollEventLoopTests) ... ok
test_create_server_ssl_verified (test.test_asyncio.test_events.PollEventLoopTests) ... ok
test_create_server_ssl_verified (test.test_asyncio.test_events.SelectEventLoopTests) ... ok

and ~500 iterations of test_create_unix_server_ssl_verified:

[aeros:~/repos/cpython]$ ./python -m test -v test_asyncio.test_events --match test_create_unix_server_ssl_verified -j4 -F
...
Ran 3 tests in 0.215s

OK
0:01:23 load avg: 5.05 [508] test_asyncio.test_events passed
test_create_unix_server_ssl_verified (test.test_asyncio.test_events.EPollEventLoopTests) ... ok
test_create_unix_server_ssl_verified (test.test_asyncio.test_events.PollEventLoopTests) ... ok
test_create_unix_server_ssl_verified (test.test_asyncio.test_events.SelectEventLoopTests) ... ok

So, I think it's safe to say that it was indeed an issue with a race condition between the event loop and the test prototype's connection_made(), as suggested in the bpo issue, and that this patch addresses it. To be additionally certain that it no longer occurs on other platforms, I'll add a test-with-buildbots label (note that other intermittent failures unrelated to this PR could potentially occur, so we'll be specifically looking out for relevant test_asyncio.test_events issues).

@aeros aeros added 🔨 test-with-buildbots Test PR w/ buildbots; report in status section topic-asyncio skip news tests Tests in the Lib/test dir labels Oct 15, 2020
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @aeros for commit 2a86d28 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 15, 2020
@JustinTArthur
Copy link
Contributor Author

Thanks for running them and help test, @aeros. Looks like all passed except for an ENV_CHANGED from buildbot/PPC64LE RHEL8 Refleaks PR.

Copy link
Contributor

@aeros aeros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it looks like we reached a similar error within ssl_proto (from warnings log in buildbot/PPC64LE RHEL8 Refleaks PR):

...
./home/buildbot/buildarea/pull_request.cstratak-RHEL8-ppc64le.refleak/build/Lib/asyncio/sslproto.py:320: ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x7fffa21fc830>
  _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Warning -- Unraisable exception
Exception ignored in: <function _SSLProtocolTransport.__del__ at 0x7fffaa142c30>
/home/buildbot/buildarea/pull_request.cstratak-RHEL8-ppc64le.refleak/build/Lib/asyncio/selector_events.py:704: ResourceWarning: unclosed transport <_SelectorSocketTransport closing fd=9>
  _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback

At the very least though, I think that based on my local testing and buildbot fleet results, the failure occurs much less commonly now, which is a clear improvement from before. As the core dev that's probably the most experienced in chasing down buildbot failures, @vstinner, do you have any thoughts? My vote is +1, but I wouldn't mind a second opinion here.

@aeros
Copy link
Contributor

aeros commented Oct 20, 2020

I've decided to proceed with merging since it's a clear difference from before with the test failure happening significantly less often (even if it doesn't 100% address the issue). Thanks @JustinTArthur!

@aeros aeros merged commit de73d43 into python:master Oct 20, 2020
@miss-islington
Copy link
Contributor

Thanks @JustinTArthur for the PR, and @aeros for merging it 🌮🎉.. I'm working now to backport this PR to: 3.8, 3.9.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 20, 2020
pythonGH-22691)

Reduces the rate at which the ENV CHANGED failure occurs in test_asyncio SSL tests (due to unclosed transport), but does not 100% resolve it.
(cherry picked from commit de73d43)

Co-authored-by: Justin Turner Arthur <[email protected]>
@bedevere-bot
Copy link

GH-22799 is a backport of this pull request to the 3.9 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 20, 2020
pythonGH-22691)

Reduces the rate at which the ENV CHANGED failure occurs in test_asyncio SSL tests (due to unclosed transport), but does not 100% resolve it.
(cherry picked from commit de73d43)

Co-authored-by: Justin Turner Arthur <[email protected]>
@bedevere-bot
Copy link

GH-22800 is a backport of this pull request to the 3.8 branch.

miss-islington added a commit that referenced this pull request Oct 20, 2020
GH-22691)

Reduces the rate at which the ENV CHANGED failure occurs in test_asyncio SSL tests (due to unclosed transport), but does not 100% resolve it.
(cherry picked from commit de73d43)

Co-authored-by: Justin Turner Arthur <[email protected]>
miss-islington added a commit that referenced this pull request Oct 20, 2020
GH-22691)

Reduces the rate at which the ENV CHANGED failure occurs in test_asyncio SSL tests (due to unclosed transport), but does not 100% resolve it.
(cherry picked from commit de73d43)

Co-authored-by: Justin Turner Arthur <[email protected]>
@JustinTArthur JustinTArthur deleted the ensure_connect_in_ssl_tests branch October 20, 2020 01:47
@JustinTArthur
Copy link
Contributor Author

Thanks for the review and testing, @aeros. I hope to get some time to look at the remaining. We might be able to do a sweeping change to wrap cleanup activities in finally blocks or context managers for most of those tests to help ensure that only the leading exception is raised.

adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 13, 2021
pythonGH-22691)

Reduces the rate at which the ENV CHANGED failure occurs in test_asyncio SSL tests (due to unclosed transport), but does not 100% resolve it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip news tests Tests in the Lib/test dir topic-asyncio

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants