Skip to content

Conversation

@pablogsal
Copy link
Member

@pablogsal pablogsal commented Dec 2, 2018

Co-authored-by: tzickel [email protected]

This PR fixes a regression introduced in #8450 that manifests as a hanging process when using a multiprocessing.pool.imap or similar iterator objects without keeping a reference to the pool itself.

https://bugs.python.org/issue35378

@pablogsal pablogsal self-assigned this Dec 2, 2018
@pablogsal pablogsal added type-bug An unexpected behavior, bug, or error tests Tests in the Lib/test dir skip news labels Dec 2, 2018
@pablogsal
Copy link
Member Author

CC: @tzickel

@tzickel
Copy link
Contributor

tzickel commented Dec 3, 2018

A. We should also backport to 2.7 as well (since currently the patch is also there).
B. Why do we need cache=None in all the definitions ?
C. You remove all of the self._pool but not all of self._cache which point to a part of it, why ?

@pablogsal
Copy link
Member Author

pablogsal commented Dec 3, 2018

A. We should also backport to 2.7 as well (since currently the patch is also there).

Yes, but I prefer to have the implementation fixed before opening the manual backport.

B. Why do we need cache=None in all the definitions?

Because the old API allows passing a cache that is not the one in the pool (pool._cache). This way, you can still detach the pool and the cache. I prefer not to restrain the internal API when making a fix for a regresion. Also, this also justifies a bit more passing the pool: if you don't provide the cache, the one in the pool will be used (although this is a minor detail).

@tzickel
Copy link
Contributor

tzickel commented Dec 3, 2018

I added C. why not self._pool = self._cache = None ?

@pablogsal
Copy link
Member Author

pablogsal commented Dec 3, 2018

I added C. why not self._pool = self._cache = None ?

Unless I am missing something. the cache does not point to the pool/keep alive the pool. If that were true, this issue will not be happening. On the other hand I suppose this does not hurt, specially when the cache is populated with other jobs.


def __init__(self, cache, callback, error_callback):
def __init__(self, pool, callback, error_callback, cache=None):
self._pool = pool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrote "This PR fixes a regression introduced in PR #8450" which comes from https://bugs.python.org/issue34172. This issue is a reference cycle fix if I understand correctly, but here you create a new strong reference to the pool. I see a risk of creating new reference cycles. No?

Before https://bugs.python.org/issue34172 fix, ApplyResult didn't hold a reference to the pool: you add a new reference.

Maybe it's fine, but I'm just concerned that this PR might reintroduce a bug similar but different than https://bugs.python.org/issue34172

I don't understand why you need to hold a strong reference to the pool.

Copy link
Member Author

@pablogsal pablogsal Dec 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to hold a reference to the pool because the regression is that the pool may be destroyed while one of its iterators is still alive if no references are left to the pool. In this case, iterating over the iterator of the pool hangs. The solution links the lifetime of the pool to the lifetime of the iterator (basically: keeping the pool alive while the iterator is alive). When the iterator is exhausted, the references to the pool are cleaned.

The issue in #8450 is different, is a reference cycle between the Pool object, and the self._worker_handler Thread object, which is more fundamental.

In this case, the cycle needs to exist while the iterator is alive: all objects that depend on the pool need to make sure that the pool is alive while they are still in use (a non-consumed iterator...etc), otherwise hangs or inconsistent state can happen. This did not happen before #8450 because of the reference cycle between the Pool object and self._worker_handler prevented the pool to be destroyed.

This PR makes sure to break the cycle when the pool objects are not needed anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More long explanation available here:

https://bugs.python.org/msg330996

@vstinner
Copy link
Member

vstinner commented Dec 3, 2018

Let's discuss on https://bugs.python.org/issue34172

class ApplyResult(object):

def __init__(self, cache, callback, error_callback):
def __init__(self, pool, callback, error_callback, cache=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the additional cache argument?

@pablogsal
Copy link
Member Author

I am going to close this PR because we need to re-apply the origin patch to master and the new preferred solution is the weak reference.

@pablogsal pablogsal closed this Dec 9, 2018
@pablogsal pablogsal deleted the bpo35378 branch December 9, 2018 16:40
@pablogsal pablogsal restored the bpo35378 branch January 9, 2019 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting merge skip news tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants