bpo-35378: Link the lifetime of the pool to the pool's iterators and results #10852

pablogsal · 2018-12-02T20:33:11Z

Co-authored-by: tzickel [email protected]

This PR fixes a regression introduced in #8450 that manifests as a hanging process when using a multiprocessing.pool.imap or similar iterator objects without keeping a reference to the pool itself.

https://bugs.python.org/issue35378

pablogsal · 2018-12-02T20:34:33Z

CC: @tzickel

…results

Lib/multiprocessing/pool.py

tzickel · 2018-12-03T17:55:30Z

A. We should also backport to 2.7 as well (since currently the patch is also there).
B. Why do we need cache=None in all the definitions ?
C. You remove all of the self._pool but not all of self._cache which point to a part of it, why ?

Lib/multiprocessing/pool.py

pablogsal · 2018-12-03T18:03:28Z

A. We should also backport to 2.7 as well (since currently the patch is also there).

Yes, but I prefer to have the implementation fixed before opening the manual backport.

B. Why do we need cache=None in all the definitions?

Because the old API allows passing a cache that is not the one in the pool (pool._cache). This way, you can still detach the pool and the cache. I prefer not to restrain the internal API when making a fix for a regresion. Also, this also justifies a bit more passing the pool: if you don't provide the cache, the one in the pool will be used (although this is a minor detail).

tzickel · 2018-12-03T18:06:33Z

I added C. why not self._pool = self._cache = None ?

pablogsal · 2018-12-03T18:09:11Z

I added C. why not self._pool = self._cache = None ?

Unless I am missing something. the cache does not point to the pool/keep alive the pool. If that were true, this issue will not be happening. On the other hand I suppose this does not hurt, specially when the cache is populated with other jobs.

vstinner · 2018-12-03T22:21:19Z

Lib/multiprocessing/pool.py


-    def __init__(self, cache, callback, error_callback):
+    def __init__(self, pool, callback, error_callback, cache=None):
+        self._pool = pool


You wrote "This PR fixes a regression introduced in PR #8450" which comes from https://bugs.python.org/issue34172. This issue is a reference cycle fix if I understand correctly, but here you create a new strong reference to the pool. I see a risk of creating new reference cycles. No?

Before https://bugs.python.org/issue34172 fix, ApplyResult didn't hold a reference to the pool: you add a new reference.

Maybe it's fine, but I'm just concerned that this PR might reintroduce a bug similar but different than https://bugs.python.org/issue34172

I don't understand why you need to hold a strong reference to the pool.

We need to hold a reference to the pool because the regression is that the pool may be destroyed while one of its iterators is still alive if no references are left to the pool. In this case, iterating over the iterator of the pool hangs. The solution links the lifetime of the pool to the lifetime of the iterator (basically: keeping the pool alive while the iterator is alive). When the iterator is exhausted, the references to the pool are cleaned.

The issue in #8450 is different, is a reference cycle between the Pool object, and the self._worker_handler Thread object, which is more fundamental.

In this case, the cycle needs to exist while the iterator is alive: all objects that depend on the pool need to make sure that the pool is alive while they are still in use (a non-consumed iterator...etc), otherwise hangs or inconsistent state can happen. This did not happen before #8450 because of the reference cycle between the Pool object and self._worker_handler prevented the pool to be destroyed.

This PR makes sure to break the cycle when the pool objects are not needed anymore.

More long explanation available here:

https://bugs.python.org/msg330996

vstinner · 2018-12-03T23:08:24Z

Let's discuss on https://bugs.python.org/issue34172

pitrou · 2018-12-06T09:21:17Z

Lib/multiprocessing/pool.py

 class ApplyResult(object):

-    def __init__(self, cache, callback, error_callback):
+    def __init__(self, pool, callback, error_callback, cache=None):


Why the additional cache argument?

pablogsal · 2018-12-09T16:40:43Z

I am going to close this PR because we need to re-apply the origin patch to master and the new preferred solution is the weak reference.

pablogsal self-assigned this Dec 2, 2018

the-knights-who-say-ni added the CLA signed label Dec 2, 2018

bedevere-bot added the awaiting merge label Dec 2, 2018

pablogsal force-pushed the bpo35378 branch from 4f33052 to e053b38 Compare December 2, 2018 20:33

pablogsal added type-bug An unexpected behavior, bug, or error tests Tests in the Lib/test dir skip news labels Dec 2, 2018

bpo-35378: Link the lifetime of the pool to the pool's iterators and …

4b4d921

…results

pablogsal force-pushed the bpo35378 branch from e053b38 to 4b4d921 Compare December 2, 2018 20:38

pablogsal added needs backport to 3.6 labels Dec 2, 2018

pablogsal requested a review from vstinner December 2, 2018 20:53

tzickel reviewed Dec 3, 2018

View reviewed changes

Lib/multiprocessing/pool.py Outdated Show resolved Hide resolved

tzickel reviewed Dec 3, 2018

View reviewed changes

Lib/multiprocessing/pool.py Outdated Show resolved Hide resolved

pablogsal force-pushed the bpo35378 branch from 98bd021 to fb2f809 Compare December 3, 2018 18:05

pablogsal force-pushed the bpo35378 branch from fb2f809 to 9ca5017 Compare December 3, 2018 18:13

Use a longer (parametrised) timeout for the slower buildbots

171dab0

pablogsal force-pushed the bpo35378 branch from 9ca5017 to 171dab0 Compare December 3, 2018 18:19

vstinner reviewed Dec 3, 2018

View reviewed changes

pitrou reviewed Dec 6, 2018

View reviewed changes

pablogsal closed this Dec 9, 2018

pablogsal deleted the bpo35378 branch December 9, 2018 16:40

pablogsal restored the bpo35378 branch January 9, 2019 18:34

Uh oh!

bpo-35378: Link the lifetime of the pool to the pool's iterators and results #10852

bpo-35378: Link the lifetime of the pool to the pool's iterators and results #10852

Uh oh!

Conversation

pablogsal commented Dec 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablogsal commented Dec 2, 2018

Uh oh!

Uh oh!

tzickel commented Dec 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pablogsal commented Dec 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tzickel commented Dec 3, 2018

Uh oh!

pablogsal commented Dec 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner Dec 3, 2018

Choose a reason for hiding this comment

Uh oh!

pablogsal Dec 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pablogsal Dec 3, 2018

Choose a reason for hiding this comment

Uh oh!

vstinner commented Dec 3, 2018

Uh oh!

pitrou Dec 6, 2018

Choose a reason for hiding this comment

Uh oh!

pablogsal commented Dec 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pablogsal commented Dec 2, 2018 •

edited

Loading

tzickel commented Dec 3, 2018 •

edited

Loading

pablogsal commented Dec 3, 2018 •

edited

Loading

pablogsal commented Dec 3, 2018 •

edited

Loading

pablogsal Dec 3, 2018 •

edited

Loading