gh-115634: Fix ProcessPoolExecutor deadlock with max_tasks_per_child #140900

gpshead · 2025-11-02T07:36:27Z

Summary

Fixes a deadlock in ProcessPoolExecutor when using max_tasks_per_child. The executor would hang indefinitely after a worker process exited upon reaching its task limit, preventing any remaining queued tasks from being processed.

Note: This (long) description was written by Claude Code (Sonnet 4.5) with minimal editing by me before I opened the Draft PR. I've reviewed the code which is derived from the existing PR and issue discussions and am mostly happy with it. There remains the valid question of "why does this semaphore even exist?" but I'd like to keep the potential removal of that separate from fixing this issue for simplicity's sake. -- @gpshead

The Bug

When a worker process exits after completing max_tasks_per_child tasks, the executor's idle worker accounting becomes incorrect due to stale semaphore values. The semaphore is incremented when tasks complete but isn't decremented when at max capacity. When _adjust_process_count() is called to replace the dead worker, it incorrectly believes an idle worker exists (due to the inflated semaphore) and fails to spawn a replacement, causing a deadlock.

Affected versions: Python 3.11, 3.12, 3.13, 3.14, 3.15
Bug introduced in: Python 3.11 (PR #27373 - adding the max_tasks_per_child feature)

Reproduction

from concurrent.futures import ProcessPoolExecutor

if __name__ == '__main__':
    with ProcessPoolExecutor(1, max_tasks_per_child=2) as exe:
        futs = [exe.submit(print, i) for i in range(10)]

Result: Prints 0 and 1, then hangs forever waiting for task 2.

The Fix

This PR implements a refined version of @tabrezm's proposed solution from the issue/PR discussion. The fix adds a process_died parameter to _adjust_process_count() to distinguish between two calling contexts:

Process death (e.g., from max_tasks_per_child): Semaphore may be wrong → check actual process count first
Normal task submission: Semaphore is accurate → check it first to preserve idle worker reuse optimization

Key insight: The process count (len(self._processes)) is the authoritative source of truth. The semaphore is a derived optimization hint that can become stale. When a process dies, we must trust the authoritative state.

This approach:

✅ Fixes the deadlock
✅ Preserves existing idle worker reuse optimization
✅ Minimal code change (~20 lines)
✅ No new synchronization primitives
✅ Simpler than the lock+counter approach in the original PR gh-115634: Force the process pool to adjust when a process worker exits #115642

Comparison to PR #115642

The original PR #115642 attempted to fix semaphore accounting by replacing it with a lock-protected counter. However:

More complex (~30 lines changed, new lock primitive)
Production reports of new deadlocks with large worker pools (64 workers)
Adds lock contention

Our approach is surgical: fix only the specific case where the semaphore is wrong (process death), leave the normal path unchanged.

Testing

Added three comprehensive regression tests:

test_max_tasks_per_child_rapid_task_submission_gh115634: Core bug scenario (1 worker, 10 tasks, max_tasks=2)
test_max_tasks_per_child_multiple_worker_restarts_gh115634: Multiple restart cycles (15 tasks, max_tasks=3)
test_max_tasks_per_child_multiple_workers_gh115634: Multiple workers (2 workers, 12 tasks, max_tasks=2)

All tests include:

Proper resource cleanup via try/finally blocks
Descriptive assertion messages for debugging
Timeout protection with support.SHORT_TIMEOUT

Test results:

✅ All 123 ProcessPoolExecutor tests pass
✅ All new regression tests pass using relevant multiprocessing start methods.
✅ Original bug reproduction fixed
✅ No regressions in idle worker reuse tests

Files Changed

Lib/concurrent/futures/process.py (+19 lines): The fix
Lib/test/test_concurrent_futures/test_process_pool.py (+56 lines): Regression tests
Misc/NEWS.d/next/Library/2025-11-02-05-28-56.gh-issue-115634.JbcNnF.rst: NEWS entry

Related Issues

Acknowledgments

Thanks to @tabrezm for the elegant fix idea and production testing
Thanks to @gaogaotiantian for the original PR gh-115634: Force the process pool to adjust when a process worker exits #115642 and extensive analysis
Thanks to all who reported and helped diagnose this issue

Issue: ProcessPoolExecutor hangs when 1<max_tasks_per_child<num_submitted//max_workers #115634

…child Fix deadlock in ProcessPoolExecutor when using max_tasks_per_child. When a worker process exited after completing max_tasks_per_child tasks, the executor could incorrectly believe an idle worker existed due to stale semaphore values, preventing a replacement worker from being spawned and causing the executor to hang indefinitely. The fix adds a process_died parameter to _adjust_process_count() to distinguish between process death (where semaphore might be wrong) and normal task submission (where semaphore is accurate). When a process dies, we check the actual process count before consulting the semaphore, ensuring a replacement worker is always spawned when needed. This preserves the existing idle worker reuse optimization while fixing the deadlock scenario reported in pythongh-115634. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Make the pythongh-115634 regression tests more robust: 1. Add try/finally blocks to ensure executor.shutdown() is always called, preventing resource leaks on test failures or timeouts 2. Replace loops with individual assertions with single list comparisons for clearer failure messages - when a test fails, the assertion will show exactly which results differ 3. Add descriptive assertion messages that explain the issue context without requiring reading the test code These improvements ensure tests clean up properly even when run against unpatched CPython where the deadlock would occur, and make assertion failures more actionable for debugging. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

gpshead · 2025-11-02T09:01:58Z

TL;DR

The _idle_worker_semaphore in ProcessPoolExecutor has fundamentally broken accounting once the pool reaches max_workers - it tracks idle workers via increment (task completion) and decrement (acquire in _adjust_process_count()), but at capacity the decrement stops happening while increments continue, causing the value to drift upward and become meaningless. This broken accounting is the root cause of gh-115634: when a worker dies (from max_tasks_per_child), the inflated semaphore makes the executor think an idle worker exists when it doesn't, preventing a replacement from spawning and causing a deadlock. Our fix works around this by prioritizing len(self._processes) (authoritative) over the semaphore (broken) when handling process death, but doesn't fix the underlying problem. The best long-term solution is likely removing the semaphore entirely - the "optimization" it provides is negligible (we already check process count), and it's a recurring source of bugs.

This analysis was prepared, drafted, and posted by Claude Code (Sonnet 4.5) at the direct request of @gpshead.

Analysis: The _idle_worker_semaphore Problem

Executive Summary

The _idle_worker_semaphore in ProcessPoolExecutor has fundamentally broken accounting once the pool reaches max_workers. Our fix (gh-115634) works around this by prioritizing the authoritative process count over the semaphore when handling process death, but does not fix the underlying problem.

Key Finding: @gaogaotiantian (PR #115642 author) identified the core issue (source):

"_idle_worker_semaphore is meaningless as long as we reach the maximum number of workers because the math would be wrong since then."

What the Semaphore Was Designed For

Original Purpose (bpo-39207, PR #19453)

The semaphore was introduced in Python 3.9 to enable on-demand worker spawning - an optimization to avoid spawning all workers upfront.

Intended behavior:

Start with 0 workers
Spawn workers as tasks are submitted (up to max_workers)
Track idle workers to avoid spawning when one is available

The optimization goal: If you submit 1 task to a pool with max_workers=100, spawn only 1 worker, not all 100.

How It's Supposed to Work (Ideally)

Semaphore value = number of idle workers

1. Worker finishes task → release() → semaphore++
2. New task submitted → acquire(blocking=False)
   - If succeeds: idle worker available, use it (semaphore--)
   - If fails: no idle workers, spawn new one (if under max)

Why the Accounting Breaks

The Fatal Flaw

Problem: acquire(blocking=False) in _adjust_process_count() fails when semaphore value is 0, even though we're calling it to potentially spawn a new worker, not to consume an idle one.

What actually happens once at max_workers:

Initial state: 4 workers (max_workers=4), all busy, semaphore=0

Task 1 completes → release() → semaphore=1
Task 2 completes → release() → semaphore=2
Task 3 completes → release() → semaphore=3
Task 4 completes → release() → semaphore=4  ← All workers now idle

New task submitted:
  → _adjust_process_count() called
  → acquire(blocking=False) succeeds! (semaphore=3)
  → Returns without spawning
  → But we DON'T need to spawn - we have workers! ✓

More tasks submitted (workers processing):
  → _adjust_process_count() called repeatedly
  → acquire(blocking=False) keeps succeeding (semaphore > 0)
  → Never spawn (correct, we're at max)
  → But semaphore value becomes divorced from reality!

The accounting divergence:

Semaphore gets incremented every time a task completes
Semaphore gets decremented only when acquire() succeeds in _adjust_process_count()
Once at max_workers, _adjust_process_count() is called but doesn't mean a worker is becoming busy
Result: Semaphore value drifts upward, becoming meaningless

Why a Counter Would Work (Comparison)

Same scenario with a signed integer counter instead:

Initial state: 4 workers (max_workers=4), all busy, counter=0

Submit task while at capacity:
  → _adjust_process_count() called
  → counter-- → counter=-1  ← Can go negative!
  → Meaning: "1 task wants an idle worker, but none available"

Task completes:
  → counter++ → counter=0  ← Back to balanced

More tasks submitted while busy:
  → counter becomes -1, -2, -3... (tracks the "queue pressure")

Tasks complete:
  → counter increments back: -3 → -2 → -1 → 0
  → Always accurate: negative = tasks waiting for idle workers
                     positive = idle workers available
                     zero = balanced

Key difference: Counter tracks the net state (idle workers minus waiting tasks), which can be negative. Semaphore can only track positive values, so accounting breaks when the true state would be negative.

The Concrete Bug in gh-115634

State: 1 worker (max=1), max_tasks_per_child=2

Worker processes task 0 → release() → semaphore=1
Worker processes task 1 → exits (reached limit)

Process death handler:
  → _adjust_process_count() called
  → acquire(blocking=False) → succeeds! (semaphore was 1)
  → Thinks: "idle worker exists, don't spawn"
  → DEADLOCK: 0 workers, but code thinks there's 1 idle

Current Usage of the Semaphore

Where It's Used (3 locations)

Line 713: Initialization

self._idle_worker_semaphore = threading.Semaphore(0)

Line 365: Task completion (release)

if executor := self.executor_reference():
    if process_exited:
        executor._adjust_process_count(process_died=True)
    else:
        executor._idle_worker_semaphore.release()  # ← Increment

Line 777: Checking for idle workers (acquire)

# Normal path in _adjust_process_count()
if self._idle_worker_semaphore.acquire(blocking=False):
    return  # ← Decrement if succeeds

When It's Consulted

Only in one place: _adjust_process_count() when called from submit() (and only for non-fork methods due to _safe_to_dynamically_spawn_children check).

Not consulted:

When process dies and needs replacement (our fix changes this)
In fork mode (disabled entirely)
Anywhere else in the codebase

What the Semaphore Actually Achieves

Under max_workers: ✓ Works as intended

Accurately tracks idle workers
Prevents unnecessary spawning
Enables on-demand worker creation

At max_workers: ✗ Fundamentally broken

Value becomes inflated and meaningless
Can't go negative (it's a semaphore!)
Creates bugs like ProcessPoolExecutor hangs when 1<max_tasks_per_child<num_submitted//max_workers #115634

Maintainer Discussion Insights

@pitrou's Analysis (CPython core dev)

(source):

"I think we should think more generally about what _idle_worker_semaphore is supposed to mean... if the process has exited, we should simply not acquire the semaphore as the process was not accounted as idle."

Key insight: The semantic confusion between "acquiring semaphore for spawn check" vs "worker becoming busy".

(source):

"_idle_worker_semaphore is not really used for synchronization since we never block on it. We merely use it as some kind of atomic integer."

@gaogaotiantian's Analysis (PR #115642 author)

(source):

"The math behind _idle_worker_semaphore is simply fragile and it only works when we have not reached max worker number."

(same source):

"_idle_worker_semaphore is meaningless as long as we reach the maximum number of workers because the math would be wrong... all the finished tasks will release the semaphore so all of them will increase the counter and the counter will just be something huge."

This is the key insight - the semaphore accounting is fundamentally incompatible with being at max capacity.

Proposed Solutions

1. PR #115642's Approach: Lock + Counter

Replace semaphore with _idle_worker_lock + _idle_worker_number (int).

Pros:

Counter can go negative (semaphore cannot)
- Why negative values matter: They represent "idle worker debt" - the number of tasks that want workers beyond what's available
- Example: 4 workers at max, all busy (counter=0). Submit 3 more tasks → counter becomes -3 ("need 3 more idle workers but at capacity")
- When tasks complete: counter increments back toward 0, maintaining correct accounting
- Semaphores can't go below 0, so they can't track this "debt", causing accounting to diverge
Math can be made correct
Fixes the accounting problem properly

Cons:

~30 lines changed
Introduces new lock (contention concerns)
Production reports of new deadlocks with 64 workers (per @tabrezm, source)
More complex

2. @tabrezm's Approach (our fix): Prioritize Process Count

Check len(self._processes) before semaphore when process dies (source).

Pros:

~20 lines changed
No new primitives
Simple and surgical
Production-tested by @tabrezm
Preserves existing optimization

Cons:

Doesn't fix the underlying accounting bug
Semaphore still meaningless at max_workers
"Papering over" the real problem

3. Remove the Semaphore Entirely

Always check len(self._processes) < max_workers to decide spawning.

Pros:

Simplest possible solution
No accounting bugs
Most obviously correct

Cons:

Removes the idle worker reuse optimization
Performance regression: Every submit checks process count
Would need to prove optimization wasn't valuable

4. Fix the Semaphore Accounting (Complex)

Make the semaphore actually track idle workers correctly.

Requires:

Decrement when worker becomes busy (how to detect this?)
Increment when worker becomes idle (already doing this)
Handle the "at max_workers" case differently
Complex state machine changes

Consensus: Probably not worth it given the alternatives.

What Actually Benefits from the Semaphore?

The Idle Worker Reuse Optimization

Scenario: Pool has 4 workers, all idle, 1 task submitted.

Without semaphore:

if len(self._processes) < self._max_workers:
    self._spawn_process()  # Wouldn't spawn (4 < 4 is False) ✓

Wait... the optimization already exists via process count check!

The Real Benefit: Avoiding Lock Contention?

Looking at the code, the semaphore avoids needing to:

Lock self._processes
Check len(self._processes)
Unlock

Instead, it's a lock-free check via acquire(blocking=False).

But: We already access self._processes in the same function (line 780), so we're not avoiding that anyway!

Recommendation

Short-term: Keep our fix (prioritize process count)

Solves the immediate deadlock
Minimal risk
Production-tested
Can be backported safely

Medium-term: Remove the semaphore entirely

Rationale:

The "optimization" it provides is negligible - we already check len(self._processes)
The accounting is fundamentally broken at max_workers
It's a source of bugs (ProcessPoolExecutor hangs when 1<max_tasks_per_child<num_submitted//max_workers #115634, potentially others)
The code would be simpler and more obviously correct without it
Performance impact: minimal (one extra dict length check per submit)

Proposed removal:

def _adjust_process_count(self):
    if self._processes is None:
        return

    # Simple and correct: just check if we need more workers
    if len(self._processes) < self._max_workers:
        self._spawn_process()

Remove:

self._idle_worker_semaphore initialization (line 713)
executor._idle_worker_semaphore.release() (line 365)
The semaphore check (line 777)

Result: ~15 lines deleted, simpler code, no accounting bugs.

Long-term: If optimization is really needed

Profile to prove the semaphore provides measurable benefit, then implement it correctly with:

Lock-protected counter (can go negative)
Clear semantics about when it's incremented/decremented
Comprehensive testing

But current evidence suggests the optimization is not worth the complexity.

Conclusion

The _idle_worker_semaphore was introduced with good intentions (on-demand spawning optimization) but suffers from a fundamental design flaw: its accounting breaks once the pool reaches max_workers, which is precisely when it would be most useful.

Our fix (gh-115634) works around this by not trusting the semaphore when replacing dead workers, but doesn't address the root cause. The semaphore remains broken at max capacity - we just avoid the most severe consequences.

The best long-term solution is likely to remove the semaphore entirely, as:

The optimization it provides is minimal (if any)
The accounting is unfixable without major surgery
Simpler code is more maintainable
It's a recurring source of bugs

References

Issue: gh-115634
Original PR (unmerged): #115642
Our fix: Branch fix-gh-115634-process-pool-hang
Semaphore introduction: PR #19453 (bpo-39207)
max_tasks_per_child introduction: PR #27373
Related deadlock issue: #90622 (fork-related)

Acknowledgments

@gaogaotiantian for the deep analysis in PR gh-115634: Force the process pool to adjust when a process worker exits #115642
@pitrou for the code review insights
@tabrezm for the production-tested simpler fix

gpshead and others added 2 commits November 2, 2025 06:57

bedevere-app bot mentioned this pull request Nov 2, 2025

ProcessPoolExecutor hangs when 1<max_tasks_per_child<num_submitted//max_workers #115634

Open

gpshead mentioned this pull request Nov 2, 2025

gh-115634: Force the process pool to adjust when a process worker exits #115642

Open

Mr0grog mentioned this pull request Dec 12, 2025

Add options for max diffs before restarting process edgi-govdata-archiving/web-monitoring-diff#202

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-115634: Fix ProcessPoolExecutor deadlock with max_tasks_per_child #140900

gh-115634: Fix ProcessPoolExecutor deadlock with max_tasks_per_child #140900

Uh oh!

gpshead commented Nov 2, 2025 •

edited

Loading

Uh oh!

gpshead commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

gh-115634: Fix ProcessPoolExecutor deadlock with max_tasks_per_child #140900

Are you sure you want to change the base?

gh-115634: Fix ProcessPoolExecutor deadlock with max_tasks_per_child #140900

Uh oh!

Conversation

gpshead commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The Bug

Reproduction

The Fix

Comparison to PR #115642

Testing

Files Changed

Related Issues

Acknowledgments

Uh oh!

gpshead commented Nov 2, 2025

TL;DR

Analysis: The _idle_worker_semaphore Problem

Executive Summary

What the Semaphore Was Designed For

Original Purpose (bpo-39207, PR #19453)

How It's Supposed to Work (Ideally)

Why the Accounting Breaks

The Fatal Flaw

Why a Counter Would Work (Comparison)

The Concrete Bug in gh-115634

Current Usage of the Semaphore

Where It's Used (3 locations)

When It's Consulted

What the Semaphore Actually Achieves

Under max_workers: ✓ Works as intended

At max_workers: ✗ Fundamentally broken

Maintainer Discussion Insights

@pitrou's Analysis (CPython core dev)

@gaogaotiantian's Analysis (PR #115642 author)

Proposed Solutions

1. PR #115642's Approach: Lock + Counter

2. @tabrezm's Approach (our fix): Prioritize Process Count

3. Remove the Semaphore Entirely

4. Fix the Semaphore Accounting (Complex)

What Actually Benefits from the Semaphore?

The Idle Worker Reuse Optimization

The Real Benefit: Avoiding Lock Contention?

Recommendation

Short-term: Keep our fix (prioritize process count)

Medium-term: Remove the semaphore entirely

Long-term: If optimization is really needed

Conclusion

References

Acknowledgments

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gpshead commented Nov 2, 2025 •

edited

Loading