Skip to content

Conversation

@gpshead
Copy link
Member

@gpshead gpshead commented Nov 2, 2025

Summary

Fixes a deadlock in ProcessPoolExecutor when using max_tasks_per_child. The executor would hang indefinitely after a worker process exited upon reaching its task limit, preventing any remaining queued tasks from being processed.


Note: This (long) description was written by Claude Code (Sonnet 4.5) with minimal editing by me before I opened the Draft PR. I've reviewed the code which is derived from the existing PR and issue discussions and am mostly happy with it. There remains the valid question of "why does this semaphore even exist?" but I'd like to keep the potential removal of that separate from fixing this issue for simplicity's sake. -- @gpshead


The Bug

When a worker process exits after completing max_tasks_per_child tasks, the executor's idle worker accounting becomes incorrect due to stale semaphore values. The semaphore is incremented when tasks complete but isn't decremented when at max capacity. When _adjust_process_count() is called to replace the dead worker, it incorrectly believes an idle worker exists (due to the inflated semaphore) and fails to spawn a replacement, causing a deadlock.

Affected versions: Python 3.11, 3.12, 3.13, 3.14, 3.15
Bug introduced in: Python 3.11 (PR #27373 - adding the max_tasks_per_child feature)

Reproduction

from concurrent.futures import ProcessPoolExecutor

if __name__ == '__main__':
    with ProcessPoolExecutor(1, max_tasks_per_child=2) as exe:
        futs = [exe.submit(print, i) for i in range(10)]

Result: Prints 0 and 1, then hangs forever waiting for task 2.

The Fix

This PR implements a refined version of @tabrezm's proposed solution from the issue/PR discussion. The fix adds a process_died parameter to _adjust_process_count() to distinguish between two calling contexts:

  1. Process death (e.g., from max_tasks_per_child): Semaphore may be wrong → check actual process count first
  2. Normal task submission: Semaphore is accurate → check it first to preserve idle worker reuse optimization

Key insight: The process count (len(self._processes)) is the authoritative source of truth. The semaphore is a derived optimization hint that can become stale. When a process dies, we must trust the authoritative state.

This approach:

Comparison to PR #115642

The original PR #115642 attempted to fix semaphore accounting by replacing it with a lock-protected counter. However:

  • More complex (~30 lines changed, new lock primitive)
  • Production reports of new deadlocks with large worker pools (64 workers)
  • Adds lock contention

Our approach is surgical: fix only the specific case where the semaphore is wrong (process death), leave the normal path unchanged.

Testing

Added three comprehensive regression tests:

  • test_max_tasks_per_child_rapid_task_submission_gh115634: Core bug scenario (1 worker, 10 tasks, max_tasks=2)
  • test_max_tasks_per_child_multiple_worker_restarts_gh115634: Multiple restart cycles (15 tasks, max_tasks=3)
  • test_max_tasks_per_child_multiple_workers_gh115634: Multiple workers (2 workers, 12 tasks, max_tasks=2)

All tests include:

  • Proper resource cleanup via try/finally blocks
  • Descriptive assertion messages for debugging
  • Timeout protection with support.SHORT_TIMEOUT

Test results:

  • ✅ All 123 ProcessPoolExecutor tests pass
  • ✅ All new regression tests pass using relevant multiprocessing start methods.
  • ✅ Original bug reproduction fixed
  • ✅ No regressions in idle worker reuse tests

Files Changed

  • Lib/concurrent/futures/process.py (+19 lines): The fix
  • Lib/test/test_concurrent_futures/test_process_pool.py (+56 lines): Regression tests
  • Misc/NEWS.d/next/Library/2025-11-02-05-28-56.gh-issue-115634.JbcNnF.rst: NEWS entry

Related Issues

Acknowledgments

gpshead and others added 2 commits November 2, 2025 06:57
…child

Fix deadlock in ProcessPoolExecutor when using max_tasks_per_child.
When a worker process exited after completing max_tasks_per_child tasks,
the executor could incorrectly believe an idle worker existed due to
stale semaphore values, preventing a replacement worker from being
spawned and causing the executor to hang indefinitely.

The fix adds a process_died parameter to _adjust_process_count() to
distinguish between process death (where semaphore might be wrong) and
normal task submission (where semaphore is accurate). When a process
dies, we check the actual process count before consulting the semaphore,
ensuring a replacement worker is always spawned when needed.

This preserves the existing idle worker reuse optimization while fixing
the deadlock scenario reported in pythongh-115634.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Make the pythongh-115634 regression tests more robust:

1. Add try/finally blocks to ensure executor.shutdown() is always
   called, preventing resource leaks on test failures or timeouts

2. Replace loops with individual assertions with single list
   comparisons for clearer failure messages - when a test fails,
   the assertion will show exactly which results differ

3. Add descriptive assertion messages that explain the issue context
   without requiring reading the test code

These improvements ensure tests clean up properly even when run
against unpatched CPython where the deadlock would occur, and make
assertion failures more actionable for debugging.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@gpshead
Copy link
Member Author

gpshead commented Nov 2, 2025

TL;DR

The _idle_worker_semaphore in ProcessPoolExecutor has fundamentally broken accounting once the pool reaches max_workers - it tracks idle workers via increment (task completion) and decrement (acquire in _adjust_process_count()), but at capacity the decrement stops happening while increments continue, causing the value to drift upward and become meaningless. This broken accounting is the root cause of gh-115634: when a worker dies (from max_tasks_per_child), the inflated semaphore makes the executor think an idle worker exists when it doesn't, preventing a replacement from spawning and causing a deadlock. Our fix works around this by prioritizing len(self._processes) (authoritative) over the semaphore (broken) when handling process death, but doesn't fix the underlying problem. The best long-term solution is likely removing the semaphore entirely - the "optimization" it provides is negligible (we already check process count), and it's a recurring source of bugs.

This analysis was prepared, drafted, and posted by Claude Code (Sonnet 4.5) at the direct request of @gpshead.


Analysis: The _idle_worker_semaphore Problem

Executive Summary

The _idle_worker_semaphore in ProcessPoolExecutor has fundamentally broken accounting once the pool reaches max_workers. Our fix (gh-115634) works around this by prioritizing the authoritative process count over the semaphore when handling process death, but does not fix the underlying problem.

Key Finding: @gaogaotiantian (PR #115642 author) identified the core issue (source):

"_idle_worker_semaphore is meaningless as long as we reach the maximum number of workers because the math would be wrong since then."

What the Semaphore Was Designed For

Original Purpose (bpo-39207, PR #19453)

The semaphore was introduced in Python 3.9 to enable on-demand worker spawning - an optimization to avoid spawning all workers upfront.

Intended behavior:

  • Start with 0 workers
  • Spawn workers as tasks are submitted (up to max_workers)
  • Track idle workers to avoid spawning when one is available

The optimization goal: If you submit 1 task to a pool with max_workers=100, spawn only 1 worker, not all 100.

How It's Supposed to Work (Ideally)

Semaphore value = number of idle workers

1. Worker finishes task → release() → semaphore++
2. New task submitted → acquire(blocking=False)
   - If succeeds: idle worker available, use it (semaphore--)
   - If fails: no idle workers, spawn new one (if under max)

Why the Accounting Breaks

The Fatal Flaw

Problem: acquire(blocking=False) in _adjust_process_count() fails when semaphore value is 0, even though we're calling it to potentially spawn a new worker, not to consume an idle one.

What actually happens once at max_workers:

Initial state: 4 workers (max_workers=4), all busy, semaphore=0

Task 1 completes → release() → semaphore=1
Task 2 completes → release() → semaphore=2
Task 3 completes → release() → semaphore=3
Task 4 completes → release() → semaphore=4  ← All workers now idle

New task submitted:
  → _adjust_process_count() called
  → acquire(blocking=False) succeeds! (semaphore=3)
  → Returns without spawning
  → But we DON'T need to spawn - we have workers! ✓

More tasks submitted (workers processing):
  → _adjust_process_count() called repeatedly
  → acquire(blocking=False) keeps succeeding (semaphore > 0)
  → Never spawn (correct, we're at max)
  → But semaphore value becomes divorced from reality!

The accounting divergence:

  • Semaphore gets incremented every time a task completes
  • Semaphore gets decremented only when acquire() succeeds in _adjust_process_count()
  • Once at max_workers, _adjust_process_count() is called but doesn't mean a worker is becoming busy
  • Result: Semaphore value drifts upward, becoming meaningless

Why a Counter Would Work (Comparison)

Same scenario with a signed integer counter instead:

Initial state: 4 workers (max_workers=4), all busy, counter=0

Submit task while at capacity:
  → _adjust_process_count() called
  → counter-- → counter=-1  ← Can go negative!
  → Meaning: "1 task wants an idle worker, but none available"

Task completes:
  → counter++ → counter=0  ← Back to balanced

More tasks submitted while busy:
  → counter becomes -1, -2, -3... (tracks the "queue pressure")

Tasks complete:
  → counter increments back: -3 → -2 → -1 → 0
  → Always accurate: negative = tasks waiting for idle workers
                     positive = idle workers available
                     zero = balanced

Key difference: Counter tracks the net state (idle workers minus waiting tasks), which can be negative. Semaphore can only track positive values, so accounting breaks when the true state would be negative.

The Concrete Bug in gh-115634

State: 1 worker (max=1), max_tasks_per_child=2

Worker processes task 0 → release() → semaphore=1
Worker processes task 1 → exits (reached limit)

Process death handler:
  → _adjust_process_count() called
  → acquire(blocking=False) → succeeds! (semaphore was 1)
  → Thinks: "idle worker exists, don't spawn"
  → DEADLOCK: 0 workers, but code thinks there's 1 idle

Current Usage of the Semaphore

Where It's Used (3 locations)

  1. Line 713: Initialization

    self._idle_worker_semaphore = threading.Semaphore(0)
  2. Line 365: Task completion (release)

    if executor := self.executor_reference():
        if process_exited:
            executor._adjust_process_count(process_died=True)
        else:
            executor._idle_worker_semaphore.release()  # ← Increment
  3. Line 777: Checking for idle workers (acquire)

    # Normal path in _adjust_process_count()
    if self._idle_worker_semaphore.acquire(blocking=False):
        return  # ← Decrement if succeeds

When It's Consulted

Only in one place: _adjust_process_count() when called from submit() (and only for non-fork methods due to _safe_to_dynamically_spawn_children check).

Not consulted:

  • When process dies and needs replacement (our fix changes this)
  • In fork mode (disabled entirely)
  • Anywhere else in the codebase

What the Semaphore Actually Achieves

Under max_workers: ✓ Works as intended

  • Accurately tracks idle workers
  • Prevents unnecessary spawning
  • Enables on-demand worker creation

At max_workers: ✗ Fundamentally broken

Maintainer Discussion Insights

@pitrou's Analysis (CPython core dev)

(source):

"I think we should think more generally about what _idle_worker_semaphore is supposed to mean... if the process has exited, we should simply not acquire the semaphore as the process was not accounted as idle."

Key insight: The semantic confusion between "acquiring semaphore for spawn check" vs "worker becoming busy".

(source):

"_idle_worker_semaphore is not really used for synchronization since we never block on it. We merely use it as some kind of atomic integer."

@gaogaotiantian's Analysis (PR #115642 author)

(source):

"The math behind _idle_worker_semaphore is simply fragile and it only works when we have not reached max worker number."

(same source):

"_idle_worker_semaphore is meaningless as long as we reach the maximum number of workers because the math would be wrong... all the finished tasks will release the semaphore so all of them will increase the counter and the counter will just be something huge."

This is the key insight - the semaphore accounting is fundamentally incompatible with being at max capacity.

Proposed Solutions

1. PR #115642's Approach: Lock + Counter

Replace semaphore with _idle_worker_lock + _idle_worker_number (int).

Pros:

  • Counter can go negative (semaphore cannot)
    • Why negative values matter: They represent "idle worker debt" - the number of tasks that want workers beyond what's available
    • Example: 4 workers at max, all busy (counter=0). Submit 3 more tasks → counter becomes -3 ("need 3 more idle workers but at capacity")
    • When tasks complete: counter increments back toward 0, maintaining correct accounting
    • Semaphores can't go below 0, so they can't track this "debt", causing accounting to diverge
  • Math can be made correct
  • Fixes the accounting problem properly

Cons:

  • ~30 lines changed
  • Introduces new lock (contention concerns)
  • Production reports of new deadlocks with 64 workers (per @tabrezm, source)
  • More complex

2. @tabrezm's Approach (our fix): Prioritize Process Count

Check len(self._processes) before semaphore when process dies (source).

Pros:

  • ~20 lines changed
  • No new primitives
  • Simple and surgical
  • Production-tested by @tabrezm
  • Preserves existing optimization

Cons:

  • Doesn't fix the underlying accounting bug
  • Semaphore still meaningless at max_workers
  • "Papering over" the real problem

3. Remove the Semaphore Entirely

Always check len(self._processes) < max_workers to decide spawning.

Pros:

  • Simplest possible solution
  • No accounting bugs
  • Most obviously correct

Cons:

  • Removes the idle worker reuse optimization
  • Performance regression: Every submit checks process count
  • Would need to prove optimization wasn't valuable

4. Fix the Semaphore Accounting (Complex)

Make the semaphore actually track idle workers correctly.

Requires:

  • Decrement when worker becomes busy (how to detect this?)
  • Increment when worker becomes idle (already doing this)
  • Handle the "at max_workers" case differently
  • Complex state machine changes

Consensus: Probably not worth it given the alternatives.

What Actually Benefits from the Semaphore?

The Idle Worker Reuse Optimization

Scenario: Pool has 4 workers, all idle, 1 task submitted.

Without semaphore:

if len(self._processes) < self._max_workers:
    self._spawn_process()  # Wouldn't spawn (4 < 4 is False) ✓

Wait... the optimization already exists via process count check!

The Real Benefit: Avoiding Lock Contention?

Looking at the code, the semaphore avoids needing to:

  1. Lock self._processes
  2. Check len(self._processes)
  3. Unlock

Instead, it's a lock-free check via acquire(blocking=False).

But: We already access self._processes in the same function (line 780), so we're not avoiding that anyway!

Recommendation

Short-term: Keep our fix (prioritize process count)

  • Solves the immediate deadlock
  • Minimal risk
  • Production-tested
  • Can be backported safely

Medium-term: Remove the semaphore entirely

Rationale:

  1. The "optimization" it provides is negligible - we already check len(self._processes)
  2. The accounting is fundamentally broken at max_workers
  3. It's a source of bugs (ProcessPoolExecutor hangs when 1<max_tasks_per_child<num_submitted//max_workers #115634, potentially others)
  4. The code would be simpler and more obviously correct without it
  5. Performance impact: minimal (one extra dict length check per submit)

Proposed removal:

def _adjust_process_count(self):
    if self._processes is None:
        return

    # Simple and correct: just check if we need more workers
    if len(self._processes) < self._max_workers:
        self._spawn_process()

Remove:

  • self._idle_worker_semaphore initialization (line 713)
  • executor._idle_worker_semaphore.release() (line 365)
  • The semaphore check (line 777)

Result: ~15 lines deleted, simpler code, no accounting bugs.

Long-term: If optimization is really needed

Profile to prove the semaphore provides measurable benefit, then implement it correctly with:

  • Lock-protected counter (can go negative)
  • Clear semantics about when it's incremented/decremented
  • Comprehensive testing

But current evidence suggests the optimization is not worth the complexity.

Conclusion

The _idle_worker_semaphore was introduced with good intentions (on-demand spawning optimization) but suffers from a fundamental design flaw: its accounting breaks once the pool reaches max_workers, which is precisely when it would be most useful.

Our fix (gh-115634) works around this by not trusting the semaphore when replacing dead workers, but doesn't address the root cause. The semaphore remains broken at max capacity - we just avoid the most severe consequences.

The best long-term solution is likely to remove the semaphore entirely, as:

  1. The optimization it provides is minimal (if any)
  2. The accounting is unfixable without major surgery
  3. Simpler code is more maintainable
  4. It's a recurring source of bugs

References

Acknowledgments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant