Skip to content

ExecutorFactory is not compatible with fork multiprocessing #2545

@iatkinson

Description

@iatkinson

Apache Iceberg version

0.10.0 (latest release)

Please describe the bug 🐞

When using pyiceberg in multiprocessing code the ExecutorFactory can get into an invalid state when forking a new process. Simple example demonstrating:

import multiprocessing
from concurrent.futures import ProcessPoolExecutor

from pyiceberg.utils.concurrent import ExecutorFactory


def use_executor_to_return(value):
    executor = ExecutorFactory.get_or_create()
    future = executor.submit(lambda: value)
    return future.result()

if __name__ == "__main__":

    multiprocessing.set_start_method("fork", force=True)

    main_value = use_executor_to_return(10)

    with ProcessPoolExecutor() as process_executor:
        # Hangs since fork + singleton executor does not work.
        future = process_executor.submit(use_executor_to_return, 20)

    # Never reached.
    assert main_value == 10
    assert future.result() == 20

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions