Skip to content

matmul operator @ can freeze / hang when used with default python multiprocessing using fork context instead of spawn #15973

@bicici

Description

@bicici

The freeze / hang can happen with large matrices and in parallel settings. For instance, sklearn/neural_network/_multilayer_perceptron.py is using safe_sparse_dot, which calls the matmul operator @ for ret = a @ b. Affected package:
sklearn/neural_network/_multilayer_perceptron.py
from ..utils.extmath import safe_sparse_dot
ret = a @ b

python also freeze with expressions like exp(3e400) for float('inf') with -Ofast in test_buffer.py and such freeze may be related to those types of operators together with -Ofast in cpython. Therefore, compiling with fewer optimization flags might also overcome the issue and prevent @ to freeze the program. The freeze occurs when matrices are about larger than 5000 x 100.

MKL inteloneapi 2021.1-beta05 is freezing. openblas is not freezing.

Test program:
``
import concurrent.futures
from numpy import random, matmul

def mmtest(X, i):
print ('matmul @ call', i)
y_hat = X @ X.T
print ('done', i)
return y_hat

def mmtest_matmul(X, i):
print ('matmul func call', i)
y_hat = matmul(X, X.T)
print ('done', i)
return y_hat

def f_mpmm(X):
executor = concurrent.futures.ProcessPoolExecutor(7)
futures = []
futures.append(executor.submit(mmtest, X, 0))
futures.append(executor.submit(mmtest, X, 1))
futures.append(executor.submit(mmtest, X, 2))
futures.append(executor.submit(mmtest, X, 3))
futures.append(executor.submit(mmtest_matmul, X, 4))
futures.append(executor.submit(mmtest_matmul, X, 5))
futures.append(executor.submit(mmtest_matmul, X, 6))
concurrent.futures.wait(futures)
executor.shutdown()

def f_mm(X):
mmtest(X, 0)
mmtest(X, 1)
mmtest(X, 2)
mmtest(X, 3)
mmtest_matmul(X, 4)
mmtest_matmul(X, 5)
mmtest_matmul(X, 6)

def test():
X = random.randn(5000, 100); y = random.randn(5000)
print ('testing serial')
f_mm(X)
print ('testing multiprocessing')
f_mpmm(X)

if name == 'main':
test()
``

Test output with numpy built with Intel MKL:
testing serial
matmul @ call 0
done 0
matmul @ call 1
done 1
matmul @ call 2
done 2
matmul @ call 3
done 3
matmul func call 4
done 4
matmul func call 5
done 5
matmul func call 6
done 6
testing multiprocessing
matmul @ call 0
matmul @ call 1
matmul @ call 2
matmul @ call 3
matmul func call 4
matmul func call 5
matmul func call 6

[frozen]

Test output with numpy built with openblas:
testing serial
matmul @ call 0
done 0
matmul @ call 1
done 1
matmul @ call 2
done 2
matmul @ call 3
done 3
matmul func call 4
done 4
matmul func call 5
done 5
matmul func call 6
done 6
testing multiprocessing
matmul @ call 0
matmul @ call 1
matmul @ call 2
matmul @ call 3
matmul func call 4
matmul func call 5
matmul func call 6
done 0
done 1
done 2
done 3
done 6
done 4
done 5

Related files:
sklearn/neural_network/_multilayer_perceptron.py
sklearn/utils/extmath.py

Related issues:
"parallel processes freezing when matrices are too big"
joblib/joblib#138
"matmul operator freeze within safe_sparse_dot and bug fix"
scikit-learn/scikit-learn#16919

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions