-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Description
The freeze / hang can happen with large matrices and in parallel settings. For instance, sklearn/neural_network/_multilayer_perceptron.py is using safe_sparse_dot, which calls the matmul operator @ for ret = a @ b. Affected package:
sklearn/neural_network/_multilayer_perceptron.py
from ..utils.extmath import safe_sparse_dot
ret = a @ b
python also freeze with expressions like exp(3e400) for float('inf') with -Ofast in test_buffer.py and such freeze may be related to those types of operators together with -Ofast in cpython. Therefore, compiling with fewer optimization flags might also overcome the issue and prevent @ to freeze the program. The freeze occurs when matrices are about larger than 5000 x 100.
MKL inteloneapi 2021.1-beta05 is freezing. openblas is not freezing.
Test program:
``
import concurrent.futures
from numpy import random, matmul
def mmtest(X, i):
print ('matmul @ call', i)
y_hat = X @ X.T
print ('done', i)
return y_hat
def mmtest_matmul(X, i):
print ('matmul func call', i)
y_hat = matmul(X, X.T)
print ('done', i)
return y_hat
def f_mpmm(X):
executor = concurrent.futures.ProcessPoolExecutor(7)
futures = []
futures.append(executor.submit(mmtest, X, 0))
futures.append(executor.submit(mmtest, X, 1))
futures.append(executor.submit(mmtest, X, 2))
futures.append(executor.submit(mmtest, X, 3))
futures.append(executor.submit(mmtest_matmul, X, 4))
futures.append(executor.submit(mmtest_matmul, X, 5))
futures.append(executor.submit(mmtest_matmul, X, 6))
concurrent.futures.wait(futures)
executor.shutdown()
def f_mm(X):
mmtest(X, 0)
mmtest(X, 1)
mmtest(X, 2)
mmtest(X, 3)
mmtest_matmul(X, 4)
mmtest_matmul(X, 5)
mmtest_matmul(X, 6)
def test():
X = random.randn(5000, 100); y = random.randn(5000)
print ('testing serial')
f_mm(X)
print ('testing multiprocessing')
f_mpmm(X)
if name == 'main':
test()
``
Test output with numpy built with Intel MKL:
testing serial
matmul @ call 0
done 0
matmul @ call 1
done 1
matmul @ call 2
done 2
matmul @ call 3
done 3
matmul func call 4
done 4
matmul func call 5
done 5
matmul func call 6
done 6
testing multiprocessing
matmul @ call 0
matmul @ call 1
matmul @ call 2
matmul @ call 3
matmul func call 4
matmul func call 5
matmul func call 6
[frozen]
Test output with numpy built with openblas:
testing serial
matmul @ call 0
done 0
matmul @ call 1
done 1
matmul @ call 2
done 2
matmul @ call 3
done 3
matmul func call 4
done 4
matmul func call 5
done 5
matmul func call 6
done 6
testing multiprocessing
matmul @ call 0
matmul @ call 1
matmul @ call 2
matmul @ call 3
matmul func call 4
matmul func call 5
matmul func call 6
done 0
done 1
done 2
done 3
done 6
done 4
done 5
Related files:
sklearn/neural_network/_multilayer_perceptron.py
sklearn/utils/extmath.py
Related issues:
"parallel processes freezing when matrices are too big"
joblib/joblib#138
"matmul operator freeze within safe_sparse_dot and bug fix"
scikit-learn/scikit-learn#16919