-
Notifications
You must be signed in to change notification settings - Fork 1.7k
OpenBLAS 6 times slower than MKL on DGEMV() #532
Description
Small vector scenario.
26.7 seconds for OpenBLAS in Julia:
blas_set_num_threads(CPU_CORES)
const trans = 'N'
const a = ones((201, 150))
const x = ones(150)
@time for k=1:1000000; s = BLAS.gemv(trans, a, x); end
4.6 seconds for MKL in Python:
import numpy as np
from scipy.linalg.blas import dgemv
from timeit import default_timer as timer
alpha = 1.0
a = np.ones((201, 150), order='F')
x = np.ones(150)
start = timer()
for k in range(1000000):
s = dgemv(alpha, a, x)
exec_time=(timer() - start)
print
print("Execution took", str(round(exec_time, 3)), "seconds")
Large vector scenario.
15.7 seconds for OpenBLAS in Julia:
blas_set_num_threads(CPU_CORES)
const trans = 'N'
const a = ones((4, 100000))
const x = ones(100000)
@time for k=1:100000; s = BLAS.gemv(trans, a, x); end
7.9 seconds for MKL in Python:
import numpy as np
from scipy.linalg.blas import dgemv
from timeit import default_timer as timer
alpha = 1.0
a = np.ones((4, 100000), order='F')
x = np.ones(100000)
start = timer()
for k in range(100000):
s = dgemv(alpha, a, x)
exec_time=(timer() - start)
print
print("Execution took", str(round(exec_time, 3)), "seconds")
Tested environment is WinPython-64bit-3.4.3.2FlavorJulia at http://sourceforge.net/projects/winpython/files/WinPython_3.4/3.4.3.2/flavors/
The same Python time was measured in 64-bit Anaconda3 v2.1.0.
From versioninfo(true) in Julia:
Julia Version 0.3.7
System: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
I observed using the CPU meter (Task Manager) that OpenBLAS is single threaded and MKL uses 4 threads. I would predict from this that OpenBLAS would be 4 times slower than MKL, but for the small vector scenario, OpenBLAS is acutally about 6 times slower than MKL. Maybe an optimization for Haswell will help OpenBLAS match MKL's speed.
I haven't tested SGEMV(), but it may need to be parallelized too. DGEMV() and SGEMV() are commonly-used functions in DSP. These are important to allow me to move from Python to Julia.