Skip to content

OpenBLAS 6 times slower than MKL on DGEMV() #532

@hiccup7

Description

@hiccup7

Small vector scenario.
26.7 seconds for OpenBLAS in Julia:

blas_set_num_threads(CPU_CORES)
const trans = 'N'
const a = ones((201, 150))
const x = ones(150)
@time for k=1:1000000; s = BLAS.gemv(trans, a, x); end

4.6 seconds for MKL in Python:

import numpy as np
from scipy.linalg.blas import dgemv
from timeit import default_timer as timer
alpha = 1.0
a = np.ones((201, 150), order='F')
x = np.ones(150)
start = timer()
for k in range(1000000):
    s = dgemv(alpha, a, x)
exec_time=(timer() - start)
print
print("Execution took", str(round(exec_time, 3)), "seconds")

Large vector scenario.
15.7 seconds for OpenBLAS in Julia:

blas_set_num_threads(CPU_CORES)
const trans = 'N'
const a = ones((4, 100000))
const x = ones(100000)
@time for k=1:100000; s = BLAS.gemv(trans, a, x); end

7.9 seconds for MKL in Python:

import numpy as np
from scipy.linalg.blas import dgemv
from timeit import default_timer as timer
alpha = 1.0
a = np.ones((4, 100000), order='F')
x = np.ones(100000)
start = timer()
for k in range(100000):
    s = dgemv(alpha, a, x)
exec_time=(timer() - start)
print
print("Execution took", str(round(exec_time, 3)), "seconds")

Tested environment is WinPython-64bit-3.4.3.2FlavorJulia at http://sourceforge.net/projects/winpython/files/WinPython_3.4/3.4.3.2/flavors/
The same Python time was measured in 64-bit Anaconda3 v2.1.0.

From versioninfo(true) in Julia:

Julia Version 0.3.7
System: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)

I observed using the CPU meter (Task Manager) that OpenBLAS is single threaded and MKL uses 4 threads. I would predict from this that OpenBLAS would be 4 times slower than MKL, but for the small vector scenario, OpenBLAS is acutally about 6 times slower than MKL. Maybe an optimization for Haswell will help OpenBLAS match MKL's speed.

I haven't tested SGEMV(), but it may need to be parallelized too. DGEMV() and SGEMV() are commonly-used functions in DSP. These are important to allow me to move from Python to Julia.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions