Skip to content

Optimize gemv for small M, large N only if it can be done in a threadsafe manner#1865

Merged
martin-frbg merged 2 commits intoOpenMathLib:developfrom
martin-frbg:issue1844
Nov 12, 2018
Merged

Optimize gemv for small M, large N only if it can be done in a threadsafe manner#1865
martin-frbg merged 2 commits intoOpenMathLib:developfrom
martin-frbg:issue1844

Conversation

@martin-frbg
Copy link
Copy Markdown
Collaborator

commit 8e5a108 tried to improve the performance of #532 but introduced a static array in the process, breaking programs like numpy that call into OpenBLAS from several concurrent threads. Supersedes the simple revert from #1852 with the intention to fix #1844 without sacrificing performance where possible.

…safe

otherwise the introduction of a static array in 8e5a108 to improve #532 breaks concurrent calls from multiple threads as seen in #1844
@brada4
Copy link
Copy Markdown
Contributor

brada4 commented Nov 10, 2018

Looks like includes the observations from mentioned issue.

@martin-frbg
Copy link
Copy Markdown
Collaborator Author

Sorry, I did not get the comment ? (Unfortunately I cannot test this on my hexacore system this weekend, but I had tested the general idea outlined in the comments on #1852 earlier)

@brada4
Copy link
Copy Markdown
Contributor

brada4 commented Nov 10, 2018

It looks that it enables gemv optimisation for OMP only.
Does not fail with Android UserLAnd (sort of debian chroot, atom CPU) in OMP or pthread builds, though I cannot tell if it ever failed before in this combination.
EDIT: it fails in pre=patches pthread version and ubuntu numpy, though multiprocessorness of this "debian" is not always guaranteed, you somehow need to spin many bg apps to make cpus wake from park. Does not seem possible from "app" I'd wait for more reliable tests...

@martin-frbg
Copy link
Copy Markdown
Collaborator Author

The intention is that it enables the optimization only when the static buffer can be made thread safe, that is either under OMP or when the compiler is capable of assigning C11-like TLS.

@brada4
Copy link
Copy Markdown
Contributor

brada4 commented Nov 11, 2018

Patches do the trick most likely just that android tablet chroot is not always multiprocessor like a normal computer....

@martin-frbg
Copy link
Copy Markdown
Collaborator Author

Merged after successful local tests

@mattip
Copy link
Copy Markdown
Contributor

mattip commented Nov 12, 2018

Thanks. Since this is a critical bug for NumPy, will it be part of a bugfix release?

@martin-frbg martin-frbg added this to the 0.3.4 milestone Nov 12, 2018
@martin-frbg
Copy link
Copy Markdown
Collaborator Author

Guess 0.3.4 is overdue in any case, will see if I can do it next weekend or at least by the end of this month. Need to try to get a decent handle on the equally ugly #1851 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenBLAS threadsafety issues for downstream libraries (NumPy)

3 participants