[FEATURE] Enable dynamic linking with MKL and compiler based OpenMP#20474
[FEATURE] Enable dynamic linking with MKL and compiler based OpenMP#20474leezu merged 16 commits intoapache:masterfrom
Conversation
|
Hey @akarbown , Thanks for submitting the PR
CI supported jobs: [edge, sanity, miscellaneous, centos-cpu, windows-cpu, unix-cpu, unix-gpu, clang, website, centos-gpu, windows-gpu] Note: |
src/initialize.cc
Outdated
| #if defined( __INTEL_LLVM_COMPILER) | ||
| mkl_set_threading_layer(MKL_THREADING_INTEL); | ||
| #else | ||
| mkl_set_threading_layer(MKL_THREADING_GNU); |
There was a problem hiding this comment.
Does this work with Windows? Intel developer's reference states "for GNU threading on Linux* operating system only" https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/support-functions/single-dynamic-library-control/mkl-set-threading-layer.html
There was a problem hiding this comment.
Good point! I forgot about Windows! I'll exclude it in a moment.
56c1404 to
b3c66c6
Compare
|
@mxnet-bot run ci[sanity] |
|
Jenkins CI successfully triggered : [sanity] |
|
@mxnet-bot run ci [all] |
|
@mxnet-bot run ci[all] |
|
@leezu, could you help me with rerunning 'sanity' check in this PR? I've checked it locally and I don't see any issues. I suppose that this sanity check failed because of the timeout (SIGTERM). Is it possible? |
|
@josephevans can we extend the max time for sanity? This PR triggers rebuild of the Docker used for Sanity, and apparently thus timeouts |
|
@mxnet-bot run ci [unix-cpu] |
|
@mxnet-bot run ci[website] |
|
Jenkins CI successfully triggered : [website] |
|
@mxnet-bot run ci[clang, miscellaneous, unix-cpu, unix-gpu] |
|
Jenkins CI successfully triggered : [unix-gpu, clang, miscellaneous, unix-cpu] |
|
@mxnet-bot run ci[unix-cpu] |
|
Jenkins CI successfully triggered : [unix-cpu] |
|
@mxnet-bot run ci[unix-gpu, miscellaneous] |
|
Jenkins CI successfully triggered : [unix-gpu, miscellaneous] |
|
@mxnet-bot run ci[miscellaneous] |
|
Jenkins CI successfully triggered : [miscellaneous] |
|
@mxnet-bot run ci[miscellaneous] |
|
Jenkins CI successfully triggered : [miscellaneous] |
|
@mxnet-bot run ci[miscellaneous] |
|
Jenkins CI successfully triggered : [miscellaneous] |
|
@mxnet-bot run ci[unix-cpu, unix-gpu] |
|
Jenkins CI successfully triggered : [unix-cpu, unix-gpu] |
|
@mxnet-bot run ci[unix-cpu] |
|
Jenkins CI successfully triggered : [unix-cpu] |
Yes, we need to keep this new github action as it will cover MKL related build and tests on MacOS.
I have created a pull request #20647 to try to fix it. |
leezu
left a comment
There was a problem hiding this comment.
Great, thank you! Two small questions
tools/staticbuild/build_lib.sh
Outdated
| .. | ||
| ninja | ||
| if [[ ! $PLATFORM == 'darwin' ]] && [[ $BLAS == 'mkl' ]]; then | ||
| patchelf --set-rpath "/opt/intel/oneapi/mkl/${INTEL_MKL}/lib/intel64/:\$ORIGIN" --force-rpath libmxnet.so |
There was a problem hiding this comment.
Will oneapi/mkl always be in /opt/intel? Is there a reason for not asking users to fix their run-time search path environment variables instead (which ideally would automatically be set correctly upon installation of oneapi/mkl)?
There was a problem hiding this comment.
> Will oneapi/mkl always be in /opt/intel?
I think that if the user do not define explicitly other location, the default location for oneMKL supposed to be /opt/intel/oneapi/ at least the way it was installed in the way as it is in the mkl.sh file.
> Is there a reason for not asking users to fix their run-time search path environment variables instead (which ideally would automatically be set correctly upon installation of oneapi/mkl)?
I've added it for the sake of the tests so that while running them (in the runtime) libmxnet.dylib could see the MKL libraries. It's not the best solution. Now, I think that maybe it would be better to source /opt/intel/oneapi/setvars.sh script just before the test execution (in the *.yml file). What do you think? Or did you have something else in mind?
There was a problem hiding this comment.
I assume source /opt/intel/oneapi/setvars.sh is also what users would be expected to do if they install MKL on Mac? If so, I think that'll be more robust than hardcoding the rpath
There was a problem hiding this comment.
Yes, sure. However, I've just realized that when linking with MKL static libraries there is no need to source /opt/intel/oneapi/setvars.sh. It's mandatory in case of linking with MKL dynamic libraries. Thanks for pointing that out!
cmake/upstream/FindBLAS.cmake
Outdated
| # MXNET NOTE: This change comes form the newest file version | ||
| # https://gitlab.kitware.com/cmake/cmake/-/issues/22295 |
There was a problem hiding this comment.
Do you mean you backported this change to our file? Or is our file now up-to-date with upstream again? If it's the latter, you don't need to add the note here. If it's the former, why not update to the latest upstream instead of backporting?
There was a problem hiding this comment.
Good point, I can remove that comment as this is in the upstream version.
|
@mxnet-bot run ci [windows-gpu] |
|
Jenkins CI successfully triggered : [windows-gpu] |
|
@mxnet-bot run ci [windows-gpu] |
|
Jenkins CI successfully triggered : [windows-gpu] |
OneMKL 2021.3 fixed linking OpenMP while using SDL and MKL_THREADING_LAYER set to GNU.
This is a temporary change to check if adding MKL runtime support won't crash MacOS.
Turn off SDL for MKL on MacOS as it need fixes.
Add proper mkl_threading flags for Mac Os. Enable all tests that are for MacOS + MKL tests. Rebuild numpy with MKL BLAS (instead of OpenBLAS).
|
@mxnet-bot run ci[unix-cpu, unix-gpu, centos-gpu] |
|
Jenkins CI successfully triggered : [unix-cpu, unix-gpu, centos-gpu] |
|
@mxnet-bot run ci[centos-gpu] |
|
Jenkins CI successfully triggered : [centos-gpu] |
|
Thank you @akarbown! |
OneMKL 2021.3 fixed linking OpenMP while using SDL and MKL_THREADING_LAYER set to GNU.
Description
OneMKL 2021.3 fixes the issue described here. Thus, it enables linking with MKL dynamic libraries without having multiple OneMPs in a single process. It is possible due to linking MxNET with oneMKL Single Dynamic Library (SDL) and then setting the appropriate threading layer at run time in a function mkl_threading_layer() (or through environment variable MKL_THREADING_LAYER).
Connected with: [#19610], [#18255] and [#17794].
Changes
Comments
Does using oneMKL 2021.3 as the recommended one should be mentioned in the documentation?