Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[FEATURE] Enable dynamic linking with MKL and compiler based OpenMP#20474

Merged
leezu merged 16 commits intoapache:masterfrom
akarbown:compiler-based-openmp2
Oct 13, 2021
Merged

[FEATURE] Enable dynamic linking with MKL and compiler based OpenMP#20474
leezu merged 16 commits intoapache:masterfrom
akarbown:compiler-based-openmp2

Conversation

@akarbown
Copy link
Copy Markdown
Contributor

OneMKL 2021.3 fixed linking OpenMP while using SDL and MKL_THREADING_LAYER set to GNU.

Description

OneMKL 2021.3 fixes the issue described here. Thus, it enables linking with MKL dynamic libraries without having multiple OneMPs in a single process. It is possible due to linking MxNET with oneMKL Single Dynamic Library (SDL) and then setting the appropriate threading layer at run time in a function mkl_threading_layer() (or through environment variable MKL_THREADING_LAYER).

Connected with: [#19610], [#18255] and [#17794].

Changes

  1. Add oneMKL 2021.3 to ubuntu docker images.
  2. Enable MKL SDL (MKL_USE_SINGLE_DYNAMIC_LIBRARY) as the default linking when MKL version is grower than 2021.2 and static linking is turned off. (Bug no: MKLD-11109, OneMKL release notes) .
  3. Otherwise, MKL static libraries are taken into account and used to build MxNET library.
  4. Add support of the new oneMKL file structure in the FindBLAS.cmake file (fix comes from the cmake 3.20: #6210 ).

Comments

Does using oneMKL 2021.3 as the recommended one should be mentioned in the documentation?

@mxnet-bot
Copy link
Copy Markdown

Hey @akarbown , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [edge, sanity, miscellaneous, centos-cpu, windows-cpu, unix-cpu, unix-gpu, clang, website, centos-gpu, windows-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@mseth10 mseth10 added the pr-work-in-progress PR is still work in progress label Jul 30, 2021
@akarbown akarbown changed the title [FEATURE] Enables dynamic linking with MKL and compiler based OpenMP [FEATURE] Enable dynamic linking with MKL and compiler based OpenMP Jul 30, 2021
#if defined( __INTEL_LLVM_COMPILER)
mkl_set_threading_layer(MKL_THREADING_INTEL);
#else
mkl_set_threading_layer(MKL_THREADING_GNU);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with Windows? Intel developer's reference states "for GNU threading on Linux* operating system only" https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/support-functions/single-dynamic-library-control/mkl-set-threading-layer.html

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I forgot about Windows! I'll exclude it in a moment.

@akarbown akarbown force-pushed the compiler-based-openmp2 branch from 56c1404 to b3c66c6 Compare July 30, 2021 19:56
@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 2, 2021

@mxnet-bot run ci[sanity]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [sanity]

@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 2, 2021

@mxnet-bot run ci [all]

@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 2, 2021

@mxnet-bot run ci[all]

@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 2, 2021

@leezu, could you help me with rerunning 'sanity' check in this PR? I've checked it locally and I don't see any issues. I suppose that this sanity check failed because of the timeout (SIGTERM). Is it possible?

@leezu
Copy link
Copy Markdown
Contributor

leezu commented Aug 2, 2021

@josephevans can we extend the max time for sanity? This PR triggers rebuild of the Docker used for Sanity, and apparently thus timeouts

@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 3, 2021

@mxnet-bot run ci [unix-cpu]

@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 5, 2021

@mxnet-bot run ci[website]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [website]

@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 5, 2021

@mxnet-bot run ci[clang, miscellaneous, unix-cpu, unix-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [unix-gpu, clang, miscellaneous, unix-cpu]

@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 6, 2021

@mxnet-bot run ci[unix-cpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [unix-cpu]

@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 6, 2021

@mxnet-bot run ci[unix-gpu, miscellaneous]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [unix-gpu, miscellaneous]

@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 6, 2021

@mxnet-bot run ci[miscellaneous]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [miscellaneous]

@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 6, 2021

@mxnet-bot run ci[miscellaneous]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [miscellaneous]

@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 9, 2021

@mxnet-bot run ci[miscellaneous]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [miscellaneous]

@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 9, 2021

@mxnet-bot run ci[unix-cpu, unix-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [unix-cpu, unix-gpu]

@akarbown
Copy link
Copy Markdown
Contributor Author

akarbown commented Aug 9, 2021

@mxnet-bot run ci[unix-cpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [unix-cpu]

@barry-jin
Copy link
Copy Markdown
Contributor

barry-jin commented Oct 7, 2021

Now it seems the change seems to be tested and checked for MacOS and with MKL BLAS.
Do you think that leaving that new github action for MKL on MacOS make sense? If so, can it look as it is or change it somehow?

Yes, we need to keep this new github action as it will cover MKL related build and tests on MacOS.

Remark: I see that windows-gpu fails, but it's rather not connected with that change but maybe with the VS 2019 version 16.11 Release? As I see that for v16.8.1 (MSVC 19.28.29333.0) it passed without any issues, while for v16.11.4 (MSVC 19.29.30136.0) it fails. But I'm not 100% sure.

I have created a pull request #20647 to try to fix it.

Copy link
Copy Markdown
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you! Two small questions

..
ninja
if [[ ! $PLATFORM == 'darwin' ]] && [[ $BLAS == 'mkl' ]]; then
patchelf --set-rpath "/opt/intel/oneapi/mkl/${INTEL_MKL}/lib/intel64/:\$ORIGIN" --force-rpath libmxnet.so
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will oneapi/mkl always be in /opt/intel? Is there a reason for not asking users to fix their run-time search path environment variables instead (which ideally would automatically be set correctly upon installation of oneapi/mkl)?

Copy link
Copy Markdown
Contributor Author

@akarbown akarbown Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> Will oneapi/mkl always be in /opt/intel?
I think that if the user do not define explicitly other location, the default location for oneMKL supposed to be /opt/intel/oneapi/ at least the way it was installed in the way as it is in the mkl.sh file.
> Is there a reason for not asking users to fix their run-time search path environment variables instead (which ideally would automatically be set correctly upon installation of oneapi/mkl)?
I've added it for the sake of the tests so that while running them (in the runtime) libmxnet.dylib could see the MKL libraries. It's not the best solution. Now, I think that maybe it would be better to source /opt/intel/oneapi/setvars.sh script just before the test execution (in the *.yml file). What do you think? Or did you have something else in mind?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume source /opt/intel/oneapi/setvars.sh is also what users would be expected to do if they install MKL on Mac? If so, I think that'll be more robust than hardcoding the rpath

Copy link
Copy Markdown
Contributor Author

@akarbown akarbown Oct 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure. However, I've just realized that when linking with MKL static libraries there is no need to source /opt/intel/oneapi/setvars.sh. It's mandatory in case of linking with MKL dynamic libraries. Thanks for pointing that out!

Comment on lines +528 to +529
# MXNET NOTE: This change comes form the newest file version
# https://gitlab.kitware.com/cmake/cmake/-/issues/22295
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean you backported this change to our file? Or is our file now up-to-date with upstream again? If it's the latter, you don't need to add the note here. If it's the former, why not update to the latest upstream instead of backporting?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I can remove that comment as this is in the upstream version.

@barry-jin
Copy link
Copy Markdown
Contributor

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [windows-gpu]

@szha
Copy link
Copy Markdown
Member

szha commented Oct 10, 2021

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [windows-gpu]

@akarbown
Copy link
Copy Markdown
Contributor Author

@mxnet-bot run ci[unix-cpu, unix-gpu, centos-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [unix-cpu, unix-gpu, centos-gpu]

@akarbown
Copy link
Copy Markdown
Contributor Author

@mxnet-bot run ci[centos-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [centos-gpu]

Copy link
Copy Markdown
Contributor

@mozga-intel mozga-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@leezu
Copy link
Copy Markdown
Contributor

leezu commented Oct 13, 2021

Thank you @akarbown!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

pr-awaiting-review PR is waiting for code review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants