[FEATURE] Enable dynamic linking with MKL and compiler based OpenMP#20474

akarbown · 2021-07-30T11:16:49Z

OneMKL 2021.3 fixed linking OpenMP while using SDL and MKL_THREADING_LAYER set to GNU.

Description

OneMKL 2021.3 fixes the issue described here. Thus, it enables linking with MKL dynamic libraries without having multiple OneMPs in a single process. It is possible due to linking MxNET with oneMKL Single Dynamic Library (SDL) and then setting the appropriate threading layer at run time in a function mkl_threading_layer() (or through environment variable MKL_THREADING_LAYER).

Connected with: [#19610], [#18255] and [#17794].

Changes

Add oneMKL 2021.3 to ubuntu docker images.
Enable MKL SDL (MKL_USE_SINGLE_DYNAMIC_LIBRARY) as the default linking when MKL version is grower than 2021.2 and static linking is turned off. (Bug no: MKLD-11109, OneMKL release notes) .
Otherwise, MKL static libraries are taken into account and used to build MxNET library.
Add support of the new oneMKL file structure in the FindBLAS.cmake file (fix comes from the cmake 3.20: #6210 ).

Comments

Does using oneMKL 2021.3 as the recommended one should be mentioned in the documentation?

mxnet-bot · 2021-07-30T11:16:54Z

Hey @akarbown , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [edge, sanity, miscellaneous, centos-cpu, windows-cpu, unix-cpu, unix-gpu, clang, website, centos-gpu, windows-gpu]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

leezu · 2021-07-30T17:19:29Z

src/initialize.cc

+  #if defined( __INTEL_LLVM_COMPILER)
+    mkl_set_threading_layer(MKL_THREADING_INTEL);
+  #else
+    mkl_set_threading_layer(MKL_THREADING_GNU);


Does this work with Windows? Intel developer's reference states "for GNU threading on Linux* operating system only" https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/support-functions/single-dynamic-library-control/mkl-set-threading-layer.html

Good point! I forgot about Windows! I'll exclude it in a moment.

akarbown · 2021-08-02T13:23:57Z

@mxnet-bot run ci[sanity]

mxnet-bot · 2021-08-02T13:24:04Z

Jenkins CI successfully triggered : [sanity]

akarbown · 2021-08-02T14:02:41Z

@mxnet-bot run ci [all]

akarbown · 2021-08-02T14:44:38Z

@mxnet-bot run ci[all]

akarbown · 2021-08-02T15:05:13Z

@leezu, could you help me with rerunning 'sanity' check in this PR? I've checked it locally and I don't see any issues. I suppose that this sanity check failed because of the timeout (SIGTERM). Is it possible?

leezu · 2021-08-02T20:54:58Z

@josephevans can we extend the max time for sanity? This PR triggers rebuild of the Docker used for Sanity, and apparently thus timeouts

akarbown · 2021-08-03T09:16:42Z

@mxnet-bot run ci [unix-cpu]

akarbown · 2021-08-05T20:22:42Z

@mxnet-bot run ci[website]

mxnet-bot · 2021-08-05T20:22:49Z

Jenkins CI successfully triggered : [website]

akarbown · 2021-08-05T21:07:34Z

@mxnet-bot run ci[clang, miscellaneous, unix-cpu, unix-gpu]

mxnet-bot · 2021-08-05T21:07:41Z

Jenkins CI successfully triggered : [unix-gpu, clang, miscellaneous, unix-cpu]

akarbown · 2021-08-06T06:10:37Z

@mxnet-bot run ci[unix-cpu]

mxnet-bot · 2021-08-06T06:10:43Z

Jenkins CI successfully triggered : [unix-cpu]

akarbown · 2021-08-06T08:20:15Z

@mxnet-bot run ci[unix-gpu, miscellaneous]

mxnet-bot · 2021-08-06T08:20:21Z

Jenkins CI successfully triggered : [unix-gpu, miscellaneous]

akarbown · 2021-08-06T11:07:43Z

@mxnet-bot run ci[miscellaneous]

mxnet-bot · 2021-08-06T11:07:48Z

Jenkins CI successfully triggered : [miscellaneous]

akarbown · 2021-08-06T13:41:47Z

@mxnet-bot run ci[miscellaneous]

mxnet-bot · 2021-08-06T13:41:51Z

Jenkins CI successfully triggered : [miscellaneous]

akarbown · 2021-08-09T09:01:14Z

@mxnet-bot run ci[miscellaneous]

mxnet-bot · 2021-08-09T09:01:21Z

Jenkins CI successfully triggered : [miscellaneous]

akarbown · 2021-08-09T15:59:54Z

@mxnet-bot run ci[unix-cpu, unix-gpu]

mxnet-bot · 2021-08-09T16:00:01Z

Jenkins CI successfully triggered : [unix-cpu, unix-gpu]

akarbown · 2021-08-09T18:22:01Z

@mxnet-bot run ci[unix-cpu]

mxnet-bot · 2021-08-09T18:22:05Z

Jenkins CI successfully triggered : [unix-cpu]

barry-jin · 2021-10-07T22:39:21Z

Now it seems the change seems to be tested and checked for MacOS and with MKL BLAS.
Do you think that leaving that new github action for MKL on MacOS make sense? If so, can it look as it is or change it somehow?

Yes, we need to keep this new github action as it will cover MKL related build and tests on MacOS.

Remark: I see that windows-gpu fails, but it's rather not connected with that change but maybe with the VS 2019 version 16.11 Release? As I see that for v16.8.1 (MSVC 19.28.29333.0) it passed without any issues, while for v16.11.4 (MSVC 19.29.30136.0) it fails. But I'm not 100% sure.

I have created a pull request #20647 to try to fix it.

leezu

Great, thank you! Two small questions

leezu · 2021-10-08T18:09:55Z

tools/staticbuild/build_lib.sh

      ..
 ninja
+if [[ ! $PLATFORM == 'darwin' ]] && [[ $BLAS == 'mkl' ]]; then
+    patchelf --set-rpath "/opt/intel/oneapi/mkl/${INTEL_MKL}/lib/intel64/:\$ORIGIN" --force-rpath libmxnet.so


Will oneapi/mkl always be in /opt/intel? Is there a reason for not asking users to fix their run-time search path environment variables instead (which ideally would automatically be set correctly upon installation of oneapi/mkl)?

> Will oneapi/mkl always be in /opt/intel?
I think that if the user do not define explicitly other location, the default location for oneMKL supposed to be /opt/intel/oneapi/ at least the way it was installed in the way as it is in the mkl.sh file.
> Is there a reason for not asking users to fix their run-time search path environment variables instead (which ideally would automatically be set correctly upon installation of oneapi/mkl)?
I've added it for the sake of the tests so that while running them (in the runtime) libmxnet.dylib could see the MKL libraries. It's not the best solution. Now, I think that maybe it would be better to source /opt/intel/oneapi/setvars.sh script just before the test execution (in the *.yml file). What do you think? Or did you have something else in mind?

I assume source /opt/intel/oneapi/setvars.sh is also what users would be expected to do if they install MKL on Mac? If so, I think that'll be more robust than hardcoding the rpath

Yes, sure. However, I've just realized that when linking with MKL static libraries there is no need to source /opt/intel/oneapi/setvars.sh. It's mandatory in case of linking with MKL dynamic libraries. Thanks for pointing that out!

leezu · 2021-10-08T18:11:47Z

cmake/upstream/FindBLAS.cmake

+      # MXNET NOTE: This change comes form the newest file version
+      # https://gitlab.kitware.com/cmake/cmake/-/issues/22295


Do you mean you backported this change to our file? Or is our file now up-to-date with upstream again? If it's the latter, you don't need to add the note here. If it's the former, why not update to the latest upstream instead of backporting?

Good point, I can remove that comment as this is in the upstream version.

barry-jin · 2021-10-08T21:54:38Z

@mxnet-bot run ci [windows-gpu]

mxnet-bot · 2021-10-08T21:54:41Z

Jenkins CI successfully triggered : [windows-gpu]

szha · 2021-10-10T01:47:46Z

@mxnet-bot run ci [windows-gpu]

mxnet-bot · 2021-10-10T01:47:52Z

Jenkins CI successfully triggered : [windows-gpu]

OneMKL 2021.3 fixed linking OpenMP while using SDL and MKL_THREADING_LAYER set to GNU.

This is a temporary change to check if adding MKL runtime support won't crash MacOS.

Turn off SDL for MKL on MacOS as it need fixes.

Add proper mkl_threading flags for Mac Os. Enable all tests that are for MacOS + MKL tests. Rebuild numpy with MKL BLAS (instead of OpenBLAS).

support.

akarbown · 2021-10-13T06:13:04Z

@mxnet-bot run ci[unix-cpu, unix-gpu, centos-gpu]

mxnet-bot · 2021-10-13T06:13:11Z

Jenkins CI successfully triggered : [unix-cpu, unix-gpu, centos-gpu]

akarbown · 2021-10-13T11:20:44Z

@mxnet-bot run ci[centos-gpu]

mxnet-bot · 2021-10-13T11:20:47Z

Jenkins CI successfully triggered : [centos-gpu]

mozga-intel

LGTM! Thanks!

leezu · 2021-10-13T15:26:03Z

Thank you @akarbown!

mseth10 added the pr-work-in-progress PR is still work in progress label Jul 30, 2021

akarbown changed the title ~~[FEATURE] Enables dynamic linking with MKL and compiler based OpenMP~~ [FEATURE] Enable dynamic linking with MKL and compiler based OpenMP Jul 30, 2021

leezu reviewed Jul 30, 2021

View reviewed changes

akarbown force-pushed the compiler-based-openmp2 branch from 56c1404 to b3c66c6 Compare July 30, 2021 19:56

leezu approved these changes Oct 8, 2021

View reviewed changes

leezu reviewed Oct 8, 2021

View reviewed changes

akarbown mentioned this pull request Oct 8, 2021

Fix the link for downloading MKL on MacOS. #20655

Open

akarbown added 16 commits October 12, 2021 22:03

[FEATURE] Enable dynamic linking with MKL and compiler based OpenMP

6042b6d

OneMKL 2021.3 fixed linking OpenMP while using SDL and MKL_THREADING_LAYER set to GNU.

Disabling SDL with MKL threading on Windows

7007130

Use multi-threading 'on' as the default option

72c060e

Sets the interface layer for Intel oneAPI MKL at runtime

e5dd2a7

Clean up the apt cache

0c5f9cc

Moving mkl runtime initialization to the function

e6f7641

Cleaning MKL find_path cmake directories

6f628a7

[WIP] Adding github runner for MAC OS to check MKL specific changes

ae6686b

This is a temporary change to check if adding MKL runtime support won't crash MacOS.

clang format + mkl workflow rename

59db10d

Fixing some formatting + installing patchelf

32aa4cf

setting up Mac OS rpath for MKL libraries

dde35e4

Run only mkl tests

4b80278

Fix for finding MKL libraries on MacOs by FindBLAS.cmake

30f90b5

Turn off SDL for MKL on MacOS as it need fixes.

Enable linking MxNET with MKL static libraries on MacOS

5195792

Add proper mkl_threading flags for Mac Os. Enable all tests that are for MacOS + MKL tests. Rebuild numpy with MKL BLAS (instead of OpenBLAS).

Excluding MKL bf16 tests as CI MacOs machines seems not to have avx512

ca1bcbe

support.

Remove forcing rpath and some unnecessary comments

3cd9340

mozga-intel approved these changes Oct 13, 2021

View reviewed changes

		# MXNET NOTE: This change comes form the newest file version
		# https://gitlab.kitware.com/cmake/cmake/-/issues/22295

Conversation

akarbown commented Jul 30, 2021

Description

Changes

Comments

Uh oh!

mxnet-bot commented Jul 30, 2021

Uh oh!

leezu Jul 30, 2021

Choose a reason for hiding this comment

Uh oh!

akarbown Jul 30, 2021

Choose a reason for hiding this comment

Uh oh!

akarbown commented Aug 2, 2021

Uh oh!

mxnet-bot commented Aug 2, 2021

Uh oh!

akarbown commented Aug 2, 2021

Uh oh!

akarbown commented Aug 2, 2021

Uh oh!

akarbown commented Aug 2, 2021

Uh oh!

leezu commented Aug 2, 2021

Uh oh!

akarbown commented Aug 3, 2021

Uh oh!

akarbown commented Aug 5, 2021

Uh oh!

mxnet-bot commented Aug 5, 2021

Uh oh!

akarbown commented Aug 5, 2021

Uh oh!

mxnet-bot commented Aug 5, 2021

Uh oh!

akarbown commented Aug 6, 2021

Uh oh!

mxnet-bot commented Aug 6, 2021

Uh oh!

akarbown commented Aug 6, 2021

Uh oh!

mxnet-bot commented Aug 6, 2021

Uh oh!

akarbown commented Aug 6, 2021

Uh oh!

mxnet-bot commented Aug 6, 2021

Uh oh!

akarbown commented Aug 6, 2021

Uh oh!

mxnet-bot commented Aug 6, 2021

Uh oh!

akarbown commented Aug 9, 2021

Uh oh!

mxnet-bot commented Aug 9, 2021

Uh oh!

akarbown commented Aug 9, 2021

Uh oh!

mxnet-bot commented Aug 9, 2021

Uh oh!

akarbown commented Aug 9, 2021

Uh oh!

mxnet-bot commented Aug 9, 2021

Uh oh!

barry-jin commented Oct 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leezu left a comment

Choose a reason for hiding this comment

Uh oh!

leezu Oct 8, 2021

Choose a reason for hiding this comment

Uh oh!

akarbown Oct 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leezu Oct 10, 2021

Choose a reason for hiding this comment

Uh oh!

barry-jin commented Oct 7, 2021 •

edited

Loading

akarbown Oct 8, 2021 •

edited

Loading

akarbown Oct 10, 2021 •

edited

Loading