[BUGFIX] Fix AmpCast for float16#19749

anko-intel · 2021-01-13T15:48:45Z

Description

OneDNN doesn't support float16 format, so fallback to standard
implementation is needed.
It fixes issue #19631.

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

mxnet-bot · 2021-01-13T15:48:47Z

Hey @anko-intel , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [clang, unix-cpu, unix-gpu, centos-gpu, windows-cpu, windows-gpu, miscellaneous, website, sanity, edge, centos-cpu]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

OneDNN doesn't support float16 format so fallback to standard implementation is needed. It fixes issue 19631.

anko-intel · 2021-01-18T08:56:49Z

@rongzha1 - could you review?

rongzha1 · 2021-01-18T15:14:50Z

src/operator/tensor/amp_cast.cc

-  mkldnn::memory::dims i_dims = mkldnn::memory::dims(i_ndim);
-  for (size_t i = 0; i < i_ndim; i++) {
-    i_dims[i] = static_cast<int>(data.shape()[i]);
+  if (data.dtype() != mshadow::kFloat16) {


shall we add isValidMKLDNNDataType() to check whether it is supported by mkldnn? mshadow has so many data types and some of them are not supported. https://github.com/apache/incubator-mxnet/blob/64f737cdd59fe88d2c5b479f25d011c5156b6a8a/3rdparty/mshadow/mshadow/base.h#L364:3

I considered that. If created isValidMKLDNNDataType() function could be used in many places like MKLDNNStorageType() for FInferStorageType it makes sense. But in this particular situation, amp_cast operator only accept 3 float types (see https://github.com/apache/incubator-mxnet/blob/v1.x/src/operator/tensor/amp_cast.h#L70 ) so I just excluded float16 as not supported in MKLDNN.

rongzha1 · 2021-01-19T01:22:42Z

src/operator/tensor/amp_cast.cc

-  mkldnn::memory::dims i_dims = mkldnn::memory::dims(i_ndim);
-  for (size_t i = 0; i < i_ndim; i++) {
-    i_dims[i] = static_cast<int>(data.shape()[i]);
+  if (data.dtype() != mshadow::kFloat16) {


anko-intel · 2021-01-25T08:48:59Z

@PatricZhao, @szha could you review and merge if everything is ok?

szha

Thanks for the fix! Could you add a test for verification?

anko-intel · 2021-01-27T10:43:33Z

Hi @szha,
Originally I thought that float16 was not intended to be passed to the amp_cast in CPU context and I treated this change as fix only for #19631.
Now I have enabled existing test for float32->float16 for amp_cast on CPU.

anko-intel · 2021-01-27T13:14:24Z

@mxnet-bot run ci [centos-cpu, unix-gpu]

mxnet-bot · 2021-01-27T13:14:31Z

Jenkins CI successfully triggered : [centos-cpu, unix-gpu]

* Fix AmpCast for float16 OneDNN doesn't support float16 format so fallback to standard implementation is needed. It fixes issue 19631. * Enable amp_cast test for float16 on CPU context

lanking520 added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Jan 13, 2021

Fix AmpCast for float16

aed0619

OneDNN doesn't support float16 format so fallback to standard implementation is needed. It fixes issue 19631.

anko-intel force-pushed the anko_up_1_x_19631_amp_cast branch from 4d22ab7 to aed0619 Compare January 13, 2021 16:12

lanking520 added pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review labels Jan 13, 2021

rongzha1 reviewed Jan 18, 2021

View reviewed changes

rongzha1 approved these changes Jan 19, 2021

View reviewed changes

szha reviewed Jan 26, 2021

View reviewed changes

Enable amp_cast test for float16 on CPU context

b691151

lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-review PR is waiting for code review labels Jan 27, 2021

lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 27, 2021

lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 27, 2021

lanking520 added the pr-awaiting-review PR is waiting for code review label Jan 27, 2021

szha merged commit 0a65920 into apache:v1.x Feb 5, 2021

anko-intel mentioned this pull request Mar 10, 2021

[BUGFIX] Fix AmpCast for float16 #20003

Merged

szha pushed a commit that referenced this pull request Mar 12, 2021

[BUGFIX] Fix AmpCast for float16 (#19749) (#20003)

ece9633

chinakook pushed a commit to chinakook/mxnet that referenced this pull request May 2, 2021

[BUGFIX] Fix AmpCast for float16 (apache#19749) (apache#20003)

fdd6133

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUGFIX] Fix AmpCast for float16#19749