Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[1.8] Include oneDNN gemm fix#19251

Merged
szha merged 3 commits intoapache:v1.8.xfrom
leezu:2020-09/onednn-gemm
Oct 5, 2020
Merged

[1.8] Include oneDNN gemm fix#19251
szha merged 3 commits intoapache:v1.8.xfrom
leezu:2020-09/onednn-gemm

Conversation

@leezu
Copy link
Copy Markdown
Contributor

@leezu leezu commented Sep 29, 2020

@leezu leezu requested a review from samskalicky September 29, 2020 18:54
@mxnet-bot
Copy link
Copy Markdown

Hey @leezu , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-cpu, windows-gpu, windows-cpu, sanity, clang, unix-cpu, website, centos-gpu, edge, miscellaneous, unix-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@TaoLv
Copy link
Copy Markdown
Member

TaoLv commented Sep 30, 2020

Hi @leezu, are you trying to include a unreleased version of oneDNN to MXNet v1.8? Or you just want to verify the fix? We're asking oneDNN team for a v1.6.4 patch release.

@leezu
Copy link
Copy Markdown
Contributor Author

leezu commented Sep 30, 2020

Could you provide more details on what fixes are missing from uxlfoundation/oneDNN@5ce95ef that will go into 1.6.4?

I think it's up to @samskalicky as release manager to decide if he just includes uxlfoundation/oneDNN@5ce95ef or wait for 1.6.4 (or don't include the fix at all)

@samskalicky
Copy link
Copy Markdown
Contributor

samskalicky commented Sep 30, 2020

Other than waiting for 1.6.4 patch release, what are our options for avoiding the issue in 1.8? Will reverting any/all of #18822 #18867 #19161 help?

I dont see any issue with setting 3rdparty/mkldnn to a unreleased commit to fix this bug either.

Copy link
Copy Markdown
Contributor

@samskalicky samskalicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test now that is passing with this fix?

@leezu
Copy link
Copy Markdown
Contributor Author

leezu commented Sep 30, 2020

We don't have a test in 1.x branch. One can construct a test by first calling 'tests/python/unittest/test_contrib_intgemm.py::test_contrib_intgemm_multiply[api1-16-192-4]' followed by a different test calling a hybridized Dense layer and ensuring that the output doesn't contain NaN, as per #19185 (comment)

@samskalicky
Copy link
Copy Markdown
Contributor

We don't have a test in 1.x branch. One can construct a test by first calling 'tests/python/unittest/test_contrib_intgemm.py::test_contrib_intgemm_multiply[api1-16-192-4]' followed by a different test calling a hybridized Dense layer and ensuring that the output doesn't contain NaN, as per #19185 (comment)

Ok, np. did we verify the fix actually fixes the issue in 1.8.x manually?

@pioy
Copy link
Copy Markdown

pioy commented Sep 30, 2020

Let's see if #19260 results can show that the fix really fixes the issue.

Update: It seems, that the failing tests don't exist on 1.8, so the PR is irrelevant.

@samskalicky
Copy link
Copy Markdown
Contributor

Thanks @leezu do you plan to open another PR for this fix on v1.x as well?

@leezu
Copy link
Copy Markdown
Contributor Author

leezu commented Oct 2, 2020

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [unix-gpu]

@leezu leezu added the pr-awaiting-review PR is waiting for code review label Oct 2, 2020
@bartekkuncer
Copy link
Copy Markdown
Contributor

bartekkuncer commented Oct 2, 2020

@leezu The number of tags has changed so for the test to pass you have to change it from 205 to 219 in MemFormat test (103 line in mkldnn_test.cc file).
If that is not a problem I will create PR for v1.x branch.
PR for v1.x branch:
#19276

@leezu leezu requested a review from anirudh2290 as a code owner October 2, 2020 13:42
@samskalicky
Copy link
Copy Markdown
Contributor

@bartekkuncer @pioy looks like the CI is passing now. Is this PR ready to merge? Can you review/approve it?

@bartekkuncer
Copy link
Copy Markdown
Contributor

@bartekkuncer @pioy looks like the CI is passing now. Is this PR ready to merge? Can you review/approve it?

@samskalicky Change looks good now. Ready to be merged.

@szha szha merged commit 25bebdf into apache:v1.8.x Oct 5, 2020
@leezu leezu deleted the 2020-09/onednn-gemm branch October 5, 2020 15:09
@leezu
Copy link
Copy Markdown
Contributor Author

leezu commented Oct 5, 2020

This change is included in v1.x as part of #19276

@samskalicky
Copy link
Copy Markdown
Contributor

This change is included in v1.x as part of #19276

Thanks @leezu for the tracking!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

pr-awaiting-review PR is waiting for code review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants