Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[FEATURE] Integrate oneDNN binary primitive support for forward add, subtract, multiply, divide.#20713

Merged
bgawrych merged 7 commits intoapache:masterfrom
agrabows:master_1dnn_binary
Jan 18, 2022
Merged

[FEATURE] Integrate oneDNN binary primitive support for forward add, subtract, multiply, divide.#20713
bgawrych merged 7 commits intoapache:masterfrom
agrabows:master_1dnn_binary

Conversation

@agrabows
Copy link
Copy Markdown
Contributor

Description

Binary broadcast operators such as add, subtract, multiply, divide are implemented in both NDArray and NumPy modules and no oneDNN support exists for those operators. Goal of this task was to dispatch execution of those operators to oneDNN binary primitive.

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Merge execution of add, subtract, multiply, divide operators to one NNVM_REGISTER_OP() function
  • Implement oneDNN dispatch for binary broadcast operators

Comments

Speedup for all cases noticed, up to ~350%.
image

@mxnet-bot
Copy link
Copy Markdown

Hey @agrabows , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-gpu, sanity, centos-cpu, windows-gpu, edge, website, miscellaneous, unix-cpu, clang, unix-gpu, windows-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 28, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 16, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 16, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 16, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Nov 25, 2021
@mseth10 mseth10 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 25, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 25, 2021
@agrabows
Copy link
Copy Markdown
Contributor Author

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [unix-cpu]

@agrabows
Copy link
Copy Markdown
Contributor Author

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [windows-gpu]

@agrabows
Copy link
Copy Markdown
Contributor Author

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [windows-gpu]

@agrabows
Copy link
Copy Markdown
Contributor Author

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [windows-gpu]

static MX_THREAD_LOCAL binary_op_fwd_map fwds;
#endif
OpSignature key;
key.AddSign(static_cast<int>(alg));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about "attrs" in the key ?
I think we probably should add attrs to key or remove it form DNNLBinaryOpFwd constructor parameters

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to remove attrs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attrs removed where it was possible

bool dispatched = false;
if (!dispatched && common::ContainsOnlyStorage(*in_attrs, kDefaultStorage)) {
#if MXNET_USE_ONEDNN == 1
if (dev_mask == mshadow::cpu::kDevMask)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disabling oneDNN in runtime is still working ?

[[8, 1, 6, 1], [7, 1, 5]], [[5, 4], [1]],
[[256, 256, 3], [3]], [[5, 4], [4]],
[[15, 3, 5], [3, 5]], [[15, 3, 5], [1, 5]],
[[15, 3, 5], [3, 1]]])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/python/unittest/test_numpy_op.py::test_np_binary_funcs

op::mshadow_op::mixed_minus,
op::mshadow_op::mixed_rminus>)
#if MXNET_USE_ONEDNN == 1
.set_attr<FComputeEx>("FComputeEx<cpu>", NumpyBinaryOperatorComputeExCPU<op::mshadow_op::minus>)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about mixed version? is it work properly for GPU if oneDNN is enabled (default configuration). Could you check if there is any test for it ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OneDNN dispatch is only taken under consideration after dev_mask == mshadow::cpu::kDevMask condition is met, thus not affecting GPU workflow.

namespace mxnet {
namespace op {

using binary_op_fwd_t = dnnl::binary;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary_op_fwd_t => binary_fwd_t ?

static MX_THREAD_LOCAL binary_op_fwd_map fwds;
#endif
OpSignature key;
key.AddSign(static_cast<int>(alg));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to remove attrs

auto engine = mxnet::CpuEngine::Get()->get_engine();
auto src0 = inputs[0].GetDNNLData();
auto src1 = inputs[1].GetDNNLData();
dnnl_output_t out_mem = CreateDNNLMem(outputs[0], fwd_pd->dst_desc(), req[0], &inputs[0]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either inputs[0] or inputs[1] can be inplace - maybe it is worth checking which input is used as output when inplace

[[8, 1, 6, 1], [7, 1, 5]], [[5, 4], [1]],
[[256, 256, 3], [3]], [[5, 4], [4]],
[[15, 3, 5], [3, 5]], [[15, 3, 5], [1, 5]],
[[15, 3, 5], [3, 1]]])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check if it works when rhs shape is longer than lhs e.g. [15,3] and [4, 15, 3]

@agrabows
Copy link
Copy Markdown
Contributor Author

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [windows-gpu]

@agrabows
Copy link
Copy Markdown
Contributor Author

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [windows-gpu]

@agrabows
Copy link
Copy Markdown
Contributor Author

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [windows-gpu]

@agrabows
Copy link
Copy Markdown
Contributor Author

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [windows-gpu]

Copy link
Copy Markdown
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Leaving it open for a bit for other reviewers to take a look at the revisions.

auto ndim_1 = inputs[1].shape().ndim();
return ndim_0 >= 1 && ndim_0 <= 6 && ndim_1 >= 1 && ndim_1 <= 6 &&
inputs[0].shape().Size() != 0 && inputs[1].shape().Size() != 0 &&
dtype == mshadow::kFloat32 && dtype == inputs[1].dtype();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check if oneDNN supports bfloat, if yes please create separate PR for it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

pr-awaiting-review PR is waiting for code review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants