[FEATURE] Integrate oneDNN binary primitive support for forward add, subtract, multiply, divide.#20713
Conversation
|
Hey @agrabows , Thanks for submitting the PR
CI supported jobs: [centos-gpu, sanity, centos-cpu, windows-gpu, edge, website, miscellaneous, unix-cpu, clang, unix-gpu, windows-cpu] Note: |
3cec78d to
34ff9eb
Compare
34ff9eb to
1e757b1
Compare
1e757b1 to
5c961f0
Compare
5c961f0 to
64e929d
Compare
64e929d to
5c8bbf6
Compare
5c8bbf6 to
32c8c25
Compare
…Forward condition.
|
@mxnet-bot run ci [unix-cpu] |
|
Jenkins CI successfully triggered : [unix-cpu] |
|
@mxnet-bot run ci [windows-gpu] |
|
Jenkins CI successfully triggered : [windows-gpu] |
|
@mxnet-bot run ci [windows-gpu] |
|
Jenkins CI successfully triggered : [windows-gpu] |
|
@mxnet-bot run ci [windows-gpu] |
|
Jenkins CI successfully triggered : [windows-gpu] |
| static MX_THREAD_LOCAL binary_op_fwd_map fwds; | ||
| #endif | ||
| OpSignature key; | ||
| key.AddSign(static_cast<int>(alg)); |
There was a problem hiding this comment.
what about "attrs" in the key ?
I think we probably should add attrs to key or remove it form DNNLBinaryOpFwd constructor parameters
There was a problem hiding this comment.
attrs removed where it was possible
| bool dispatched = false; | ||
| if (!dispatched && common::ContainsOnlyStorage(*in_attrs, kDefaultStorage)) { | ||
| #if MXNET_USE_ONEDNN == 1 | ||
| if (dev_mask == mshadow::cpu::kDevMask) |
There was a problem hiding this comment.
disabling oneDNN in runtime is still working ?
| [[8, 1, 6, 1], [7, 1, 5]], [[5, 4], [1]], | ||
| [[256, 256, 3], [3]], [[5, 4], [4]], | ||
| [[15, 3, 5], [3, 5]], [[15, 3, 5], [1, 5]], | ||
| [[15, 3, 5], [3, 1]]]) |
There was a problem hiding this comment.
tests/python/unittest/test_numpy_op.py::test_np_binary_funcs
| op::mshadow_op::mixed_minus, | ||
| op::mshadow_op::mixed_rminus>) | ||
| #if MXNET_USE_ONEDNN == 1 | ||
| .set_attr<FComputeEx>("FComputeEx<cpu>", NumpyBinaryOperatorComputeExCPU<op::mshadow_op::minus>) |
There was a problem hiding this comment.
What about mixed version? is it work properly for GPU if oneDNN is enabled (default configuration). Could you check if there is any test for it ?
There was a problem hiding this comment.
OneDNN dispatch is only taken under consideration after dev_mask == mshadow::cpu::kDevMask condition is met, thus not affecting GPU workflow.
| namespace mxnet { | ||
| namespace op { | ||
|
|
||
| using binary_op_fwd_t = dnnl::binary; |
There was a problem hiding this comment.
binary_op_fwd_t => binary_fwd_t ?
| static MX_THREAD_LOCAL binary_op_fwd_map fwds; | ||
| #endif | ||
| OpSignature key; | ||
| key.AddSign(static_cast<int>(alg)); |
src/operator/nn/dnnl/dnnl_binary.cc
Outdated
| auto engine = mxnet::CpuEngine::Get()->get_engine(); | ||
| auto src0 = inputs[0].GetDNNLData(); | ||
| auto src1 = inputs[1].GetDNNLData(); | ||
| dnnl_output_t out_mem = CreateDNNLMem(outputs[0], fwd_pd->dst_desc(), req[0], &inputs[0]); |
There was a problem hiding this comment.
either inputs[0] or inputs[1] can be inplace - maybe it is worth checking which input is used as output when inplace
| [[8, 1, 6, 1], [7, 1, 5]], [[5, 4], [1]], | ||
| [[256, 256, 3], [3]], [[5, 4], [4]], | ||
| [[15, 3, 5], [3, 5]], [[15, 3, 5], [1, 5]], | ||
| [[15, 3, 5], [3, 1]]]) |
There was a problem hiding this comment.
please check if it works when rhs shape is longer than lhs e.g. [15,3] and [4, 15, 3]
|
@mxnet-bot run ci [windows-gpu] |
|
Jenkins CI successfully triggered : [windows-gpu] |
|
@mxnet-bot run ci [windows-gpu] |
|
Jenkins CI successfully triggered : [windows-gpu] |
|
@mxnet-bot run ci [windows-gpu] |
|
Jenkins CI successfully triggered : [windows-gpu] |
|
@mxnet-bot run ci [windows-gpu] |
|
Jenkins CI successfully triggered : [windows-gpu] |
szha
left a comment
There was a problem hiding this comment.
LGTM. Leaving it open for a bit for other reviewers to take a look at the revisions.
| auto ndim_1 = inputs[1].shape().ndim(); | ||
| return ndim_0 >= 1 && ndim_0 <= 6 && ndim_1 >= 1 && ndim_1 <= 6 && | ||
| inputs[0].shape().Size() != 0 && inputs[1].shape().Size() != 0 && | ||
| dtype == mshadow::kFloat32 && dtype == inputs[1].dtype(); |
There was a problem hiding this comment.
please check if oneDNN supports bfloat, if yes please create separate PR for it.
Description
Binary broadcast operators such as add, subtract, multiply, divide are implemented in both NDArray and NumPy modules and no oneDNN support exists for those operators. Goal of this task was to dispatch execution of those operators to oneDNN binary primitive.
Checklist
Essentials
Changes
Comments
Speedup for all cases noticed, up to ~350%.
