Back port optimization to broadcast_axis to MXNet1.x#18773

access2rohit · 2020-07-23T01:56:52Z

Description

Back port optimization to broadcast_axis for CPU #17882 and GPU #18168 to MXNet1.7.x

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

mxnet-bot · 2020-07-23T01:56:54Z

Hey @access2rohit , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [sanity, unix-cpu, windows-cpu, centos-gpu, clang, website, miscellaneous, centos-cpu, windows-gpu, unix-gpu, edge]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

access2rohit · 2020-07-23T01:59:11Z

@leezu @sandeep-krishnamurthy can you help merge ? These are cherry-picked from master(already merged there)

ChaiBapchya · 2020-07-23T02:16:53Z

Looks good since it's a cherry-pick.
For functionality check, broadcast_axis specific tests in unix-cpu should verify that on CI.
For performance check, do you mind pasting comparisons for w/ and w/o this change for 1.x branch just to be sure these perf achievements are valid for 1.x branch as well?

access2rohit · 2020-07-24T00:16:53Z

Looks good since it's a cherry-pick.
For functionality check, broadcast_axis specific tests in unix-cpu should verify that on CI.
For performance check, do you mind pasting comparisons for w/ and w/o this change for 1.x branch just to be sure these perf achievements are valid for 1.x branch as well?

@ChaiBapchya
the results are identical to the one I posted on my original PR. These results haven't changed over time. Do you still want additional runs ?

ChaiBapchya · 2020-07-24T02:17:05Z

That PR was against master [2.0]. This PR is for 1.x branch [which forked off from master awhile ago]. Running perf before this PR and after this PR & getting the numbers for this PR would ensure we do our due diligence before merging this PR.

access2rohit · 2020-07-27T23:44:13Z

Code	SSD 1 Epoch time (sec)	%age Speedup/Slowdown w.r.t Master (large tensor disabled)
Master (large tensor disabled)	231	0% speedup/slowdown
Master (large tensor enabled)	378	63.23% slowdown
Master + CPU Optimized broadcast_axis (large tensor disabled)	237	2.6% slowdown
Master + CPU Optimized broadcast_axis (large tensor enabled)	234	1.3% slowdown

* adding separate int32_t kernel for GPU in broadcast_axis/to/like operators * using structure instead of temp workspace to pass stride and shape * replacing hardcoded int32_t with generic index_t * combining CPU and GPU kernels to leverage cached stride calculation and fast access shape data in both Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

* adding comments explaining code optimizations * fixing broadcast_axis kernel to int32 * fixing slice_axis kernel to int32 * combining CPU and GPU implementation method signatures and cleaned up code * adding new broadcast_axis to np_matmul Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

ChaiBapchya · 2020-07-28T19:07:46Z

Why is it master? Shouldn't it be v1.x?

access2rohit · 2020-07-28T19:22:31Z

Why is it master? Shouldn't it be v1.x?

1.x is master for all future 1.x releases

ChaiBapchya

Thanks.

sandeep-krishnamurthy approved these changes Jul 23, 2020

View reviewed changes

access2rohit changed the base branch from v1.7.x to v1.x July 23, 2020 02:13

access2rohit requested review from aaronmarkham, anirudh2290, eric-haibin-lin, iblislin, marcoabreu, nswamy, sergeykolychev and szha as code owners July 23, 2020 02:13

access2rohit force-pushed the mx17 branch from 19e7db4 to 5b61919 Compare July 23, 2020 02:14

access2rohit changed the title ~~Back port optimization to broadcast_axis to MXNet1.7.x~~ Back port optimization to broadcast_axis to MXNet1.x Jul 23, 2020

access2rohit force-pushed the mx17 branch from cbf55c2 to b05226b Compare July 24, 2020 00:20

lanking520 added the pr-awaiting-review PR is waiting for code review label Jul 26, 2020

access2rohit force-pushed the mx17 branch from b05226b to f6d03ae Compare July 27, 2020 23:45

access2rohit and others added 2 commits July 28, 2020 17:17

access2rohit force-pushed the mx17 branch from f6d03ae to 6dded23 Compare July 28, 2020 17:18

ChaiBapchya approved these changes Jul 28, 2020

View reviewed changes

szha merged commit 7bef9cb into apache:v1.x Jul 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Back port optimization to broadcast_axis to MXNet1.x#18773