Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[FEATURE] AdaBelief operator#20065

Merged
szha merged 5 commits intoapache:masterfrom
khaotik:adabelief-i19950
Apr 30, 2021
Merged

[FEATURE] AdaBelief operator#20065
szha merged 5 commits intoapache:masterfrom
khaotik:adabelief-i19950

Conversation

@khaotik
Copy link
Copy Markdown
Contributor

@khaotik khaotik commented Mar 20, 2021

Description

#19950
Hello, this is my first PR to apache MXNet. Glad to work with the community.

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • added C++ operator under src/operator/contrib/adabelief*
  • briefly modified src/operator/contrib/adamw* to avoid symbol name clashes
  • src/operator/contrib/adabelief*

Comments

  • This is mostly a copycat of AdamW optimizer, and therefore fair amount of code duplication. It might be a good idea to perform a refactor regarding adam-like optimizers later.

@mxnet-bot
Copy link
Copy Markdown

Hey @khaotik , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [website, windows-gpu, unix-cpu, edge, miscellaneous, sanity, centos-cpu, clang, windows-cpu, centos-gpu, unix-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@szha
Copy link
Copy Markdown
Member

szha commented Mar 21, 2021

@khaotik thanks for the contribution! It looks like CI fails to start due to a merge failure. Perhaps you could squash the commits into a single commit, rebase to the latest master, and then do a force push?

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 21, 2021
@szha
Copy link
Copy Markdown
Member

szha commented Mar 21, 2021

OK the CI is working for this PR now. There are some formatting issues to address in the sanity check: https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fsanity/detail/PR-20065/2/pipeline#step-75-log-169

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 21, 2021
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 21, 2021
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 21, 2021
@khaotik
Copy link
Copy Markdown
Contributor Author

khaotik commented Mar 21, 2021

Hmm, test failed due to gcc "internal compiler error" during build phase. I find this a bit puzzling.

[2021-03-21T15:52:36.650Z] FAILED: CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o 

[2021-03-21T15:52:36.650Z] /usr/local/bin/ccache /usr/bin/g++-7  -DDMLC_CORE_USE_CMAKE -DDMLC_LOG_FATAL_THROW=1 -DDMLC_LOG_STACK_TRACE_SIZE=0 -DDMLC_MODERN_THREAD_LOCAL=0 -DDMLC_STRICT_CXX11 -DDMLC_USE_CXX11 -DDMLC_USE_CXX11=1 -DDMLC_USE_CXX14 -DMSHADOW_FORCE_STREAM -DMSHADOW_INT64_TENSOR_SIZE=1 -DMSHADOW_IN_CXX11 -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_CUDA=1 -DMSHADOW_USE_CUDNN -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_SSE -DMXNET_USE_BLAS_OPEN=1 -DMXNET_USE_CUDA=1 -DMXNET_USE_ILP64_LAPACKE=1 -DMXNET_USE_INTGEMM=1 -DMXNET_USE_LAPACK=1 -DMXNET_USE_LAPACKE_INTERFACE=1 -DMXNET_USE_LIBJPEG_TURBO=0 -DMXNET_USE_NVTX=1 -DMXNET_USE_ONEDNN=1 -DMXNET_USE_OPENCV=1 -DMXNET_USE_OPENMP=1 -DMXNET_USE_OPERATOR_TUNING=1 -DMXNET_USE_SIGNAL_HANDLER=1 -DUSE_CUDNN -D__USE_XOPEN2K8 -Dmxnet_EXPORTS -I/work/mxnet/3rdparty/onednn/include -I3rdparty/onednn/include -I/work/mxnet/include -I/work/mxnet/src -I/work/mxnet/3rdparty/tvm/nnvm/include -I/work/mxnet/3rdparty/tvm/include -I/work/mxnet/3rdparty/dmlc-core/include -I/work/mxnet/3rdparty/dlpack/include -I/work/mxnet/3rdparty/mshadow -I3rdparty/intgemm -I/work/mxnet/3rdparty/intgemm -I/work/mxnet/3rdparty/onednn/src/../include -I/work/mxnet/3rdparty/miniz -I3rdparty/dmlc-core/include -isystem /usr/include/opencv4 -isystem /usr/local/cuda/include -D_GLIBCXX_ASSERTIONS  -Wall -Wno-sign-compare -O3 -g -fopenmp -O2 -g -DNDEBUG -fPIC   -Werror -Wno-error=unused-variable -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -mf16c -fopenmp -std=gnu++1z -MD -MT CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o -MF CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o.d -o CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o -c /work/mxnet/src/operator/numpy/np_kron.cc

[2021-03-21T15:52:36.650Z] g++-7: internal compiler error: Killed (program cc1plus)

@szha
Copy link
Copy Markdown
Member

szha commented Mar 21, 2021

@khaotik that might have been the host running out of memory.

There are some additional errors related to AMP: https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/PR-20065/1/pipeline/#step-170-log-891.

The way to fix it is to include the new operators in this list, which signifies that when AMP is enabled, these operators need not to be casted.

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Mar 22, 2021
@lanking520 lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 22, 2021
@khaotik
Copy link
Copy Markdown
Contributor Author

khaotik commented Mar 22, 2021

Here are failed tests:

tests/python/unittest/test_operator.py::test_layer_norm
tests/python/unittest/test_operator.py::test_norm

I can't see immediate relationship with my PR here. Do they just fail at random?

@mxnet-bot run ci [centos-cpu, unix-cpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [centos-cpu, unix-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 22, 2021
@szha
Copy link
Copy Markdown
Member

szha commented Mar 22, 2021

Here are failed tests:


tests/python/unittest/test_operator.py::test_layer_norm

tests/python/unittest/test_operator.py::test_norm

I can't see immediate relationship with my PR here. Do they just fail at random?

@mxnet-bot run ci [centos-cpu, unix-cpu]

Yes I believe those are just flaky and we can overcome with rerunning the failed pipelines for now.

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [unix-cpu, centos-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Mar 22, 2021
@khaotik
Copy link
Copy Markdown
Contributor Author

khaotik commented Mar 22, 2021

@mxnet-bot run ci [centos-cpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [centos-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 22, 2021
Copy link
Copy Markdown
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@szha szha merged commit 059a055 into apache:master Apr 30, 2021
@szha
Copy link
Copy Markdown
Member

szha commented Apr 30, 2021

@khaotik thanks for the contribution!

@pribadihcr
Copy link
Copy Markdown

Hi, How to enable AdaBelief or AdamW in c++ here https://github.com/apache/mxnet/blob/v2.0.0.beta1/cpp-package/include/mxnet-cpp/optimizer.hpp. Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

pr-awaiting-review PR is waiting for code review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants