Conversation
|
Hey @mseth10 , Thanks for submitting the PR
CI supported jobs: [edge, miscellaneous, unix-cpu, centos-gpu, website, sanity, windows-gpu, unix-gpu, clang, centos-cpu, windows-cpu] Note: |
|
|
||
| set(CUDACXX "/usr/local/cuda-10.0/bin/nvcc" CACHE STRING "Cuda compiler") | ||
| set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures") | ||
| set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures") |
There was a problem hiding this comment.
dropping support for 7.5 cuda arch is it favorable?
For eg for Tesla T4 [G4 instances] cuda arch supported is 7.5
@leezu What do you think?
There was a problem hiding this comment.
It's a temporary fix to get the CD working, I checked the builds for cu100 and cu102, both fail because of binary size issues. We should work on adding back 7.5 arch after making sure the build works.
There was a problem hiding this comment.
Sidenote: 7.0 binaries also run on 7.5
| if platform.system() == 'Darwin': | ||
| shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/3rdparty/mkldnn/build/install/include'), | ||
| os.path.join(CURRENT_DIR, 'mxnet/include/mkldnn')) | ||
| shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/3rdparty/mkldnn/build/install/include'), |
There was a problem hiding this comment.
Curious how
- previously when nightly CD was working, why was mkldnn include done only for Darwin
- now, to fix nightly CD, this needs to be done for all OS
There was a problem hiding this comment.
This platform condition was added by mistake in some earlier commit. We are fixing that here. No reason to package dnnl header files for only Darwin.
|
Also @mseth10 do you mind updating the description [removing extraneous content] and also adding the Jenkins Pipeline build where you tested it out or a command you used to reproduce & test this fix. Thanks |
|
@ChaiBapchya updated the description. Check it out. |
|
Retriggered one previously broken pipeline as http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/1048/pipeline |
…he#18205) * use cmake for cd static build, skip running kvstore tests * update dnnl headers stash location * remove unnecessary platform condition * remove 7.5 arch for cu100, cu101, cu102 Co-authored-by: Ubuntu <ubuntu@ip-172-31-3-62.us-west-2.compute.internal>
Description
This PR fixes nightly CD GPU tests by updating the build toolchain to use cmake static build and updating dnnl headers stash location. It removes 7.5 arch for cu100, cu101, cu102 builds to solve oversized libmxnet.so binary issue with cmake builds.
It also fixes the issue with dnnl headers packaging into nightly build artifacts. Fixes #18120
Here's the link to a broken CD pipeline: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/1024/pipeline/
Commands used to reproduce and test the fix: