This repository was archived by the owner on Nov 17, 2023. It is now read-only.
[v.1x] Attempt to fix v1.x cd by installing new cuda compt package#19959
Merged
Zha0q1 merged 7 commits intoapache:v1.xfrom Feb 26, 2021
Merged
[v.1x] Attempt to fix v1.x cd by installing new cuda compt package#19959Zha0q1 merged 7 commits intoapache:v1.xfrom
Zha0q1 merged 7 commits intoapache:v1.xfrom
Conversation
|
Hey @Zha0q1 , Thanks for submitting the PR
CI supported jobs: [windows-cpu, clang, edge, unix-cpu, centos-cpu, centos-gpu, website, windows-gpu, miscellaneous, unix-gpu, sanity] Note: |
Contributor
|
With this change, are we using CUDA 11.2 inside cu110 and cu102 docker containers? |
Contributor
Author
No we essentially just install the latest cuda compatibility package. This should not affect build tho |
Zha0q1
added a commit
to Zha0q1/incubator-mxnet
that referenced
this pull request
Feb 26, 2021
…pache#19959) * update cude compt for cd * Update Dockerfile.build.ubuntu_gpu_cu102 * Update Dockerfile.build.ubuntu_gpu_cu102 * Update Dockerfile.build.ubuntu_gpu_cu110 * Update runtime_functions.sh * Update Dockerfile.build.ubuntu_gpu_cu110 * Update Dockerfile.build.ubuntu_gpu_cu102
Zha0q1
added a commit
that referenced
this pull request
Mar 2, 2021
* [v1.x] Migrate to use ECR as docker cache instead of dockerhub (#19654) * [v1.x] Update CI build scripts to install python 3.6 from deadsnakes repo (#19788) * Install python3.6 from deadsnakes repo, since 3.5 is EOL'd and get-pip.py no longer works with 3.5. * Set symlink for python3 to point to newly installed 3.6 version. * Setting symlink or using update-alternatives causes add-apt-repository to fail, so instead just set alias in environment to call the correct python version. * Setup symlinks in /usr/local/bin, since it comes first in the path. * Don't use absolute path for python3 executable, just use python3 from path. Co-authored-by: Joe Evans <joeev@amazon.com> * Disable unix-gpu-cu110 pipeline for v1.x build since we now build with cuda 11.0 in windows pipelines. (#19828) Co-authored-by: Joe Evans <joeev@amazon.com> * [v1.x] For ECR, ensure we sanitize region input from environment variable (#19882) * Set default for cache_intermediate. * Make sure we sanitize region extracted from registry, since we pass it to os.system. Co-authored-by: Joe Evans <joeev@amazon.com> * [v1.x] Address CI failures with docker timeouts (v2) (#19890) * Add random sleep only, since retry attempts are already implemented. * Reduce random sleep to 2-10 sec. Co-authored-by: Joe Evans <joeev@amazon.com> * [v1.x] CI fixes to make more stable and upgradable (#19895) * Test moving pipelines from p3 to g4. * Remove fallback codecov command - the existing (first) command works and the second always fails a few times before finally succeeding (and also doesn't support the -P parameter, which causes an error.) * Stop using docker python client, since it still doesn't support latest nvidia 'gpus' attribute. Switch to using subprocess calls using list parameter (to avoid shell injections). See docker/docker-py#2395 * Remove old files. * Fix comment * Set default environment variables * Fix GPU syntax. * Use subprocess.run and redirect output to stdout, don't run docker in interactive mode. * Check if codecov works without providing parameters now. * Send docker stderr to sys.stderr * Support both nvidia-docker configurations, first try '--gpus all', and if that fails, then try '--runtime nvidia'. Co-authored-by: Joe Evans <joeev@amazon.com> * fix cd * fix cudnn version for cu10.2 buiuld * WAR the dataloader issue with forked processes holding stale references (#19924) * skip some tests * fix ski[ * [v.1x] Attempt to fix v1.x cd by installing new cuda compt package (#19959) * update cude compt for cd * Update Dockerfile.build.ubuntu_gpu_cu102 * Update Dockerfile.build.ubuntu_gpu_cu102 * Update Dockerfile.build.ubuntu_gpu_cu110 * Update runtime_functions.sh * Update Dockerfile.build.ubuntu_gpu_cu110 * Update Dockerfile.build.ubuntu_gpu_cu102 * update command Co-authored-by: Joe Evans <joseph.evans@gmail.com> Co-authored-by: Joe Evans <joeev@amazon.com> Co-authored-by: Joe Evans <github@250hacks.net> Co-authored-by: Przemyslaw Tredak <ptredak@nvidia.com>
Zha0q1
added a commit
to Zha0q1/incubator-mxnet
that referenced
this pull request
Mar 2, 2021
…pache#19959) * update cude compt for cd * Update Dockerfile.build.ubuntu_gpu_cu102 * Update Dockerfile.build.ubuntu_gpu_cu102 * Update Dockerfile.build.ubuntu_gpu_cu110 * Update runtime_functions.sh * Update Dockerfile.build.ubuntu_gpu_cu110 * Update Dockerfile.build.ubuntu_gpu_cu102
mseth10
pushed a commit
to mseth10/incubator-mxnet
that referenced
this pull request
Mar 15, 2021
* [v1.x] Migrate to use ECR as docker cache instead of dockerhub (apache#19654) * [v1.x] Update CI build scripts to install python 3.6 from deadsnakes repo (apache#19788) * Install python3.6 from deadsnakes repo, since 3.5 is EOL'd and get-pip.py no longer works with 3.5. * Set symlink for python3 to point to newly installed 3.6 version. * Setting symlink or using update-alternatives causes add-apt-repository to fail, so instead just set alias in environment to call the correct python version. * Setup symlinks in /usr/local/bin, since it comes first in the path. * Don't use absolute path for python3 executable, just use python3 from path. Co-authored-by: Joe Evans <joeev@amazon.com> * Disable unix-gpu-cu110 pipeline for v1.x build since we now build with cuda 11.0 in windows pipelines. (apache#19828) Co-authored-by: Joe Evans <joeev@amazon.com> * [v1.x] For ECR, ensure we sanitize region input from environment variable (apache#19882) * Set default for cache_intermediate. * Make sure we sanitize region extracted from registry, since we pass it to os.system. Co-authored-by: Joe Evans <joeev@amazon.com> * [v1.x] Address CI failures with docker timeouts (v2) (apache#19890) * Add random sleep only, since retry attempts are already implemented. * Reduce random sleep to 2-10 sec. Co-authored-by: Joe Evans <joeev@amazon.com> * [v1.x] CI fixes to make more stable and upgradable (apache#19895) * Test moving pipelines from p3 to g4. * Remove fallback codecov command - the existing (first) command works and the second always fails a few times before finally succeeding (and also doesn't support the -P parameter, which causes an error.) * Stop using docker python client, since it still doesn't support latest nvidia 'gpus' attribute. Switch to using subprocess calls using list parameter (to avoid shell injections). See docker/docker-py#2395 * Remove old files. * Fix comment * Set default environment variables * Fix GPU syntax. * Use subprocess.run and redirect output to stdout, don't run docker in interactive mode. * Check if codecov works without providing parameters now. * Send docker stderr to sys.stderr * Support both nvidia-docker configurations, first try '--gpus all', and if that fails, then try '--runtime nvidia'. Co-authored-by: Joe Evans <joeev@amazon.com> * fix cd * fix cudnn version for cu10.2 buiuld * WAR the dataloader issue with forked processes holding stale references (apache#19924) * skip some tests * fix ski[ * [v.1x] Attempt to fix v1.x cd by installing new cuda compt package (apache#19959) * update cude compt for cd * Update Dockerfile.build.ubuntu_gpu_cu102 * Update Dockerfile.build.ubuntu_gpu_cu102 * Update Dockerfile.build.ubuntu_gpu_cu110 * Update runtime_functions.sh * Update Dockerfile.build.ubuntu_gpu_cu110 * Update Dockerfile.build.ubuntu_gpu_cu102 * update command Co-authored-by: Joe Evans <joseph.evans@gmail.com> Co-authored-by: Joe Evans <joeev@amazon.com> Co-authored-by: Joe Evans <github@250hacks.net> Co-authored-by: Przemyslaw Tredak <ptredak@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We updated restricted-mxnetlinux-gpu nodes' ami to use the new NVIDIA 460 driver. So, we will also need to install the new cuda compt package so that during the test stage we can find gpus. (Old compt package won't work with the new nvidia driver)
#19948