Add tests for the NVIDIA GPU tests#433
Conversation
| } | ||
|
|
||
| func verifyNvidiaInstallation(c cluster.TestCluster) { | ||
| m := c.Machines()[0] |
There was a problem hiding this comment.
With
| m := c.Machines()[0] | |
| if kola.AzureOptions.Size != "Standard_NC6s_v3" { | |
| c.Skip("skipping due to wrong instance size") | |
| } | |
| m := c.Machines()[0] |
I think we could do this in scripts:
diff --git a/ci-automation/vendor-testing/azure.sh b/ci-automation/vendor-testing/azure.sh
index 17081b3598..c57210d651 100755
--- a/ci-automation/vendor-testing/azure.sh
+++ b/ci-automation/vendor-testing/azure.sh
@@ -74,7 +74,7 @@ query_kola_tests() {
kola list --platform=azure --filter "${@}"
}
-other_instance_types=()
+other_instance_types=('Standard_NC6s_v3')
if [[ "${CIA_ARCH}" = 'amd64' ]]; then
other_instance_types+=('V1')
fi
@@ -85,6 +85,6 @@ run_kola_tests_on_instances \
"${CIA_FIRST_RUN}" \
"${other_instance_types[@]}" \
'--' \
- 'cl.internet' \
+ 'cl.internet' 'cl.misc.nvidia' \
'--' \
"${@}"
There was a problem hiding this comment.
If we want to keep this check, we could even move it to the SkipFunc (as kola.AzureOptions.Size should be accessible)
811dc81 to
3def6fb
Compare
| // This test is to test the NVIDIA installation, limited to AZURE for now | ||
| Platforms: []string{"azure"}, | ||
| Architectures: []string{"amd64"}, | ||
| Flags: []register.Flag{register.NoEnableSelinux}, |
There was a problem hiding this comment.
Any chance to track the avc message? (for issue tracking and/or documentation purpose?)
There was a problem hiding this comment.
Good idea! I will need to search the logs, if it's not there then I can trigger a local/Jenkins build to track the avc messages for tracking on a issue.
There was a problem hiding this comment.
You should see it in the journal logs (before disabling the SELinux flags)
There was a problem hiding this comment.
Yes, I meant searching the journal logs if they are not deleted on bincache.
| out, err := c.SSH(m, "systemctl is-active nvidia.service") | ||
| if !bytes.Contains(out, []byte("inactive")) { | ||
| return fmt.Errorf("nvidia.service: %q: %v", out, err) | ||
| } |
There was a problem hiding this comment.
Just FYI, you could do this way:
| out, err := c.SSH(m, "systemctl is-active nvidia.service") | |
| if !bytes.Contains(out, []byte("inactive")) { | |
| return fmt.Errorf("nvidia.service: %q: %v", out, err) | |
| } | |
| _, err := c.SSH(m, "systemctl --quiet is-active nvidia.service") | |
| return err |
we just want to assert if the unit is active or not, if the unit is active with --quiet it will simply return 0.
There was a problem hiding this comment.
There is also _ = c.MustSSH(…) which fails the test.
There was a problem hiding this comment.
That is correct but in this case we don't want to fail the test as we are in a retry loop no?
Signed-off-by: Sayan Chowdhury <schowdhury@microsoft.com>
Signed-off-by: Sayan Chowdhury <schowdhury@microsoft.com>
d04c5d0 to
4cd37d7
Compare
Add tests for the NVIDIA GPU tests
Testing done
changelog/directory (user-facing change, bug fix, security fix, update)/bootand/usrsize, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.