Thread sanitizer tests in the CI pipeline#2068

jblueh · 2023-07-03T11:20:55Z

Proposed Changes

To simplify the maintenance of hybrid parallel SU2, we want to run thread sanitizer tests as part of the CI pipeline. We use thread-sanitizer enabled containers for this. This is work in progress and there are things that need testing, so I want to take it step by step while observing the behaviour of the CI pipeline.

Related Work

su2code/Docker-Builds#17 and follow-on PRs, #1679

PR Checklist

I am submitting my contribution to the develop branch.
My contribution generates no new compiler warnings (try with --warnlevel=3 when using meson).
My contribution is commented and consistent with SU2 style (https://su2code.github.io/docs_v7/Style-Guide/).
I have added a test case that demonstrates my contribution, if necessary.
I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp), if necessary.

This reverts commit f946c26.

jblueh

Thread sanitizer tests are now executed as part of the CI pipeline. With the tweaks explained in the code comments, the time spent on thread sanitizer builds and tests is right now approximately

job	time
build `BaseOMP-tsan`	20min
build `ReverseOMP-tsan`	45min
run `hybrid_regression.py` with tsan	42min
run `hybrid_regression_AD.py` with tsan	12min

which is reasonable, I would say, given that thread sanitizer runs are expected to be slower and that some of the other build and test jobs take around 20min as well.

I will update the documentation on containers with thread sanitizer specifics when this is merged.

jblueh · 2023-07-18T08:45:44Z

Common/src/geometry/CPhysicalGeometry.cpp

+/*--- Use a thread-sanitizer dependent loop schedule to work around suspected false positives ---*/
+#ifndef __SANITIZE_THREAD__
+#define CPHYSGEO_PARFOR SU2_OMP_FOR_DYN(roundUpDiv(nPoint, 2 * omp_get_max_threads()))
+#else
+#define CPHYSGEO_PARFOR SU2_OMP_FOR_()
+#endif
+
+#define END_CPHYSGEO_PARFOR END_SU2_OMP_FOR
+


With a dynamic loop schedule, there were thread sanitizer findings in optimized debug builds that I suspect to be false positives. This changes the loop schedule if the thread sanitizer is used.

jblueh · 2023-07-18T08:47:35Z

TestCases/hybrid_regression.py

    cosine_gust.test_iter = 79
    cosine_gust.test_vals = [-2.418813, 0.004650, -0.001878, -0.000637, -0.000271]
    cosine_gust.unsteady  = True
+    cosine_gust.enabled_with_tsan = False


Tests can be disabled for thread sanitizer testing. I disabled all tests that took longer than 10 minutes to run.

jblueh · 2023-07-18T08:51:21Z

TestCases/TestCase.py

+        new_iter = self.test_iter + 1
+
+        if running_with_tsan:
+
+          # detect restart
+          restart_iter = 0
+          for line in lines:
+              if line.strip().split("=")[0].strip() == "RESTART_ITER":
+                  restart_iter = int(line.strip().split("=")[1].strip())
+                  break
+
+          if  new_iter > restart_iter + 2:
+            new_iter = restart_iter + 2
+


Thread sanitzer tests run at most two iterations, to reduce the overall runtime cost.

jblueh · 2023-07-18T08:53:19Z

TestCases/TestCase.py

-            try:
-                fromdate = time.ctime(os.stat(fromfile).st_mtime)
-                fromlines = open(fromfile, 'U').readlines()
+        if not running_with_tsan: # thread sanitizer tests only check the return code, no need to compare outputs


A test running with the thread sanitizer stops with a non-zero return code as soon as a data race is detected, the actual test output is not parsed.

jblueh · 2023-07-18T09:02:17Z

.github/workflows/regression.yml

+          - config_set: BaseOMP-tsan
+            flags: '--buildtype=debugoptimized -Dwith-omp=true -Denable-mixedprec=true -Denable-tecio=false --warnlevel=3'
+          - config_set: ReverseOMP-tsan
+            flags: '--buildtype=debugoptimized -Denable-autodiff=true -Denable-normal=false -Dwith-omp=true -Denable-mixedprec=true -Denable-tecio=false --warnlevel=3'


Optimized debug builds have reasonable runtime and stack traces that are still useful if a data race is detected. This is in line with the recommendations.

I had to remove the -werror, there is a warning about control reaching the end of the non-void function CMultizoneDriver::Monitor(). IIRC a fix for this is pending somewhere?

Yep in #2011

The -werror can be readded there 👍

.github/workflows/regression.yml

pcarruscag

Very nice addition 👍 you can leave the conversations where you documented the main aspects unresolved and I will merge as admin.

Update container image versions.

13a3b31

jblueh added the changelog:feature label Jul 3, 2023

jblueh added 6 commits July 3, 2023 17:03

Merge branch 'develop' into feature_tsan_ci

b99c2a4

Specify container images as part of the job matrix.

f946c26

Revert "Specify container images as part of the job matrix."

f578d93

This reverts commit f946c26.

Add separate job for tsan builds.

5da06cc

No tsan builds on ARM64 for now.

43b1035

Use non-ambiguous names.

f936984

jblueh added changelog:feature and removed changelog:feature labels Jul 4, 2023

jblueh added 2 commits July 4, 2023 11:34

Introduce --tsan argument for hybrid regression tests.

15ebb3b

Prepare python scripts for thread-sanitzer tests.

3fc49d5

jblueh mentioned this pull request Jul 4, 2023

Prepare test containers for thread sanitizer tests su2code/Docker-Builds#22

Merged

jblueh added 17 commits July 4, 2023 18:09

Update container image versions.

780fb2d

Define variables outside the if block.

ae3d799

Add thread sanitizer tests to the workflow.

f192be1

Fix dependency.

1a519cc

Skip thread sanitizer tests on ARM64 for now.

955be02

Add --tsan parameter.

177e889

Exclude pywrapper tests from tsan testing.

e0a6351

Disable hybrid AD tsan tests that take too long.

1e6e9d4

Exclude further pywrapper tests.

f95fadd

Disable hybrid tsan tests that take too long.

a981cac

Disable transonic_stator as well.

13307f0

Merge branch 'develop' into feature_tsan_ci

5e7b9b7

Deliberately introduce a data race for testing.

acf37bb

Silence warning.

8510417

Silence more warnings.

37e6bc1

Use tsan with debug builds.

96ad0bb

Disable ForwardOMP-tsan since it is not used right now.

dcaa3b6

jblueh added 19 commits July 7, 2023 11:39

Work around segfault.

7eea769

Merge branch 'develop' into feature_tsan_ci

188b49a

Fix.

bfce8b5

Don't treat warnings as errors for now.

d41a860

Remove deliberate data race.

8dba8a6

Use optimized debug builds.

8163706

Merge branch 'develop' into feature_tsan_ci

a28146a

Thread sanitizer builds don't need the python wrapper.

6241f69

Move if out of parallel region.

3fa5caa

Different loop schedule to fix suspected false positives.

8802c66

Formatting.

4366d5c

Work around syntax check.

1df0ea1

Formatting.

1835a84

Address compiler warning.

9ef63ef

Merge branch 'develop' into feature_tsan_ci

f52ba05

Introduce a thread-sanitizer dependent loop macro.

6d965b9

Default to two iterations with the thread sanitizer.

9facc26

Ensure that the number of iterations does not increase.

063ddf2

Regard restart iter.

88f474f

jblueh changed the title ~~[WIP] Thread sanitizer tests in the CI pipeline~~ Thread sanitizer tests in the CI pipeline Jul 18, 2023

jblueh commented Jul 18, 2023

View reviewed changes

pcarruscag reviewed Jul 18, 2023

View reviewed changes

.github/workflows/regression.yml Show resolved Hide resolved

pcarruscag approved these changes Jul 18, 2023

View reviewed changes

jblueh added 2 commits July 18, 2023 17:20

Update container image in release action.

c6931d7

Merge branch 'develop' into feature_tsan_ci

b9a7777

pcarruscag merged commit 8f5d770 into develop Jul 18, 2023

pcarruscag deleted the feature_tsan_ci branch July 18, 2023 17:57

jblueh mentioned this pull request Mar 21, 2024

Address sanitizer tests in the CI pipeline #2246

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread sanitizer tests in the CI pipeline#2068