Thread sanitizer tests in the CI pipeline#2068
Conversation
This reverts commit f946c26.
jblueh
left a comment
There was a problem hiding this comment.
Thread sanitizer tests are now executed as part of the CI pipeline. With the tweaks explained in the code comments, the time spent on thread sanitizer builds and tests is right now approximately
| job | time |
|---|---|
build BaseOMP-tsan |
20min |
build ReverseOMP-tsan |
45min |
run hybrid_regression.py with tsan |
42min |
run hybrid_regression_AD.py with tsan |
12min |
which is reasonable, I would say, given that thread sanitizer runs are expected to be slower and that some of the other build and test jobs take around 20min as well.
I will update the documentation on containers with thread sanitizer specifics when this is merged.
| /*--- Use a thread-sanitizer dependent loop schedule to work around suspected false positives ---*/ | ||
| #ifndef __SANITIZE_THREAD__ | ||
| #define CPHYSGEO_PARFOR SU2_OMP_FOR_DYN(roundUpDiv(nPoint, 2 * omp_get_max_threads())) | ||
| #else | ||
| #define CPHYSGEO_PARFOR SU2_OMP_FOR_() | ||
| #endif | ||
|
|
||
| #define END_CPHYSGEO_PARFOR END_SU2_OMP_FOR | ||
|
|
There was a problem hiding this comment.
With a dynamic loop schedule, there were thread sanitizer findings in optimized debug builds that I suspect to be false positives. This changes the loop schedule if the thread sanitizer is used.
| cosine_gust.test_iter = 79 | ||
| cosine_gust.test_vals = [-2.418813, 0.004650, -0.001878, -0.000637, -0.000271] | ||
| cosine_gust.unsteady = True | ||
| cosine_gust.enabled_with_tsan = False |
There was a problem hiding this comment.
Tests can be disabled for thread sanitizer testing. I disabled all tests that took longer than 10 minutes to run.
| new_iter = self.test_iter + 1 | ||
|
|
||
| if running_with_tsan: | ||
|
|
||
| # detect restart | ||
| restart_iter = 0 | ||
| for line in lines: | ||
| if line.strip().split("=")[0].strip() == "RESTART_ITER": | ||
| restart_iter = int(line.strip().split("=")[1].strip()) | ||
| break | ||
|
|
||
| if new_iter > restart_iter + 2: | ||
| new_iter = restart_iter + 2 | ||
|
|
There was a problem hiding this comment.
Thread sanitzer tests run at most two iterations, to reduce the overall runtime cost.
| try: | ||
| fromdate = time.ctime(os.stat(fromfile).st_mtime) | ||
| fromlines = open(fromfile, 'U').readlines() | ||
| if not running_with_tsan: # thread sanitizer tests only check the return code, no need to compare outputs |
There was a problem hiding this comment.
A test running with the thread sanitizer stops with a non-zero return code as soon as a data race is detected, the actual test output is not parsed.
| - config_set: BaseOMP-tsan | ||
| flags: '--buildtype=debugoptimized -Dwith-omp=true -Denable-mixedprec=true -Denable-tecio=false --warnlevel=3' | ||
| - config_set: ReverseOMP-tsan | ||
| flags: '--buildtype=debugoptimized -Denable-autodiff=true -Denable-normal=false -Dwith-omp=true -Denable-mixedprec=true -Denable-tecio=false --warnlevel=3' |
There was a problem hiding this comment.
Optimized debug builds have reasonable runtime and stack traces that are still useful if a data race is detected. This is in line with the recommendations.
I had to remove the -werror, there is a warning about control reaching the end of the non-void function CMultizoneDriver::Monitor(). IIRC a fix for this is pending somewhere?
There was a problem hiding this comment.
The -werror can be readded there 👍
pcarruscag
left a comment
There was a problem hiding this comment.
Very nice addition 👍 you can leave the conversations where you documented the main aspects unresolved and I will merge as admin.
Proposed Changes
To simplify the maintenance of hybrid parallel SU2, we want to run thread sanitizer tests as part of the CI pipeline. We use thread-sanitizer enabled containers for this. This is work in progress and there are things that need testing, so I want to take it step by step while observing the behaviour of the CI pipeline.
Related Work
su2code/Docker-Builds#17 and follow-on PRs, #1679
PR Checklist