Fix #2546: Implemented ADT-based probe search and batched AllReduce#2679

guptapratykshh · 2026-01-04T21:03:19Z

Proposed Changes

This PR addresses the performance bottleneck experienced when using a large number of probes (e.g., >100) in parallel simulations. The previous implementation used a brute-force O(N) linear search per probe to find the nearest grid point, resulting in O(N_probes * N_points) complexity which caused significant slowdowns.

The changes include:

ADT-Based Nearest Neighbor Search: Implemented an Alternating Digital Tree (ADT) strategy for probe location. This reduces search complexity to O(log(N_points)) per probe. A heuristic is used to switch to ADT only when the number of probes exceeds a threshold (default: 10).
Batched Communication: Consolidated the AllReduce operations for probe values. Instead of performing one MPI reduction per probe, all probe values are now collected and reduced in a single batched operation at the end of the output routine, significantly reducing MPI overhead.
Regression Testing: Added a new regression test case (test_11_probes.cfg) to parallel_regression.py that specifically exercises the ADT path (11 probes) and verifies the correctness of the probe output against known values.

Related Work

Fixes issue #2546 (Probe performance bottleneck).

PR Checklist

I am submitting my contribution to the develop branch.
My contribution generates no new compiler warnings (try with --warnlevel=3 when using meson).
My contribution is commented and consistent with SU2 style (https://su2code.github.io/docs_v7/Style-Guide/).
I used the pre-commit hook to prevent dirty commits and used pre-commit run --all to format old commits.
I have added a test case that demonstrates my contribution, if necessary.
I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp), if necessary.

guptapratykshh · 2026-01-04T21:05:52Z

Please look into this PR , while removing the formatting all the changes of the file (CFlowOutput.cpp) were removed by mistake in the last PR. @pcarruscag

pcarruscag

Thanks, looks good now.

TestCases/parallel_regression.py

SU2_CFD/src/output/CFlowOutput.cpp

TestCases/parallel_regression.py

SU2_CFD/src/output/CFlowOutput.cpp

bigfooted · 2026-01-06T08:29:08Z

@guptapratykshh can you update the regression values in parallel_regression.py (after you've checked that everything works as intended)?

pcarruscag · 2026-01-06T16:00:37Z

Are you using AI to review PRs @bigfooted?
Why would this PR change the probe value @guptapratykshh?!

pcarruscag

Unexplained regressions

guptapratykshh · 2026-01-07T09:03:46Z

The values I initially committed for this new test case (probe_performance_11) were incorrect: the probe values were accidentally taken from Iteration 0 (initialization), while the RMS density was from the last iteration.

I updated parallel_regression.py to consistently check the values at the final iteration (Iter 3), ensuring the test actually validates the simulation result.

pcarruscag · 2026-01-07T15:33:56Z

The regression values for test flatplate_udobj changed, this case only uses one probe, hence its results should not be affected by the ADT change. I suspect some kind of memory error or undefined behavior.
Please determine what is introducing the discrepancy.

TestCases/parallel_regression.py

Fix #2546: Implemented ADT-based probe search and batched AllReduce

72c3227

pcarruscag approved these changes Jan 5, 2026

View reviewed changes

TestCases/parallel_regression.py Outdated Show resolved Hide resolved

SU2_CFD/src/output/CFlowOutput.cpp Outdated Show resolved Hide resolved

guptapratykshh and others added 2 commits January 5, 2026 10:25

Address review: Refactor duplicate code and cleanup tests

50cbf62

Merge branch 'develop' into fix/probe-performance-clean

1894c61

bigfooted added the changelog:fix label Jan 5, 2026

pcarruscag reviewed Jan 5, 2026

View reviewed changes

SU2_CFD/src/output/CFlowOutput.cpp Outdated Show resolved Hide resolved

Update SU2_CFD/src/output/CFlowOutput.cpp

3af399c

pcarruscag reviewed Jan 5, 2026

View reviewed changes

TestCases/parallel_regression.py Outdated Show resolved Hide resolved

SU2_CFD/src/output/CFlowOutput.cpp Outdated Show resolved Hide resolved

Apply suggestions from code review

8d3ddca

pcarruscag reviewed Jan 5, 2026

View reviewed changes

SU2_CFD/src/output/CFlowOutput.cpp Show resolved Hide resolved

Update SU2_CFD/src/output/CFlowOutput.cpp

542572e

guptapratykshh added 2 commits January 6, 2026 15:26

Update regression values for probe test case (Iter 3)

3a43e2a

Merge branch 'develop' into fix/probe-performance-clean

5af1898

pcarruscag requested changes Jan 6, 2026

View reviewed changes

Merge branch 'develop' into fix/probe-performance-clean

81b87f6

guptapratykshh added 3 commits January 7, 2026 21:33

Merge branch 'develop' into fix/probe-performance-clean

8713ad1

Merge branch 'develop' into fix/probe-performance-clean

f3ce25f

Refactor probe value handling and memory allocation

a256d39

guptapratykshh requested a review from pcarruscag January 10, 2026 17:43

pcarruscag approved these changes Jan 10, 2026

View reviewed changes

pcarruscag reviewed Jan 10, 2026

View reviewed changes

TestCases/parallel_regression.py Outdated Show resolved Hide resolved

Update TestCases/parallel_regression.py

43ab0e6

pcarruscag merged commit b84f237 into su2code:develop Jan 10, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #2546: Implemented ADT-based probe search and batched AllReduce#2679