Fix #2546: Implemented ADT-based probe search and batched AllReduce#2679
Conversation
|
Please look into this PR , while removing the formatting all the changes of the file ( |
|
@guptapratykshh can you update the regression values in parallel_regression.py (after you've checked that everything works as intended)? |
|
Are you using AI to review PRs @bigfooted? |
|
The values I initially committed for this new test case ( I updated parallel_regression.py to consistently check the values at the final iteration (Iter 3), ensuring the test actually validates the simulation result. |
|
The regression values for test |
Proposed Changes
This PR addresses the performance bottleneck experienced when using a large number of probes (e.g., >100) in parallel simulations. The previous implementation used a brute-force O(N) linear search per probe to find the nearest grid point, resulting in O(N_probes * N_points) complexity which caused significant slowdowns.
The changes include:
AllReduceoperations for probe values. Instead of performing one MPI reduction per probe, all probe values are now collected and reduced in a single batched operation at the end of the output routine, significantly reducing MPI overhead.Related Work
Fixes issue #2546 (Probe performance bottleneck).
PR Checklist
pre-commit run --allto format old commits.