Hybrid parallel (OpenMP) support for incompressible solvers#1178
Hybrid parallel (OpenMP) support for incompressible solvers#1178pcarruscag merged 17 commits intodevelopfrom
Conversation
| StrainMag_Max = max(StrainMag_Max, strainMax); | ||
| Omega_Max = max(Omega_Max, omegaMax); | ||
| } | ||
|
|
There was a problem hiding this comment.
I suppose you do the reduction over the threads yourself because of efficiency, when the if statement is false, but wouldn't it be more readable to have a reduction clause in the OMP for loop?
There was a problem hiding this comment.
Thank you for having a look at the code Edwin, the issues with these kind of reductions were:
- The microsoft compilers do not support max/min reductions;
- For the code to compile with AD (forward) we would need to define custom reduction operators, which might not be very readable;
- Earlier discussions about OpenMP+AD (reverse) suggested that avoiding reduction clauses would make it easier to develop a thread-safe AD tool.
For performance the reduction clause would be more efficient, as the compiler can make min/max atomic, however I do not know of a nice simple way to implement atomic min/max.
| SU2_OMP_BARRIER | ||
| SU2_OMP_MASTER | ||
| { | ||
| SU2_OMP_MASTER { |
There was a problem hiding this comment.
For efficiency reasons then the Allreduce could be done in a single call with two arguments, as for both Omega and the Strain you compute the maximum.
| class CIncEulerSolver : public CFVMFlowSolverBase<CIncEulerVariable, INCOMPRESSIBLE> { | ||
| protected: | ||
| CFluidModel *FluidModel = nullptr; /*!< \brief fluid model used in the solver */ | ||
| vector<CFluidModel*> FluidModel; /*!< \brief fluid model used in the solver. */ |
There was a problem hiding this comment.
Don't you want to put the fluid model in the base class? CEulerSolver has exactly the same variable.
There was a problem hiding this comment.
It is a bit different for the NEMO solver, it also has a "FluidModel" but of a different type.
vdweide
left a comment
There was a problem hiding this comment.
The few questions I had have been answered, so this one looks good to me.
TobiKattmann
left a comment
There was a problem hiding this comment.
Thanks a lot for adding omp support to the inc solver 💐 and moving some reg test accordingly
And thanks for moving some functions to the FVMFlowSolverBase 👍
Much appreciated
| */ | ||
| class CEulerSolver : public CFVMFlowSolverBase<CEulerVariable, COMPRESSIBLE> { | ||
| protected: | ||
| using BaseClass = CFVMFlowSolverBase<CEulerVariable, COMPRESSIBLE>; |
There was a problem hiding this comment.
You need this to call an implementation of the base class later like BaseClass::SetInitialCondition in this very CEulerSolver. But SetInitialCondition is also public in the FVMBase class so it should be callable anyway because we inherit that via class CEulerSolver : public CFVMFlowSolverBase ... where am I wrong?
There was a problem hiding this comment.
If you call SetInitialCondition in SetInitialCondition you get an infinite loop, I just wanted to make more expressive I was calling the implementation of the parent first.
| inline virtual void Viscous_Residual(unsigned long iEdge, CGeometry *geometry, CSolver **solver_container, | ||
| CNumerics *numerics, CConfig *config) { } | ||
| void Viscous_Residual_impl(unsigned long iEdge, CGeometry *geometry, CSolver **solver_container, | ||
| CNumerics *numerics, CConfig *config); | ||
| using CSolver::Viscous_Residual; /*--- Silence warning ---*/ |
There was a problem hiding this comment.
The idea behind this to be able to 'fake' override Viscous_Residual for viscous flows with just calling Viscous_Residual_impl because for inviscid flow the empty implementation from here is taken. This trick has to be used because FVMBase -> Euler -> NS is the inheritance and Viscous_Residual itself cannot be overloaded because Euler is the middleman?
And what happens to the call in CIntegration to Viscous_Residual for NS flow. Is there a doubled contribution then because it is called there and in the centered/upwind residual?
There was a problem hiding this comment.
Well, in Cintegration the empty impl of CSolver is used and in the centered/upwind residual the correct Viscous_Residual impl with the Viscous_Residual_impl init ...
Seems that i have to go back and read a couple of pages in 'Inheritance for complete morons'. fml
There was a problem hiding this comment.
Yep that's all correct.
There was a problem hiding this comment.
Except the self flagellation part, no need for that xD
| * \note The convective residual methods include a call to this for each edge, | ||
| * this allows convective and viscous loops to be "fused". |
There was a problem hiding this comment.
Is there a speed-up due to this? Or is this just for simplification?
There was a problem hiding this comment.
Fast code goes brrrrrrrrrrrrrrrrrrrr
| template <class V, ENUM_REGIME R> | ||
| void CFVMFlowSolverBase<V, R>::LoadRestart(CGeometry **geometry, CSolver ***solver, | ||
| CConfig *config, int iter, bool update_geo) { | ||
| LoadRestart_impl(geometry, solver, config, iter, update_geo); | ||
| } |
There was a problem hiding this comment.
Can you explain why having the LoadRestart_impl is beneficial? The LoadRestart moved from the Euler and IncEuler to FVMFlowSolverBase so that is good but why the additional wrapper function
| * \return Value of the pressure at the infinity. | ||
| */ | ||
| inline CFluidModel* GetFluidModel(void) const final { return FluidModel;} | ||
| inline CFluidModel* GetFluidModel(void) const final { return FluidModel[omp_get_thread_num()]; } |
There was a problem hiding this comment.
Wait, I guess (real) shared-memory is not viable for the FluidModel as it is filled and computes stuff on the fly so each thread gets its own?
| } | ||
| } | ||
| } | ||
| BaseClass::SetInitialCondition(geometry, solver_container, config, TimeIter); |
There was a problem hiding this comment.
this is the BaseClass call I mean above
| CConfig *config, unsigned short iMesh, unsigned short iRKStep) { | ||
|
|
||
| CNumerics* numerics = numerics_container[CONV_TERM]; | ||
| CNumerics* numerics = numerics_container[CONV_TERM + omp_get_thread_num()*MAX_TERMS]; |
There was a problem hiding this comment.
Here you need multiple numerics instances because in the OMP loop they all individually fill the container and evaluate their residual, right?
| void CIncEulerSolver::Centered_Residual(CGeometry *geometry, CSolver **solver_container, CNumerics **numerics_container, | ||
| CConfig *config, unsigned short iMesh, unsigned short iRKStep) { | ||
|
|
There was a problem hiding this comment.
Here it would be prob nice to reflect in the name that it contains the viscous residual.
There was a problem hiding this comment.
Yeah... but then I would need to change all the solvers, the vectorized residual loop is called EdgeFluxResidual to be more inclusive...
|
|
||
| /*--- Read the restart data from either an ASCII or binary SU2 file. ---*/ | ||
|
|
||
| if (config->GetRead_Binary_Restart()) { | ||
| Read_SU2_Restart_Binary(geometry[MESH_0], config, restart_filename); | ||
| } else { | ||
| Read_SU2_Restart_ASCII(geometry[MESH_0], config, restart_filename); | ||
| } | ||
|
|
||
| /*--- Load data from the restart into correct containers. ---*/ | ||
|
|
||
| counter = 0; | ||
| for (iPoint_Global = 0; iPoint_Global < geometry[MESH_0]->GetGlobal_nPointDomain(); iPoint_Global++ ) { | ||
|
|
||
| /*--- Retrieve local index. If this node from the restart file lives | ||
| on the current processor, we will load and instantiate the vars. ---*/ | ||
|
|
||
| iPoint_Local = geometry[MESH_0]->GetGlobal_to_Local_Point(iPoint_Global); | ||
|
|
||
| if (iPoint_Local > -1) { | ||
|
|
||
| /*--- We need to store this point's data, so jump to the correct | ||
| offset in the buffer of data from the restart file and load it. ---*/ | ||
|
|
||
| index = counter*Restart_Vars[1] + skipVars; | ||
| for (iVar = 0; iVar < nVar_Restart; iVar++) Solution[iVar] = Restart_Data[index+iVar]; | ||
| nodes->SetSolution(iPoint_Local,Solution); | ||
| iPoint_Global_Local++; | ||
|
|
||
| /*--- For dynamic meshes, read in and store the | ||
| grid coordinates and grid velocities for each node. ---*/ | ||
|
|
||
| if (dynamic_grid && val_update_geo) { | ||
|
|
||
| /*--- Read in the next 2 or 3 variables which are the grid velocities ---*/ | ||
| /*--- If we are restarting the solution from a previously computed static calculation (no grid movement) ---*/ | ||
| /*--- the grid velocities are set to 0. This is useful for FSI computations ---*/ | ||
|
|
||
| /*--- Rewind the index to retrieve the Coords. ---*/ | ||
| index = counter*Restart_Vars[1]; | ||
| for (iDim = 0; iDim < nDim; iDim++) { Coord[iDim] = Restart_Data[index+iDim]; } | ||
|
|
||
| su2double GridVel[MAXNDIM] = {0.0}; | ||
| if (!steady_restart) { | ||
| /*--- Move the index forward to get the grid velocities. ---*/ | ||
| index = counter*Restart_Vars[1] + skipVars + nVar_Restart + turbVars; | ||
| for (iDim = 0; iDim < nDim; iDim++) { GridVel[iDim] = Restart_Data[index+iDim]; } | ||
| } | ||
|
|
||
| for (iDim = 0; iDim < nDim; iDim++) { | ||
| geometry[MESH_0]->nodes->SetCoord(iPoint_Local, iDim, Coord[iDim]); | ||
| geometry[MESH_0]->nodes->SetGridVel(iPoint_Local, iDim, GridVel[iDim]); | ||
| } | ||
| } | ||
|
|
||
| /*--- For static FSI problems, grid_movement is 0 but we need to read in and store the | ||
| grid coordinates for each node (but not the grid velocities, as there are none). ---*/ | ||
|
|
||
| if (static_fsi && val_update_geo) { | ||
| /*--- Rewind the index to retrieve the Coords. ---*/ | ||
| index = counter*Restart_Vars[1]; | ||
| for (iDim = 0; iDim < nDim; iDim++) { Coord[iDim] = Restart_Data[index+iDim];} | ||
|
|
||
| for (iDim = 0; iDim < nDim; iDim++) { | ||
| geometry[MESH_0]->nodes->SetCoord(iPoint_Local, iDim, Coord[iDim]); | ||
| } | ||
| } | ||
|
|
||
| /*--- Increment the overall counter for how many points have been loaded. ---*/ | ||
| counter++; | ||
|
|
||
| } | ||
| } | ||
|
|
||
| /*--- Detect a wrong solution file ---*/ | ||
|
|
||
| if (iPoint_Global_Local < nPointDomain) { | ||
| SU2_MPI::Error(string("The solution file ") + restart_filename + string(" doesn't match with the mesh file!\n") + | ||
| string("It could be empty lines at the end of the file."), CURRENT_FUNCTION); | ||
| } | ||
|
|
||
| /*--- Update the geometry for flows on deforming meshes ---*/ | ||
|
|
||
| if ((dynamic_grid || static_fsi) && val_update_geo) { | ||
|
|
||
| /*--- Communicate the new coordinates and grid velocities at the halos ---*/ | ||
|
|
||
| geometry[MESH_0]->InitiateComms(geometry[MESH_0], config, COORDINATES); | ||
| geometry[MESH_0]->CompleteComms(geometry[MESH_0], config, COORDINATES); | ||
|
|
||
| if (dynamic_grid) { | ||
| geometry[MESH_0]->InitiateComms(geometry[MESH_0], config, GRID_VELOCITY); | ||
| geometry[MESH_0]->CompleteComms(geometry[MESH_0], config, GRID_VELOCITY); | ||
| } | ||
|
|
||
| /*--- Recompute the edges and dual mesh control volumes in the | ||
| domain and on the boundaries. ---*/ | ||
|
|
||
| geometry[MESH_0]->SetControlVolume(config, UPDATE); | ||
| geometry[MESH_0]->SetBoundControlVolume(config, UPDATE); | ||
| geometry[MESH_0]->SetMaxLength(config); | ||
|
|
||
| /*--- Update the multigrid structure after setting up the finest grid, | ||
| including computing the grid velocities on the coarser levels. ---*/ | ||
|
|
||
| for (iMesh = 1; iMesh <= config->GetnMGLevels(); iMesh++) { | ||
| iMeshFine = iMesh-1; | ||
| geometry[iMesh]->SetControlVolume(config, geometry[iMeshFine], UPDATE); | ||
| geometry[iMesh]->SetBoundControlVolume(config, geometry[iMeshFine],UPDATE); | ||
| geometry[iMesh]->SetCoord(geometry[iMeshFine]); | ||
| if (dynamic_grid) { | ||
| geometry[iMesh]->SetRestricted_GridVelocity(geometry[iMeshFine], config); | ||
| } | ||
| geometry[iMesh]->SetMaxLength(config); | ||
| } | ||
| } | ||
|
|
||
| /*--- Communicate the loaded solution on the fine grid before we transfer | ||
| it down to the coarse levels. We alo call the preprocessing routine | ||
| on the fine level in order to have all necessary quantities updated, | ||
| especially if this is a turbulent simulation (eddy viscosity). ---*/ | ||
|
|
||
| solver[MESH_0][FLOW_SOL]->InitiateComms(geometry[MESH_0], config, SOLUTION); | ||
| solver[MESH_0][FLOW_SOL]->CompleteComms(geometry[MESH_0], config, SOLUTION); | ||
|
|
||
| /*--- For turbulent simulations the flow preprocessing is done by the turbulence solver | ||
| * after it loads its variables (they are needed to compute flow primitives). ---*/ | ||
| if (!turbulent) { | ||
| solver[MESH_0][FLOW_SOL]->Preprocessing(geometry[MESH_0], solver[MESH_0], config, MESH_0, NO_RK_ITER, RUNTIME_FLOW_SYS, false); | ||
| } | ||
|
|
||
| /*--- Interpolate the solution down to the coarse multigrid levels ---*/ | ||
|
|
||
| for (iMesh = 1; iMesh <= config->GetnMGLevels(); iMesh++) { | ||
| for (iPoint = 0; iPoint < geometry[iMesh]->GetnPoint(); iPoint++) { | ||
| Area_Parent = geometry[iMesh]->nodes->GetVolume(iPoint); | ||
| for (iVar = 0; iVar < nVar; iVar++) Solution[iVar] = 0.0; | ||
| for (iChildren = 0; iChildren < geometry[iMesh]->nodes->GetnChildren_CV(iPoint); iChildren++) { | ||
| Point_Fine = geometry[iMesh]->nodes->GetChildren_CV(iPoint, iChildren); | ||
| Area_Children = geometry[iMesh-1]->nodes->GetVolume(Point_Fine); | ||
| Solution_Fine = solver[iMesh-1][FLOW_SOL]->GetNodes()->GetSolution(Point_Fine); | ||
| for (iVar = 0; iVar < nVar; iVar++) { | ||
| Solution[iVar] += Solution_Fine[iVar]*Area_Children/Area_Parent; | ||
| } | ||
| } | ||
| solver[iMesh][FLOW_SOL]->GetNodes()->SetSolution(iPoint,Solution); | ||
| } | ||
| solver[iMesh][FLOW_SOL]->InitiateComms(geometry[iMesh], config, SOLUTION); | ||
| solver[iMesh][FLOW_SOL]->CompleteComms(geometry[iMesh], config, SOLUTION); | ||
|
|
||
| if (!turbulent) { | ||
| solver[iMesh][FLOW_SOL]->Preprocessing(geometry[iMesh], solver[iMesh], config, iMesh, NO_RK_ITER, RUNTIME_FLOW_SYS, false); | ||
| } | ||
| } | ||
|
|
||
| /*--- Update the old geometry (coordinates n and n-1) in dual time-stepping strategy ---*/ | ||
| if (dual_time && config->GetGrid_Movement() && !config->GetDeform_Mesh() && | ||
| (config->GetKind_GridMovement() != RIGID_MOTION)) { | ||
| Restart_OldGeometry(geometry[MESH_0], config); | ||
| } | ||
|
|
||
| /*--- Delete the class memory that is used to load the restart. ---*/ | ||
|
|
||
| delete [] Restart_Vars; Restart_Vars = nullptr; | ||
| delete [] Restart_Data; Restart_Data = nullptr; | ||
| LoadRestart_impl(geometry, solver, config, val_iter, val_update_geo, Solution, nVar_Restart); | ||
|
|
There was a problem hiding this comment.
@TobiKattmann to satisfy the quirkiness of the incompressible solver that needs a default value for temperature when there is no energy equation
Proposed Changes
Remove some more duplication, fix some issues, and make the incompressible solvers compatible with OpenMP.
Related Work
Fix #1175
PR Checklist