Hybrid parallel CFEASolver and CMeshSolver#843

pcarruscag · 2019-12-21T11:49:43Z

Proposed Changes

Another snapshot from #824.
All major routines of CFEASolver (and by association CMeshSolver) now use OpenMP directives.
Parallelization is at the level of each individual routine, i.e. threads are "started" locally.
This required avoiding the use of small auxiliary array member variables, as that is not thread safe (multiple threads cannot write to the same location) In doing that I ended up doing some cleanup and so I thought it would be better to move the files, strip spaces, etc., it should make the review easier actually.
I will document to the major changes below.

Related Work

#824 #789
Depends on #834 and #830

PR Checklist

I am submitting my contribution to the develop branch.
My contribution generates no new compiler warnings (try with the '-Wall -Wextra -Wno-unused-parameter -Wno-empty-body' compiler flags).
My contribution is commented and consistent with SU2 style.
I have added a test case that demonstrates my contribution, if necessary.
I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp) , if necessary.

…eanup

…allel_and_SIMD

…arallel_and_SIMD

…l_fea_solver

pcarruscag · 2019-12-21T11:59:59Z

SU2_CFD/src/drivers/CDriver.cpp

+  for (iMGlevel = 0; iMGlevel <= config->GetnMGLevels(); iMGlevel++) {
+    numerics[iMGlevel] = new CNumerics** [MAX_SOLS];
+    for (iSol = 0; iSol < MAX_SOLS; iSol++)
+      numerics[iMGlevel][iSol] = new CNumerics* [MAX_TERMS*omp_get_max_threads()]();


One of the major changes it that we need one numerics instance per thread, instead of adding another * I enlarged the last dimension of the numerics container.

MAX_TERMS is still 6, while MAX_TERMS_FEA was 10 before, and if I recall correctly that was to enable multi-material laws. How has this changed?

It looked like we only have 5 material models / terms:

const int FEA_TERM = 0; /*!< \brief Position of the finite element analysis terms in the numerics container array. */ const int DE_TERM = 1; /*!< \brief Position of the dielectric terms in the numerics container array. */ const int MAT_NHCOMP = 2; /*!< \brief Position of the Neo-Hookean compressible material model. */ const int MAT_IDEALDE = 3; /*!< \brief Position of the Ideal-DE material model. */ const int MAT_KNOWLES = 4; /*!< \brief Position of the Knowles material model. */

And so 6 max terms would be enough?

pcarruscag · 2019-12-21T12:01:32Z

SU2_CFD/src/drivers/CDriver.cpp

-              break;
-          }
+        if (!(properties_file.fail())) {
+          numerics[MESH_0][FEA_SOL][MAT_NHCOMP+offset]  = new CFEM_NeoHookean_Comp(nDim, nVar_FEM, config);


The container can/should then be used with an "offset" of thread*MAX_TERMS.

pcarruscag · 2019-12-21T12:05:10Z

SU2_CFD/src/drivers/CDriver.cpp

+void CDriver::Numerics_Postprocessing(CNumerics *****numerics, CSolver***, CGeometry**,
+                                      CConfig *config, unsigned short val_iInst) {


I simplified (a lot) the numerics postprocessing to just delete everything.

Great idea, I was searching for it 😅😅😅

pcarruscag · 2019-12-21T12:07:34Z

meson.build

+# OpenMP
+omp = get_option('with-omp')
+
+if omp
+  omp_dep = dependency('openmp', language:'cpp')
+endif
+


I updated meson to build with OpenMP support, add -Dwith-omp=true (default false) in the call to meson.py, the number of threads is still set via environment variable (see #830).

Thanks! That should make it much easier to build.

pcarruscag · 2019-12-21T12:11:02Z

SU2_CFD/src/solvers/CFEASolver.cpp

+#ifdef HAVE_OMP
+  /*--- Get the element coloring. ---*/
+
+  const auto& coloring = geometry->GetElementColoring();
+
+  auto nColor = coloring.getOuterSize();
+  ElemColoring.resize(nColor);
+
+  for(auto iColor = 0ul; iColor < nColor; ++iColor) {
+    ElemColoring[iColor].size = coloring.getNumNonZeros(iColor);
+    ElemColoring[iColor].indices = coloring.innerIdx(iColor);
+  }
+
+  ColorGroupSize = 1; /// TODO: This needs to come from geometry or config
+
+  omp_chunk_size = computeStaticChunkSize(nPointDomain, omp_get_max_threads(), OMP_MAX_SIZE);
+#endif


The element coloring is obtained from CGeometry as a shallow copy at construction.
I have yet to test the performance impact of the color group size, group size 1 gives maximum parallelization but worse spatial locality.

Update: The impact of not grouping elements for coloring (size 1) is not significant for the FEA solver, not a lot of data is accessed per element relative to the computation effort and linear algebra takes up 80%+ of the time anyway.

pcarruscag · 2019-12-21T12:14:47Z

SU2_CFD/src/solvers/CFEASolver.cpp

+#ifdef HAVE_OMP
+    /*--- Chunk size is at least OMP_MIN_SIZE and a multiple of the color group size. ---*/
+    auto chunkSize = roundUpDiv(OMP_MIN_SIZE, ColorGroupSize)*ColorGroupSize;
+
+    /*--- Loop over element colors. ---*/
+    for (auto color : ElemColoring)
+    {
+      SU2_OMP_FOR_DYN(chunkSize)
+      for(auto k = 0ul; k < color.size; ++k) {
+
+        auto iElem = color.indices[k];
+#else
+    /*--- Natural coloring. ---*/
+    {
+      for (auto iElem = 0ul; iElem < nElement; iElem++) {
+#endif


The coloring is then used like this and the body of the loop is almost unchanged apart from the use of local variables and picking one CElement and one CNumerics per thread.
When compiling without OpenMP support everything reverts to the natural way of looping over elements and no overhead or indirection is introduced by coloring.

Is this linked to the number of threads of the preconditioners, or totally independent?

Totally independent.

pcarruscag · 2019-12-21T12:17:04Z

SU2_CFD/src/solvers/CFEASolver.cpp

+
+          auto Ta = element->Get_Kt_a(iNode);
+          for (iVar = 0; iVar < nVar; iVar++)
+            LinSysRes[indexNode[iNode]*nVar+iVar] -= simp_penalty*Ta[iVar];


In a number of places we now write directly to CSysVector / CSysMatrix instead of populating a temporary first.

I like it, it's cleaner like this.

pcarruscag · 2019-12-21T12:35:09Z

SU2_CFD/src/solvers/CFEASolver.cpp

+  /*--- Reduction variable to compute the maximum von Misses stress,
+   *    each thread uses the first element of each row, the rest of
+   *    the row is padding to prevent false sharing. ---*/
+  su2activematrix auxMaxVonMisses(omp_get_max_threads(), 64/sizeof(su2double));
+  auxMaxVonMisses.setConstant(0.0);


This is probably the trickiest bit in this PR, reduction operations in general are tricky, I can try to clean it up a bit by using the OpenMP reduction clause but this "manual" way is future compatible with the AD type (OpenMP reductions are only defined for builtin types, they can be declared for other types but that sounds like witchcraft to me).

…or threads, FEA-related cleanup

pcarruscag · 2019-12-27T21:09:44Z

SU2_CFD/src/SU2_CFD.cpp

-  app.add_flag("-d,--dryrun", dry_run, "Enable dry run mode.\n" 
+  app.add_flag("-d,--dryrun", dry_run, "Enable dry run mode.\n"
                                       "Only execute preprocessing steps using a dummy geometry.");
+  app.add_option("-t,--threads", num_threads, "Number of OpenMP threads per MPI rank.");


The number of threads per rank can now be set via this command line option, it defaults to whatever is defined by the environment.

…allel_and_SIMD

…l_fea_solver

pcarruscag · 2020-01-20T00:31:19Z

Common/src/config_structure.cpp

+  /* DESCRIPTION: Custom number of threads used for additive domain decomposition for ILU and LU_SGS (0 is "auto"). */
+  addUnsignedLongOption("LINEAR_SOLVER_PREC_THREADS", Linear_Solver_Prec_Threads, 0);


With this PR the ILU and LU_SGS preconditioners can now run with a custom number of threads specified through the config file.
For structural cases (and mesh deformation) it gets harder and harder (more linear iterations) to solve the linear systems as more processors are throw at the problem (which is annoying in multizone problems), being able to independently control the number of preconditioner threads alleviates the problem.

I guess the default, or auto, just keeps the number of threads for the requested MPI parallelization, is that correct?

Yup it uses the same threads that are used everywhere else.
To answer another of your questions:

When using the mesh solver in a CFD case (so, single zone), is it possible to use different levels of parallelism for the CFD problem and for the mesh deformation problem? And, if so, is it something worth doing?

Although one could change the number of threads used in all mesh deformation and fea routines very easily, at the moment the mesh deformation runs synchronously with the rest of the code, and so running the discretization parts with fewer threads would not be advantageous.
The area where it might be worth reducing the number of threads is those 2 preconditioners, if and only if:

The linear solver is not able to reach adequate convergence (or is diverging) due to the preconditioner being too weak (too partitioned).

The code is running on a very bandwidth-limited machine, in which case reducing threads a bit might give a speedup due to fewer linear iterations being required.

Other areas of the code will not suffer as much from having too many threads, since threads do not increase the number of halo points (communication costs) and allow for "automatic" load balancing.

WallyMaier

Thanks for the work @pcarruscag ! This looks good to me, though my C++ isnt the best. I appreciate all the clean up that you have done.

WallyMaier · 2020-01-15T02:46:56Z

Common/include/option_structure.hpp

    this->centered = NO_CENTERED;
    this->upwind = NO_UPWIND;
-    this->space = SPACE_CENTERED;
+    this->space = NO_CONVECTIVE;


Good catch!

Thank @koodlyakshay
@rsanfer you asked about this below, it addresses #842

rsanfer

Hi @pcarruscag
thanks so much for this great contribution! Indeed, it addresses a problem of the over-parallelization for FEA and mesh cases that has been there for a while. kudos!

I saw tests are not passing yet and I asked a few questions here and there, but overall I'd say this is pretty much ready. Happy to approve it, but I'll wait a few days to see if there are any further comments from the community. Just a request, if it's possible that you add one or two test cases so the implementation is safe onwards (and, of course, so I can play around with the new features a little bit 😏).

Honestly, my knowledge of OpenMP is limited to say the least, I have never really used it/coded with it. I guess that shows in my questions here and there, so I can't really comment on some technicalities. However, I do have some additional questions:

When using the mesh solver in a CFD case (so, single zone), is it possible to use different levels of parallelism for the CFD problem and for the mesh deformation problem? And, if so, is it something worth doing?
Should this just run "out of the box" with a working installation of OpenMP in any machine, or is there anything else fancy needed?
Is the previous behaviour exactly kept, or are there any modifications in the basic, non OpenMP version of code? (Not that I mind, just curious).

Also, just an additional (hopefully constructive) comment: I find all of these developments great, and I honestly think that you are doing an amazing job on performance and overall code improvement. However, as a non-C++-master myself, I'm just a little concerned of whether some advanced programming may become an entrance barrier to new additions to the community. I'm aware that you have been doing very well at documenting the code and the various PRs, but I'd say we should try to find an strategy to ease the learning curve on potential new developers (maybe some developer tutorials? a collection of the comments/discussions on the PRs moved to the wiki? a list of links/useful resources?).

Thanks again!

Ruben

Common/include/config_structure.hpp

rsanfer · 2020-01-22T15:34:09Z

Common/include/linear_algebra/CSysMatrix.hpp

-        break;
-      }
-    }
+    for (auto iVar = 0ul; iVar < nVar; iVar++)


May be due to my limited knowledge, but this declaration seems strange to me...

ul is for unsigned long (which I was getting tired to write xD) otherwise auto will deduce int, which is the default type for integer literals.

Common/include/option_structure.hpp

rsanfer · 2020-01-22T15:42:44Z

Common/src/config_structure.cpp

+  /* DESCRIPTION: Custom number of threads used for additive domain decomposition for ILU and LU_SGS (0 is "auto"). */
+  addUnsignedLongOption("LINEAR_SOLVER_PREC_THREADS", Linear_Solver_Prec_Threads, 0);


I guess the default, or auto, just keeps the number of threads for the requested MPI parallelization, is that correct?

rsanfer · 2020-01-22T15:44:40Z

Common/src/geometry/CGeometry.cpp

-  /*--- Populate ---*/
-  for(iElem=0; iElem<nElem; ++iElem)
+
+  SU2_OMP_PARALLEL


I'm not really familiar with OpenMP. What is the behaviour if the code is compiled without OpenMP?

"Business as usual" there should be no runtime overhead, no coloring is applied for the loops, the SU2_OMP macros are disabled and the "omp_xxx" functions are constexpr so the compiler should optimize them away very easily.

rsanfer · 2020-01-23T07:12:02Z

SU2_CFD/src/solvers/CFEASolver.cpp

+#ifdef HAVE_OMP
+    /*--- Chunk size is at least OMP_MIN_SIZE and a multiple of the color group size. ---*/
+    auto chunkSize = roundUpDiv(OMP_MIN_SIZE, ColorGroupSize)*ColorGroupSize;
+
+    /*--- Loop over element colors. ---*/
+    for (auto color : ElemColoring)
+    {
+      SU2_OMP_FOR_DYN(chunkSize)
+      for(auto k = 0ul; k < color.size; ++k) {
+
+        auto iElem = color.indices[k];
+#else
+    /*--- Natural coloring. ---*/
+    {
+      for (auto iElem = 0ul; iElem < nElement; iElem++) {
+#endif


Is this linked to the number of threads of the preconditioners, or totally independent?

SU2_CFD/src/solvers/CMeshSolver.cpp

rsanfer · 2020-01-23T07:14:40Z

SU2_CFD/src/solvers/CMeshSolver.cpp

-    Disp_nP1 = nodes->GetSolution(iPoint);
+    /*--- Coordinates of the current point at n+1, n, & n-1 time levels. ---*/
+
+    const su2double* Disp_nM1 = nodes->GetSolution_time_n1(iPoint);


Is this for efficiency reasons?

see big reply

TestCases/parallel_regression_AD.py

rsanfer · 2020-01-23T07:17:37Z

meson.build

+# OpenMP
+omp = get_option('with-omp')
+
+if omp
+  omp_dep = dependency('openmp', language:'cpp')
+endif
+


Thanks! That should make it much easier to build.

pcarruscag · 2020-01-23T13:38:02Z

Thank you for the thorough review @rsanfer! I'll reply to your main questions and some of the smaller ones here to centralize things.

Just a request, if it's possible that you add one or two test cases so the implementation is safe onwards (and, of course, so I can play around with the new features a little bit ).

The testcases are the same, no changes there other than the one optional option introduced above. When the hybrid stuff covers most of the code I would add an entire build configuration e.g. BaseMPIOMP and corresponding testcase suite.

Should this just run "out of the box" with a working installation of OpenMP in any machine, or is there anything else fancy needed?

I would leave it to the community to decide what the defaults should be, probably for a lot of new users that don't run on clusters just calling SU2_CFD and not having to worry about mpi would be nice (a lot of the issues on CFD online are mpi related).

Is the previous behaviour exactly kept, or are there any modifications in the basic, non OpenMP version of code? (Not that I mind, just curious).

Other than the algorithmic changes (but mathematically equivalent) introduced to limiters and gradients in #834, yes.

What's the advantage of having one numerics term per thread?

It is a requirement, we need to write data into numerics before using them, multiple threads cannot write to the same location (i.e. the internal structures of CNumerics) therefore one per thread is required.

...Also, I think I missed the point where the numerics container is extended beyond MAX_TERMS

The allocation of space for one numerics per thread is done above in line 1995 of my 21 Dec 2019 comment: ...MAX_TERMS*omp_get_max_threads()....
The instantiation of one numerics per thread is then done by executing the rest of the preprocessing in parallel and instead of using XYZ_TERM using XYZ_TERM+offset where offset = thread_id * MAX_TERMS.
I think someone mentioned this (maybe Tim) that we could revisit the ownership relations of the numerics classes, i.e. allocate them as members of their respective solvers, which if we do, we can think of having a purpose built container that automates the per-thread creation and access.

Why are they redefined each time inside the loop?
Is this for efficiency reasons?

Referring to variables being declared inside loops. One stylist reason is that declaring everything at the top of a function is the C way of doing things, the C++ people whose books/blogs I've read and talks I've watched, recommend keeping namespaces (the inside of the loop being one) as clean as possible.
The only reason not to do this is if you explicitly want re-use, in the case of trivial types this does not improve efficiency, and in the context of OpenMP code it can create issues. Just like we need one numerics per thread, if we declare variables outside a parallel loop the default OpenMP behaviour is to consider them shared, and concurrent writes to shared locations = gdb and many bad words xD.
EDIT: I should mention here that if the parallel region is started before the variable declarations they become local and all is well, with the exception of class members, those will be shared most of the time (this is where const correctness can give some peace of mind).

Also, just an additional (hopefully constructive) comment: I find all of these developments great, and I honestly think that you are doing an amazing job on performance and overall code improvement. However, as a non-C++-master myself, I'm just a little concerned of whether some advanced programming may become an entrance barrier to new additions to the community.

As I wrote in the preamble of #789:
"But please participate even if you never heard of these topics, your opinion about readability and "developability" of the code is important! I think the code-style should be accessible to people starting a PhD (after they read a bit about C++...)."
I try to encapsulate and hide the tricky bits as much as possible to make the code as readable as possible, whether I am succeeding or not is for the community to decide, in all these PR's I've been pointing to the areas I think are trickier, if someone, anyone, feels they are absolutely incomprehensible please say something... either here, or trough slack, or by email (I think it shows in the commits) (I understand not everyone is keen on github exposure).

I'm aware that you have been doing very well at documenting the code and the various PRs, but I'd say we should try to find an strategy to ease the learning curve on potential new developers (maybe some developer tutorials? a collection of the comments/discussions on the PRs moved to the wiki? a list of links/useful resources?).

I agree with documentation of broad design decisions, that is the intent of #789, and developer tutorials (how to implement a new X) once we are content with the restructurings, otherwise they will quickly go outdated... or actually...
We should probably first think about the answers to "how to implement a new X" and restructure/refactor as a function of that.
Based on previous efforts of maintaining wiki's updated while code is being developed, I much prefer this github style where you can clearly tell what version of the code the comments refer to. A collection of comments/discussions organized by topic and linked to a feature is somewhat what I had in mind when I opened a "big PR" (#824) with little branches such as this one, I can try to complete that with a list of links/useful resources, references as it were, good idea!

pcarruscag added 19 commits December 10, 2019 17:57

update Makefile

4bd3832

start using element coloring

bb1ddb8

major "Compute" methods of CFEASolver using element coloring

6e0e102

parallel pragmas in point loops, cleanup

a02ea3d

fix AD build

ffac9aa

CFEASolver boundary condition cleanup

3feb352

parallel OF calculations, avoid use of small "global" arrays, more cl…

5bfdd5b

…eanup

Merge branch 'hybrid_parallel_linear_algebra' into feature_hybrid_par…

49707c5

…allel_and_SIMD

Merge remote-tracking branch 'upstream/develop' into feature_hybrid_p…

075b274

…arallel_and_SIMD

parallel loops of CMeshSolver

21e7102

allocate one numerics per thread

7855189

use MAX_TERMS instead of MAX_TERMS_FEA

4dfcda2

fix small bugs move inlines to hpp

1392277

move CFEASolver files

1fff7cc

strip trailing blanks

7534ee6

virtual qualifiers, fix build after move

389acbc

parallel density filters

148c9f2

fix nompi build, eliminate unnecessary temporary

75897cd

update meson for OpenMP

1338c97

dep bot added the dependent label Dec 21, 2019

pr-triage bot added the PR: unreviewed label Dec 21, 2019

pcarruscag added 2 commits December 21, 2019 11:52

Merge branch 'hybrid_parallel_gradients_limiters' into hybrid_paralle…

66e1d00

…l_fea_solver

fix leak in numerics postprocessing

ce46da0

pcarruscag commented Dec 21, 2019

View reviewed changes

pcarruscag added 3 commits December 26, 2019 23:21

throw errors on absence of CONV_NUM_METHOD_??? options

98bf0ce

fix "uninitialized variable" warning

ac7514c

move parts of preprocessing to improve parallelism, cmd line option f…

124fbcb

…or threads, FEA-related cleanup

pcarruscag commented Dec 27, 2019

View reviewed changes

pcarruscag added 2 commits December 30, 2019 22:19

get preconditioner threads from config

dd813c9

output files in CFluidDriver

fd05e3b

pcarruscag added 2 commits January 3, 2020 10:01

dot product seems to need one more barrier

4e7ab3a

move barrier insted of adding new one...

3710a4d

jayantmukho mentioned this pull request Jan 11, 2020

Restructuring of solver files #849

Merged

5 tasks

pcarruscag mentioned this pull request Jan 13, 2020

Hybrid parallel (OpenMP) implementation for Linear Algebra classes #830

Merged

5 tasks

Merge branch 'hybrid_parallel_linear_algebra' into feature_hybrid_par…

964efd4

…allel_and_SIMD

pcarruscag mentioned this pull request Jan 18, 2020

Hybrid parallel Gradients and Limiters #834

Merged

5 tasks

Merge branch 'hybrid_parallel_gradients_limiters' into hybrid_paralle…

d86bf0f

…l_fea_solver

pcarruscag changed the base branch from hybrid_parallel_gradients_limiters to develop January 18, 2020 23:04

pcarruscag added changelog:feature and removed dependent labels Jan 18, 2020

pcarruscag and others added 4 commits January 18, 2020 23:05

Merge branch 'develop' into hybrid_parallel_fea_solver

0cd4954

cleanup direct diff output

a9eb61d

try to make discadj_fea regression more robust against numerical "noise"

cb88525

close one "todo"

a817bb4

pcarruscag commented Jan 20, 2020

View reviewed changes

Merge branch 'develop' into hybrid_parallel_fea_solver

a40d68c

pcarruscag mentioned this pull request Jan 21, 2020

FYI: Recently fixed build system issues #851

Closed

pcarruscag added 2 commits January 22, 2020 12:56

more cleanup

2290c49

fix AD build issues + multiple print in parallel

9d547ca

WallyMaier approved these changes Jan 22, 2020

View reviewed changes

rsanfer reviewed Jan 23, 2020

View reviewed changes

pcarruscag and others added 2 commits January 23, 2020 13:49

tabs to spaces

6ce671c

Merge branch 'develop' into hybrid_parallel_fea_solver

1bd6074

pcarruscag merged commit 43f01fb into develop Jan 24, 2020

pcarruscag deleted the hybrid_parallel_fea_solver branch January 24, 2020 13:55

pr-triage bot added PR: merged and removed PR: unreviewed labels Jan 24, 2020

pcarruscag mentioned this pull request Mar 11, 2020

Hybrid parallel coloring fallback strategies (better strong scaling and user friendliness) #908

Merged

5 tasks

jblueh mentioned this pull request Mar 1, 2021

Hybrid Parallel AD (Part 1/?) #1214

Merged

5 tasks

		void CDriver::Numerics_Postprocessing(CNumerics ***numerics, CSolver, CGeometry*,
		CConfig *config, unsigned short val_iInst) {

		/* DESCRIPTION: Custom number of threads used for additive domain decomposition for ILU and LU_SGS (0 is "auto"). */
		addUnsignedLongOption("LINEAR_SOLVER_PREC_THREADS", Linear_Solver_Prec_Threads, 0);

Conversation

pcarruscag commented Dec 21, 2019

Proposed Changes

Related Work

PR Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WallyMaier left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rsanfer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcarruscag commented Jan 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

pcarruscag commented Jan 23, 2020 •

edited

Loading