Inspiration

We’ve always seen galaxies and nebulae in TV-shows and movies like Star Wars, Star Trek, and The Expanse (my personal favorite). However, we don’t get to see things like galaxies colliding in front of our eyes. It’s so interesting how such simple rules of gravity can create such complex patterns that we see through our telescopes. For this hackathon, we decided to simulate galaxies and collisions in between them.

What it does

We have two separate applications we were unable to combine in the given timeframe.

The Mac/iPad application allows a user to compute and view a 3D simulation of roughly twenty thousand particles interacting with one another using Newton’s gravity model (we aren’t simulating relativistic effects because the time complexity is too great O(n^3)).

Originally, we were planning to stream simulation data from a server that stores simulation data, but we couldn’t finish that in time. However, we were able to finish a heavily optimized simulator that uses the Barnes-Hut n-body algorithm and a step-kick-step integrator unlike a Eulerian integrator that has unbounded error. The error is calculated by the difference in total mechanical energy of the system at the beginning and current time step (total energy stays constant in a closed system!).

How we built it

The Mac/iPad app is built using the languages Swift and the Metal Shading Language (subset of C++). This app uses the naive O(n^2) algorithm that computes the net force on each body using every other body in the simulation. However, this can still work for fairly good values of N through GPU acceleration. Saaketh and I (Arpan) worked on optimizations such as reducing cache misses and eliminating thread divergence, which allowed us to simulate and render thirty thousand bodies in real time. However, this didn’t come close to what was possible using a better algorithm.

Vivek and Deepa worked on what was supposed to be our backend uses Python, Cython, and Numba to simulate the Barnes-Hut n-body algorithm that is O(n * log(n)), providing a lot of benefits at high values of N. The Mac/iPad version uses a Eulerian integrator that is identical to what you learn in Calculus class when approximating the values on a differential equation using initial conditions. However, this has the problem that it can cause the error to increase progressively over time. The step-kick-step integrator works by averaging the previous and current velocities to keep the error very close to zero. Although it isn’t always zero, it is guaranteed to not diverge over time.

Challenges we ran into

We had many roadblocks when creating our simulations on the client GPU and the backend.

  1. Utilizing Metal directly provided many benefits, namely performance. However, in order to actually fully utilize those benefits, we had to also render particles using Metal, including matrix transformations for rotation and projectection. We were dealing with raw pixels and coordinate information, forcing us to make our own 3D rendering pipeline using Metal kernel functions.
  2. Originally, we weren’t able to get good performance on the GPU since we had so many if statements. You see, if statements in GPU code causes slowdowns because they run code across threads, at the same time. Think of it as a marching band. They all take steps in coordinating. Using if statements causes some threads to stray, hence the name thread divergence.
  3. We also had a lot of cache misses since we had poorly constructed our for loops. Luckily these were easy to figure out since there’s a lot of material to read about on optimizing cache usage. Also, the XCode Profiler gave us tips on which functions were behaving poorly.

On the backend, Vivek and Deepa also had a lot of roadblocks:

  1. The Barnes-Hut algorithm is rather complex to implement since it completely changes the calculations and requires a tree to be built every frame. Making this correct AND fast was very challenging.
  2. The algorithm is technically an approximation, although it’s a very good one and can be tuned as needed. Also, using a step-kick-step integrated kept the error very low, and in some cases, better than the more exact, naive implementation that used eulerian integration.

What's next for GalaxyViz

We want to complete the UI and work on a new rendering system. The thing is, the naive implementation is O(n), the Barnes-Hut implementation is O(n*logn), but streaming pre-simulated data through a websocket is O(n). However, we are limited by the bandwidth someone has for large values of N. To fix this, we want to create new, but definitely more complex renderer using density regions similar to how smoothed marching cubes work. This would allow us to simulate and stream much higher values of N on lower bandwidths. Through this, we could also simulate much more complex interactions.

Built With

Share this project:

Updates