Inspiration

ML models can run anywhere, but inference stacks add overhead. I wanted a transparent path from graph to metal: emit plain C, own the memory, and measure every cycle.

What it does

Imports ONNX models and lowers to a small JSON IR. Plans memory with an arena and first-fit reuse. Emits portable C kernels (MatMul, Add, ReLU). Folds Gemm → MatMul(+Add) and constant transposes. Saves weights and wires them at fixed offsets. Runs models with zero dependencies beyond a C compiler.

How I built it

Front-End: ONNX parsing, dtype/shape capture, constant extraction. IR: JSON for values, ops, consts, and entry I/O. Passes: DCE, constant folding, alias resolution for view ops, memory liveness. Planner: interval coloring + first-fit to pack tensors into a single arena. C emitter: generates kernels and a schedule that reads/writes fixed offsets.

Challenges we ran into

With only a day, I was only able to support primitve ops. I had to normalize blocks to what I wrote in the emitter. This meant I had to make my own models for testing to ensure my compiler supported them. I also had to deal with learning the compiler pipeline of course. At some point, my C code was 2700% slower than pytorch, so I had to rewrite my entire emitter to be more efficient.

Accomplishments that I'm proud of and what I learned

I'm proud of the working pipeline from graph to code, especially the memory planning arena. This was my first time using topo sort outside of a programming competition. For a lot of people, AI/ML is just a blackbox. We 'train', get a 'model', the model takes input and returns output. After this, I've developed a working understanding of what goes on inside of a model(The difference between knowing DSA and solving out problems). I'm proud that I've covered this much ground in such a short amount of time.

Stretch Goals

I plan to keep working on this until it's consistently matching and beating out PyTorch/Scikit outputs. Most of the work will be done in the passer, but I'll have to do a lot more reading on LLVM and Compiler design in general. In terms of functionality, I want to:

  • expand support for more data types & tensor operations.
  • Create a

Built With

Share this project:

Updates