Nihal Potdar

Research Engineer @ Databricks Mosaic AI

I am a research engineer at Databricks focusing on inference performance and GPU kernels. I enjoy working on ML systems and performance engineering, and have been thinking about this area for a while!

At Databricks, I played a key role in building up the custom inference runtime which was optimized for serving open-source language models. My core focus for this runtime was on techniques that improve the hardware utilization: everything from advanced quantization algorithms, efficient fused MoE architecture, novel self-attention and grouped GEMM GPU kernels, and advanced hardware techniques to better overlap kernel executions. The core work that I led is listed in this databricks research blog. Variations of the techniques I proposed have since been adopted into popular open-source projects like vLLM and sglang.

Prior to Databricks, I was at the University of Waterloo, where I did research into both ML algorithms and systems. I also did a lot of internships where I worked across the stack, trying to understand the practical bottlenecks with ML applications, and learning about performance engineering at scale.

Selected Work

Experience