I'm a C and C++ programmer with proficiency in AMD's HIP as well as Nvidia's CUDA. I work on developing high performance GPU kernels for AMD GPUs.
At AMD, I work with the Machine Learning team to maintain the Composable Kernel library. My work focuses on Flash Attention and GEMM Optimizations.
In the past, I've worked on mobile graphics drivers at Samsung, San Jose, published games on Steam as part of my master's degree project.
I played competitive Team Fortress 2 as medic and still continue to do so :)
- Flash Attention from Scratch (Python | Triton) (https://github.com/aviralgoel/flash_attention_triton)
- Try Again (C# | Unity) - Youtube Demo || CameraScript Gist || CameraPanning Gist







