I am a 4th year CS PhD student at University of Maryland, College Park advised by Prof. Dinesh Manocha. I am broadly interested in multi-modal learning and its different applications. My research primarily involves studying the interplay between the vision and audio modalities and developing systems equipped with their comprehensive understanding.
I am currently working as a ML Research intern at Apple MLR hosted by Chun-Liang Li and Karren Yang. I spent the summer of '24 at Meta Reality Labs working as a research scientist intern hosted by Ruohan Gao . Before this, I was a student researcher at Google Research with Avisek Lahiri and Vivek Kwatra in the Talking heads team on speech driven facial synthesis. Previously, I spent a wonderful summer with Adobe Research working with Joseph K J in the Multi-modal AI team as a research PhD intern on multi-modal audio generation. I am also fortunate to have had the chance to work with Prof. Kristen Grauman , Prof. Salman Khan , Prof. Mohamed Elhoseiny among other wonderful mentors and collaborators.
Before joining for PhD, I was working as a Machine Learning Scientist with the Camera and Video AI team at ShareChat, India. I was also a visiting researcher at the Computer Vision and Pattern Recognition Unit at Indian Statistical Institute Kolkata under Prof. Ujjwal Bhattacharya. Even before, I was a Senior Research Engineer with the Vision Intelligence Group at Samsung R&D Institute Bangalore. I primarily worked on developing novel AI-powered solutions for different smart devices of Samsung.
I received my MTech in Computer Science & Engineering from IIIT Hyderabad where I was fortunate to be advised by Prof. C V Jawahar. During my undergrad, I worked as a research intern under Prof. Pabitra Mitra at IIT Kharagpur and the CVPR Unit at ISI Kolkata.
Looking for Internship/Industrial RS/Post-Doc Positions:
I am actively looking for internships, full-time industrial RS and Post-Doc positions
in multi-modal learning, generative modeling, agentic AI and relevant areas. Kindly reach out if you think I would be a good fit.
Feel free to reach out if you're interested in research collaboration!
Oct 2021 - Paper on audio-visual summarization accepted in BMVC 2021.
Sep 2021 - Blog on Video Quality Enhancement released at Tech @ ShareChat.
July 2021 - Paper on reflection removal got accepted in ICCV 2021.
June 2021 - Joined ShareChat Data Science team.
May 2021 - Paper on audio-visual joint segmentation accepted in ICIP 2021.
Dec 2018 - Accepted Samsung Research offer. Will be joining in June'19.
Sep 2018 - Received Dean's Merit List Award for academic excellence at IIIT Hyderabad.
Oct 2017 - Our work on a multi-scale, low-latency face detection framework received Best Paper Award at NGCT-2017.
Selected publications
I am interested in solving Computer Vision, Computer Audition, and Machine Learning problems and applying them to broad AI
applications. My research focuses on applying multi-modal learning (Vision + X) for generative modeling and holistic cross-modal understanding
with minimal supervision. In the past, I have focused on computational photography, tackling
challenges such as image reflection removal, intrinsic image decomposition, inverse rendering and video quality assessment.
Representative papers are highlighted. For full list of publications, please refer to
my Google Scholar.
AMusE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding