I graduated with a Master of Science in Robotics (MSR) from
the Robotics Institute at Carnegie Mellon University where I worked with Prof. Abhinav Gupta and collaborated with Prof. Pedro Morgado at UW-Madison. Before my Master's, I worked as a Research Assistant at CMU with
Prof. David Held at the R-Pad Lab, in collaboration with Pittsburgh-based autonomous driving company,
Argo AI.
During my Masters at CMU, I had worked on self-supervised representation learning methods for multimodal audio-visual videos and as a RA at CMU, I worked on self-supervised algorithms for real-world 3D LiDAR point clouds.
I am always looking to collaborate with motivated students on research projects during the semester. If you are interested, please feel free to reach out to me via email.
Looking for Spring/Summer 2026 internship opportunities!
UniPhy is a unified latent-conditioned neural model which learns a common latent space to encode the properties of diverse materials. At inference, given motion observations for a system with unknown material parameters, UniPhy allows material inference via differentiable simulation latent optimization.
Multi-Modal Representation Learning / Large Video-Language Models
We leverage a large video-language model for anticipating action sequences that are plausible in the real-world. We develop the understanding of plausibility of an action sequence in a large video-language model by introducing two objective functions, a counterfactual-based plausible action sequence learning loss and a long-horizon action repetition loss.
We propose a self-supervised algorithm to learn representations from untrimmed, egocentric videos containing audible interactions.
Our method uses the audio signals in two unique ways: (1) to identify moments in time that are conducive to better self-supervised learning
and (2) to learn representations that focus on the visual state changes caused by audible interactions.
A self-supervised method to complete the incomplete, partial point clouds for real-world settings like LiDAR where ground truth complete point cloud
annotations are unavailable. We achieve this via inpainting where a region of the point cloud is removed and the network is trained to complete this removed region.
A method of training scene flow that uses two self-supervised losses, based on nearest neighbors and cycle consistency.
These self-supervised losses allow us to train our method on large unlabeled autonomous driving datasets.
Predicted action and spatial relationships in images between objects detected by YOLO, then combining VGG-Net based visual features and
Word2Vec based semantic features.
A method to capture the anomalous behavior in a social network based on degree, betweenness, and closeness of graph nodes using
Graph Neural Networks (GNN) in Keras.
Presents trajectory analysis of spatio-temporal graph nodes using DeepWalk algorithm in NetworkX (Python) for classification and detecting
changing points of interest using SVMs.
A method for multi-modal depression detection using audio, video, and textual modalities using LSTMs. This work leverages emotions to detect an early indication of
depression.