I am a Senior Machine Learning Researcher at Qualcomm AI Research working in Roland Memisevic's group on the intersection of large vision language models and robotics. Before that, I completed my MSc. (Thesis) at Simon Fraser University (SFU), where I was advised
by Prof. Manolis Savva .
In 2019, I completed my undergrad Computer Science from FAST-NUCES Karachi, Pakistan, where I worked on Class Imbalance
under the guidance of Prof. Tahir Syed.
We present Qualcomm Interactive Cooking, a dataset and a benchmark, to enable development of live step-by-step task guidance by situated AI assistants.
We introduce Oat-VLA, a VLA with Object-Agent-centric tokenization which drastically reduces the number of vision tokens enabling ~2x faster training than OpenVLA while outperforming it on real-world tasks.
We present a benchmark for compositional learning in robotics. The benchmark consists of 33 manipulation tasks over 3 levels of compositionality. We also open-source a large dataset of ground-truth trajectories generated using oracle solvers.
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge to push the frontier of first-person video understanding of skilled human activity.
We present the Habitat Synthetic Scene Dataset, a dataset of 211 high-quality 3D scenes, and use it to investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find and navigate to objects.
We curated a large dataset of dashcam videos for road anomalies understanding. We proposed an object-object interaction
reasoning approach for detecting anomalies without additional supervision.