I'm actively seeking job opportunities. Please get in touch if you believe I can be a good fit for your team.
Research Interests
I work in the intersection of computer vision, natural language processing and cognition.
This primarily consists of developing algorithms based on how humans understand concepts and
perceive the world. Humans have ability to be creative; understand and apply concepts from
different domains. For machines to be generalizable, I study Compositional Zero-Shot Learning
(CZSL), such that with limited data, computers learn to compose different concepts which
are unseen. I'm interested in Recognition (Video Understanding) and Generation of fine-grained unseen compositions, for editing
and generaing new image and video (Gen AI).
TL;DR
Utilize pre-trained Text-to-Image Diffusion models to achieve video in-painting,
(inserting a new object in a video),
eliminates the need for training a video generation model.
Paper coming soon.
TL;DR
Open-Vocabulary Compositional Zero-shot Learning (OV-CZSL) enables learning of entirely new attributes, objects, and their compositions.
Model can expand beyond their seen vocabulary.
Project
/
Paper
TL;DR
WayEx: a novel method for learning complex goal-conditioned robotics tasks from a
single demonstration, using unique reward function and knowledge expansion.
Project
/
Paper
TL;DR
Benchmark suite for fruits, vegetables and various cutting styles from multiple views.
Compositional Image Generation supports generating
unseen cutting styles of different objects.
Project
/
Paper
/
Media Blogs: (UMD News,
TechXplore, MARKTECHPost
)
TL;DR
Compositional Zero-shot Learning (CZSL) for attributes and objects is solved by
disentangling concepts in the visual feature space,
and using those for hallucinating novel complex concepts.
Project
/
Paper
/
Code
TL;DR
Using initial and final states of objects (two frames), learns to classify the actions which is being
performed in the video that is causing the object state change.
Project
/
Paper
TL;DR
Dynamically updates adjacency matrices for Graphs used in Graph Convolutional Networks (GCN), can be
used for Semi-supervised learning and
zero/few-shot action recognition tasks.
Project
/
Paper
TL;DR
Analyze different types of Knowldge Graphs (KGs): action embeddings,
action-object embeddings, and visual embeddings, for Zero/Few-shot Action Recognition task.
Paper
TL;DR
Analyze the tendency of Visual Question Answering (VQA) models to rely on statistical biases
in the dataset to answer questions, rather than paying attention to the visual content.
Paper
TL;DR
Design an auction-based spectrum allocation framework designed for transaction between primary users adding
secondary users, while maximizing the spatial reuse capacity.
Paper