Saketh Rambhatla

I am a Research Scientist at Meta AI. Previously I was a postdoctoral researcher working with Dr. Ishan Misra. I obtained my Ph.D. in ECE at University of Maryland (UMD), College Park, where I worked on in-the-wild visual understanding with Dr. Rama Chellappa and Dr. Abhinav Shrivastava.

I completed my bachelor's and master's at Indian Institute of Technology, Kharagpur.

Email  /  CV  /  Google Scholar  

profile photo
Research

At Meta, I spend most of my time working on Generative AI. My research has resulted in state-of-the-art image and video generation models and received wide media coverage. During my Ph.D. I worked on object tracking, person re-identification, object detection and discovery and multi-modal inconsistency detection tasks.

Publications

Image
Image
Moviegen: A cast of media foundation models
GenAI, Meta

A cast of foundation models that can generate HD videos and synchronized audio. Enables additional capabilities like precise instruction-based video editing and generation of personalized videos.

Image
Image
Trajectory-aligned Space-time tokens for Few-Shot Action Recognition
ECCV, 2024

Leveraging point trajectories and self-supervised representations for few-shot action recogntion.

Image
Image
SelfEval: Leveraging the discriminative nature of generative models for evaluation.
Sai Saketh Rambhatla, Ishan Misra
Under Submission

Repurpose generative models as discriminative models to evaluate generative performance.

Image
Image
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning.
ECCV, 2024

State-of-the-art Diffusion-based Text-to-Video generation method.

Image
Image
InstanceDiffusion: Instance-level Control for Image Generation.
CVPR, 2024

A novel and effective method to enable precise instance-level control for text-to-image generation.

Image
Image
UVIS: Unsupervised Video Instance Segmentation.
CVPR Workshops, 2024

Unsupervised Video Instance Segmentation without any video annotations or densely labeled pre-training.

Image
Image
MOST: Multiple Object localization with Self-supervised Transformers for object discovery.
ICCV, 2023 (Oral Presentation)

Localize multiple objects in a real world images in an unsupervised fashion without any training using Self-supervised Transformers.

Image
Image
SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining.
ICCV, 2023

Improve performance of detectors trained using missing annotations by posing the problem as region-based semi-supervised learning.

Image
Image
The Pursuit of Knowledge: Discovering and Localizing new concepts using Dual Memory
Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava
ICCV, 2021

Equip machines with capabilities to automatically discover and learn models for new categories from a large unlabeled dataset.

Image
Image
Towards Discovery and Attribution of Open-world GAN Generated Images.
Saksham Suri*, Sharath Girish*, Sai Saketh Rambhatla, Abhinav Shrivastava
ICCV, 2021

Automatically discover and attribute open-world GAN images.

Image
Image
Self-Denoising neural networks for few shot learning
Steven Schwarcz, Sai Saketh Rambhatla, Rama Chellappa

Novel architecture based on denoising auto-encoders to improve few shot learning.

Image
Image
An Empirical analysis of Boosting Deep Networks
Sai Saketh Rambhatla, Michael Jones, Rama Chellappa
IJCNN, 2022

Empirical evidence that a single large neural network is usually more accurate than a boosted ensemble of neural networks with the same number of total parameters

Image
Image
Towards Accurate Visual and Natural language-based vehicle retrieval systems.
Pirazh Khorramshahi*, Sai Saketh Rambhatla*, Rama Chellappa
NVIDIA AI City Challenge, CVPR Workshops, 2021

Proposed a real-time system for image-based vehicle re-identification and natural language-based vehicle retrieval.

Image
Image
Towards Real-Time Systems for Vehicle Re-Identification, Multi-Camera Tracking, and Anomaly Detection.
NVIDIA AI City Challenge, CVPR Workshops, 2020

Proposed a Real-Time system for Vehicle Re-identification, Multi-Camera Tracking, and Anomaly Detection in a network of traffic cameras.

Image
Image
Detecting Human-Object Interactions via Functional Generalization.
AAAI, 2020

Humans interact with functionally similar objects in a similar manner.

Image
Image
Spatial Priming for Detecting Human-Object Interactions
arxiv

A method for exploiting the spatial layout information of a human and an object for detecting HOIs in images.

Image
Image
A Dual-Path Model With Adaptive Attention for Vehicle Re-Identification
ICCV, 2019   (Oral Presentation)

Proposed a novel dual-path adaptive attention model for Vehicle re-identification.

Image
Image
Body Part Alignment and Temporal Attention Pooling for Video-Based Person ReIdentification
Sai Saketh Rambhatla, Michael Jones
BMVC, 2019

Training deep networks with the ability to align features achieves state of the art performance on Person Re-identification.



Service
Image

Reviewer, International Journal of Computer Vision

Reviewer, Pattern Recognition Letters

Reviewer, IEEE Access

Reviewer, ECCV 2020, 2022, 2024

Reviewer, ICCV 2021, 2023

Reviewer, Neurips, 2023

Reviewer, ICLR, 2023, 2024

Reviewer, ICML 2024

Reviewer, CVPR 2021, 2022, 2023, 2024

Reviewer, AAAI 2021, 2022

cs188

Graduate Teaching Assistant, ENEE222 Fall 2016

Graduate Teaching Assistant, ENEE324 Spring 2017

Graduate Teaching Assistant, ENEE425 Fall 2017

Graduate Teaching Assistant, ENEE630 Fall 2017


website template credits