Saketh Rambhatla

	Moviegen: A cast of media foundation models GenAI, Meta Project Page \| Tech Report A cast of foundation models that can generate HD videos and synchronized audio. Enables additional capabilities like precise instruction-based video editing and generation of personalized videos.
	Trajectory-aligned Space-time tokens for Few-Shot Action Recognition Pulkit Kumar, Namitha Padmanabhan, Luke Luo, Sai Saketh Rambhatla, Abhinav Shrivastava ECCV, 2024 Project Page \| ArXiv \| Code Leveraging point trajectories and self-supervised representations for few-shot action recogntion.
	SelfEval: Leveraging the discriminative nature of generative models for evaluation. Sai Saketh Rambhatla, Ishan Misra Under Submission arXiv Repurpose generative models as discriminative models to evaluate generative performance.
	Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning. Rohit Girdhar^†, Mannat Singh^†, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Mian Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra ECCV, 2024 Project Page \| arXiv State-of-the-art Diffusion-based Text-to-Video generation method.
	InstanceDiffusion: Instance-level Control for Image Generation. Xudong Wang , Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra CVPR, 2024 Project Page \| arXiv \| Code A novel and effective method to enable precise instance-level control for text-to-image generation.
	UVIS: Unsupervised Video Instance Segmentation. Shuaiyi Huang, Saksham Suri, Kamal Gupta, Sai Saketh Rambhatla, Ser-nam Lim, Abhinav Shrivastava CVPR Workshops, 2024 Paper Unsupervised Video Instance Segmentation without any video annotations or densely labeled pre-training.
	MOST: Multiple Object localization with Self-supervised Transformers for object discovery. Sai Saketh Rambhatla, Ishan Misra, Rama Chellappa, Abhinav Shrivastava ICCV, 2023 (Oral Presentation) Project Page \| arXiv \| Code \| Poster Localize multiple objects in a real world images in an unsupervised fashion without any training using Self-supervised Transformers.
	SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining. Saksham Suri, Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava ICCV, 2023 Project Page \| arXiv \| Code \| Poster Improve performance of detectors trained using missing annotations by posing the problem as region-based semi-supervised learning.
	The Pursuit of Knowledge: Discovering and Localizing new concepts using Dual Memory Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava ICCV, 2021 Project Page \| arXiv \| Code \| Poster Equip machines with capabilities to automatically discover and learn models for new categories from a large unlabeled dataset.
	Towards Discovery and Attribution of Open-world GAN Generated Images. Saksham Suri, Sharath Girish, Sai Saketh Rambhatla, Abhinav Shrivastava ICCV, 2021 Project Page \| arXiv Automatically discover and attribute open-world GAN images.
	Self-Denoising neural networks for few shot learning Steven Schwarcz, Sai Saketh Rambhatla, Rama Chellappa arXiv Novel architecture based on denoising auto-encoders to improve few shot learning.
	An Empirical analysis of Boosting Deep Networks Sai Saketh Rambhatla, Michael Jones, Rama Chellappa IJCNN, 2022 Paper Empirical evidence that a single large neural network is usually more accurate than a boosted ensemble of neural networks with the same number of total parameters
	Towards Accurate Visual and Natural language-based vehicle retrieval systems. Pirazh Khorramshahi, Sai Saketh Rambhatla, Rama Chellappa NVIDIA AI City Challenge, CVPR Workshops, 2021 Paper Proposed a real-time system for image-based vehicle re-identification and natural language-based vehicle retrieval.
	Towards Real-Time Systems for Vehicle Re-Identification, Multi-Camera Tracking, and Anomaly Detection. Neehar Peri, Pirazh Khorramshahi, Sai Saketh Rambhatla, Vineet Shenoy, Saumya Rawat Jun-Cheng Chen, Rama Chellappa NVIDIA AI City Challenge, CVPR Workshops*, 2020 Paper Proposed a Real-Time system for Vehicle Re-identification, Multi-Camera Tracking, and Anomaly Detection in a network of traffic cameras.
	Detecting Human-Object Interactions via Functional Generalization. Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa AAAI, 2020 project page / arXiv Humans interact with functionally similar objects in a similar manner.
	Spatial Priming for Detecting Human-Object Interactions Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa arxiv project page / arXiv A method for exploiting the spatial layout information of a human and an object for detecting HOIs in images.
	A Dual-Path Model With Adaptive Attention for Vehicle Re-Identification Pirazh Khorramshahi, Amit Kumar Neehar Peri, Sai Saketh Rambhatla, Jun-Cheng Chen, Rama Chellappa ICCV, 2019 (Oral Presentation)* arXiv Proposed a novel dual-path adaptive attention model for Vehicle re-identification.
	Body Part Alignment and Temporal Attention Pooling for Video-Based Person ReIdentification Sai Saketh Rambhatla, Michael Jones BMVC, 2019 Paper Training deep networks with the ability to align features achieves state of the art performance on Person Re-identification.

	Moviegen: A cast of media foundation models GenAI, Meta Project Page \| Tech Report A cast of foundation models that can generate HD videos and synchronized audio. Enables additional capabilities like precise instruction-based video editing and generation of personalized videos.
	Trajectory-aligned Space-time tokens for Few-Shot Action Recognition Pulkit Kumar, Namitha Padmanabhan, Luke Luo, Sai Saketh Rambhatla, Abhinav Shrivastava ECCV, 2024 Project Page \| ArXiv \| Code Leveraging point trajectories and self-supervised representations for few-shot action recogntion.
	SelfEval: Leveraging the discriminative nature of generative models for evaluation. Sai Saketh Rambhatla, Ishan Misra Under Submission arXiv Repurpose generative models as discriminative models to evaluate generative performance.
	Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning. Rohit Girdhar^†, Mannat Singh^†, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Mian Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra ECCV, 2024 Project Page \| arXiv State-of-the-art Diffusion-based Text-to-Video generation method.
	InstanceDiffusion: Instance-level Control for Image Generation. Xudong Wang , Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra CVPR, 2024 Project Page \| arXiv \| Code A novel and effective method to enable precise instance-level control for text-to-image generation.
	UVIS: Unsupervised Video Instance Segmentation. Shuaiyi Huang, Saksham Suri, Kamal Gupta, Sai Saketh Rambhatla, Ser-nam Lim, Abhinav Shrivastava CVPR Workshops, 2024 Paper Unsupervised Video Instance Segmentation without any video annotations or densely labeled pre-training.
	MOST: Multiple Object localization with Self-supervised Transformers for object discovery. Sai Saketh Rambhatla, Ishan Misra, Rama Chellappa, Abhinav Shrivastava ICCV, 2023 (Oral Presentation) Project Page \| arXiv \| Code \| Poster Localize multiple objects in a real world images in an unsupervised fashion without any training using Self-supervised Transformers.
	SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining. Saksham Suri, Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava ICCV, 2023 Project Page \| arXiv \| Code \| Poster Improve performance of detectors trained using missing annotations by posing the problem as region-based semi-supervised learning.
	The Pursuit of Knowledge: Discovering and Localizing new concepts using Dual Memory Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava ICCV, 2021 Project Page \| arXiv \| Code \| Poster Equip machines with capabilities to automatically discover and learn models for new categories from a large unlabeled dataset.
	Towards Discovery and Attribution of Open-world GAN Generated Images. Saksham Suri, Sharath Girish, Sai Saketh Rambhatla, Abhinav Shrivastava ICCV, 2021 Project Page \| arXiv Automatically discover and attribute open-world GAN images.
	Self-Denoising neural networks for few shot learning Steven Schwarcz, Sai Saketh Rambhatla, Rama Chellappa arXiv Novel architecture based on denoising auto-encoders to improve few shot learning.
	An Empirical analysis of Boosting Deep Networks Sai Saketh Rambhatla, Michael Jones, Rama Chellappa IJCNN, 2022 Paper Empirical evidence that a single large neural network is usually more accurate than a boosted ensemble of neural networks with the same number of total parameters
	Towards Accurate Visual and Natural language-based vehicle retrieval systems. Pirazh Khorramshahi, Sai Saketh Rambhatla, Rama Chellappa NVIDIA AI City Challenge, CVPR Workshops, 2021 Paper Proposed a real-time system for image-based vehicle re-identification and natural language-based vehicle retrieval.
	Towards Real-Time Systems for Vehicle Re-Identification, Multi-Camera Tracking, and Anomaly Detection. Neehar Peri, Pirazh Khorramshahi, Sai Saketh Rambhatla, Vineet Shenoy, Saumya Rawat Jun-Cheng Chen, Rama Chellappa NVIDIA AI City Challenge, CVPR Workshops*, 2020 Paper Proposed a Real-Time system for Vehicle Re-identification, Multi-Camera Tracking, and Anomaly Detection in a network of traffic cameras.
	Detecting Human-Object Interactions via Functional Generalization. Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa AAAI, 2020 project page / arXiv Humans interact with functionally similar objects in a similar manner.
	Spatial Priming for Detecting Human-Object Interactions Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa arxiv project page / arXiv A method for exploiting the spatial layout information of a human and an object for detecting HOIs in images.
	A Dual-Path Model With Adaptive Attention for Vehicle Re-Identification Pirazh Khorramshahi, Amit Kumar Neehar Peri, Sai Saketh Rambhatla, Jun-Cheng Chen, Rama Chellappa ICCV, 2019 (Oral Presentation)* arXiv Proposed a novel dual-path adaptive attention model for Vehicle re-identification.
	Body Part Alignment and Temporal Attention Pooling for Video-Based Person ReIdentification Sai Saketh Rambhatla, Michael Jones BMVC, 2019 Paper Training deep networks with the ability to align features achieves state of the art performance on Person Re-identification.