Agrim Gupta (@agrimgupta92) / X

Agrim Gupta

362 posts

Agrim Gupta

@agrimgupta92

Simulating reality @ Meta

web.stanford.edu/~agrim/

Joined January 2017

Pinned
Agrim Gupta
@agrimgupta92
Aug 5, 2025
Introducing Genie 3, our state-of-the-art world model that generates interactive worlds from text, enabling real-time interaction at 24 fps with minutes-long consistency at 720p. 🧵👇
00:00
459K
Agrim Gupta
@agrimgupta92
Dec 16, 2024
"A pair of hands skillfully slicing a ripe tomato on a wooden cutting board" #veo
00:00
3.6M
Agrim Gupta
@agrimgupta92
Dec 11, 2023
We introduce W.A.L.T, a diffusion model for photorealistic video generation. Our model is a transformer trained on image and video generation in a shared latent space. 🧵👇
00:00
431K
Agrim Gupta
@agrimgupta92
May 23, 2023
How should we leverage internet videos for learning visual correspondence? In our latest work we introduce SiamMAE: Siamese Masked Autoencoders for self-supervised representation learning from videos. web: siam-mae-video.github.io paper: siam-mae-video.github.io/resources/pape… 👇🧵
00:00
171K
Agrim Gupta
@agrimgupta92
Aug 9, 2019
We have released LVIS v0.5 dataset for long tail object detection with 1200+ categories and 700k+ high quality instance segmentation masks Paper: arxiv.org/abs/1908.03195 Website: lvisdataset.org/explore API: github.com/lvis-dataset/l… with Ross Girshick and Piotr Dollar @FacebookAI
arxiv.org
LVIS: A Dataset for Large Vocabulary Instance Segmentation
Progress on object detection is enabled by datasets that focus the research community's attention on open challenges. This process led us from simple images to complex scenes and from bounding...
Agrim Gupta
@agrimgupta92
Aug 5, 2025
Replying to @agrimgupta92
3/ One emergent capability I find remarkable is long-term consistency, especially because we don’t use any explicit 3D representations or priors. Simply training the model to generate the next frame auto-regressively teaches it to maintain physical consistency across time
00:00
42K
Agrim Gupta
@agrimgupta92
Mar 23, 2022
1/ Can we replicate the success of large scale pre-training --> task specific fine tuning for robotics? This is hard as robots have different act/obs space, morphology and learning speed! We introduce MetaMorph🧵👇 Paper: arxiv.org/abs/2203.11931 Code: github.com/agrimgupta92/m…
00:00
Agrim Gupta
@agrimgupta92
May 14, 2024
When will every pixel be generated? Every 2 years AI systems can generate 10x more pixels. At this rate of progress we will have AI generated TV episodes by 2029 and movies by 2031.
149K
Agrim Gupta
@agrimgupta92
Feb 4, 2021
Excited to share our work on understanding the relationship between environmental complexity, evolved morphology, and the learnability of intelligent control. Paper: arxiv.org/abs/2102.02202 Video: youtu.be/MMrIiNavkuY w/ @silviocinguetta @SuryaGanguli @drfeifei
00:00
Agrim Gupta
@agrimgupta92
Aug 5, 2025
Replying to @agrimgupta92
4/ Finally, I think future iterations of models like Genie 3 will have a significant impact on accelerating robotics and real-world AI. Here's a glimpse of what that could look like: an agent pursuing a goal (go to tomatoes) in an environment generated by our model.
00:00
22K
Agrim Gupta
@agrimgupta92
Jun 27, 2022
1/ Can we build video prediction models by masked visual pretraining via Transformer? We present MaskViT: a simple & parameter efficient method to generate high res. videos in real time. Paper: arxiv.org/abs/2206.11894 Web: maskedvit.github.io🧵👇
GIF
Agrim Gupta
@agrimgupta92
Dec 16, 2024
Today we are introducing Veo2: SOTA video generation model. You can try the model here: labs.google/fx/tools/video…
Sundar Pichai
@sundarpichai
Dec 16, 2024
Introducing Veo 2, our new, state-of-the-art video model (with better understanding of real-world physics & movement, up to 4K resolution). You can join the waitlist on VideoFX. Our new and improved Imagen 3 model also achieves SOTA results, and is coming today to 100+ countries
00:00
14K
Agrim Gupta
@agrimgupta92
Oct 6, 2021
1/ Excited to share that our work on Deep Evolutionary Reinforcement Learning (DERL): a framework for large scale evolution of embodied agents in physically realistic environments is now published in @NatureComms Paper nature.com/articles/s4146… Video youtube.com/watch?v=zltE0w…
Embodied intelligence via learning and evolution
From nature.com
Agrim Gupta
@agrimgupta92
Dec 11, 2023
Replying to @agrimgupta92
6/ Finally, our model can be used to generate videos with consistent 3D camera motion.
00:00
12K