Log inSign up
Agrim Gupta
362 posts
user avatar
Agrim Gupta
@agrimgupta92
Simulating reality @ Meta
web.stanford.edu/~agrim/
Joined January 2017
324
Following
5,545
Followers
  • Pinned
    user avatar
    Agrim Gupta
    @agrimgupta92
    Aug 5, 2025
    Introducing Genie 3, our state-of-the-art world model that generates interactive worlds from text, enabling real-time interaction at 24 fps with minutes-long consistency at 720p. 🧵👇
    Image
    00:00
    459K
  • user avatar
    Agrim Gupta
    @agrimgupta92
    Dec 16, 2024
    "A pair of hands skillfully slicing a ripe tomato on a wooden cutting board" #veo
    Image
    00:00
    3.6M
  • user avatar
    Agrim Gupta
    @agrimgupta92
    Dec 11, 2023
    We introduce W.A.L.T, a diffusion model for photorealistic video generation. Our model is a transformer trained on image and video generation in a shared latent space. 🧵👇
    Image
    00:00
    431K
  • user avatar
    Agrim Gupta
    @agrimgupta92
    May 23, 2023
    How should we leverage internet videos for learning visual correspondence? In our latest work we introduce SiamMAE: Siamese Masked Autoencoders for self-supervised representation learning from videos. web: siam-mae-video.github.io paper: siam-mae-video.github.io/resources/pape… 👇🧵
    Image
    00:00
    171K
  • user avatar
    Agrim Gupta
    @agrimgupta92
    Aug 9, 2019
    We have released LVIS v0.5 dataset for long tail object detection with 1200+ categories and 700k+ high quality instance segmentation masks Paper: arxiv.org/abs/1908.03195 Website: lvisdataset.org/explore API: github.com/lvis-dataset/l… with Ross Girshick and Piotr Dollar @FacebookAI
    arXiv logo
    arxiv.org
    LVIS: A Dataset for Large Vocabulary Instance Segmentation
    Progress on object detection is enabled by datasets that focus the research community's attention on open challenges. This process led us from simple images to complex scenes and from bounding...
  • user avatar
    Agrim Gupta
    @agrimgupta92
    Aug 5, 2025
    Replying to @agrimgupta92
    3/ One emergent capability I find remarkable is long-term consistency, especially because we don’t use any explicit 3D representations or priors. Simply training the model to generate the next frame auto-regressively teaches it to maintain physical consistency across time
    Image
    00:00
    42K
  • user avatar
    Agrim Gupta
    @agrimgupta92
    Mar 23, 2022
    1/ Can we replicate the success of large scale pre-training --> task specific fine tuning for robotics? This is hard as robots have different act/obs space, morphology and learning speed! We introduce MetaMorph🧵👇 Paper: arxiv.org/abs/2203.11931 Code: github.com/agrimgupta92/m…
    Image
    00:00
  • user avatar
    Agrim Gupta
    @agrimgupta92
    May 14, 2024
    When will every pixel be generated? Every 2 years AI systems can generate 10x more pixels. At this rate of progress we will have AI generated TV episodes by 2029 and movies by 2031.
    Image
    149K
  • user avatar
    Agrim Gupta
    @agrimgupta92
    Feb 4, 2021
    Excited to share our work on understanding the relationship between environmental complexity, evolved morphology, and the learnability of intelligent control. Paper: arxiv.org/abs/2102.02202 Video: youtu.be/MMrIiNavkuY w/ @silviocinguetta @SuryaGanguli @drfeifei
    Image
    00:00
  • user avatar
    Agrim Gupta
    @agrimgupta92
    Aug 5, 2025
    Replying to @agrimgupta92
    4/ Finally, I think future iterations of models like Genie 3 will have a significant impact on accelerating robotics and real-world AI. Here's a glimpse of what that could look like: an agent pursuing a goal (go to tomatoes) in an environment generated by our model.
    Image
    00:00
    22K
  • user avatar
    Agrim Gupta
    @agrimgupta92
    Jun 27, 2022
    1/ Can we build video prediction models by masked visual pretraining via Transformer? We present MaskViT: a simple & parameter efficient method to generate high res. videos in real time. Paper: arxiv.org/abs/2206.11894 Web: maskedvit.github.io🧵👇
    Image
    GIF
  • user avatar
    Agrim Gupta
    @agrimgupta92
    Dec 16, 2024
    Today we are introducing Veo2: SOTA video generation model. You can try the model here: labs.google/fx/tools/video…
    user avatar
    Sundar Pichai
    Google
    @sundarpichai
    Dec 16, 2024
    Introducing Veo 2, our new, state-of-the-art video model (with better understanding of real-world physics & movement, up to 4K resolution). You can join the waitlist on VideoFX. Our new and improved Imagen 3 model also achieves SOTA results, and is coming today to 100+ countries
    Image
    00:00
    14K
  • user avatar
    Agrim Gupta
    @agrimgupta92
    Oct 6, 2021
    1/ Excited to share that our work on Deep Evolutionary Reinforcement Learning (DERL): a framework for large scale evolution of embodied agents in physically realistic environments is now published in @NatureComms Paper nature.com/articles/s4146… Video youtube.com/watch?v=zltE0w…
    Content cover image
    Embodied intelligence via learning and evolution
    From nature.com
  • user avatar
    Agrim Gupta
    @agrimgupta92
    Dec 11, 2023
    Replying to @agrimgupta92
    6/ Finally, our model can be used to generate videos with consistent 3D camera motion.
    Image
    00:00
    12K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement