Log inSign up
Wenlong Huang
726 posts
Image
user avatar
Wenlong Huang
@wenlong_huang
PhD Student @StanfordSVL. Previously @MIT_CSAIL @Berkeley_AI @GoogleDeepMind @NVIDIARobotics. Robotics, Foundation Models, Spatial Intelligence.
Stanford, CA
wenlonghuang.com
Joined May 2019
1,389
Following
5,673
Followers
  • Pinned
    user avatar
    Wenlong Huang
    @wenlong_huang
    Jan 8
    What if we can simulate an *interactive 3D world*, from a single image, in the wild, in real time? Introducing PointWorld-1B: a large pre-trained 3D world model that predicts env dynamics given RGB-D capture and robot actions. 🌐 point-world.github.io from @Stanford @nvidia
    Image
    00:00
    273K
  • user avatar
    Wenlong Huang
    @wenlong_huang
    Jul 7, 2023
    How to harness foundation models for *generalization in the wild* in robot manipulation? Introducing VoxPoser: use LLM+VLM to label affordances and constraints directly in 3D perceptual space for zero-shot robot manipulation in the real world! 🌐 voxposer.github.io 🧵👇
    Image
    00:00
    294K
  • user avatar
    Wenlong Huang
    @wenlong_huang
    Aug 29, 2024
    What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇
    Image
    00:00
    191K
  • user avatar
    Wenlong Huang
    @wenlong_huang
    Mar 2, 2023
    Large language models gathered tons of world knowledge by speaking human language. But can they ever speak “robot language”? Introducing “Grounded Decoding”: a scalable way to decode *grounded text* from LLM for robots. Website: grounded-decoding.github.io 🧵👇
    Image
    00:00
    88K
  • user avatar
    Wenlong Huang
    @wenlong_huang
    May 16, 2025
    How to scale visual affordance learning that is fine-grained, task-conditioned, works in-the-wild, in dynamic envs? Introducing Unsupervised Affordance Distillation (UAD): distills affordances from off-the-shelf foundation models, *all without manual labels*. Very excited this
    Image
    00:00
    94K
  • user avatar
    Wenlong Huang
    @wenlong_huang
    Apr 8, 2022
    Thrilled to announce that I will join @Stanford for my PhD! Extremely grateful to @pathak2206 @IMordatch @pabbeel for years of amazing mentorship and Zhuowen Tu for introducing me to AI research. Looking forward to tackling interesting problems in robotics and AI @StanfordAILab!
  • user avatar
    Wenlong Huang
    @wenlong_huang
    Nov 11, 2024
    Excited to share that ReKep won Best Paper Award at CoRL LEAP workshop! Extracting plannable task representations from foundation models unlocks great potential for generalization in manipulation. Huge shout-out to my collaborators and advisor @chenwang_j @YunzhuLiYZ
    user avatar
    Wenlong Huang
    @wenlong_huang
    Aug 29, 2024
    What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇
    Image
    00:00
    22K
  • user avatar
    Wenlong Huang
    @wenlong_huang
    May 1, 2025
    Excited to announce the “Structured World Models for Robotic Manipulation” workshop at #RSS2025 in LA! Website: swomo-rss.github.io Call for Papers (Deadline: May 23): swomo-rss.github.io/index.html#call Come join us with a stellar lineup of speakers to discuss the various important &
    Image
    16K
  • user avatar
    Wenlong Huang
    @wenlong_huang
    Feb 22, 2025
    Excited to co-organize the tutorial on Foundation Models Meet Embodied Agents at AAAI 2025 in Philadelphia, with @ManlingLi_ @YunzhuLiYZ @maojiayuan! …models-meet-embodied-agents.github.io 📅 Date: February 25, 2025 ⏰ Time: 8:30 AM – 12:30 PM 📍 Location: Room 118A We will present a
    Image
    29K
  • user avatar
    Wenlong Huang
    @wenlong_huang
    May 22, 2024
    Very well-written thread about LLM in robotics! My 2 cents is: robotics requires a full-stack approach - whether it's symbolic or LLM or hybrid planners, one has to think about the abstractions they operate in, especially pertaining to closely-tied perception-action loops. 1/N
    user avatar
    Chris Paxton
    @chris_j_paxton
    May 21, 2024
    One of the most interesting questions to me right now is: can LLMs plan, why/why not, and to what extent do we care about this, especially as it pertains to robotics?
    Image
    00:00
    15K
  • user avatar
    Wenlong Huang
    @wenlong_huang
    Nov 5, 2024
    Will be presenting this in Poster Session 3 (Nov 7 afternoon) at @corl_conf! The work builds upon our years-long exploration of leveraging foundation models as task representations in manipulation, starting from language decomposition (LMs as Zero-Shot Planners, Inner Monologue,
    user avatar
    Wenlong Huang
    @wenlong_huang
    Aug 29, 2024
    What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇
    Image
    00:00
    7K
  • user avatar
    Wenlong Huang
    @wenlong_huang
    Feb 13, 2025
    Excited to release our work IKER, led by the amazing @shivanshpatel35 and @XinchenYinYXC ! iker-robot.github.io Following up to ReKep, we found that 3D keypoint relations remain as a powerful task representation that simultaneously 1) grounds VLM’s knowledge in perceptual
    user avatar
    Shivansh Patel
    @shivanshpatel35
    Feb 13, 2025
    How can VLMs specify visual rewards for diverse manipulation tasks and evolve them iteratively? Introducing Iterative Keypoint Reward (IKER)—a visually grounded reward that leverages VLMs for flexible, human-like task execution through a real-to-sim-to-real pipeline. 🧵🔽
    Image
    00:00
    4.9K
  • user avatar
    Wenlong Huang
    @wenlong_huang
    Sep 25, 2024
    Spatial intelligence requires grounding world knowledge in spatial domain beyond text - VLM generating keypoints is promising to achieve this! This enables robots to perform some cool tasks as in ReKep: x.com/wenlong_huang/…. Looking forward to more capabilities of future VLMs!
    user avatar
    Ai2
    @allen_ai
    Sep 25, 2024
    Meet Molmo: a family of open, state-of-the-art multimodal AI models. Our best model outperforms proprietary systems, using 1000x less data. Molmo doesn't just understand multimodal data—it acts on it, enabling rich interactions in both the physical and virtual worlds. Try it
    Image
    00:00
    6.7K
  • user avatar
    Wenlong Huang
    @wenlong_huang
    May 18, 2025
    On my way to #ICRA2025 ✈️ Excited to meet new friends and catch up with old ones! I will be giving two talks: Invited talk @ "Foundation Models and Neuro-Symbolic AI for Robotics" 📍Room 305 ⏰May 19, 4:40 - 5:00pm 🔗sairlab.org/icra25/ Oral for UAD (Best Paper Finalist)
    user avatar
    Shivansh Patel
    @shivanshpatel35
    Feb 13, 2025
    How can VLMs specify visual rewards for diverse manipulation tasks and evolve them iteratively? Introducing Iterative Keypoint Reward (IKER)—a visually grounded reward that leverages VLMs for flexible, human-like task execution through a real-to-sim-to-real pipeline. 🧵🔽
    Image
    00:00
    sairlab.org
    ICRA’25 Workshop on Foundation Models and Neuro-Symbolic AI for Robotics
    A series of interactive talks on foundation models and neuro-symbolic AI for robotics.
    3.9K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement