Wenlong Huang (@wenlong

Wenlong Huang

726 posts

Wenlong Huang

@wenlong_huang

PhD Student @StanfordSVL. Previously @MIT_CSAIL @Berkeley_AI @GoogleDeepMind @NVIDIARobotics. Robotics, Foundation Models, Spatial Intelligence.

Stanford, CA

Joined May 2019

Pinned
Wenlong Huang
@wenlong_huang
Jan 8
What if we can simulate an *interactive 3D world*, from a single image, in the wild, in real time? Introducing PointWorld-1B: a large pre-trained 3D world model that predicts env dynamics given RGB-D capture and robot actions. 🌐 point-world.github.io from @Stanford @nvidia
00:00
273K
Wenlong Huang
@wenlong_huang
Jul 7, 2023
How to harness foundation models for *generalization in the wild* in robot manipulation? Introducing VoxPoser: use LLM+VLM to label affordances and constraints directly in 3D perceptual space for zero-shot robot manipulation in the real world! 🌐 voxposer.github.io 🧵👇
00:00
294K
Wenlong Huang
@wenlong_huang
Aug 29, 2024
What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇
00:00
191K
Wenlong Huang
@wenlong_huang
Mar 2, 2023
Large language models gathered tons of world knowledge by speaking human language. But can they ever speak “robot language”? Introducing “Grounded Decoding”: a scalable way to decode *grounded text* from LLM for robots. Website: grounded-decoding.github.io 🧵👇
00:00
88K
Wenlong Huang
@wenlong_huang
May 16, 2025
How to scale visual affordance learning that is fine-grained, task-conditioned, works in-the-wild, in dynamic envs? Introducing Unsupervised Affordance Distillation (UAD): distills affordances from off-the-shelf foundation models, *all without manual labels*. Very excited this
00:00
94K
Wenlong Huang
@wenlong_huang
Apr 8, 2022
Thrilled to announce that I will join @Stanford for my PhD! Extremely grateful to @pathak2206 @IMordatch @pabbeel for years of amazing mentorship and Zhuowen Tu for introducing me to AI research. Looking forward to tackling interesting problems in robotics and AI @StanfordAILab!
Wenlong Huang
@wenlong_huang
Nov 11, 2024
Excited to share that ReKep won Best Paper Award at CoRL LEAP workshop! Extracting plannable task representations from foundation models unlocks great potential for generalization in manipulation. Huge shout-out to my collaborators and advisor @chenwang_j @YunzhuLiYZ
Wenlong Huang
@wenlong_huang
Aug 29, 2024
What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇
00:00
22K
Wenlong Huang
@wenlong_huang
May 1, 2025
Excited to announce the “Structured World Models for Robotic Manipulation” workshop at #RSS2025 in LA! Website: swomo-rss.github.io Call for Papers (Deadline: May 23): swomo-rss.github.io/index.html#call Come join us with a stellar lineup of speakers to discuss the various important &
16K
Wenlong Huang
@wenlong_huang
Feb 22, 2025
Excited to co-organize the tutorial on Foundation Models Meet Embodied Agents at AAAI 2025 in Philadelphia, with @ManlingLi_ @YunzhuLiYZ @maojiayuan! …models-meet-embodied-agents.github.io 📅 Date: February 25, 2025 ⏰ Time: 8:30 AM – 12:30 PM 📍 Location: Room 118A We will present a
29K
Wenlong Huang
@wenlong_huang
May 22, 2024
Very well-written thread about LLM in robotics! My 2 cents is: robotics requires a full-stack approach - whether it's symbolic or LLM or hybrid planners, one has to think about the abstractions they operate in, especially pertaining to closely-tied perception-action loops. 1/N
Chris Paxton
@chris_j_paxton
May 21, 2024
One of the most interesting questions to me right now is: can LLMs plan, why/why not, and to what extent do we care about this, especially as it pertains to robotics?
00:00
15K
Wenlong Huang
@wenlong_huang
Nov 5, 2024
Will be presenting this in Poster Session 3 (Nov 7 afternoon) at @corl_conf! The work builds upon our years-long exploration of leveraging foundation models as task representations in manipulation, starting from language decomposition (LMs as Zero-Shot Planners, Inner Monologue,
Wenlong Huang
@wenlong_huang
Aug 29, 2024
What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇
00:00
7K
Wenlong Huang
@wenlong_huang
Feb 13, 2025
Excited to release our work IKER, led by the amazing @shivanshpatel35 and @XinchenYinYXC ! iker-robot.github.io Following up to ReKep, we found that 3D keypoint relations remain as a powerful task representation that simultaneously 1) grounds VLM’s knowledge in perceptual
Shivansh Patel
@shivanshpatel35
Feb 13, 2025
How can VLMs specify visual rewards for diverse manipulation tasks and evolve them iteratively? Introducing Iterative Keypoint Reward (IKER)—a visually grounded reward that leverages VLMs for flexible, human-like task execution through a real-to-sim-to-real pipeline. 🧵🔽
00:00
4.9K
Wenlong Huang
@wenlong_huang
Sep 25, 2024
Spatial intelligence requires grounding world knowledge in spatial domain beyond text - VLM generating keypoints is promising to achieve this! This enables robots to perform some cool tasks as in ReKep: x.com/wenlong_huang/…. Looking forward to more capabilities of future VLMs!
Ai2
@allen_ai
Sep 25, 2024
Meet Molmo: a family of open, state-of-the-art multimodal AI models. Our best model outperforms proprietary systems, using 1000x less data. Molmo doesn't just understand multimodal data—it acts on it, enabling rich interactions in both the physical and virtual worlds. Try it
00:00
6.7K
Wenlong Huang
@wenlong_huang
May 18, 2025
On my way to #ICRA2025 ✈️ Excited to meet new friends and catch up with old ones! I will be giving two talks: Invited talk @ "Foundation Models and Neuro-Symbolic AI for Robotics" 📍Room 305 ⏰May 19, 4:40 - 5:00pm 🔗sairlab.org/icra25/ Oral for UAD (Best Paper Finalist)
Shivansh Patel
@shivanshpatel35
Feb 13, 2025
How can VLMs specify visual rewards for diverse manipulation tasks and evolve them iteratively? Introducing Iterative Keypoint Reward (IKER)—a visually grounded reward that leverages VLMs for flexible, human-like task execution through a real-to-sim-to-real pipeline. 🧵🔽
00:00
sairlab.org
ICRA’25 Workshop on Foundation Models and Neuro-Symbolic AI for Robotics
A series of interactive talks on foundation models and neuro-symbolic AI for robotics.
3.9K