We’ve been exploring how interactive 3D worlds can serve as scalable training environments for embodied agents.
As a first step, we trained a VLM agent to play Elden Ring in real time.
Read about it here:
Antim Labs
33 posts
World creation layer for robotics sims
- Antim Labs repostedSimulation is a core part of how we build and evaluate robots. I spoke about simulation, robotics, and the role of interactive worlds at AI Engineer Singapore this past weekend! Full talk : youtube.com/watch?v=_xQnSN… Huge shoutout to @SherryYanJiang @unprofeshme @agrimsingh @swyx
- Antim Labs repostedAnnouncing HUD's RL environments for RSI hackathon! 🎉 Join us June 20–21 in SF if you're interested in RL and want to push the frontier forward! (w/$100,000+ in prizes and compute credits 👀)
- Antim Labs repostedWe're launching early access to Gizmo, our automated sim creation tool. From text and/or image inputs, our agent generates SimReady assets and scenes from dimensioned primitives, with correct affordances and articulation.
00:00 - Replying to @AntimLabsWe'll keep evaluating frontier models as they are released, and expanding the leaderboards to smaller VLMs with better latency. VGBench is live at antimlabs.com/vgbench.
- Replying to @AntimLabsThe gap is clear: models show flashes of vision-driven competence then break on navigation loops, ambiguous controls, pathfinding, or basic drag-and-drop.
- Replying to @AntimLabsEven strong models struggle across many games, especially when tasks require long-horizon planning, spatial reasoning, and goal persistence. Scores use a checkpointing system, and the best performers barely cross 5% in the Full setting.
- Replying to @AntimLabsVGBench has two settings: Full = real-time play; Lite = the game pauses while the model thinks. Lite was added because inference latency is still a real bottleneck for agents in live environments.
- Replying to @AntimLabsThe setup: agents play from raw visual input plus a high-level system prompt describing the objective and controls. No game-specific scaffolding or auxiliary information is provided. This is a video of Gemini 3.1 Pro Preview playing The Legend of Zelda: Link's Awakening (sped
00:00 - Replying to @AntimLabsVGBench evaluates VLM-based agents on a suite of curated video games, scoring them on game progression through visual understanding alone. We've updated the leaderboard with the latest frontier models: @OpenAI GPT-5.4, @AnthropicAI Claude Opus 4.6, and @GoogleDeepMind Gemini 3.1
- Replying to @AntimLabsVideo games are created to be intuitive for humans to learn and master by leveraging innate inductive biases, making them an ideal testbed for evaluating those same capabilities in VLMs.
- We are excited to launch VideoGameBench on Antim Labs, created by @a1zhang, Thomas L. Griffiths (@cocosci_lab), @karthik_r_n, and @OfirPress at Princeton.










