Pinned
Agents….Continual Harness, PokeAgent, LLM Economist | Research Intern @PrimeIntellect | CS PhD @Princeton | Former CMU Waymo
- They say the first 100 citations are the hardest. Happy to achieve this small milestone 🎉
- Can a Large Language Model (LLM) with zero Pokémon-specific training achieve expert-level performance in competitive Pokémon battles? Introducing PokéChamp, our minimax LLM agent that reaches top 30%-10% human-level Elo on Pokémon Showdown! New paper on arXiv and code on github!
- Excited to announce that I will be spending the summer at @Waymo on the simulation realism team! I’ll be working on learning to generate simulated worlds. 🚙🚙🚙 Send me a message if youre in the bay and want to chat!
- 🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid
- I am happy to share that I will be joining the PhD in Computer Science program at @Princeton with @chijinML as a Francis Robbins Upton and NSF GRFP Fellow. I am very grateful to my advisors, mentors, and peers at @SCSatCMU and @RutgersU over the past years for their support.
- Little know fact about Carnegie Mellon University’s robotics institute… we keep it stocked with popcorn so the place always smells like a movie theater🍿🎥
- Wow, the exponential rollout at @Waymo is super exciting. Most deployed real-world autonomous agent and multi-agent system!
- Excited to share that the PokeAgent challenge was accepted as a @NeurIPSConf competition! This should serve as an excellent standardized benchmark for competitive games AND ‘speedrunning’ the RPG. I hope to see both the RL and LLM agent communities working together here to eval
- I'm excited to release the PokéChamp dataset! 🎮 2 million cleaned battle logs from Pokémon Showdown to help train expert-level AI agents for competitive Pokémon. Check out the data on @huggingface and code on GitHub! 🔗 huggingface.co/datasets/milkk… 🔗 github.com/sethkarten/pok…Can a Large Language Model (LLM) with zero Pokémon-specific training achieve expert-level performance in competitive Pokémon battles? Introducing PokéChamp, our minimax LLM agent that reaches top 30%-10% human-level Elo on Pokémon Showdown! New paper on arXiv and code on github!
- Fantastic RL result from the Puffer group showing the sheer complexity of Pokemon Red as a benchmark. LLM agents need to take notesWe beat Pokemon Red with online RL! Details here over the next several days. Led by @dsrubinstein. Follow him, me, @DanAdvantage, @kywch500, @computerender for more!
- Heading to #ICML2025 next week! If you’re into all things API (Artificial Pokémon Intelligence) from our PokéChamp spotlight to the upcoming NeurIPS PokeAgent Challenge, LLM-agent scaffolding & reasoning, or mechanism-design nudging, let’s connect. DMs open!
- 🎓 University students & AI researchers — push your Pokémon AI agents further! The NeurIPS 2025 PokéAgent Challenge is offering compute credits, courtesy of our sponsor Google DeepMind, to help you train bigger models & run more experiments. 📌 To apply: 1️⃣ Make a submission to












