B'Log

Rethinking RL Infra for Agents

Tue, 26 May 2026 00:00:00 +0000

Why the agentic shift breaks classical RL infra, a tour of Forge, ROLL, SkyRL and Slime, and my recent take with Polar (Agentic RL on Any Harness at Scale).

Demystifying Agent Sandbox

Sun, 28 Dec 2025 00:00:00 +0000

Modern AI agents are typically scaffolded with a runtime sandbox, and these Computer-Use Agents (CUA) autonomously run code, use the terminal, take notes, and access the Internet and MCPs – exactly like humans do when interacting with the digital world.

Yet the underlying reasoning and practices remain unclear to most, so let’s dive into popular agent scaffolds like Claude Code and MiniMax Agent, demystifying the design principles and discovering how agents benefit from using a computer.

Why I start to write

Sat, 13 Dec 2025 00:00:00 +0000

This is a fairly procrastinated start to my personal blog. Starting a blog isn’t as easy as it seems—I don’t want to waste people’s time with casual anecdotes. Meanwhile, an overly formal academic write-up would likely be overkill and scare people away.

There are many people who truly enjoy machine learning and find joy in sharing knowledge. I’ve been a long-time follower of AI/tech blogs from Andrej Karpathy, Lilian Weng, Yao Fu, and others. I usually prefer blogs over papers because blogs feel more honest and less AI-polished (or written to attract citations). Yet almost everyone I followed stopped posting in early 2025. I understand the shifts and hype in SF lately that keep everyone busy building and/or financially free. Still, I’d be sad if this vibe disappears—it’s been truly helpful to me over the past few years, along with many others.

Mon, 01 Jan 0001 00:00:00 +0000

Binfeng Xu

I’m a research engineer at NVIDIA. Currently, I work on Agent RL and harness codesign for computer-use and continual learning.

Formerly, I was a researcher at Samsung Research (SRA) where I led LLM post-training + distillation infra. I enjoy training large neural nets, building open-source projects and competing on Kaggle, where I rank top 1% globally.

Papers

Polar: Agentic RL on Any Harness at Scale
Binfeng Xu, Hao Zhang, Shaokun Zhang, Songyang Han, Mingjie Liu, Jian Hu, Shizhe Diao, Zhenghui Jin, Yunheng Zou, Michael Demoret, Jan Kautz, Yi Dong