At a recent dinner I met a very senior engineer at one of the Big Four tech cos.
His team develops tooling for a 0-engineer future. They're not allowed to tell anyone internally what they're working on to avoid mass panic. He figures mega layoffs start in 18 months.
Kyle Corbitt
2,940 posts
Seattle
Joined September 2012
- Spoke to a Microsoft engineer on the GPT-6 training cluster project. He kvetched about the pain they're having provisioning infiniband-class links between GPUs in different regions. Me: "why not just colocate the cluster in one region?" Him: "Oh yeah we tried that first. We
- Just launched agent.exe, a free, open-source Mac/Windows/Linux app that lets you use Claude 3.5 Sonnet to control your computer! This was a fun little project to explore the API and see what the model can do. Computer use is really cool—I expect 2025 will be the year of agents.
- Guys fine-tuned Llama 3.1 8B is completely cracked. Just ran it through our fine-tuning test suite and blows GPT-4o mini out of the water on every task. There has never been an open model this small, this good.
- Announcing MCP•RL: teach your model how to use any MCP server automatically using reinforcement learning! Just connect any MCP server, and your model will start playing with it and (using RL) "learn from experience" how to use its tools most effectively!
- Wow, may be the most significant paper of 2025! A team at Tsinghua has figured out how to get an AI to generate its own training data, and surpassed the performance of models trained on expert human-curated data. We may not hit another data wall between here and ASI.❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/
- Recently overheard a Groq employee: apparently their per-token costs are 1-2 orders of magnitude higher than what they charge, and the new chip won't materially help. There's no credible plan to fix this. This is why they aren't raising rate limits. Very bearish.
- A few weeks ago, OpenAI announced Reinforcement Fine-Tuning (RFT)—a new way to adapt LLMs to complex tasks with very little training data. Here’s a quick rundown of how it works, why it’s a big deal, and when you should use it. 🧵
- 🚨 We’ve just published a recipe to train a frontier-level deep research agent using RL. With just 30 hours on an H200, any developer can now beat Sonnet-4 on DeepResearch Bench using open-source tools. (Thread 🧵)
- Big news: we've figured out how to make a *universal* reward function that lets you apply RL to any agent with: - no labeled data - no hand-crafted reward functions - no human feedback! A 🧵 on RULER
- Crazy fact that everyone deploying LLMs should know—GPT-4 is "smarter" at temperature=1 than temperature=0, even on deterministic tasks. I honestly didn't believe this myself until I tried it, but shows up clearly on our evals. ht to @eugeneyan for the tip!
- 🚀 Meet ART·E—our open-source RL-trained email research agent that searches your inbox and answers questions more accurately, faster, and cheaper than o3. Let's go deeper on how we built it. 🧵
- "RL from a single example works" "RL with random rewards works" "Base model pass@256 can match RL model pass@1" "RL updates a small % of params" Recent papers all point in the same direction: RL is mostly just eliciting latent behavior already learned in pretraining, not
- Replying to @Jessassinmy general understanding of the business model is "whoever builds agi first wins the whole game." you can agree with them or not, but openai really does believe they're playing for all the marbles here.










