Drew Breunig (@dbreunig) / X

Drew Breunig

15.9K posts

Drew Breunig

@dbreunig

Writing about and working on AI, DSPy, geo, and data.

Bay Area

Joined March 2008

Drew Breunig
@dbreunig
Oct 21, 2025
Google summarizes @NousResearch as a t-shirt company with a side business in AI.
98K
Drew Breunig
@dbreunig
Jun 23, 2025
As your context bloats, you hit different failure modes. These failures hit agents hardest because they operate in exactly the scenarios where contexts balloon: gathering information, making sequential tool calls, engaging in multi-turn reasoning, & accumulating histories.
131K
Drew Breunig
@dbreunig
Oct 25, 2025
For people wondering today, yes, Kimi K2 RL’ed writing quality.
dbreunig.com
How Kimi K2 RL’ed Qualitative Data to Write Better
Our last post on Kimi K2 dives into how the Moonshot team used reinforcement learning (RL) on qualitative tasks. If you haven’t already, check out the last two explorations:
72K
Drew Breunig
@dbreunig
Oct 6, 2025
OpenAI's prompt optimizer in AgentKit is GEPA. It's gonna be everywhere...
Flurin Laim
@flurin17
Oct 6, 2025
Replying to @flurin17
One of the developer actually confirmed. They use GEPA.
90K
Drew Breunig
@dbreunig
Sep 22, 2025
If the DSPy docs are confusing (I get it, no judgement!), check out my talk from Databricks' Summit. I keep getting notes from people who attended telling me it helped them finally "get" DSPy and they haven't stopped using it since.
43K
Drew Breunig
@dbreunig
Nov 9, 2025
Seems like every other week people find my writing about Kimi K2. Hats off the the @Kimi_Moonshot team for writing such a great technical paper and sharing so many details worth writing about. Here's the paper: github.com/MoonshotAI/Kim… Here's my writing on... How they rephrased
Kimi-K2/tech_report.pdf at main · MoonshotAI/Kimi-K2
From github.com
32K
Drew Breunig
@dbreunig
Aug 23, 2025
I got around to kicking the tires on GEPA prompt optimization in DSPy, seeing if it could match the reported gsm8k benchmark for Qwen3-4b-thinking. Started with the simplest signature: qa_bot = dspy.Predict('question -> answer') GEPA got it from 67.2% to 92.8%.
25K
Drew Breunig
@dbreunig
Aug 20, 2025
With tools like DSPy and techniques like GEPA, we're at the 1st level of prompt optimization: easily create effective prompts, for a given model, for a given task. I'm excited 2nd level of prompt optimization: constant eval collection via usage, w/ regular prompt optimization.
19K
Drew Breunig
@dbreunig
Aug 21, 2025
Here's your DeepSeek 3.1 headline: the same scores with 25-50% fewer tokens.
20K
Drew Breunig
@dbreunig
Oct 18, 2024
Simplifying the AI noise by segmenting everything into 3 big use cases: Gods, Interns, and Cogs.
dbreunig.com
The 3 AI Use Cases: Gods, Interns, and Cogs
Simplifying and navigating the AI noise by segmenting everything into 3 big use cases.
117K
Drew Breunig
@dbreunig
Jun 15, 2025
Here's the write up of my Data+AI Summit talk on the perils of prompts in code and how to mitigate them with DSPy. As prompts grow in complexity, they begin to resemble programming. Don't program your prompts. Program your program.
dbreunig.com
Let the Model Write the Prompt
Notes from a talk I delivered at the 2025 Data + AI Summit, detailing the problem with prompts in your code and how DSPy can make everything better.
49K
Drew Breunig
@dbreunig
Jan 9, 2025
Before you pick a model or write a prompt, build your eval. If you’re building with LLMs, your eval is your most valuable asset. It lets you test new models, iterate faster on prompts or pipelines, and ensures your product is always moving forward.
dbreunig.com
Your Eval is More Important Than the Model
A well-built custom eval lets you quickly test the newest models, iterate faster when developing prompts and pipelines, and ensure you’re always moving forward against your product’s specific goal....
12K
Drew Breunig
@dbreunig
Jun 23, 2025
Replying to @dbreunig
How Long Contexts Fail (a post on how to mitigate these issues is due this week...)
dbreunig.com
How Long Contexts Fail
Taking care of your context is the key to building successful agents. Just because there’s a 1 million token context window doesn’t mean you should fill it.
45K
Drew Breunig
@dbreunig
Dec 12, 2024
Really enjoy DSPy’s workflow for LLM work. Handing off the specifics of prompt generation and engineering back to the LLM makes a lot of sense:
dbreunig.com
Pipelines & Prompt Optimization with DSPy
Writing about AI, geo, culture, media, data, and the ways they interact.
18K