Drew Breunig
15.9K posts
Writing about and working on AI, DSPy, geo, and data.
- As your context bloats, you hit different failure modes. These failures hit agents hardest because they operate in exactly the scenarios where contexts balloon: gathering information, making sequential tool calls, engaging in multi-turn reasoning, & accumulating histories.
- For people wondering today, yes, Kimi K2 RL’ed writing quality.
- OpenAI's prompt optimizer in AgentKit is GEPA. It's gonna be everywhere...Replying to @flurin17One of the developer actually confirmed. They use GEPA.
- If the DSPy docs are confusing (I get it, no judgement!), check out my talk from Databricks' Summit. I keep getting notes from people who attended telling me it helped them finally "get" DSPy and they haven't stopped using it since.
- Seems like every other week people find my writing about Kimi K2. Hats off the the @Kimi_Moonshot team for writing such a great technical paper and sharing so many details worth writing about. Here's the paper: github.com/MoonshotAI/Kim… Here's my writing on... How they rephrased
- I got around to kicking the tires on GEPA prompt optimization in DSPy, seeing if it could match the reported gsm8k benchmark for Qwen3-4b-thinking. Started with the simplest signature: qa_bot = dspy.Predict('question -> answer') GEPA got it from 67.2% to 92.8%.
- With tools like DSPy and techniques like GEPA, we're at the 1st level of prompt optimization: easily create effective prompts, for a given model, for a given task. I'm excited 2nd level of prompt optimization: constant eval collection via usage, w/ regular prompt optimization.
- Here's your DeepSeek 3.1 headline: the same scores with 25-50% fewer tokens.
- Simplifying the AI noise by segmenting everything into 3 big use cases: Gods, Interns, and Cogs.
- Here's the write up of my Data+AI Summit talk on the perils of prompts in code and how to mitigate them with DSPy. As prompts grow in complexity, they begin to resemble programming. Don't program your prompts. Program your program.
- Before you pick a model or write a prompt, build your eval. If you’re building with LLMs, your eval is your most valuable asset. It lets you test new models, iterate faster on prompts or pipelines, and ensures your product is always moving forward.
- Replying to @dbreunigHow Long Contexts Fail (a post on how to mitigate these issues is due this week...)
- Really enjoy DSPy’s workflow for LLM work. Handing off the specifics of prompt generation and engineering back to the LLM makes a lot of sense:





