Imagine you’re walking a tightrope tied to the peaks of two mountains. When you reach the middle, it’s a long way to safety – forwards or backwards – and a long way down if you fall.
Changing code’s a bit like walking a tightrope. Every step we take risks a fall, and the more changes we make, the more likely we are to experience catastrophe.

Now imagine the same rope, but now its tied to wooden posts just a few feet apart and a few feet tall. The risk of a fall with each step remains the same, but you’re never far from safety, and if you do fall, it’s no big deal. You can just climb back up and carry on from the last post you reached.
If “safety” in terms of software means code we’re confident works, and that we could therefore ship if we wanted to, than we want those safe points to be close together and “low to the ground” – easy to climb back on at the last safe point and try again if we fall.
In practice, this means getting into the habit of committing our changes whenever we see all the tests pass. Provided they’re good tests, of course. Another thing LLM coding assistants are notorious for is generating meaningless, “weak” tests – or even commenting them out when they fail. Gosh, I wonder where they learned that? Monkey see, monkey do. My advice? Test your tests!
This works hand-in-hand with working in short feedback loops, solving one problem at a time and testing continuously. The bigger the feedback loops, the more changes between safe points, the further apart the wooden posts get, and the bigger the drop if we fall.
And the bigger the sunk cost.
If I change one line of code and tests fail, it’s no big deal to figure out which change broke the code. I can usually see the problem and fix it quickly. If not, I can revert to the previous working commit and try again with very little time lost.
If I change 100 lines of code and tests fail… Well, now I have to figure out which of those 100 changes broke it, and if I can’t, that’s a lot of time lost with a reset. In this situation, we’ll naturally be unwilling to cut our losses.
LLMs can generate a lot of changes very quickly, and because they understand nothing, each change is significantly more likely to break the software.
And models can’t distinguish between working code and broken code. It’s all just context to a language model. Ideally, we don’t want the broken stuff figuring in its machinations, so it’s important to remove broken code as soon as it appears, so the model is building on solid ground whenever possible.
The easy way to do that is a hard reset back to the previous working commit. Otherwise, we can send the model into a “doom loop” where it keeps trying to fix the problem, but actually makes things worse with each attempt, contaminating the context for subsequent passes. This usually means resetting the context, too.
Some “AI” coding assistant users report success with a “three strikes and out” policy. If the tests fail, the model is given two more attempts to fix any problems, before a hard reset. But I’ve been finding that a “zero tolerance” approach works well for me. I revert the code, adapt the prompt – often looking for a smaller intermediate step – and ask the model to try again.
(And, yes, I do have a policy on how many attempts I’ll allow before I write the code myself. We’ll also be talking about when to grab the wheel in a future post.)
LLMs work better on a clean slate.