code craft – Codemanship's Blog

Public Code Craft Training – July 7-9

For the small percentage of engineering orgs who’d genuinely like to be shipping more reliable software and be more responsive to the needs of their business and their users – it’s a niche, I know – I’m running a public 3-day online Code Craft workshop on July 7-9.

If you’re a developer, twist your manager’s arm – especially if they’re expecting you to be more productive using tools like Claude Code and Copilot.

If you’re an engineering leader, this is the real AI-assisted software engineering training your teams need – and, funnily enough, it’s mostly about software engineering and only a little bit about AI. It’s about making teams AI-ready.

It’s 6x half-day modules that give developers a practical, hands-on introduction to the foundational technical practices that enable teams to accelerate release cycles, shrink lead times and improve release reliability – with and without AI.

Specification By Example
Test-Driven Development
Refactoring
Design Principles
Continuous Delivery
Code Craft & AI – grounded on hard data, includes how to apply CRESS principles for context engineering to AI-assisted workflows

To learn more and register, visit https://codemanship.co.uk/codecraft.html

Places are limited.

May Workshops for Self-Funding Learners – Update

Hiya. Just a quick note about the Essential Code Craft training workshops aimed at self-funding learners that are happening this month.

Specification By Example

Tuesday May 12 (evening) has 3 places available. I’m guessing it’ll be sold out by the end of this week.

Saturday May 16 (morning) is half-full at time of writing.

-> Register

Test-Driven Development

Tuesday May 19 (evening) is sold out, but you can add yourself to the waitlist in case anybody drops out and to be among the first to hear about future workshops.

Saturday May 23 (morning) still has plenty of places available.

-> Register

And if you want to keep an eye out for future workshops in June and beyond, bookmark our dedicated web page for self-funding learners on Ticket Tailor.

Upcoming skills we’ll be covering include Modular Design and Refactoring. Y’know? Boring skills that are nevertheless essential.

Essential Code Craft – Workshops In May

May will soon be upon us, and I’ve scheduled three out-of-hours workshops for self-funding learners in my Essential Code Craft series.

If your employer won’t invest in you, invest in yourself and join us.

Tues May 12th 18:45 BST & Sat May 16th 09:45 BST- Specification By Example

Over more than 70 years of developing software products and systems, we’ve learned that misunderstandings about the meaning of requirements is one of the biggest sources of avoidable rework.

Reducing ambiguity in specifications can dramatically reduce the risk of misinterpretation, whether it’s among human stakeholders or when we’re working with AI coding tools.

Tues May 19th 18:45 BST – Test-Driven Development

For nearly 30 years, Test-Driven Development has been the technical core of successful agile software development.

Teams have shortened delivery lead times dramatically, while actually improving the reliability of their releases, and lowering the cost of changing software using TDD.

And TDD is proving to be not just compatible with AI-assisted software development, but essential.

Essential Code Craft – The Roadmap

Some of you may have noticed that I’ve been running out-of-hours training workshops for self-funding learners recently, under the banner of Essential Code Craft.

In a way, this is a return to the early days of Codemanship when I ran regular weekend workshops – priced for individual pockets – that were mostly attended by developers investing in their own skills and career development.

Many of those people are now CTOs and heads of engineering, and I’ve been fortunate – and grateful – that quite a few have brought me in to provide the same kind of training for their teams.

But with senior engineering leaders now very distracted by the code-generating firehose – and while I wait for them to realise that nothing’s actually changed as far as software engineering fundamentals are concerned – I’m pivoting back to self-funders.

So far – just as it was way back when – the first two workshops filled up quickly. While the boss might not be thinking about investing in their developers at the moment, it seems a lot of developers are looking to invest in themselves.

And this is exactly the moment to do it. While a gazillion developers hunt for magic incantations to make a probabilistic next-token predictor act like something other than a probabilistic next-token predictor, the people who’ve done their homework already know: better results with AI coding tools have very little to do with the tools, and almost everything to do with the processes around them.

And it’s a double-win. The practices that produce the best outcomes with AI are the exact same practices that produce the best outcomes without AI.

The key to being effective with AI is being effective without it.

And here’s the hedge, but only for the informed gamblers – developer hiring is rising again, but the demographic of these new hires is changing. Employers are favouring senior developers with significant pre-LLM experience.

I, and a few others, predicted this would happen. Demand would be highest for people who can do the things AI coding tools can’t – like, well, understand code. I mean really understand it. Not “LGTM” understanding. Deep comprehension of programs.

Not only that, but for all kinds of good reasons – economic, environmental, energy, ethical, geopolitical – the future of hyperscale LLMs is by no means predictable. Folks grappling with reduced token limits and rapidly degrading performance with Anthropic’s newest models will hopefully have figured out by now that building workflows that depend heavily in hyperscale LLMs is building on quicksand.

Who are Acme Megacorp gonna’ hire – the dev who sits on their hands because they’re waiting for their token limit to reset, or the dev who can just carry on at roughly the same overall pace of delivery?

And we should be under no illusions that teams who’ve mastered the fundamentals of software delivery are routinely outperforming teams who haven’t – with or without AI. AI is clearly not the differentiator.

So, whether you’re going to apply these disciplines with Claude Code or Codex, or with IntelliJ or VS Code, they still matter – arguably more than ever.

And what are these disciplines? What is Essential Code Craft?

Specification By Example – build shared understanding and pin down requirements with testable specifications
Test-Driven Development – rapidly iterate working software designs with short delivery lead times and reliable releases
Continuous Integration – keep teams more in sync with their changes, merging and testing them many times a day to ensure a working, shippable-at-any-time product
Continuous Collaboration – keep teams on the same page by continuously communicating with practices like pair programming and teaming
Refactoring – reshape code to make change easier, while keeping it working and shippable at all times
Modular Design – optimise software architecture to localise the “blast radius” and minimise the cost of changes, while making rapid testing and smarter reuse easier
Continuous Inspection – minimise the bottleneck and the “LGTM” effect of downstream code review by making it a continuous and highly automated process
Continuous Delivery – combine these fundamentals in a delivery process that can get the proverbial peas from the farmer’s field to the kitchen table through rapid, reliable integration, build and deployment pipelines
Continuous Improvement – build development capability in an evidence-based way, learning what really works and what doesn’t as you build skills, automate tools and workflows, and explore and experiment with your approach – and that’s where I come in!)

Workshops on Specification By Example and Test-Driven Development are already live and taking registrations. If there’s demand, more will follow.

The roadmap is to build a set of repeating individual workshops, rotating monthly, that will eventually cover all of these disciplines – some explicitly, some implicitly like Continuous Integration and pair programming, which will be an integral part of most workshops.

Self-funders can pick and choose which to attend, and my hope is that they’ll be a bit like Pokemon cards – gotta collect ’em all!

Keep an eye on the Codemanship Ticket Tailor box office for details of upcoming workshops.

Also, details of new workshop times will be posted here first, so subscribe to this blog if you’d like to be kept in the loop for future workshops.

Specification By Example Was Essential Before AI. It’s Twice As Essential Now.

_Psst. _{If your boss won’t invest in training you in Specification By Example, I’m running out-of-hours workshops on May 12 and 16 specifically for self-funding learners. £99 + UK VAT.}

The research I’ve done over the last 3 years into AI-assisted programming, including my own closed-loop experiments, found that one major factor in the likelihood that an LLM will correctly interpret a specification is whether or not examples are included to clarify requirements.

Completion rates – as measured by acceptance tests passed – improve dramatically, even in a single pass.

In multiple passes, with feedback from acceptance testing, models given examples converge on impressive completion of ~80%, while without examples they tend to just go around in circles, with completion barely improving.

This should come as no surprise, because we saw a similar effect with dev teams before AI coding tools appeared on the scene. Teams who clarify requirements using examples are much more likely to interpret what the customer (or the product manager, or the business analyst) means correctly.

And, as requirements misunderstandings are typically one of the biggest sources of avoidable rework, they save a lot of time and money correcting mistakes that could have been spotted before a line of code was written.

The LLM equivalent means the same outcomes – features delivered as the prompter intended – in fewer passes, using fewer tokens (and burning down fewer proverbial forests).

Done right, specifications with examples can be translated pretty directly into executable tests that can drive the design and development of working software using techniques like Test-Driven Development.

A specification for totaling items in an order that uses examples is test-ready. Essentially, it is a test.

    def test_one_item(self):
        product = Product(id=327, price=159.95, stock=7, hold=1)
        order = Order([Item(product=product, quantity=1)])

        total = order.total()
        
        self.assertEqual(total, 159.5)

When I’ve provided specifications with examples in TDD training workshops, and measured successful interpretation of requirements by students, I’ve found the same trend that my experiments found with LLMs – it roughly doubles, and often hits 100% completion.

When I don’t include examples… Well, I’ve lost count of the number of times students thought that telling the Mars Rover to turn right moved it x+1, or that Roman Numerals should be converted into integers. As a trainer, it saves me and my students a lot of time – especially if I get to them later.

But humans can do things LLMs can’t, like understand, reason and learn. So levels of misinterpretation tend to be lower, because we can apply an understanding of the world and judge whether a requirement makes sense. Misinterpretation by AI coding assistants is a higher risk, and therefore the need to clarify is significantly heightened.

As is the need to use language consistently. While some folks claim – presumably because they haven’t tested it in any meaningful way – that LLMs don’t need code to be human-readable, the evidence is clear that they really, really do.

I’ve seen many times myself how completion rates dropped significantly when code wasn’t clearly and consistently signposted, using language that had a close conceptual correlation to our specifications. If I call it “sales tax” in one interaction, and “VAT” in another, the model struggles to anchor on a name for that variable, often interpreting them as distinct variables in the code.

Specifying with examples gives us an opportunity to establish a shared vocabulary for describing our problem domain, which aids communication between stakeholders, but also between humans and LLMs.

When developers are stuck for a name for a class or a function that makes the intent of that code clear, I encourage them to write that intent in plain English and take inspiration from that. Specifying with examples can help establish a shared language before a line of code’s been written.

Essential Code Craft – Test-Driven Development – April 7 & 11

Something I hear often: “I’d love to go on one of your courses, but my boss won’t pay for it”.

Codemanship’s new Essential Code Craft training workshops are aimed at software developers who are self-funding their professional growth. If your employer won’t invest in you, perhaps you can invest in you. (Businesses and other VAT-registered entities should visit codemanship.co.uk for details of corporate training for teams.)

For nearly 30 years, Test-Driven Development has been the technical core of successful agile software development.

Teams have shortened delivery lead times dramatically, while actually improving the reliability of their releases, and lowering the cost of changing software using TDD.

And TDD is proving to be not just compatible with AI-assisted software development, but essential.

In this introductory workshop, you will learn how to solve problems working in TDD micro-cycles, rigorously specifying desired software behaviour using tests, writing the simplest solution code to pass those tests, and refactoring safely to enable a simple, clean design to emerge.

The emphasis will be on learning by doing, with succinct practical instruction and guidance from a 25-year+ TDD practitioner and teacher.

You will work in pairs in your chosen programming language, swapping through Continuous Integration into a shared GitHub* repository after each TDD cycle, reinforcing the relationship between TDD, refactoring, version control and CI.

When you register, you’ll be asked to list up to 3 programming languages you’re comfortable working in (e.g., Java, Ruby, Go), and I’ll use that to pair folks as best I can up for the exercise. (Tip: put at least one popular one on the list – we may struggle to find you a pairing partner for Prolog)

I’ll demonstrate in either Java, Python, JS or C#, depending on which of those is listed most often by registrants.

* Requires an active personal GitHub account

This workshop includes a 15-minute break

Find out more and register here

Engineering Leaders: Your AI Adoption Doesn’t Start With AI

In the past few months, I’ve been hearing from more and more teams that the use of AI coding tools is being strongly encouraged in their organisations.

I’ve also been hearing that this mandate often comes with high expectations about the productivity gains leaders expect this technology to bring. But this narrative is rapidly giving way to frustration when these gains fail to materialise.

The best data we have shows that a minority of development teams are reporting modest gains – in the order of 5%-15% – in outcomes like delivery lead times and throughput. The rest appear to be experiencing negative impacts, with lead times growing and the stability of releases getting worse.

The 2025 DevOps Research & Assessment State of AI-assisted Software Development report makes it clear that the teams reporting gains were already high-performing or elite by DORA’s classification, releasing frequently, with short lead times and with far fewer fires in production to put out.

As the report puts it, this is not about tools or technology – and certainly not about AI. It’s about the engineering capability of the team and the surrounding organisation.

It’s about the system.

Teams who design, test, review, refactor, merge and release in bigger batches are overwhelmed by what DORA describes as “downstream chaos” when AI code generation makes those batches even bigger. Queues and delays get longer, and more problems leak into releases.

Teams who design, test, review, refactor, merge and release continuously in small batches tend to get a boost from AI.

In this respect, the team’s ranking within those DORA performance classifications is a reasonably good predictor of the impact on outcomes when AI coding assistants are introduced.

The DORA website helpfully has a “quick check” diagnostic questionnaire that can give you a sense of where your team sits in their performance bands.

(Answer as accurately as you can. Perception and aspiration aren’t capability.)

The overall result is usefully colour-coded. Red is bad, blue is good. Average is Meh. Yep, Meh is a colour.

If your team’s overall performance is in the purple or red, AI code generation’s likely to make things worse.

If your team’s performance is comfortably in the blue, they may well get a little boost. (You can abandon any hopes of 2x, 5x or 10x productivity gains. At the level of team outcomes, that’s pure fiction.)

The upshot of all this is that before you even think about attaching a code-generating firehose to your development process, you need to make sure the team’s already performing at a blue level.

If they’re not, then they’ll need to shrink their batch sizes – take smaller steps, basically – and accelerate their design, test, review, refactor and merge feedback loops.

Before you adopt AI, you need to be AI-ready.

Many teams go in the opposite direction, tackling whole features in a single step – specifying everything, letting the AI generate all the code, testing it after-the-fact, reviewing the code in larger change-sets (“LGTM”), doing large-scale refactorings using AI, and integrating the whole shebang in one big bucketful of changes.

Heavy AI users like Microsoft and Amazon Web Services have kindly been giving us a large-scale demonstration of where that leads – more bugs, more outages, and significant reputational damage.

A smaller percentage of teams are learning that what worked well before AI works even better with it. Micro-iterative practices like Test-Driven Development, Continuous Integration, Continuous Inspection, and real refactoring (one small change at a time) are not just compatible with AI-assisted development, they’re essential for avoiding the “downstream chaos” DORA finds in the purple-to-red teams.

And while many focus on the automation aspects of Continuous Delivery – and a lot of automation is required to accelerate the feedback loops – by far the biggest barrier to pushing teams into the blue is skills.

Yes. SKILLS.

Skills that most developers, regardless of their level of experience, don’t have. The vast majority of developers have never even seen practices like TDD, refactoring and CI being performed for real.

That’s certainly because real practitioners are pretty rare, so they’re unlikely to bump into one. But much of this is because of their famously steep learning curves. TDD, for example, takes months of regular practice to to be able to use it on real production systems.

And, as someone who’s been practicing TDD and teaching it for more than 25 years, I know it requires ongoing mindful practice to maintain the habits that make it work. Use it or lose it!

An experienced guide can be incredibly valuable in that journey. It’s unrealistic to expect developers new to these practices to figure it all out for themselves.

Maybe you’re lucky to have some of the 1% of software developers – yes, it really is that few – who can actually do this stuff for real. Or even one of the 0.1% who has had a lot of experience helping developers learn them. (Just because they can do it, it doesn’t necessarily follow that they can teach it.)

This is why companies like mine exist. With high-quality training and mentoring from someone who not only has many thousands of hours of practice, but also thousands of hours of experience teaching these skills, the journey can be rapidly accelerated.

I made all the mistakes so that you don’t have to.

And now for the good news: when you build this development capability, the speed-ups in release cycles and lead times, while reliability actually improves, happen whether you’re using AI or not.

71% of Developers and Engineering Leaders Believe “AI” Makes Engineering Discipline More Important

I ran the same poll on LinkedIn and Mastodon, asking:

In your estimation, does AI-assisted and agentic coding make engineering discipline:

More important than before?

As important as before?

Less important than before?

Don’t know

464 people responded, and the final votes tell an interesting story.

More than two-thirds of developers and engineering leaders believe that engineering discipline is more important when we’re using “AI”.

And a whopping 94% believe it’s at least as important as before.

This directly contradicts the narrative that “software engineering is dead” in this age of “AI” coding assistants and agents. And the evidence bears that out. “AI” is an amplifier of engineering capability, not a magic wand that fixes bottlenecks, blockers and quality leaks. Unless you address them (and that’s mostly a skills thing – it doesn’t come with your Claude Code plan), it just makes them worse.

Teams using Claude Code, Cursor, Copilot and other LLM-based code generating tools experience positive benefits – in terms of outcomes like lead times and release stability – if they’re applying good technical practices.

Teams that don’t apply enough engineering discipline experience negative impact on those outcomes. They deliver less reliable software that costs more to change, and they deliver it later.

And those teams are sadly in the majority, according to the DORA data.

So the beliefs seem to match the reality – engineering discipline is undeniably more important when we’re drinking from a code-generating firehose.

The mystery that remains is why, given we mostly all seem to agree on this, we see no signs of increased investment in engineering capability. Demand for training hasn’t been rising (and I would know if it had) – and that includes training in “AI-assisted” development practices.

Teams appear to have been left to figure it out for themselves by a process of trial and error.

At the same time, many developers now report feeling under pressure to deliver the claimed productivity gains of “AI” – spend 2 minutes on sites like LinkedIn and you’ll see some very high expectations being set – and the lack of support probably isn’t helping.

Practices like Specification By Example, Test-Driven Development, continuous inspection, refactoring, and continuous integration aren’t just compatible with “AI”-assisted workflows – they’re essential for them.

I’ve spent more than 25 years applying these practices successfully, and teaching them to thousands of developers face-to-face and countless more through online video tutorials, blog posts and social media.

And I’ve devoted a large chunk of the last three years exploring how they can benefit “AI”-assisted workflows. Check out my AI-Ready Software Developer blog series for the highlights.

For teams who want a hands-on introduction, my training courses now incorporate “AI”. Once you’ve learned these practices in your IDE, you’ll get a chance to apply them using tools like Claude Code and Cursor and see the difference they can make. This way, you can become more effective with and without “AI”.

It turns out that’s the key to succeeding with it.

Tech Leaders: Low-Performing Teams Are A Gift, Not A Curse

It’s an age-old story. You’re parachuted in to lead a software development organisation that’s experiencing – let’s be diplomatic – challenges.

Releases are far apart and full of drama. Lead times are long, and unless it’s super, super necessary – and they’re prepared to dig deep into their pockets and wait – the business has just stopped asking for changes. It saves times. They know the answer will probably be “too expensive”.

The only respite comes with code freezes and release embargos – never on a Friday, never over the holidays. But is a break really a break when you know what you’ll be coming back to?

Morale is low, the best people keep leaving, and they can almost smell the chaos on you when you’re trying to hire them.

The finger-pointing is never-ending. And soon enough, those fingers will be pointing at you. Honeymoon periods don’t last long, and the clock is ticking to effect noticeable positive change. They lit the fuse the moment you said “yes”.

I know, right! Nightmare!

Except it isn’t. For a leader new in the chair, it’s actually a golden ticket.

The mistake most tech leaders make is to go strategic – bring in the big guns, make the big changes. It’s a big song and dance number, often with the words “Agile” and/or “transformation” above the door.

This creates highly-visible activity at management and process level. But not meaningful change. Agility Theatre is first and foremost still theatre. You’re focusing on trying to speed up the outer feedback loops of software and systems development – at high cost – without recognising where the real leverage is.

Where I work as a teacher and advisor is at the level of small, almost invisible changes to the way software developers work day-to-day, hour-to-hour, minute-to-minute.

The biggest leverage in a system of feedback loops within feedback loops is in the innermost loops.

Think about it in programming terms: you have nested loops in a function that’s really slow, and you need to speed it up. Which loop do you focus your attention on – the outer loop, or the inner loop? Which one would optimising give you the most leverage for the least change?

I worked with a FinTech company in London who called me in because they’d been going round in circles trying to stabilise a release. Every time they deployed, the system exploded on contact with users – costing a lot of money in fines and compensation, and doing much damage to their reputation.

“Agile coaches” had been engaging with management and, to a lesser extent, teams to try to “fix the process”.

I took one look, saw their total reliance on manual testing, and recognised the “doom loop” immediately. Testing the complete system took a big old QA team several weeks, during which time the developers were busy fixing the bugs reported from the last test cycle, while – of course – introducing all-new bugs (and reintroducing a few old favourites).

They’d been stuck in this loop for 9 months, at a cost running into millions of pounds – plus whatever they’d been paying for the “Agile coaching” that had made no discernible difference. You can’t fix this problem with “stand-ups”, “cadences” and “epics”.

I took the developers into a meeting room and taught them to write NUnit tests. Prioritising the most unstable features, the problem was gone within 6 months, never to return.

And something else happened – something quite magical. Not only did the software become much more reliable, but release cycles got shorter. Better software, sooner.

That’s quite the notch on the belt for a new CTO or head of engineering, dontcha think?

After decades of helping organisations build software development capability, I know from wide experience that taking a team from bad to okay is way easier than taking them from okay to good, which is way easier than from good to excellent.

Having helped teams climb that ladder all the way to the top, I can attest that the first rung is a real gift. Your CEO or investors probably can’t distinguish excellent from good, but the difference between bad and okay is very noticeable to businesses.

Unit testing is table stakes in that journey, but the difference it makes is often huge. Low-visibility, but very high-leverage.

Here’s the 101: the inner loops of software development – coding, testing, reviewing, refactoring, merging – are where the highest-leverage changes can be made at surprisingly low cost.

90% of the time, nobody but the developers themselves need get involved. No extra budget’s required (you might even save money). No management permission need be sought. No buy-in from other parts of the organisation is necessary.

High-leverage changes that can slip under the radar, but that can also have quite profound impact.

And results are much easier to sell than promises. Imagine, when your honeymoon period’s up, being able to point to a graph showing how lead times have shrunk, and how fires in production have become rare.

I’ve seen that buy tech leaders the higher trust and autonomy they need to make those strategic changes that the infamous organisational antibodies may well have attacked before. They were immunised by results.

Now for the really good news. In all those cases I’ve been involved in – many over the years – tech leaders only did one thing.

I am not an “Agile coach” or a “transformation lead”. I do not sit in meetings with management. I do not address strategic matters or matters of high-level process.

I work directly with developers, helping them to make those high-leverage, mostly invisible changes to their innermost workflows. Your CEO will likely never meet me, or even hear my name.

The first rule of Code Craft is that we do not talk about Code Craft.

Right now, you may be thinking, “But Jason, we’ve got AI. Developers can produce 2x the code in the same time.”

Do you think 2x the code hitting that FinTech company’s testing bottleneck would have made things better? Or would that just have been more bugs hitting QA in bigger batches?

Attaching a code-generating firehose to your dev process is very likely to overwhelm bottlenecks like testing, code review and merging – making the system as a whole slower and leakier. I call it “LGTM-speed”.

Tightening up those inner loops is even more essential when we’re using “AI” code generators. If we don’t, lead times get longer, releases get less stable, and the cost of change goes up. Worse software, later.

Good luck presenting that graph to the C-suite!

If you need help tightening up the inner feedback loops in your software development processes – with or without “AI” – that’s right up my street.

Visit https://codemanship.co.uk/ for details of training and coaching in the technical practices that enable rapid, reliable and sustainable evolution of software to meet changing needs.

Clean Contexts

You’ve probably heard of “clean code” (and the “clean coder”, and “clean architecture”, and other things Bob Martin has added the word “clean” in front of to get another book out of it).

In this dawning age of “AI”-assisted software development, I’d like to propose clean contexts.

What is a “clean context”? Well, I’m glad you asked.

A clean context:

Addresses one problem – one failing test, one code quality rule, one refactoring etc.
Is small enough to stay inside the model’s effective context limit – which is going to be orders of magnitude smaller than the advertised maximum context
Uses clear and consistent shared language – if you’ve been calling it “sales tax”, don’t suddenly start calling it “VAT”
Clarifies with examples that can be used as success criteria (i.e., tests) – The code samples used in training were paired with usage examples, so it improves matching
Only contains information pertinent to the task – don’t divert the model’s attention (literally)
Only contains accurate information (“ground truth”) – the code and the architecture as it is now (not a bunch of changes back when you asked the tool to summarise it), the test failure message, the mutation testing results and so on. Ground your interactions in reality.
Only contains working code – if the model breaks the code, don’t feed it back to it. It can’t tell broken code from working code, and you’ll pollute the context. Revert and try again. The exception to this is bug fixes, of course. But if the model introduced the bug – git reset –hard
Contains code that doesn’t go outside the model’s data distribution – LLMs famously choke on code that lacks clarity, is overly complex and lacks separation of concerns because it’s far outside the distribution of examples they were trained on. When it comes to gnarly legacy code, I’ve had more success breaking it down myself initially before letting Claude loose on it. Y’know, like how an adult bird chews the food first before feeding it to its chicks.

And remember that a prompt isn’t the entire context. Claude Code and Cursor will use static analysis to determine what source code needs to be added. Context files may be added (e.g., CLAUDE.md). And of course, everything in the conversation- your (or your agent’s) prompts and the model’s responses – are all part of the context. When an LLM “hallucinates”, that becomes part of the context, and the model has no way of determining fact from its own fiction. It’s all just context to a language model.

This is why I purge and then construct a new, task-specific context with each interaction. Many users are reporting how much more accurate LLMs tend to be with a fresh context.

Our goal with a clean context is to minimise ambiguity and the risk of misinterpretation, to minimise attention dilution and context drift, context pollution and context “rot”, and as much as possible, stay within the LLM’s training data distribution.

Basically, we’re aiming to maximise the chances of an accurate prediction from the LLM, and spend less time cleaning up mistakes and digging the tool out of “doom loops”.

Importantly, working in small steps – solving one problem at a time – opens up many more opportunities after each step to get feedback from testing, code review and merging, so clean contexts are highly compatible with much more iterative approaches.

Just as Continuous Delivery enables us to make progress by putting one foot in front of the other, ensuring a working product after every step, we also aim to start every step with a clean context that significantly reduces the risk of a stumble.