The Solution To BDUF Isn’t Faster BDUF

Big Design Up-Front – making lots of design decisions before getting real-world feedback on any of them – fails at any speed of decision-making. Indeed, the faster we make design decisions, the more we tend to fail.

Most folks make a category error of approaching design as a shopping list of decisions. We decide A, B, C, D. If it turns out B is wrong, we can just change B.

But it’s not a list – it’s a tree. Each decision constrains future choices. If B is wrong, C and D may well be wrong, too, if they’re consequences of B.

And we might also make the mistake of thinking we can just unpick the decision tree, but it’s not as simple as that. First of all, we have to distinguish the leaves from the nodes. What is the root decision that we got wrong. And all we see in the resulting code is leaves.

Also, dependencies between decisions don’t operate like dependencies in code. So we’d have to extract an entire branch of decisions from an orthogonally-interconnected architecture. Decision X, Y and Z may have logical similarities that lead to shared modules that exist independently of whether features I, J and K reuse them.

BDUF is like a game of Play Your Cards Right where they don’t turn over any cards until the end. And it doesn’t matter how fast you play it, or how many goes you get, you’ll almost certainly lose.

The key factor in dramatically improving our odds of making good design decisions is how many consequent decisions we make before we get real-world feedback – how far we keep driving down that road after we make the turn.

And an AI Maserati’s just going to make things worse – we can go even further in the wrong direction, even faster.

So when someone boasts that Claude Cowork generated a PRD for 50 features in an hour, I can’t help thinking “Tell me there are no users without telling me there are no users”.

Can I back up what I’m claiming here? Yes, I think I probably can.

Image

You’re looking in the wrong place. The value’s in the feedback, folks – not in the plan.

The Hottest AI Coding Skill In 2026? Coding Without AI

Psst. If your boss won’t invest in training you in Specification By Example or Test-Driven Development, I’m running out-of-hours workshops in May specifically for self-funding learners. £99 + UK VAT.

Statistics are showing that software developer hiring’s been on the rise again for about a year, but it seems priorities have changed.

Image

Time was that our profession was a pyramid, with a base of entry-level and “junior” hires outnumbering seasoned professionals. But this time, we’re an aging population, with employers favouring developers with significant pre-AI experience.

“The current market has become increasingly senior-driven, with fewer junior roles available and employers expecting even entry-level candidates to have all the skills to hit the ground running.”

Harvey Nash, Software trends in the year ahead: A UK hiring outlook

Meanwhile, my LinkedIn feed’s awash with posts complaining that not only are coding tests still very much a thing, but they’re becoming even more of a thing. “Why”, they lament, “do employers want coding skills when AI can do all that?”

The problem is that – a significant proportion of the time – AI actually can’t do all that. Anyone who’s used AI coding assistants and agents for more than an hour or two will know that there are many times when we have to intervene. And intervening at the very least requires us to really understand what we’re intervening in.

As a trainer and mentor, I’ve watched code comprehension degrade alarmingly over the past 10 years as developers have been relying more and more on copying and pasting code from sources like Stack Overflow.

I especially implore junior developers not to do it. It very noticeably stunts their growth as programmers. The code really does need to go in through your eyes, through your brain and out through your fingers for knowledge to sink in. The language and reasoning centres of our brain need to be actively engaged.

If you lost your phone and urgently needed to call a friend, could you remember their number? Speed-dialing code has a similar effect.

In the last year, that’s gone into warp drive now that we have a powerful tool that automates the copypasta process at scale.

The effect of AI use on code comprehension is well-documented. The more we use it, the less we understand the code that’s being generated. And not just because we didn’t write it, it transpires – our ability to understand code generally atrophies. Use it or lose it.

Employers in 2026, it seems, favour developers who haven’t lost it. As I’ve been only half-joking for over a year, the devs who’ll be in highest demand in this age of AI will be the ones who don’t need it.

With highly public outages becoming routine, it looks like managers are reprioritising to make sure that when the you-know-what hits the fan on a critical system at 2 am, the person answering that call is capable of fixing it even when the AI is having one of its famous senior moments.

Those same employers, of course, then insist that the senior developers they hired because of their effectiveness without AI should use AI as much as possible. Sigh.

In my blog series The AI-Ready Software Developer, I wrote a post called Staying Sharp about how important it’s going to be to maintain our “traditional” programming skills. We need to find some time in the day or the week to leave the proverbial car at home and walk.

And if hitting your token limit is a blocker to doing more programming, you may already have deskilled yourself out of the market.

Essential Code Craft – Workshops In May

May will soon be upon us, and I’ve scheduled three out-of-hours workshops for self-funding learners in my Essential Code Craft series.

If your employer won’t invest in you, invest in yourself and join us.

Tues May 12th 18:45 BST & Sat May 16th 09:45 BST- Specification By Example

Over more than 70 years of developing software products and systems, we’ve learned that misunderstandings about the meaning of requirements is one of the biggest sources of avoidable rework.

Reducing ambiguity in specifications can dramatically reduce the risk of misinterpretation, whether it’s among human stakeholders or when we’re working with AI coding tools.

Tues May 19th 18:45 BST – Test-Driven Development

For nearly 30 years, Test-Driven Development has been the technical core of successful agile software development.

Teams have shortened delivery lead times dramatically, while actually improving the reliability of their releases, and lowering the cost of changing software using TDD.

And TDD is proving to be not just compatible with AI-assisted software development, but essential.

Essential Code Craft – The Roadmap

Some of you may have noticed that I’ve been running out-of-hours training workshops for self-funding learners recently, under the banner of Essential Code Craft.

In a way, this is a return to the early days of Codemanship when I ran regular weekend workshops – priced for individual pockets – that were mostly attended by developers investing in their own skills and career development.

Many of those people are now CTOs and heads of engineering, and I’ve been fortunate – and grateful – that quite a few have brought me in to provide the same kind of training for their teams.

But with senior engineering leaders now very distracted by the code-generating firehose – and while I wait for them to realise that nothing’s actually changed as far as software engineering fundamentals are concerned – I’m pivoting back to self-funders.

So far – just as it was way back when – the first two workshops filled up quickly. While the boss might not be thinking about investing in their developers at the moment, it seems a lot of developers are looking to invest in themselves.

And this is exactly the moment to do it. While a gazillion developers hunt for magic incantations to make a probabilistic next-token predictor act like something other than a probabilistic next-token predictor, the people who’ve done their homework already know: better results with AI coding tools have very little to do with the tools, and almost everything to do with the processes around them.

And it’s a double-win. The practices that produce the best outcomes with AI are the exact same practices that produce the best outcomes without AI.

The key to being effective with AI is being effective without it.

And here’s the hedge, but only for the informed gamblers – developer hiring is rising again, but the demographic of these new hires is changing. Employers are favouring senior developers with significant pre-LLM experience.

I, and a few others, predicted this would happen. Demand would be highest for people who can do the things AI coding tools can’t – like, well, understand code. I mean really understand it. Not “LGTM” understanding. Deep comprehension of programs.

Not only that, but for all kinds of good reasons – economic, environmental, energy, ethical, geopolitical – the future of hyperscale LLMs is by no means predictable. Folks grappling with reduced token limits and rapidly degrading performance with Anthropic’s newest models will hopefully have figured out by now that building workflows that depend heavily in hyperscale LLMs is building on quicksand.

Who are Acme Megacorp gonna’ hire – the dev who sits on their hands because they’re waiting for their token limit to reset, or the dev who can just carry on at roughly the same overall pace of delivery?

And we should be under no illusions that teams who’ve mastered the fundamentals of software delivery are routinely outperforming teams who haven’t – with or without AI. AI is clearly not the differentiator.

So, whether you’re going to apply these disciplines with Claude Code or Codex, or with IntelliJ or VS Code, they still matter – arguably more than ever.

And what are these disciplines? What is Essential Code Craft?

  • Specification By Example – build shared understanding and pin down requirements with testable specifications
  • Test-Driven Development – rapidly iterate working software designs with short delivery lead times and reliable releases
  • Continuous Integration – keep teams more in sync with their changes, merging and testing them many times a day to ensure a working, shippable-at-any-time product
  • Continuous Collaboration – keep teams on the same page by continuously communicating with practices like pair programming and teaming
  • Refactoring – reshape code to make change easier, while keeping it working and shippable at all times
  • Modular Design – optimise software architecture to localise the “blast radius” and minimise the cost of changes, while making rapid testing and smarter reuse easier
  • Continuous Inspection – minimise the bottleneck and the “LGTM” effect of downstream code review by making it a continuous and highly automated process
  • Continuous Delivery – combine these fundamentals in a delivery process that can get the proverbial peas from the farmer’s field to the kitchen table through rapid, reliable integration, build and deployment pipelines
  • Continuous Improvement – build development capability in an evidence-based way, learning what really works and what doesn’t as you build skills, automate tools and workflows, and explore and experiment with your approach – and that’s where I come in!)

Workshops on Specification By Example and Test-Driven Development are already live and taking registrations. If there’s demand, more will follow.

The roadmap is to build a set of repeating individual workshops, rotating monthly, that will eventually cover all of these disciplines – some explicitly, some implicitly like Continuous Integration and pair programming, which will be an integral part of most workshops.

Self-funders can pick and choose which to attend, and my hope is that they’ll be a bit like Pokemon cards – gotta collect ’em all!

Keep an eye on the Codemanship Ticket Tailor box office for details of upcoming workshops.

Also, details of new workshop times will be posted here first, so subscribe to this blog if you’d like to be kept in the loop for future workshops.

Engineering Leaders: Your AI Adoption Doesn’t Start With AI

In the past few months, I’ve been hearing from more and more teams that the use of AI coding tools is being strongly encouraged in their organisations.

I’ve also been hearing that this mandate often comes with high expectations about the productivity gains leaders expect this technology to bring. But this narrative is rapidly giving way to frustration when these gains fail to materialise.

The best data we have shows that a minority of development teams are reporting modest gains – in the order of 5%-15% – in outcomes like delivery lead times and throughput. The rest appear to be experiencing negative impacts, with lead times growing and the stability of releases getting worse.

The 2025 DevOps Research & Assessment State of AI-assisted Software Development report makes it clear that the teams reporting gains were already high-performing or elite by DORA’s classification, releasing frequently, with short lead times and with far fewer fires in production to put out.

As the report puts it, this is not about tools or technology – and certainly not about AI. It’s about the engineering capability of the team and the surrounding organisation.

It’s about the system.

Teams who design, test, review, refactor, merge and release in bigger batches are overwhelmed by what DORA describes as “downstream chaos” when AI code generation makes those batches even bigger. Queues and delays get longer, and more problems leak into releases.

Teams who design, test, review, refactor, merge and release continuously in small batches tend to get a boost from AI.

In this respect, the team’s ranking within those DORA performance classifications is a reasonably good predictor of the impact on outcomes when AI coding assistants are introduced.

The DORA website helpfully has a “quick check” diagnostic questionnaire that can give you a sense of where your team sits in their performance bands.

Image

(Answer as accurately as you can. Perception and aspiration aren’t capability.)

The overall result is usefully colour-coded. Red is bad, blue is good. Average is Meh. Yep, Meh is a colour.

Image

If your team’s overall performance is in the purple or red, AI code generation’s likely to make things worse.

If your team’s performance is comfortably in the blue, they may well get a little boost. (You can abandon any hopes of 2x, 5x or 10x productivity gains. At the level of team outcomes, that’s pure fiction.)

The upshot of all this is that before you even think about attaching a code-generating firehose to your development process, you need to make sure the team’s already performing at a blue level.

If they’re not, then they’ll need to shrink their batch sizes – take smaller steps, basically – and accelerate their design, test, review, refactor and merge feedback loops.

Before you adopt AI, you need to be AI-ready.

Many teams go in the opposite direction, tackling whole features in a single step – specifying everything, letting the AI generate all the code, testing it after-the-fact, reviewing the code in larger change-sets (“LGTM”), doing large-scale refactorings using AI, and integrating the whole shebang in one big bucketful of changes.

Heavy AI users like Microsoft and Amazon Web Services have kindly been giving us a large-scale demonstration of where that leads – more bugs, more outages, and significant reputational damage.

A smaller percentage of teams are learning that what worked well before AI works even better with it. Micro-iterative practices like Test-Driven Development, Continuous Integration, Continuous Inspection, and real refactoring (one small change at a time) are not just compatible with AI-assisted development, they’re essential for avoiding the “downstream chaos” DORA finds in the purple-to-red teams.

And while many focus on the automation aspects of Continuous Delivery – and a lot of automation is required to accelerate the feedback loops – by far the biggest barrier to pushing teams into the blue is skills.

Yes. SKILLS.

Skills that most developers, regardless of their level of experience, don’t have. The vast majority of developers have never even seen practices like TDD, refactoring and CI being performed for real.

That’s certainly because real practitioners are pretty rare, so they’re unlikely to bump into one. But much of this is because of their famously steep learning curves. TDD, for example, takes months of regular practice to to be able to use it on real production systems.

And, as someone who’s been practicing TDD and teaching it for more than 25 years, I know it requires ongoing mindful practice to maintain the habits that make it work. Use it or lose it!

An experienced guide can be incredibly valuable in that journey. It’s unrealistic to expect developers new to these practices to figure it all out for themselves.

Maybe you’re lucky to have some of the 1% of software developers – yes, it really is that few – who can actually do this stuff for real. Or even one of the 0.1% who has had a lot of experience helping developers learn them. (Just because they can do it, it doesn’t necessarily follow that they can teach it.)

This is why companies like mine exist. With high-quality training and mentoring from someone who not only has many thousands of hours of practice, but also thousands of hours of experience teaching these skills, the journey can be rapidly accelerated.

I made all the mistakes so that you don’t have to.

And now for the good news: when you build this development capability, the speed-ups in release cycles and lead times, while reliability actually improves, happen whether you’re using AI or not.

The Gorman Paradox – Solution II: They’re In The Bin

Software development’s essentially a learning process. Most of the value in a product or system’s added in response to user and eventually market feedback.

With each iteration we get the design less wrong. With each iteration, we learn.

The effect of batch size on learning is profound.

I urge teams to work on the basis that every design decision is guesswork until it hits the real world. We can’t know with certainty that we made the right decisions.

Getting user feedback is the only meaningful mechanism we have to “turn the cards over” and found out if we guessed right. In this sense, learning is characterised as reducing or eliminating uncertainty in product design. Teams who do this faster will tend to out-learn their competition.

Imagine trying to guess a random 4-digit number in one go vs. guessing one digit at a time.

In both approaches, we start with the same odds of guessing it right: 1/10,000. But with each guess, the uncertainty collapses orders of magnitude faster when we’re guessing one digit at a time. The latter approach out-learns the former.

Even if we had an “AI” random 4-digit number generator that enabled us to make 10x as many guesses in the same time, guessing one digit at a time would still out-learn us.

The chances of a complete solution delivered in a single pass – guessing all 4 digits in one go – being even on the same continent as correct are vanishingly remote, and we learn very little because of the nature of user feedback.

If I deliver 50 changes (e.g., new features) in a single release and ask users “waddaya think?”, I won’t get meaningful feedback about all 50 changes.

Most likely I’ll get general feedback of the “LGTM” or “meh” variety, and maybe some specific feedback about things that stood out. (Bugs in a release tend to overshadow anything else, for example – the proverbial fly in the soup. “Waddaya think of the soup?” “There’s a fly in it!”)

If I deliver ONE change, they’ll probably have something meaningful to say about it. We can at least observe what impact that one change has on user behaviour (e.g., engagement, completing tasks etc).

So we learn faster when we iterate fewer changes into the hands of users at a time. This inevitably forces us to apply the brakes on the creation of code, because we need to wait for feedback, and we need to do that often.

I see many posts here from folks claiming to have generated entire applications in days or even hours using LLM-based coding tools. That’s the equivalent of “guessing all 4 digits at a time using an ‘AI’ 4-digit number generator”. That’s an entire application – hundreds of design decisions – created without any user feedback.

Creating an entire application in a single pass is every bit as “Big Design Up-Front” as wireframing or modeling the whole thing in UML in advance. And assumptions and guesses in your early decisions get compounded in later decisions, piling up uncertainty under a mountain of interconnected complexity. Failure is almost inevitable.

This is another potential solution to the Gorman Paradox.

Where are all the “AI”-generated apps? In the bin.

It just so happens that I train and mentor teams in the technical practices that enable them to learn faster from user and market feedback. I know, right! What are the chances?

And it also just so happens that any Codemanship training course booked by January 31st 2026 is HALF-PRICE. Which is nice.

The Great Filter (Or Why High Performance Still Eludes Most Dev Teams, Even With AI)

In my post about The Gorman Paradox, I compare the lack of any evidence of “AI”-assisted productivity gains to be found out here in the Real WorldTM with the famous Fermi Paradox that asks, if the universe is teeming with intelligent life, where is everybody?

It’s been over 3 years, and we’ve seen no uptick in products being added to the app stores. We’ve seen no rising tide on business bottom lines. We’ve seen no impact on national GDPs.

There is a likely explanation, and it’s the most obvious one: “AI”-assisted coding doesn’t actually make the majority of dev teams more productive. For sure, it produces more code. But, on average, it creates no net additional value.

The DORA data does find some teams reaping modest gains in terms of software delivery lead times without sacrificing reliability, and – interestingly – the data shows that those high-performing teams using “AI” were already high-performing without it.

The majority of teams showed that “AI” actually slowed them down, and these were the teams who were already pretty slow before “AI”. Attaching a code-generating firehose to the process just made them marginally slower.

The differentiator? Are the high-performing teams super-skilled programmers? Are they getting paid more? Are they putting something in the office water supply?

It turns out that what separates the teams who get a negative boost from the teams who get a positive boost is that the latter have addressed the bottlenecks in their development process.

Blocking activities, like detailed up-front design, after-the-fact testing, Pull Request code reviews, and big merges to the main branch, have been turned into continuous activities.

Teams work in much smaller batches and in much tighter feedback loops, designing, testing, inspecting and merging many times an hour instead of every few days.

Work doesn’t sit in queues waiting for someone’s attention. There are very few traffic lights between the developer’s desktop and the outside world to slow that traffic down.

And this means that changes can make it into the hands of users very rapidly, with highly automated, highly reliable, frictionless delivery pipelines that – as the supermarket ads used to say – get the peas from the farmer’s field to your table in no time at all.

The just-in-time grocery supply chains of supermarkets are a good analogy for the processes high-performing teams are using. Supermarkets don’t buy a year’s supply of fresh peas once a year. They buy tomorrow’s supply today, and their formidable logistical capabilities get those peas on the shelves pronto.

Those formidable logistical capabilities didn’t just appear, either. They’re the product of many decades of investment. Supermarket chains have sunk billions into getting better at it, so they can maximise cash flow by minimising the amount of working capital they have committed at any time.

They don’t want millions of pounds-worth of produce sitting in warehouses making them no money.

And businesses don’t want millions of pounds-worth of software changes sitting in queues waiting to be released. They want them out there in the hands of users, creating value in the form of learning what works and what doesn’t. Software that can’t be used has no value.

Walk into any large organisation and take a snapshot of how much investment in developed code is “in progress”. For some, it literally is million of pounds-worth – tens or hundreds of thousands of pounds, multiplied by dozens or hundreds of teams.

The impact on a business of being able to out-learn the competition can be so profound, we might ask ourselves “Why isn’t everybody doing this?” Can you imagine a supermarket chain deciding not to bother with JIT supply? They wouldn’t last long.

It’s come into focus even more sharply with the rise of “AI”-assisted software development. It’s quite clear now that even modest productivity gains lie on the other side of the spectrum with teams who have addressed their bottlenecks and have low-friction delivery pipelines.

I see a “Great Filter” that continues to prevent the large majority of dev teams making it to that Nirvana. It requires a big, ongoing investment in the software development capability needed.

We’re talking about investment in people and skills. We’re talking about investment in teams and organisational design. We’re talking about investment in tooling and automation. We’re talking about investment in research and experimentation. We’re talking about investment in talent pipelines and outreach. We’re talking about investment in developer communities and the profession of software development.

Typically, I’ve seen that companies who manage to progress from the bottleneck-ridden ways of working to highly iterative, frictionless methods needed to invest 20-25% of their entire development budget in building and maintaining that capability.

And building that kind of capability takes years.

You can’t buy it. You can’t install it. You can’t have it flown in fresh from Silicon Valley.

And, like organ transplants, any attempt to transplant that kind of capability into your business will be met with organisational anti-bodies protecting the status quo.

And that, folks, is The Great Filter.

Most organisations are simply not prepared to make that kind of commitment in time, effort and money.

Sure, they want the business benefits of faster lead times, more reliable releases, and a lower cost of change. But they’re just not willing to pony up to get it.

On a daily basis, I see people online warning us not to “get left behind by AI”. The reality is that the people who really are getting left behind are the ones who think that the bottlenecks and blockers they’ve struggled with in the past will magically get out of the way of the code-generating firehose.

Low-performing teams, now grappling with the downstream chaos caused by “AI” code generation, will probably always be the norm. And the value of this technology will probably never be realised by those businesses.

If you’re on of the few who are serious about building software development capability, my training courses in the technical practices that enable rapid, reliable and sustained evolution of software to meeting changing needs are half price if you confirm your booking by Jan 31st.

Walking Skeletons, Delivery Pipelines & DevOps Drills

On my 3-day Code Craft training workshop (and if you’re reading this in January 2026, training’s half-price if you confirm your booking by Jan 31st), there’s a team exercise where the group need to work together to deliver a simple program to the customer’s (my) laptop where I can acceptance-test it.

It’s primarily an exercise in Continuous Delivery, bringing together many of the skills explored earlier in the course like Test-Driven Development and Continuous Integration.

But it also exercises the muscles individual or pair-programmed exercise don’t reach. Any problem, even a simple one like the Mars Rover, tends to become much more complicated when we tackle it as a team. It requires a lot of communication and coordination. A team will typically take more time to complete it.

And it also exercises muscles that developers these days have never used before. In 2026, the average developer has never created, say, a command-line project from scratch in their tech stack. They’ve never set up a repo using their version control tool. They’ve never created a build script for Continuous Integration builds. They’ve never written a script to automatically deploy working software.

In the age of “developer experience”, a lot of people have these things done for them. Entry-level devs land on a project and it’s all just there.

That may seem like a convenience initially, but it comes with a sort of learned helplessness, with total reliance on other people to create and adapt build and deployment logic when it’s needed. A lot of developers would be on a significant learning curve if they ever needed to get a project up and running or to change, say, a build script.

It’s the delivery pipeline that frustrates most teams’ attempts to get any functionality in front of the customer in this exercise.

I urge them at the start to get that pipeline in place first. Code that can’t be used has no value. They may have written all of it, but if I can’t test it on my machine – nil points. Just like in real life.

They’re encouraged to create a “walking skeleton” for their tech stack – e.g., a command-line program that outputs “Hello, world!”, and has one dummy unit test.

This can then be added to a new GitHub repository, and the rest of the team can be invited to collaborate on it. That’s the first part of the pipeline.

Then someone can create a build script that runs the tests, and is triggered by pushes to the main (trunk) branch. On GitHub, if we keep our technical architecture vanilla for our tech stack (e.g., a vanilla Java/Maven project structure), GitHub actions can usually generate a script for us. It might need a tweak or two – the right version of Java, for example – but it will get us in the ballpark.

So now everyone in the team can clone a repo that has a skeleton project with a dummy unit test and a simple output to check that it’s working end to end.

That’s the middle of the pipeline. We now have what we need to at least do Continuous Integration.

The final part of the pipeline is when the food makes it to the customer’s table. I remind teams that my laptop is a developer’s machine, and that I have versions of Python, Node.js, Java and .NET installed, as well as a Git client.

So, they could write a batch script that clones the repo, builds the software (e.g., runs pip install for a Python project), and runs the program. When I see “Hello, world!” appear on my screen, we have lift-off. The team can begin implementing the Mars Rover, and whenever a feature is complete, they can ping me and ask me to run that script again to test it.

And thus, value begins to flow, in the form of meaningful user feedback from working software. (Aww, bless. Did you think the software was the value? No, mate. The value’s in what we learn, not what we deliver.)

And, of course, in the real world, that delivery pipeline will evolve, adding more quality gates (e.g., linting), parallelising test execution as the suite gets larger, progressing to more sophisticated deployment models and that sort of thing, as needs change.

DevOps – the marriage of software development and operations – means that the team writing the solution code also handles these matters. We don’t throw it over the wall to a separate “DevOps” team. That’s kind of the whole point of DevOps, really. When we need a change to, say, the build script, we – the team – make that change.

But you might be surprised how many people who describe themselves as “DevOps Engineers” wouldn’t even know where to start. (Or maybe you wouldn’t.)

It’s not their fault if they’ve been given no exposure to operations. And it’s not every day that we start a project from scratch, so the opportunities to gain experience are few and far between.

Given just how critical these pipelines are to our delivery lead times, it’s surprising how little time and effort many organisations invest in getting good at them. It should be a core competency in software development.

It’s especially mysterious why so many businesses allow it to become a bottleneck by favouring specialised teams instead of T-shaped DevOps software engineers who can do most of it themselves instead of waiting for someone else to do it. Teams could have a specialised expert on hand for the rare times when deep expertise is really needed.

If the average developer knew the 20% they’d need 80% of the time to create and change delivery pipelines for their tech stack(s), there’d be a lot less waiting on “DevOps specialists” (which is an oxymoron, of course).

Just as a contractor who has to move house often tends to become very efficient at it, developers who have to get delivery pipelines up and running often tend to be much better at the yak shaving it involves.

So I encourage teams to make these opportunities by doing regular “DevOps drills” for their tech stacks. Get a Node Express “Hello, world” pipeline up and running from scratch. Get a Spring Boot pipeline up and running from scratch. etc.

Typically, I see teams doing them monthly, and as they gain confidence, varying the parameters (e.g., parallel test execution, deployment to a cluster and so on), and making the quality gates more sophisticated (security testing, linting, mutation testing and so on), while learning how to optimise pipelines to keep them as frictionless as possible.

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

Psst. If your boss won’t invest in training you in Specification By Example (BDD, ATDD), I’m running out-of-hours workshops on May 12 and 16 specifically for self-funding learners. £99 + UK VAT.

In this series, I’ve explored the principles and practices that teams seeing modest improvements in software development outcomes have been applying.

After more than four years since the first “AI” coding assistant, GitHub Copilot, appeared, the evidence is clear. Claims of teams achieving 2x, 5x, even 10x productivity gains simply don’t stand up to scrutiny. No shortage of anecdotal evidence, but not a shred of hard data. It seems when we measure it, the gains mysteriously disappear.

The real range, when it’s measured in terms of team outcomes like delivery lead time and release stability, is roughly 0.8x – 1.2x, with negative effects being substantially more common than positives.

And we know why. Faster cars != faster traffic. Gains in code generation, according to the latest DORA State of AI-Assisted Software Development report, are lost to “downstream chaos” for the majority of teams.

Coding never was the bottleneck in software development, and optimising a non-bottleneck in a system with real bottlenecks just makes those bottlenecks worse.

Far from boosting team productivity, for the majority of “AI” users, it’s actually slowing them down, while also negatively impacting product or system reliability and maintainability. They’re producing worse software, later.

Most of those teams won’t be aware that it’s happening, of course. They attached a code-generating firehose to their development plumbing, and while the business is asking why they’re not getting the power shower they were promised, most teams are measuring the water pressure coming out of the hose (lines of code, commits, Pull Requests) and not out of the shower (business outcomes), because those numbers look far more impressive.

The teams who are seeing improvements in lead times of 5%, 10%, 15%, without sacrificing reliability and without increasing the cost of change, are doing it the way they were always doing it:

  • Working in small batches, solving one problem at a time
  • Iterating rapidly, with continuous testing, code review, refactoring and integration
  • Architecting highly modular designs that localise the “blast radius” of changes
  • Organising around end-to-end outcomes instead of around role or technology specialisms
  • Working with high autonomy, making timely decisions on the ground instead of sending them up the chain of command

When I observe teams that fall into the “high-performing” and “elite” categories of the DORA capability classifications using tools like Claude Code and Cursor, I see feedback loops being tightened. Batch sizes get even smaller, quality gates get even narrower, iterations get even faster. They keep “AI” on a very tight leash, and that by itself could well account for the improvements in outcomes.

Meanwhile, the majority of teams are doing the opposite. They’re trying to specify large amounts of work in detail up-front. They’re leaving “AI agents” to chew through long tasks that have wide impact, generating or modifying hundreds or even thousands of lines of code while developers go to the proverbial pub.

And, of course, they test and inspect too late, applying too little rigour – “Looks good to me.” They put far too much trust in the technology, relying on “rules” and “guardrails” set out in Markdown files that we know LLMs will misinterpret and ignore randomly, barely keeping one hand on the wheel.

As far as I’ve seen, no team actually winning with the technology works like that. They’re keeping both hands firmly on the wheel. They’re doing the driving. As AI luminary Andrej Karpathy put it, “agentic” solutions built on top of LLMs just don’t work reliably enough today to leave them to get on with it.

It may be many years before they do. Statistical mechanics predicts it could well be never, with the order-of-magnitude improvement in accuracy needed to make them reliable enough (wrong 2% of the time instead of 20%) calculated to require 1020 times the compute to train. To do that on similar timescales to the hyperscale models of today would require Dyson Spheres (plural) to power it.

Any autonomous software developer – human or machine – requires Actual Intelligence: the ability to reason, to learn, to plan and to understand. There’s no reason to believe that any technology built using deep learning alone will ever be capable of those things, regardless of how plausibly they can mimic them, and no matter how big we scale them. LLMs are almost certainly a dead end for AGI.

For this reason I’ve resisted speculating about how good the technology might become in the future, even though the entire value proposition we see coming out of the frontier labs continues to be about future capabilities. The gold is always over the next hill, it seems.

Instead, I’ve focused my experiments and my learning on present-day reality. And the present-day reality that we’ll likely have to live with for a long time is that LLMs are unreliable narrators. End of. Any approach that doesn’t embrace this fact is doomed to fail.

That’s not to say, though, that there aren’t things we can do to reduce the “hallucinations” and confabulations, and therefore the downstream chaos.

LLMs perform well – are less unreliable – when we present them with problems that are well-represented in their training data. The errors they make are usually a product of going outside of their data distribution, presenting them with inputs that are too complex, too novel or too niche.

Ask them for one thing, in a common problem domain, and chances are much higher that they’ll get it right. Ask them for 10 things, or for something in the long-tail of sparse training examples, and we’re in “hallucination” territory.

Clarifying with examples (e.g., test cases) helps to minimise the semantic ambiguity of inputs, reducing the risk of misinterpretation, and this is especially helpful when the model’s working with code because the samples they’re trained on are paired with those kinds of examples. They give the LLM more to match on.

Contexts need to be small and specific to the current task. How small? Research suggests that the effective usable context sizes of even the frontier LLMs are orders of magnitude smaller than advertised. Going over 1,000 tokens is likely to produce errors, but even contexts as small as 100 tokens can produce problems.

Attention dilution, drift, “probability collapse” (play one at chess and you’ll see what I mean), and the famous “lost in the middle” effect make the odds of a model following all of the rules in your CLAUDE.md file, or all the requirements for a whole feature, vanishingly remote. They just can’t accurately pay attention to that many things.

But even if they could, trying to match on dozens of criteria simultaneously will inevitably send them out-of-distribution.

So the smart money focuses on one problem at a time and one rule at a time, working in rapid iterations, testing and inspecting after every step to ensure everything’s tickety-boo before committing the change (singular) and moving on to the next problem.

And when everything’s not tickety-boo – e.g., tests start failing – they do a hard reset and try again, perhaps breaking the task down into smaller, more in-distribution steps. Or, after the model’s failed 2-3 times, writing the code themselves to get themselves out of a “doom loop”.

There will be times – many times – when you’ll be writing or tweaking or fixing the code yourself. Over-relying on the tool is likely to cause your skills to atrophy, so it’s important to keep your hand in.

It will also be necessary to stay on top of the code. The risk, when code’s being created faster than we can understand it, is that a kind of “comprehension debt” will rapidly build up. When we have to edit the code ourselves, it’s going to take us significantly longer to understand it.

And, of course, it compounds the “looks good to me” problem with our own version of the Gell-Mann amnesia effect. Something I’ve heard often over the last 3 years is people saying “Well, it’s not good with <programming language they know well>, but it’s great at <programming language they barely know>”. The less we understand the output, the less we see the brown M&Ms in the bowl.

“Agentic” coding assistants are claimed to be able to break complex problems down, and plan and execute large pieces of work in smaller steps. Even if they can – and remember that LLMs don’t reason and don’t plan, they just produce plausible-looking reasoning and plausible-looking plans – that doesn’t mean we can hit “Play” and walk away to leave them to it. We still need to check the results at every step and be ready to grab the wheel when the model inevitably takes a wrong turn.

Many developers report how LLM accuracy falls of a cliff when tasked with making changes to code that lacks separation of concerns, and we know why this is too. Changing large modules with many dependencies brings a lot more code into play, which means the model has to work with a much larger context. And we’re out-of-distribution again.

The really interesting thing is that the teams DORA found were succeeding with “AI” were already working this way. Practices like Test-Driven Development, refactoring, modular design and Continuous Integration are highly compatible with working with “AI” coding assistants. Not just compatible, in fact – essential.

But we shouldn’t be surprised, really. Software development – with or without “AI” – is inherently uncertain. Is this really what the user needs? Will this architecture scale like we want? How do I use that new library? How do I make Java do this, that or the other?

It’s one unknown after another. Successful teams don’t let that uncertainty pile up, heaping speculation and assumption on top of speculation and assumption. They turn the cards over as they’re being dealt. Small steps, rapid feedback. Adapting to reality as it emerges.

Far from “changing the game”, probabilistic “AI” coding assistants have just added a new layer of uncertainty. Same game, different dice.

Those of us who’ve been promoting and teaching these skills for decades may have the last laugh, as more and more teams discover it really is the only effective way to drink from the firehose.

Skills like Test-Driven Development, refactoring, modular design and Continuous Integration don’t come with your Claude Code plan. You can’t buy them or install them like an “AI” coding assistant. They take time to learn – lots of time. Expert guidance from an experienced practitioner can expedite things and help you avoid the many pitfalls.

If you’re looking for training and coaching in the practices that are distinguishing the high-performing teams from the rest – with or without “AI” – visit my website.

The AI-Ready Software Developer #20 – It’s The Bottlenecks, Stupid!

For many years now, cycling has been consistently the fastest way to get around central London. Faster than taking the tube. Faster than taking the train. Faster than taking the bus. Faster than taking a cab. Faster than taking your car.

Image

All of these other modes of transport are, in theory, faster than a bike. But the bike will tend to get there first, not because it’s the fastest vehicle, but because it’s subject to the fewest constraints.

Cars, cabs, trains and buses move not at the top speed of the vehicle, but at the speed of the system.

And, of course, when we measure their journey speed at an average 9 mph, we don’t see them crawling along steadily at that pace.

“Travelling” in London is really mostly waiting. Waiting at junctions. Waiting at traffic lights. Waiting to turn. Waiting for the bus to pull out. Waiting on rail platforms. Waiting at tube stations. Waiting for the pedestrian to cross. Waiting for that van to unload.

Cyclists spend significantly less time waiting, and that makes them faster across town overall.

Similarly, development teams that can produce code much faster, but work in a system with real constraints – lots of waiting – will tend to be outperformed overall by teams who might produce code significantly slower, but who are less constrained – spend less time waiting.

What are developers waiting for? What are the traffic lights, junctions and pedestrian crossings in our work?

If I submit a Pull Request, I’m waiting for it to be reviewed. If I send my code for testing, I’m waiting for the results. If I don’t have SQL skills, and I need a new column in the database, I’m waiting for the DBA to add it for me. If I need someone on another team to make a change to their API, more waiting. If I pick up a feature request that needs clarifying, I’m waiting for the customer or the product owner to shed some light. If I need my manager to raise a request for a laptop, then that’s just yet more waiting.

Teams with handovers, sign-offs and other blocking activities in their development process will tend to be outperformed by teams who spend less time waiting, regardless of the raw coding power available to them.

Teams who treat activities like testing, code review, customer interaction and merging as “phases” in their process will tend to be outperformed by teams who do them continuously, regardless of how many LOC or tokens per minute they’re capable of generating.

This isn’t conjecture. The best available evidence is pretty clear. Teams who’ve addressed the bottlenecks in their system are getting there sooner – and in better shape – than teams who haven’t. With or without “AI”.

The teams who collaborate with customers every day – many times a day – outperform teams who have limited, infrequent access.

The teams who design, test, review, refactor and integrate continuously outperform teams who do them in phases.

The teams with wider skillsets outperform highly-specialised teams.

The teams working in cohesive and loosely-coupled enterprise architectures outperform teams working in distributed monoliths.

The teams with more autonomy outperform teams working in command-and-control hierarchies.

None of these things comes with your Claude Code plan. You can’t buy them. You can’t install them. But you can learn them.

And if you’re ticking none of those boxes, and you still think a code-generating supercar is going to make things better, I have a Bugatti Chiron Sport you might be interested in buying. Perfect for the school run!