teams – Codemanship's Blog

In Teams, Individual Productivity Is A Harmful Illusion

The hardest lesson I had to learn as a software developer in the early part of my career was that what felt “productive” to me locally – uninterrupted time, code getting created fast etc – often turned out to be a bad sign for overall team outcomes.

Taking interruptions as an example, I mistook concurrency for parallelism in the way I worked. Concurrency is about communication and coordination, and when a bunch of devs are working on different aspects of the same problem at the same time, communication and coordination become the primary activities.

Coding interrupts communication and coordination. CODING IS THE INTERRUPTION.

The more time I spend coding and not communicating, the further I drift from the rest of the team.

We talk a good game about building shared understanding and aligning teams, but then we do everything we can to minimise that in the pursuit of individual “productivity”.

Of course, there are ways we can go about writing code that maintain communication and coordination – but your boss might not like them. (Sing along if you know the words – “But that’s 2 developers doing the work of 1!”)

Cliffs Notes version: what feels productive to you is often counter-productive for the team

The fix? Keep one eye on the horizon, not both eyes on your feet. Effective teams need a high level of situational awareness – being cognizant of what’s going on around you.

And we can work in ways that make us more interruptible. But your boss might not like them, either. (“One little test at a time? It looks so slow!”)

I’ve found so many times that having clear end goals – unambiguously articulated, and ideally measurable for meaningful feedback – has counteracted this illusion of individual productivity.

Indeed, a little like Douglas Harding’s Headless Way, after a while you realise that there is no individual productivity in software development. There is only the team.

Of course – as has happened so many times before – most teams are running in the exact opposite direction with AI-assisted coding, pursuing this seductive illusion of “individual productivity” at the expense of team outcomes.

And – just as all those times before – the solutions that actually work are known to a small few.

Public Code Craft Training – July 7-9

For the small percentage of engineering orgs who’d genuinely like to be shipping more reliable software and be more responsive to the needs of their business and their users – it’s a niche, I know – I’m running a public 3-day online Code Craft workshop on July 7-9.

If you’re a developer, twist your manager’s arm – especially if they’re expecting you to be more productive using tools like Claude Code and Copilot.

If you’re an engineering leader, this is the real AI-assisted software engineering training your teams need – and, funnily enough, it’s mostly about software engineering and only a little bit about AI. It’s about making teams AI-ready.

It’s 6x half-day modules that give developers a practical, hands-on introduction to the foundational technical practices that enable teams to accelerate release cycles, shrink lead times and improve release reliability – with and without AI.

Specification By Example
Test-Driven Development
Refactoring
Design Principles
Continuous Delivery
Code Craft & AI – grounded on hard data, includes how to apply CRESS principles for context engineering to AI-assisted workflows

To learn more and register, visit https://codemanship.co.uk/codecraft.html

Places are limited.

Slow. The. F**k. Down.

“Slow is smooth, and smooth is fast.”

US Navy SEALS training mantra

Another day, another data set telling us what we already knew.

In the latest AI Engineering Report from Faros, the software development telemetry folks, they found from studying 22,000 developers working on more than 4,000 teams what they call an “acceleration whiplash” effect caused by AI code generation.

As with other large-scale studies, more code’s undoubtedly being generated faster. Individual developer “productivity” – which I put in quotation marks because there’s no such thing – is up. Nobody’s contesting that.

But they also saw the same “downstream chaos” that the DORA folks saw in their data, and CircleCI saw in 28 million merge, build & deployment workflows.

Incidents per Pull Request were up 242%. Monthly incidents were up 58%. Bugs were up 54% (and that’s up 9% on 2025, so it’s accelerating).

Work restarts are up 14%. 26% more of tasks show no activity for a week or more, languishing in code review purgatory. More work in progress, more work stalled, more work abandoned. As they put it, “beginning is easy and finishing is hard”. And the best available data is telling us that unconstrained AI code generation is making it much harder.

Notably, they found that PR’s that made it into production without any review were up 31%. Teams are shipping at “LGTM-speed”.

Hey, it’s this guy again.

This, they found, is accompanied by rising levels of cognitive load caused by bigger change sets requiring oversight – much bigger – and by increased context-switching caused by more work in progress; more plates spinning. And the average outcome of spinning more plates is broken plates (and broken plate-spinners).

Bugs, incidents and rework are rising rapidly, and – surprise, surprise – delivery lead times (by their definition, time from changes being committed to getting into production) are up 480%. That’s 5x!

In the 90s, CASE tool vendors often used the marketing strapline “better software, faster”. Perhaps AI coding tool vendors’ should read “worse software, later”.

All of this was predictable, and all of it was predicted – not least by me. Bigger changes hitting downstream bottlenecks faster just causes longer delays. We walked into this ambush with our eyes wide open. It’s the bottlenecks, stupid!

75+ years of software development has taught us important lessons about how we should work. That most teams still haven’t got the memo in 2026 – and many teams who did get the memo seem to have forgotten – is embarrassing for our profession. All it took was a shiny new toy, and software engineering fundamentals went out of the window. It’s not like that hasn’t happened before, but never on this scale or at this speed.

And what are those fundamentals?

Work in small slices, solving one problem at a time
Test, review and merge continuously
The gold isn’t in what we ship, it’s in what we learn from what we ship. A team that ships 2x and gets 0.5x meaningful user feedback is a 0.5x team. The time it takes to get that feedback is the ultimate speed limit, not how fast features can be churned out.
Slow the fuck down! Move forward by putting one sure foot in front of the other

I was in my twenties when I learned that what feels slow in software development can often turn out to be fast when we measure it at the system level, and not at my individual level. I learned to mistrust my gut feeling of being “productive”. In reality, my individual productivity is an illusion – albeit a very seductive one.

We’ve known for decades that in software development, batch size and feedback loops do the heavy lifting. How fast code’s being generated is a tiny little mouse compared to those elephants.

And until teams address the elephant in the room, the “downstream chaos” is just going to get worse as AI code generation becomes an ever-larger part of the process. It’s a good job we didn’t put software in everything, or this could have serious consequences for society!

Here’s your guide to how dev teams can become AI-ready, and it has very little to do with AI, and a lot to do with how you approach development.

Essential Code Craft – The Roadmap

Some of you may have noticed that I’ve been running out-of-hours training workshops for self-funding learners recently, under the banner of Essential Code Craft.

In a way, this is a return to the early days of Codemanship when I ran regular weekend workshops – priced for individual pockets – that were mostly attended by developers investing in their own skills and career development.

Many of those people are now CTOs and heads of engineering, and I’ve been fortunate – and grateful – that quite a few have brought me in to provide the same kind of training for their teams.

But with senior engineering leaders now very distracted by the code-generating firehose – and while I wait for them to realise that nothing’s actually changed as far as software engineering fundamentals are concerned – I’m pivoting back to self-funders.

So far – just as it was way back when – the first two workshops filled up quickly. While the boss might not be thinking about investing in their developers at the moment, it seems a lot of developers are looking to invest in themselves.

And this is exactly the moment to do it. While a gazillion developers hunt for magic incantations to make a probabilistic next-token predictor act like something other than a probabilistic next-token predictor, the people who’ve done their homework already know: better results with AI coding tools have very little to do with the tools, and almost everything to do with the processes around them.

And it’s a double-win. The practices that produce the best outcomes with AI are the exact same practices that produce the best outcomes without AI.

The key to being effective with AI is being effective without it.

And here’s the hedge, but only for the informed gamblers – developer hiring is rising again, but the demographic of these new hires is changing. Employers are favouring senior developers with significant pre-LLM experience.

I, and a few others, predicted this would happen. Demand would be highest for people who can do the things AI coding tools can’t – like, well, understand code. I mean really understand it. Not “LGTM” understanding. Deep comprehension of programs.

Not only that, but for all kinds of good reasons – economic, environmental, energy, ethical, geopolitical – the future of hyperscale LLMs is by no means predictable. Folks grappling with reduced token limits and rapidly degrading performance with Anthropic’s newest models will hopefully have figured out by now that building workflows that depend heavily in hyperscale LLMs is building on quicksand.

Who are Acme Megacorp gonna’ hire – the dev who sits on their hands because they’re waiting for their token limit to reset, or the dev who can just carry on at roughly the same overall pace of delivery?

And we should be under no illusions that teams who’ve mastered the fundamentals of software delivery are routinely outperforming teams who haven’t – with or without AI. AI is clearly not the differentiator.

So, whether you’re going to apply these disciplines with Claude Code or Codex, or with IntelliJ or VS Code, they still matter – arguably more than ever.

And what are these disciplines? What is Essential Code Craft?

Specification By Example – build shared understanding and pin down requirements with testable specifications
Test-Driven Development – rapidly iterate working software designs with short delivery lead times and reliable releases
Continuous Integration – keep teams more in sync with their changes, merging and testing them many times a day to ensure a working, shippable-at-any-time product
Continuous Collaboration – keep teams on the same page by continuously communicating with practices like pair programming and teaming
Refactoring – reshape code to make change easier, while keeping it working and shippable at all times
Modular Design – optimise software architecture to localise the “blast radius” and minimise the cost of changes, while making rapid testing and smarter reuse easier
Continuous Inspection – minimise the bottleneck and the “LGTM” effect of downstream code review by making it a continuous and highly automated process
Continuous Delivery – combine these fundamentals in a delivery process that can get the proverbial peas from the farmer’s field to the kitchen table through rapid, reliable integration, build and deployment pipelines
Continuous Improvement – build development capability in an evidence-based way, learning what really works and what doesn’t as you build skills, automate tools and workflows, and explore and experiment with your approach – and that’s where I come in!)

Workshops on Specification By Example and Test-Driven Development are already live and taking registrations. If there’s demand, more will follow.

The roadmap is to build a set of repeating individual workshops, rotating monthly, that will eventually cover all of these disciplines – some explicitly, some implicitly like Continuous Integration and pair programming, which will be an integral part of most workshops.

Self-funders can pick and choose which to attend, and my hope is that they’ll be a bit like Pokemon cards – gotta collect ’em all!

Keep an eye on the Codemanship Ticket Tailor box office for details of upcoming workshops.

Also, details of new workshop times will be posted here first, so subscribe to this blog if you’d like to be kept in the loop for future workshops.

The Mythical Agent-Month

_Psst. _{If your boss won’t invest in training you in Specification By Example (BDD, ATDD), I’m running out-of-hours workshops on May 12 and 16 specifically for self-funding learners. £99 + UK VAT.}

One of the more frustrating aspects of this new “AI era” is watching people rediscover things we’ve known about software development for many decades.

Apparently, for example, it really helps if our specifications include tests when we use AI. You don’t say?

Perhaps most annoying, though, are pronouncements that – “with AI”, of course – small teams outperform large teams. Yes indeed. “With AI”, a team of 6 developers can do the work of a team of 26.

Forget the fact that we’ve known that a team of 6 will tend to outperform a team of 26 for at least 50 years. Or the fact that we know precisely why small teams tend to get more done than large teams.

It’s called “Brooks’ Law”, named after Fred P. Brooks, author of the seminal software engineering book The Mythical Man-Month, published in 1975.

Among the topics Brooks expounds on is the effect of team size on the lines of communication required to keep that team in sync.

It can be calculated by imaginary pair-programming configurations. A team of 2 can only pair with each other, so they are continuously synchronising.

A team of 3 – let’s call them Aunt Flo, Farmer Barleymow and Bod (since we’re in 1975) – can be paired 3 ways to keep in sync:

Flo & Barleymow
Flo & Bod
Bod & Barleymow

A team of 4 requires 6 different pairings to keep everyone in sync. A team of 5 requires 10 unique pairings.

With each additional team member, the lines of communication required to keep everyone on the same page increases non-linearly. As the team grows, the communication overhead explodes.

The upshot of all this is that larger teams spend orders of magnitude more time keeping in sync, or – more commonly – dealing with the downstream consequences of not keeping in sync.

Brooks’ Law simply states that adding people to a late project makes it later.

It also means that adding developers to a team has rapidly diminishing returns. A team of 6 will probably get more done than a team of 3, but not twice as much. And a team of 12 may well get less done than a team of 6.

So am I at all surprised by this revelation that – “with AI” – a team of 6 can deliver more value than a team of 26? No. It’s exactly what I’d expect, with or without AI.

And I’ve seen it play out many times in my career: that cohesive, small team who were getting shit done has a dozen more bodies thrown at them in the mistaken belief that more shit will get done faster.

Typically, productivity slows to a crawl as the team attempts to get everybody pushing in roughly the same direction.

When it comes to software economics, team size – and team makeup – has very high leverage. Always has, and always will, regardless of who’s typing the code.

So it’s curious how, in the 51 years since TMM-M was published, the standard management response to slipping schedules and escalating costs has been to increase team size. It’s almost as if they haven’t heard of Brooks’ Law.

Now, here’s where this gets interesting. The combinatorial explosion in lines of communication required to keep team members in sync applies whether those team members are human or AI.

Agents working on the same code need to keep as much as possible on the same page about what’s being done to what. When 2 agents work in parallel on their own branches, the longer they go without synchronising, the more their picture of the plan and of the code drifts apart, and the bigger the risk of conflicts grows.

Add a third, and that overhead increases linearly. But add a fourth, and we’re off to the races.

At this point, agents either spend more time waiting in merge queues, or even more time dealing with the consequences of not waiting to merge safely.

Those nice folks who make Cursor helpfully demonstrated what happens when agent “swarms” are let loose on the same code base at the same time.

Here’s the rub; because merges – to happen safely – have to go in single file, there are real hard limits to how many merges can happen in any period of time. And therefore real hard limits to how many people or agents can be changing the same code base at the same time.

And that’s just the coordination required to avoid breaking the build. That’s before we even think about coordinating over requirements, over architecture, over coding standards etc etc etc.

When I run team workshops, the size of the team is a pretty reliable predictor of which ones will successfully complete the exercise. A team of 4 has a big advantage over a team of 12.

Although I’ve yet to test this empirically, what I’m seeing and hearing is that the number of agents working in parallel on the same set of source files might follow a similar trend.

Indeed, the folks I’ve been watching experiment with agentic software development the longest appear to have landed on a single thread of execution as the optimal solution. Even when there are multiple specialised agents involved, they’re taking turns.

It’s quite possible that the limit of the number of agents working in parallel – without very high separation of concerns from the code other agents are working on – is just one.

When we spin too many plates, the result is usually broken plates.

The AI-Ready Software Developer #24 – Specification Is A Conversation

_Psst. _{If your boss won’t invest in training you in Specification By Example, I’m running out-of-hours workshops on May 12 and 16 specifically for self-funding learners. £99 + UK VAT.}

A sentiment I see often on social media about “AI”-assisted and agentic coding goes something along the lines of “If you’re just translating specs into code, your job is disappearing”.

It sounds reasonable on the surface, if you believe that’s all many programmers were doing. Someone – say, a product manager or an architect – hands the programmer a specification for a feature, and the programmer just “codes it up” like a pharmacist filling a prescription.

But was that ever really a thing?

In reality, most software specifications are incomplete and ambiguous, and often contain logical contradictions that are hard to spot – because of the incompleteness and the ambiguity.

Think of the movie script that contains the line “A huge battle ensues”. The studio asks “How much will that cost?” The producer has absolutely no idea, because that part of the script still needs to be written. The line’s just a placeholder for more work to flesh out the details. And in software development, just as in movie-making, the devil is in the details. That’s where the time and the money goes.

And that’s the reality of software specifications written in natural languages like English, even ones written by programmers. At best, they’re placeholders for conversations. Extreme Programming actually makes that explicit: a “user story” is not a requirements specification. It’s just a placeholder for a chat with the person who wrote it. That’s why it’s a waste of time making them detailed.

And this means that the programmer’s job is not just to “code up the spec”, it’s to figure out what the specification actually means. What exactly happens in this huge battle?

And because specifications are incomplete and ambiguous and often contradictory, this process inevitably has to walk everything back to figuring out what the need being addressed is in the first place.

This is why, for so many years, I drummed it into product managers, requirements analysts and the like to come to the team not with the “what”, and certainly not with the “how”, but with the “why” – what problem are we aiming to solve?

We then work as a team – leveraging our combined expertise in systems design and development and the problem domain – to learn together how to solve the problem through successive iterations of best guesses informed by rapid user feedback.

Now, I could be wrong, but that doesn’t sound like “just translating specs into code” to me.

The promise of this new generation of “AI” coding tools is that non-programmers will be able to iterate working software by themselves. And this is true, to an extent.

Tools like Claude Code and Cursor have proven themselves to be very useful for generating prototypes and proofs-of-concept with no programmer involvement, enabling business analysts, UX designers, product managers, start-up founders and pastry chefs to test simple ideas quickly and cheaply.

The problem is that without the expert judgement of experienced programmers, it doesn’t mature to reliable, scalable, secure software that stands up to real-world production rigours.

So, at some point, you’ll have to pick up the phone to Programmers-R-Us and get some involved if you want your experiment to scale. Have your cheque book ready!

And this is where the problems really start. You now have kind-of, sort-of working software that validates your idea. There’s a fork in the road here. You can either:

Have the programmers find and fix all the problems to make the prototype market-ready
Use the prototype as the specification and have the programmers build a production-quality version from scratch with, y’know, tests and architecture and stuff

Let’s go through Door #1.

So, the prototype sort-of works, but there are bugs – oh, boy are there bugs?! – and security vulnerabilities and performance bottlenecks and scaling blockers and some gone-off cheddar and discarded prams and all the kind of stuff that LLMs will tend to leave in your code if you let them. Which you did, because you can’t tell a switch statement from a discarded pram.

So the programmers need to test the software thoroughly to find all the usage scenarios where the software doesn’t do what it’s supposed to. And there’s the Catch 22. What is it supposed to do? If only there was a complete and precise specification!

We don’t fare much better with Door #2.

Now your programmers have to reverse-engineer the prototype to figure out what it does. What happens when the user leaves that field blank and clicks “Continue”? What happens when the clock strikes midnight and interest needs to be applied to the account? What happens when the vehicle remains stationary for more than 5 minutes? Edge case after edge case after edge case.

You run into the exact same wall. A complete and precise specification for any non-trivial software is made up of thousands of definitive answers to these kinds of questions. Software systems are the most complex machines we’ve ever built. You can’t specify them on the back of a cigarette packet.

Or, to put it the way a customer once put it to me when we had this discussion about a new feature, “Why are you making it so complicated, Jason?”

“I’m not making it complicated. That’s how complicated what you’re asking for is.”

The system has to handle all of these inputs in some meaningful way, otherwise it will break. If the user’s email address isn’t valid, a whole bunch of features won’t work. Are you happy for the system to just not work for those users?

Then, as it almost always does, the conversation turned into a negotiation about the scope and complexity of that feature for the next release. We can always remove one variable now, and add it in a later iteration. It’s an old physics trick (see: Special Relativity).

And this is why requirements specifications are placeholders for conversations. If there’s no conversation, issues will not get addressed by experts who understand them until much, much later when they’re much, much harder to fix.

This is why, as a tech lead, I almost always – when presented with a “requirements specification” as a fait accompli – pressed the “Reset” button and started the conversation again at “Okay, so what seems to be the problem?”

That’s Door #3 – involve programmers early. Because those conversations have to happen whether you like it or not, and the sooner you have them, the sooner you’ll converge on a workable, production-ready solution.

A simple prototype can help you validate your idea before you pick up that phone, but the more design decisions you make before involving experts, the bigger and badder the catch-up’s going to be later. And you might be surprised – when you have a clear end goal in mind – how simple the simplest proof-of-concepts can be.

I’ve been in this game for 34 years, and in that time I’ve seen countless attempts to demarcate this process of building an understanding of not just what the software needs to do, but why it needs to do it.

They all inevitably walk into the same wall. You cannot pay someone else to understand something for you. It’s like paying someone to revise for your exams.

Software specification is necessarily a conversation between people with needs – and, ideally, money – and people who specialise in meeting needs using computers. T’was ever thus, t’will ever be.

Unless, of course, your specification is complete, consistent and mathematically precise.

And a complete, consistent, mathematically precise specification of a computer program is that computer program. That’s what source code is, and that’s why programming languages were invented.

A person who just translates complete, consistent and mathematically precise specifications into executable code is a compiler.

Fans of Spec-Driven Development may be feeling vindicated now because you believe your specifications are complete, consistent and precise. If you’ve clarified requirements using examples – to me and you, tests – that might push them towards being of that integrity.

But even if your specs really are completely complete, and completely consistent and completely precise – and even if LLMs were capable of reliably translating such specifications into code (which they’re not) – you need to remember that it will still be full of assumptions about what’s really needed. Basically, a formal specification is just formalised guesswork.

To quote the Second Doctor, “Logic, my dear Zoe, merely enables one to be wrong with authority”.

The real knowledge isn’t in the spec, or in the code, it’s in the feedback we get when people use it in the real world. In this sense, iterating is the ultimate requirements discipline – it’s where most of the real value gets discovered.

So, by all means, spec away. But don’t spec far – just enough to test an assumption with user feedback from working software. And user feedback’s like code reviews – the more changes we ask for feedback on, the less attention gets paid to most of them.

Research has found that when users give feedback, they often anchor on one or two standout moments – positive or negative – rather than the entire user experience. Psychologists call it the “peak-end rule” – it’s “LGTM” for user eyeballs.

Spec one change to functionality at a time, build it in rapid, tested iterations, ship it through a reliable delivery pipeline, and then go get that focused feedback. Because the spec very probably will need to change.

And if the spec rarely changes, I’d worry that we aren’t listening to our users. Either that or we got incredibly lucky (or clairvoyant).

It’s all one big, ongoing conversation.

Engineering Leaders: Your AI Adoption Doesn’t Start With AI

In the past few months, I’ve been hearing from more and more teams that the use of AI coding tools is being strongly encouraged in their organisations.

I’ve also been hearing that this mandate often comes with high expectations about the productivity gains leaders expect this technology to bring. But this narrative is rapidly giving way to frustration when these gains fail to materialise.

The best data we have shows that a minority of development teams are reporting modest gains – in the order of 5%-15% – in outcomes like delivery lead times and throughput. The rest appear to be experiencing negative impacts, with lead times growing and the stability of releases getting worse.

The 2025 DevOps Research & Assessment State of AI-assisted Software Development report makes it clear that the teams reporting gains were already high-performing or elite by DORA’s classification, releasing frequently, with short lead times and with far fewer fires in production to put out.

As the report puts it, this is not about tools or technology – and certainly not about AI. It’s about the engineering capability of the team and the surrounding organisation.

It’s about the system.

Teams who design, test, review, refactor, merge and release in bigger batches are overwhelmed by what DORA describes as “downstream chaos” when AI code generation makes those batches even bigger. Queues and delays get longer, and more problems leak into releases.

Teams who design, test, review, refactor, merge and release continuously in small batches tend to get a boost from AI.

In this respect, the team’s ranking within those DORA performance classifications is a reasonably good predictor of the impact on outcomes when AI coding assistants are introduced.

The DORA website helpfully has a “quick check” diagnostic questionnaire that can give you a sense of where your team sits in their performance bands.

(Answer as accurately as you can. Perception and aspiration aren’t capability.)

The overall result is usefully colour-coded. Red is bad, blue is good. Average is Meh. Yep, Meh is a colour.

If your team’s overall performance is in the purple or red, AI code generation’s likely to make things worse.

If your team’s performance is comfortably in the blue, they may well get a little boost. (You can abandon any hopes of 2x, 5x or 10x productivity gains. At the level of team outcomes, that’s pure fiction.)

The upshot of all this is that before you even think about attaching a code-generating firehose to your development process, you need to make sure the team’s already performing at a blue level.

If they’re not, then they’ll need to shrink their batch sizes – take smaller steps, basically – and accelerate their design, test, review, refactor and merge feedback loops.

Before you adopt AI, you need to be AI-ready.

Many teams go in the opposite direction, tackling whole features in a single step – specifying everything, letting the AI generate all the code, testing it after-the-fact, reviewing the code in larger change-sets (“LGTM”), doing large-scale refactorings using AI, and integrating the whole shebang in one big bucketful of changes.

Heavy AI users like Microsoft and Amazon Web Services have kindly been giving us a large-scale demonstration of where that leads – more bugs, more outages, and significant reputational damage.

A smaller percentage of teams are learning that what worked well before AI works even better with it. Micro-iterative practices like Test-Driven Development, Continuous Integration, Continuous Inspection, and real refactoring (one small change at a time) are not just compatible with AI-assisted development, they’re essential for avoiding the “downstream chaos” DORA finds in the purple-to-red teams.

And while many focus on the automation aspects of Continuous Delivery – and a lot of automation is required to accelerate the feedback loops – by far the biggest barrier to pushing teams into the blue is skills.

Yes. SKILLS.

Skills that most developers, regardless of their level of experience, don’t have. The vast majority of developers have never even seen practices like TDD, refactoring and CI being performed for real.

That’s certainly because real practitioners are pretty rare, so they’re unlikely to bump into one. But much of this is because of their famously steep learning curves. TDD, for example, takes months of regular practice to to be able to use it on real production systems.

And, as someone who’s been practicing TDD and teaching it for more than 25 years, I know it requires ongoing mindful practice to maintain the habits that make it work. Use it or lose it!

An experienced guide can be incredibly valuable in that journey. It’s unrealistic to expect developers new to these practices to figure it all out for themselves.

Maybe you’re lucky to have some of the 1% of software developers – yes, it really is that few – who can actually do this stuff for real. Or even one of the 0.1% who has had a lot of experience helping developers learn them. (Just because they can do it, it doesn’t necessarily follow that they can teach it.)

This is why companies like mine exist. With high-quality training and mentoring from someone who not only has many thousands of hours of practice, but also thousands of hours of experience teaching these skills, the journey can be rapidly accelerated.

I made all the mistakes so that you don’t have to.

And now for the good news: when you build this development capability, the speed-ups in release cycles and lead times, while reliability actually improves, happen whether you’re using AI or not.

A New DORA Performance Level – Catastrophically Bad

DORA (DevOps Research & Assessment) has 4 broad levels for dev team performance: Elite, High, Medium, and Low.

A Low-Performing team deploys less than once a month, has lead times for changes > 1 month, sees as many as half their deployments go boom, and takes more than a week to fix them.

An Elite team deploys changes multiple times a day, has lead times typically of < 1 day, failure rates < 15% (fewer than 1 in 8 deployments go boom), and can fix failed releases in under an hour.

I picture a Low-Performing team walking a tightrope between two mountain peaks. It’s a long way to safety (working, shippable code), a long way down if they fall, and a long climb back up to try again.

I picture an Elite team as walking the same length tightrope, tied to wooden posts a few feet apart, just 3 feet off the ground. Safety’s never far away, and a fall’s no big deal. They can quickly recover.

I would like to propose a 5th performance level, one that I’ve seen for real more than once: Catastrophically Bad.

At this level, the tightrope has snapped.

I’ve seen teams stuck in a death spiral where no changes can be deployed because every release goes boom. One example was a financial services company here in London who’d been running themselves ragged trying to stabilise a release for almost a year.

Every deployment had to be rolled back, and every deployment cost them high six-figures in client compensation, not to mention loss of reputation.

You know the drill: testing was done 100% manually and took weeks. While the devs were fixing the bugs testing found, they were introducing all-new bugs (and reintroducing a few old favourites).

A change failure rate of 100% and a lead time of – effectively – infinity.

How does a Catastrophically Bad development process that delivers nothing turn into at the very least a Low-Performing process that delivers something? (And yes, this is the mythical “hyper-productivity” the Scrum folks told you about – and no, Scrum isn’t the answer).

What we did with my client started with 2 fundamental changes:

Fast-running automated “smoke” tests, selected by analysing which features broke most often
CI traffic lights – a wrapper around version control that forced check-ins to go in single file, and blocked check-ins and check-outs of changes when the light wasn’t “green” (meaning, no check-in in progress and build is working).

It took 12 weeks to go from Catastrophically Bad to Medium-Performing. (Pro tip for new leaders and process improvement consultants – poorly-performing teams are a gift because they have such low-hanging fruit).

You can build from here. In this case, by showing teams how to safely change legacy code in ways that add more fast-running regression tests and gradually simplify and modularise the parts of the code that are changing the most.

(The title image is taken from a Catastrophically Bad agentic “team”)

71% of Developers and Engineering Leaders Believe “AI” Makes Engineering Discipline More Important

I ran the same poll on LinkedIn and Mastodon, asking:

In your estimation, does AI-assisted and agentic coding make engineering discipline:

More important than before?

As important as before?

Less important than before?

Don’t know

464 people responded, and the final votes tell an interesting story.

More than two-thirds of developers and engineering leaders believe that engineering discipline is more important when we’re using “AI”.

And a whopping 94% believe it’s at least as important as before.

This directly contradicts the narrative that “software engineering is dead” in this age of “AI” coding assistants and agents. And the evidence bears that out. “AI” is an amplifier of engineering capability, not a magic wand that fixes bottlenecks, blockers and quality leaks. Unless you address them (and that’s mostly a skills thing – it doesn’t come with your Claude Code plan), it just makes them worse.

Teams using Claude Code, Cursor, Copilot and other LLM-based code generating tools experience positive benefits – in terms of outcomes like lead times and release stability – if they’re applying good technical practices.

Teams that don’t apply enough engineering discipline experience negative impact on those outcomes. They deliver less reliable software that costs more to change, and they deliver it later.

And those teams are sadly in the majority, according to the DORA data.

So the beliefs seem to match the reality – engineering discipline is undeniably more important when we’re drinking from a code-generating firehose.

The mystery that remains is why, given we mostly all seem to agree on this, we see no signs of increased investment in engineering capability. Demand for training hasn’t been rising (and I would know if it had) – and that includes training in “AI-assisted” development practices.

Teams appear to have been left to figure it out for themselves by a process of trial and error.

At the same time, many developers now report feeling under pressure to deliver the claimed productivity gains of “AI” – spend 2 minutes on sites like LinkedIn and you’ll see some very high expectations being set – and the lack of support probably isn’t helping.

Practices like Specification By Example, Test-Driven Development, continuous inspection, refactoring, and continuous integration aren’t just compatible with “AI”-assisted workflows – they’re essential for them.

I’ve spent more than 25 years applying these practices successfully, and teaching them to thousands of developers face-to-face and countless more through online video tutorials, blog posts and social media.

And I’ve devoted a large chunk of the last three years exploring how they can benefit “AI”-assisted workflows. Check out my AI-Ready Software Developer blog series for the highlights.

For teams who want a hands-on introduction, my training courses now incorporate “AI”. Once you’ve learned these practices in your IDE, you’ll get a chance to apply them using tools like Claude Code and Cursor and see the difference they can make. This way, you can become more effective with and without “AI”.

It turns out that’s the key to succeeding with it.

Will You Finally Address Your Development Bottlenecks In 2026?

I’ve spent the best part of 3 decades telling teams that to minimise the bottleneck of testing changes to their code, they’ll need to build testing right into their innermost workflow, and write fast-running automated regression tests.