Engineering Leaders: Your AI Adoption Doesn’t Start With AI

In the past few months, I’ve been hearing from more and more teams that the use of AI coding tools is being strongly encouraged in their organisations.

I’ve also been hearing that this mandate often comes with high expectations about the productivity gains leaders expect this technology to bring. But this narrative is rapidly giving way to frustration when these gains fail to materialise.

The best data we have shows that a minority of development teams are reporting modest gains – in the order of 5%-15% – in outcomes like delivery lead times and throughput. The rest appear to be experiencing negative impacts, with lead times growing and the stability of releases getting worse.

The 2025 DevOps Research & Assessment State of AI-assisted Software Development report makes it clear that the teams reporting gains were already high-performing or elite by DORA’s classification, releasing frequently, with short lead times and with far fewer fires in production to put out.

As the report puts it, this is not about tools or technology – and certainly not about AI. It’s about the engineering capability of the team and the surrounding organisation.

It’s about the system.

Teams who design, test, review, refactor, merge and release in bigger batches are overwhelmed by what DORA describes as “downstream chaos” when AI code generation makes those batches even bigger. Queues and delays get longer, and more problems leak into releases.

Teams who design, test, review, refactor, merge and release continuously in small batches tend to get a boost from AI.

In this respect, the team’s ranking within those DORA performance classifications is a reasonably good predictor of the impact on outcomes when AI coding assistants are introduced.

The DORA website helpfully has a “quick check” diagnostic questionnaire that can give you a sense of where your team sits in their performance bands.

Image

(Answer as accurately as you can. Perception and aspiration aren’t capability.)

The overall result is usefully colour-coded. Red is bad, blue is good. Average is Meh. Yep, Meh is a colour.

Image

If your team’s overall performance is in the purple or red, AI code generation’s likely to make things worse.

If your team’s performance is comfortably in the blue, they may well get a little boost. (You can abandon any hopes of 2x, 5x or 10x productivity gains. At the level of team outcomes, that’s pure fiction.)

The upshot of all this is that before you even think about attaching a code-generating firehose to your development process, you need to make sure the team’s already performing at a blue level.

If they’re not, then they’ll need to shrink their batch sizes – take smaller steps, basically – and accelerate their design, test, review, refactor and merge feedback loops.

Before you adopt AI, you need to be AI-ready.

Many teams go in the opposite direction, tackling whole features in a single step – specifying everything, letting the AI generate all the code, testing it after-the-fact, reviewing the code in larger change-sets (“LGTM”), doing large-scale refactorings using AI, and integrating the whole shebang in one big bucketful of changes.

Heavy AI users like Microsoft and Amazon Web Services have kindly been giving us a large-scale demonstration of where that leads – more bugs, more outages, and significant reputational damage.

A smaller percentage of teams are learning that what worked well before AI works even better with it. Micro-iterative practices like Test-Driven Development, Continuous Integration, Continuous Inspection, and real refactoring (one small change at a time) are not just compatible with AI-assisted development, they’re essential for avoiding the “downstream chaos” DORA finds in the purple-to-red teams.

And while many focus on the automation aspects of Continuous Delivery – and a lot of automation is required to accelerate the feedback loops – by far the biggest barrier to pushing teams into the blue is skills.

Yes. SKILLS.

Skills that most developers, regardless of their level of experience, don’t have. The vast majority of developers have never even seen practices like TDD, refactoring and CI being performed for real.

That’s certainly because real practitioners are pretty rare, so they’re unlikely to bump into one. But much of this is because of their famously steep learning curves. TDD, for example, takes months of regular practice to to be able to use it on real production systems.

And, as someone who’s been practicing TDD and teaching it for more than 25 years, I know it requires ongoing mindful practice to maintain the habits that make it work. Use it or lose it!

An experienced guide can be incredibly valuable in that journey. It’s unrealistic to expect developers new to these practices to figure it all out for themselves.

Maybe you’re lucky to have some of the 1% of software developers – yes, it really is that few – who can actually do this stuff for real. Or even one of the 0.1% who has had a lot of experience helping developers learn them. (Just because they can do it, it doesn’t necessarily follow that they can teach it.)

This is why companies like mine exist. With high-quality training and mentoring from someone who not only has many thousands of hours of practice, but also thousands of hours of experience teaching these skills, the journey can be rapidly accelerated.

I made all the mistakes so that you don’t have to.

And now for the good news: when you build this development capability, the speed-ups in release cycles and lead times, while reliability actually improves, happen whether you’re using AI or not.

The Gorman Paradox – Solution II: They’re In The Bin

Software development’s essentially a learning process. Most of the value in a product or system’s added in response to user and eventually market feedback.

With each iteration we get the design less wrong. With each iteration, we learn.

The effect of batch size on learning is profound.

I urge teams to work on the basis that every design decision is guesswork until it hits the real world. We can’t know with certainty that we made the right decisions.

Getting user feedback is the only meaningful mechanism we have to “turn the cards over” and found out if we guessed right. In this sense, learning is characterised as reducing or eliminating uncertainty in product design. Teams who do this faster will tend to out-learn their competition.

Imagine trying to guess a random 4-digit number in one go vs. guessing one digit at a time.

In both approaches, we start with the same odds of guessing it right: 1/10,000. But with each guess, the uncertainty collapses orders of magnitude faster when we’re guessing one digit at a time. The latter approach out-learns the former.

Even if we had an “AI” random 4-digit number generator that enabled us to make 10x as many guesses in the same time, guessing one digit at a time would still out-learn us.

The chances of a complete solution delivered in a single pass – guessing all 4 digits in one go – being even on the same continent as correct are vanishingly remote, and we learn very little because of the nature of user feedback.

If I deliver 50 changes (e.g., new features) in a single release and ask users “waddaya think?”, I won’t get meaningful feedback about all 50 changes.

Most likely I’ll get general feedback of the “LGTM” or “meh” variety, and maybe some specific feedback about things that stood out. (Bugs in a release tend to overshadow anything else, for example – the proverbial fly in the soup. “Waddaya think of the soup?” “There’s a fly in it!”)

If I deliver ONE change, they’ll probably have something meaningful to say about it. We can at least observe what impact that one change has on user behaviour (e.g., engagement, completing tasks etc).

So we learn faster when we iterate fewer changes into the hands of users at a time. This inevitably forces us to apply the brakes on the creation of code, because we need to wait for feedback, and we need to do that often.

I see many posts here from folks claiming to have generated entire applications in days or even hours using LLM-based coding tools. That’s the equivalent of “guessing all 4 digits at a time using an ‘AI’ 4-digit number generator”. That’s an entire application – hundreds of design decisions – created without any user feedback.

Creating an entire application in a single pass is every bit as “Big Design Up-Front” as wireframing or modeling the whole thing in UML in advance. And assumptions and guesses in your early decisions get compounded in later decisions, piling up uncertainty under a mountain of interconnected complexity. Failure is almost inevitable.

This is another potential solution to the Gorman Paradox.

Where are all the “AI”-generated apps? In the bin.

It just so happens that I train and mentor teams in the technical practices that enable them to learn faster from user and market feedback. I know, right! What are the chances?

And it also just so happens that any Codemanship training course booked by January 31st 2026 is HALF-PRICE. Which is nice.

The Great Filter (Or Why High Performance Still Eludes Most Dev Teams, Even With AI)

In my post about The Gorman Paradox, I compare the lack of any evidence of “AI”-assisted productivity gains to be found out here in the Real WorldTM with the famous Fermi Paradox that asks, if the universe is teeming with intelligent life, where is everybody?

It’s been over 3 years, and we’ve seen no uptick in products being added to the app stores. We’ve seen no rising tide on business bottom lines. We’ve seen no impact on national GDPs.

There is a likely explanation, and it’s the most obvious one: “AI”-assisted coding doesn’t actually make the majority of dev teams more productive. For sure, it produces more code. But, on average, it creates no net additional value.

The DORA data does find some teams reaping modest gains in terms of software delivery lead times without sacrificing reliability, and – interestingly – the data shows that those high-performing teams using “AI” were already high-performing without it.

The majority of teams showed that “AI” actually slowed them down, and these were the teams who were already pretty slow before “AI”. Attaching a code-generating firehose to the process just made them marginally slower.

The differentiator? Are the high-performing teams super-skilled programmers? Are they getting paid more? Are they putting something in the office water supply?

It turns out that what separates the teams who get a negative boost from the teams who get a positive boost is that the latter have addressed the bottlenecks in their development process.

Blocking activities, like detailed up-front design, after-the-fact testing, Pull Request code reviews, and big merges to the main branch, have been turned into continuous activities.

Teams work in much smaller batches and in much tighter feedback loops, designing, testing, inspecting and merging many times an hour instead of every few days.

Work doesn’t sit in queues waiting for someone’s attention. There are very few traffic lights between the developer’s desktop and the outside world to slow that traffic down.

And this means that changes can make it into the hands of users very rapidly, with highly automated, highly reliable, frictionless delivery pipelines that – as the supermarket ads used to say – get the peas from the farmer’s field to your table in no time at all.

The just-in-time grocery supply chains of supermarkets are a good analogy for the processes high-performing teams are using. Supermarkets don’t buy a year’s supply of fresh peas once a year. They buy tomorrow’s supply today, and their formidable logistical capabilities get those peas on the shelves pronto.

Those formidable logistical capabilities didn’t just appear, either. They’re the product of many decades of investment. Supermarket chains have sunk billions into getting better at it, so they can maximise cash flow by minimising the amount of working capital they have committed at any time.

They don’t want millions of pounds-worth of produce sitting in warehouses making them no money.

And businesses don’t want millions of pounds-worth of software changes sitting in queues waiting to be released. They want them out there in the hands of users, creating value in the form of learning what works and what doesn’t. Software that can’t be used has no value.

Walk into any large organisation and take a snapshot of how much investment in developed code is “in progress”. For some, it literally is million of pounds-worth – tens or hundreds of thousands of pounds, multiplied by dozens or hundreds of teams.

The impact on a business of being able to out-learn the competition can be so profound, we might ask ourselves “Why isn’t everybody doing this?” Can you imagine a supermarket chain deciding not to bother with JIT supply? They wouldn’t last long.

It’s come into focus even more sharply with the rise of “AI”-assisted software development. It’s quite clear now that even modest productivity gains lie on the other side of the spectrum with teams who have addressed their bottlenecks and have low-friction delivery pipelines.

I see a “Great Filter” that continues to prevent the large majority of dev teams making it to that Nirvana. It requires a big, ongoing investment in the software development capability needed.

We’re talking about investment in people and skills. We’re talking about investment in teams and organisational design. We’re talking about investment in tooling and automation. We’re talking about investment in research and experimentation. We’re talking about investment in talent pipelines and outreach. We’re talking about investment in developer communities and the profession of software development.

Typically, I’ve seen that companies who manage to progress from the bottleneck-ridden ways of working to highly iterative, frictionless methods needed to invest 20-25% of their entire development budget in building and maintaining that capability.

And building that kind of capability takes years.

You can’t buy it. You can’t install it. You can’t have it flown in fresh from Silicon Valley.

And, like organ transplants, any attempt to transplant that kind of capability into your business will be met with organisational anti-bodies protecting the status quo.

And that, folks, is The Great Filter.

Most organisations are simply not prepared to make that kind of commitment in time, effort and money.

Sure, they want the business benefits of faster lead times, more reliable releases, and a lower cost of change. But they’re just not willing to pony up to get it.

On a daily basis, I see people online warning us not to “get left behind by AI”. The reality is that the people who really are getting left behind are the ones who think that the bottlenecks and blockers they’ve struggled with in the past will magically get out of the way of the code-generating firehose.

Low-performing teams, now grappling with the downstream chaos caused by “AI” code generation, will probably always be the norm. And the value of this technology will probably never be realised by those businesses.

If you’re on of the few who are serious about building software development capability, my training courses in the technical practices that enable rapid, reliable and sustained evolution of software to meeting changing needs are half price if you confirm your booking by Jan 31st.

Walking Skeletons, Delivery Pipelines & DevOps Drills

On my 3-day Code Craft training workshop (and if you’re reading this in January 2026, training’s half-price if you confirm your booking by Jan 31st), there’s a team exercise where the group need to work together to deliver a simple program to the customer’s (my) laptop where I can acceptance-test it.

It’s primarily an exercise in Continuous Delivery, bringing together many of the skills explored earlier in the course like Test-Driven Development and Continuous Integration.

But it also exercises the muscles individual or pair-programmed exercise don’t reach. Any problem, even a simple one like the Mars Rover, tends to become much more complicated when we tackle it as a team. It requires a lot of communication and coordination. A team will typically take more time to complete it.

And it also exercises muscles that developers these days have never used before. In 2026, the average developer has never created, say, a command-line project from scratch in their tech stack. They’ve never set up a repo using their version control tool. They’ve never created a build script for Continuous Integration builds. They’ve never written a script to automatically deploy working software.

In the age of “developer experience”, a lot of people have these things done for them. Entry-level devs land on a project and it’s all just there.

That may seem like a convenience initially, but it comes with a sort of learned helplessness, with total reliance on other people to create and adapt build and deployment logic when it’s needed. A lot of developers would be on a significant learning curve if they ever needed to get a project up and running or to change, say, a build script.

It’s the delivery pipeline that frustrates most teams’ attempts to get any functionality in front of the customer in this exercise.

I urge them at the start to get that pipeline in place first. Code that can’t be used has no value. They may have written all of it, but if I can’t test it on my machine – nil points. Just like in real life.

They’re encouraged to create a “walking skeleton” for their tech stack – e.g., a command-line program that outputs “Hello, world!”, and has one dummy unit test.

This can then be added to a new GitHub repository, and the rest of the team can be invited to collaborate on it. That’s the first part of the pipeline.

Then someone can create a build script that runs the tests, and is triggered by pushes to the main (trunk) branch. On GitHub, if we keep our technical architecture vanilla for our tech stack (e.g., a vanilla Java/Maven project structure), GitHub actions can usually generate a script for us. It might need a tweak or two – the right version of Java, for example – but it will get us in the ballpark.

So now everyone in the team can clone a repo that has a skeleton project with a dummy unit test and a simple output to check that it’s working end to end.

That’s the middle of the pipeline. We now have what we need to at least do Continuous Integration.

The final part of the pipeline is when the food makes it to the customer’s table. I remind teams that my laptop is a developer’s machine, and that I have versions of Python, Node.js, Java and .NET installed, as well as a Git client.

So, they could write a batch script that clones the repo, builds the software (e.g., runs pip install for a Python project), and runs the program. When I see “Hello, world!” appear on my screen, we have lift-off. The team can begin implementing the Mars Rover, and whenever a feature is complete, they can ping me and ask me to run that script again to test it.

And thus, value begins to flow, in the form of meaningful user feedback from working software. (Aww, bless. Did you think the software was the value? No, mate. The value’s in what we learn, not what we deliver.)

And, of course, in the real world, that delivery pipeline will evolve, adding more quality gates (e.g., linting), parallelising test execution as the suite gets larger, progressing to more sophisticated deployment models and that sort of thing, as needs change.

DevOps – the marriage of software development and operations – means that the team writing the solution code also handles these matters. We don’t throw it over the wall to a separate “DevOps” team. That’s kind of the whole point of DevOps, really. When we need a change to, say, the build script, we – the team – make that change.

But you might be surprised how many people who describe themselves as “DevOps Engineers” wouldn’t even know where to start. (Or maybe you wouldn’t.)

It’s not their fault if they’ve been given no exposure to operations. And it’s not every day that we start a project from scratch, so the opportunities to gain experience are few and far between.

Given just how critical these pipelines are to our delivery lead times, it’s surprising how little time and effort many organisations invest in getting good at them. It should be a core competency in software development.

It’s especially mysterious why so many businesses allow it to become a bottleneck by favouring specialised teams instead of T-shaped DevOps software engineers who can do most of it themselves instead of waiting for someone else to do it. Teams could have a specialised expert on hand for the rare times when deep expertise is really needed.

If the average developer knew the 20% they’d need 80% of the time to create and change delivery pipelines for their tech stack(s), there’d be a lot less waiting on “DevOps specialists” (which is an oxymoron, of course).

Just as a contractor who has to move house often tends to become very efficient at it, developers who have to get delivery pipelines up and running often tend to be much better at the yak shaving it involves.

So I encourage teams to make these opportunities by doing regular “DevOps drills” for their tech stacks. Get a Node Express “Hello, world” pipeline up and running from scratch. Get a Spring Boot pipeline up and running from scratch. etc.

Typically, I see teams doing them monthly, and as they gain confidence, varying the parameters (e.g., parallel test execution, deployment to a cluster and so on), and making the quality gates more sophisticated (security testing, linting, mutation testing and so on), while learning how to optimise pipelines to keep them as frictionless as possible.

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

In this series, I’ve explored the principles and practices that teams seeing modest improvements in software development outcomes have been applying.

After more than four years since the first “AI” coding assistant, GitHub Copilot, appeared, the evidence is clear. Claims of teams achieving 2x, 5x, even 10x productivity gains simply don’t stand up to scrutiny. No shortage of anecdotal evidence, but not a shred of hard data. It seems when we measure it, the gains mysteriously disappear.

The real range, when it’s measured in terms of team outcomes like delivery lead time and release stability, is roughly 0.8x – 1.2x, with negative effects being substantially more common than positives.

And we know why. Faster cars != faster traffic. Gains in code generation, according to the latest DORA State of AI-Assisted Software Development report, are lost to “downstream chaos” for the majority of teams.

Coding never was the bottleneck in software development, and optimising a non-bottleneck in a system with real bottlenecks just makes those bottlenecks worse.

Far from boosting team productivity, for the majority of “AI” users, it’s actually slowing them down, while also negatively impacting product or system reliability and maintainability. They’re producing worse software, later.

Most of those teams won’t be aware that it’s happening, of course. They attached a code-generating firehose to their development plumbing, and while the business is asking why they’re not getting the power shower they were promised, most teams are measuring the water pressure coming out of the hose (lines of code, commits, Pull Requests) and not out of the shower (business outcomes), because those numbers look far more impressive.

The teams who are seeing improvements in lead times of 5%, 10%, 15%, without sacrificing reliability and without increasing the cost of change, are doing it the way they were always doing it:

  • Working in small batches, solving one problem at a time
  • Iterating rapidly, with continuous testing, code review, refactoring and integration
  • Architecting highly modular designs that localise the “blast radius” of changes
  • Organising around end-to-end outcomes instead of around role or technology specialisms
  • Working with high autonomy, making timely decisions on the ground instead of sending them up the chain of command

When I observe teams that fall into the “high-performing” and “elite” categories of the DORA capability classifications using tools like Claude Code and Cursor, I see feedback loops being tightened. Batch sizes get even smaller, quality gates get even narrower, iterations get even faster. They keep “AI” on a very tight leash, and that by itself could well account for the improvements in outcomes.

Meanwhile, the majority of teams are doing the opposite. They’re trying to specify large amounts of work in detail up-front. They’re leaving “AI agents” to chew through long tasks that have wide impact, generating or modifying hundreds or even thousands of lines of code while developers go to the proverbial pub.

And, of course, they test and inspect too late, applying too little rigour – “Looks good to me.” They put far too much trust in the technology, relying on “rules” and “guardrails” set out in Markdown files that we know LLMs will misinterpret and ignore randomly, barely keeping one hand on the wheel.

As far as I’ve seen, no team actually winning with the technology works like that. They’re keeping both hands firmly on the wheel. They’re doing the driving. As AI luminary Andrej Karpathy put it, “agentic” solutions built on top of LLMs just don’t work reliably enough today to leave them to get on with it.

It may be many years before they do. Statistical mechanics predicts it could well be never, with the order-of-magnitude improvement in accuracy needed to make them reliable enough (wrong 2% of the time instead of 20%) calculated to require 1020 times the compute to train. To do that on similar timescales to the hyperscale models of today would require Dyson Spheres (plural) to power it.

Any autonomous software developer – human or machine – requires Actual Intelligence: the ability to reason, to learn, to plan and to understand. There’s no reason to believe that any technology built using deep learning alone will ever be capable of those things, regardless of how plausibly they can mimic them, and no matter how big we scale them. LLMs are almost certainly a dead end for AGI.

For this reason I’ve resisted speculating about how good the technology might become in the future, even though the entire value proposition we see coming out of the frontier labs continues to be about future capabilities. The gold is always over the next hill, it seems.

Instead, I’ve focused my experiments and my learning on present-day reality. And the present-day reality that we’ll likely have to live with for a long time is that LLMs are unreliable narrators. End of. Any approach that doesn’t embrace this fact is doomed to fail.

That’s not to say, though, that there aren’t things we can do to reduce the “hallucinations” and confabulations, and therefore the downstream chaos.

LLMs perform well – are less unreliable – when we present them with problems that are well-represented in their training data. The errors they make are usually a product of going outside of their data distribution, presenting them with inputs that are too complex, too novel or too niche.

Ask them for one thing, in a common problem domain, and chances are much higher that they’ll get it right. Ask them for 10 things, or for something in the long-tail of sparse training examples, and we’re in “hallucination” territory.

Clarifying with examples (e.g., test cases) helps to minimise the semantic ambiguity of inputs, reducing the risk of misinterpretation, and this is especially helpful when the model’s working with code because the samples they’re trained on are paired with those kinds of examples. They give the LLM more to match on.

Contexts need to be small and specific to the current task. How small? Research suggests that the effective usable context sizes of even the frontier LLMs are orders of magnitude smaller than advertised. Going over 1,000 tokens is likely to produce errors, but even contexts as small as 100 tokens can produce problems.

Attention dilution, drift, “probability collapse” (play one at chess and you’ll see what I mean), and the famous “lost in the middle” effect make the odds of a model following all of the rules in your CLAUDE.md file, or all the requirements for a whole feature, vanishingly remote. They just can’t accurately pay attention to that many things.

But even if they could, trying to match on dozens of criteria simultaneously will inevitably send them out-of-distribution.

So the smart money focuses on one problem at a time and one rule at a time, working in rapid iterations, testing and inspecting after every step to ensure everything’s tickety-boo before committing the change (singular) and moving on to the next problem.

And when everything’s not tickety-boo – e.g., tests start failing – they do a hard reset and try again, perhaps breaking the task down into smaller, more in-distribution steps. Or, after the model’s failed 2-3 times, writing the code themselves to get themselves out of a “doom loop”.

There will be times – many times – when you’ll be writing or tweaking or fixing the code yourself. Over-relying on the tool is likely to cause your skills to atrophy, so it’s important to keep your hand in.

It will also be necessary to stay on top of the code. The risk, when code’s being created faster than we can understand it, is that a kind of “comprehension debt” will rapidly build up. When we have to edit the code ourselves, it’s going to take us significantly longer to understand it.

And, of course, it compounds the “looks good to me” problem with our own version of the Gell-Mann amnesia effect. Something I’ve heard often over the last 3 years is people saying “Well, it’s not good with <programming language they know well>, but it’s great at <programming language they barely know>”. The less we understand the output, the less we see the brown M&Ms in the bowl.

“Agentic” coding assistants are claimed to be able to break complex problems down, and plan and execute large pieces of work in smaller steps. Even if they can – and remember that LLMs don’t reason and don’t plan, they just produce plausible-looking reasoning and plausible-looking plans – that doesn’t mean we can hit “Play” and walk away to leave them to it. We still need to check the results at every step and be ready to grab the wheel when the model inevitably takes a wrong turn.

Many developers report how LLM accuracy falls of a cliff when tasked with making changes to code that lacks separation of concerns, and we know why this is too. Changing large modules with many dependencies brings a lot more code into play, which means the model has to work with a much larger context. And we’re out-of-distribution again.

The really interesting thing is that the teams DORA found were succeeding with “AI” were already working this way. Practices like Test-Driven Development, refactoring, modular design and Continuous Integration are highly compatible with working with “AI” coding assistants. Not just compatible, in fact – essential.

But we shouldn’t be surprised, really. Software development – with or without “AI” – is inherently uncertain. Is this really what the user needs? Will this architecture scale like we want? How do I use that new library? How do I make Java do this, that or the other?

It’s one unknown after another. Successful teams don’t let that uncertainty pile up, heaping speculation and assumption on top of speculation and assumption. They turn the cards over as they’re being dealt. Small steps, rapid feedback. Adapting to reality as it emerges.

Far from “changing the game”, probabilistic “AI” coding assistants have just added a new layer of uncertainty. Same game, different dice.

Those of us who’ve been promoting and teaching these skills for decades may have the last laugh, as more and more teams discover it really is the only effective way to drink from the firehose.

Skills like Test-Driven Development, refactoring, modular design and Continuous Integration don’t come with your Claude Code plan. You can’t buy them or install them like an “AI” coding assistant. They take time to learn – lots of time. Expert guidance from an experienced practitioner can expedite things and help you avoid the many pitfalls.

If you’re looking for training and coaching in the practices that are distinguishing the high-performing teams from the rest – with or without “AI” – visit my website.

The AI-Ready Software Developer #20 – It’s The Bottlenecks, Stupid!

For many years now, cycling has been consistently the fastest way to get around central London. Faster than taking the tube. Faster than taking the train. Faster than taking the bus. Faster than taking a cab. Faster than taking your car.

Image

All of these other modes of transport are, in theory, faster than a bike. But the bike will tend to get there first, not because it’s the fastest vehicle, but because it’s subject to the fewest constraints.

Cars, cabs, trains and buses move not at the top speed of the vehicle, but at the speed of the system.

And, of course, when we measure their journey speed at an average 9 mph, we don’t see them crawling along steadily at that pace.

“Travelling” in London is really mostly waiting. Waiting at junctions. Waiting at traffic lights. Waiting to turn. Waiting for the bus to pull out. Waiting on rail platforms. Waiting at tube stations. Waiting for the pedestrian to cross. Waiting for that van to unload.

Cyclists spend significantly less time waiting, and that makes them faster across town overall.

Similarly, development teams that can produce code much faster, but work in a system with real constraints – lots of waiting – will tend to be outperformed overall by teams who might produce code significantly slower, but who are less constrained – spend less time waiting.

What are developers waiting for? What are the traffic lights, junctions and pedestrian crossings in our work?

If I submit a Pull Request, I’m waiting for it to be reviewed. If I send my code for testing, I’m waiting for the results. If I don’t have SQL skills, and I need a new column in the database, I’m waiting for the DBA to add it for me. If I need someone on another team to make a change to their API, more waiting. If I pick up a feature request that needs clarifying, I’m waiting for the customer or the product owner to shed some light. If I need my manager to raise a request for a laptop, then that’s just yet more waiting.

Teams with handovers, sign-offs and other blocking activities in their development process will tend to be outperformed by teams who spend less time waiting, regardless of the raw coding power available to them.

Teams who treat activities like testing, code review, customer interaction and merging as “phases” in their process will tend to be outperformed by teams who do them continuously, regardless of how many LOC or tokens per minute they’re capable of generating.

This isn’t conjecture. The best available evidence is pretty clear. Teams who’ve addressed the bottlenecks in their system are getting there sooner – and in better shape – than teams who haven’t. With or without “AI”.

The teams who collaborate with customers every day – many times a day – outperform teams who have limited, infrequent access.

The teams who design, test, review, refactor and integrate continuously outperform teams who do them in phases.

The teams with wider skillsets outperform highly-specialised teams.

The teams working in cohesive and loosely-coupled enterprise architectures outperform teams working in distributed monoliths.

The teams with more autonomy outperform teams working in command-and-control hierarchies.

None of these things comes with your Claude Code plan. You can’t buy them. You can’t install them. But you can learn them.

And if you’re ticking none of those boxes, and you still think a code-generating supercar is going to make things better, I have a Bugatti Chiron Sport you might be interested in buying. Perfect for the school run!

Are You Training Your Junior Developers, Or Hazing Them?

One of the ways I feel lucky in my software development career is in how I got started.

I learned programming by building – well, trying to build – programs. Complete working programs – mostly simple games – on computers I had total control over.

I designed the games (remember graph paper?) I composed the music. I wrote the code. I tested the programs – perhaps not as thoroughly or as often as I should have. I copied the C30 cassettes. And I swapped them for other home-produced games on the playground. I even sold a couple.

I was the CEO, CTO, head of sales and marketing, product manager and lead developer, head of distribution, and QA manager of my own micro-tech company… that just happened not to make any real money, but that’s a minor detail.

I did it all, hands-on.

Then, after I stumbled out of university and needed money, I freelanced for a while, working directly with customers to understand their requirements, designing user interfaces, writing code, designing databases, testing the software – perhaps not as thoroughly or as often as I should have – packaging and deploying finished products, and answering the phone when users encountered a problem. Which they did. Often.

So I started my career as a full lifecycle software developer: requirements, design, programming, databases, testing, releases and operations.

Did I screw it up at times? Oh boy, yeah! But I learned fast. I had to to get paid. And, importantly, I got to see the whole process, work with a range of technologies, and wear a bunch of different hats.

And these were only mini projects. The world didn’t burn down when my SQL corrupted some data. The work was relatively low on risk, but high on learning. It built my competence and my confidence quickly.

When I got my first salaried job as a “software engineer”, I was then given what I would describe as a 2-year apprenticeship where I learned a lot of foundational stuff that would have been damned useful to know when I was freelancing.

And I was encouraged to try my hand at a wide range of things. My many screw-ups just never made it into any releases. The guardrails were very effective.

Importantly, while I was given a fairly free reign, I was closely supervised and mentored by developers with many years more experience. And I was given a lot of training.

Sadly, this is very different to how most developers start their careers these days. Instead of creating a wide range of learning opportunities on low-risk work, entry-level devs are confined to narrow, menial tasks – typically the ones “senior” developers would prefer not to do. “Training”, for too many, looks more like hazing.

It’s not at all uncommon for a junior dev to spend 6-12 months doing little else but fixing bugs on production systems, or working through SonarQube issues, or manning the support hotline. “It’s all they’re good for.”

New features? Product strategy? Talking to customers? Architecture? UX? Process improvement? The interesting stuff? That’s senior work.

Most often, it’s the risk that they’ll make mistakes that deters managers from giving junior developers too much freedom. But that’s a fundamental misjudgment. Mistakes and failure are integral to the learning process. The real risk is that you’ll grow developers who are afraid to try.

And while they’re painting the proverbial fences, it’s rare that they get much structured training or mentoring, either. Most organisations view senior developers as too valuable to “waste” on such things.

I see it differently. I think there comes a point in a developer’s career where the real waste is not letting them share their experience.

In this sense, there are “three ages” of a software developer, as their focus shifts from mostly learning, to mostly doing, to mostly teaching.

The job of a junior developer is to grow into a productive and well-rounded practitioner. And the productivity of a junior developer should be measured not by how much they deliver, but by how fast they grow. Month-on-month, year-on-year, what difference do we see in their capability and their confidence?

Businesses are so obsessed with cooking with the green tomatoes, they forget that with more time and more watering, they’ll grow into far more versatile red tomatoes.

Keeping them in a narrow lane, blinkered to the wider development process, stunts their growth. I’ve met many devs with decades of experience who were to all intents and purposes still junior developers.

When we frame it in those terms, the emphasis shifts from “What value can we extract from this junior dev today?” to “What potential can we add?”

In that light, it makes sense to structure their work around providing the most valuable learning opportunities. If they create tangible business value along the way, all the better. But that’s not the primary aim. The primary aim is to produce better software developers, and their work is a vehicle for that.

And if they somehow manage to burn the house down, that’s a you problem. How did their mistake make it into production?

Training & Mentoring is a Common Good

“Why should I train my developers? They’ll just leave.”

“Why is it so hard to find developers with the skills I need?”

Over a thirty three-year career, I’ve heard variations on these questions many, many times. And, typically, from the exact same people.

Many businesses are reluctant to invest in developers because tenures tend to be shorter than the time it takes for that investment to pay off. By the time a junior turns into a genuinely productive professional developer – someone who can work largely unsupervised and create more value than they cost – chances are they’ll be doing that somewhere else.

The mental leap employers don’t seem able to take is understanding that “somewhere else” is, from a previous employer’s perspective, them.

Where did their productive developers come from? Did they emerge fully-formed from a college or a school or a boot camp? Or are they products of years of learning – both on the job, from books, from courses, from each other, and so on – that somebody invested in?

Somebody took that hit. Whether they did it explicitly by paying for training and education or providing mentoring, or implicitly by shouldering the learning curve in everyday work, the only reason there’s any pasta sauce available at all is because somebody grew tomatoes.

I encourage employers to think of professional development in terms of paying it forward. The mindset that it should only be provided when there’s a direct – and often immediate – benefit has led to an industry of perpetual beginners.

Developers whose growth was stunted by a lack of investment in knowledge and skills set the example for inexperienced developers who are about to see their growth stunted in turn.

Because nobody sees it as their responsibility. It’s that patch of grass that doesn’t belong to anybody, so nobody cuts it, even though everybody complains about it.

Developers who are lucky enough to have lots of free time – usually young single men – may proactively hone their skills out of hours. But that’s not a strategy that scales to a $1.5 trillion-a-year profession. That would require more structure and more investment. A lot more. Imagine if your doctor was mostly self-taught in their spare time…

And it excludes a large part of the population who might well make great developers, but have young children, or care for elderly relatives, or volunteer in their communities, and can’t find the time to build skills that – let’s be honest now – are a bigger benefit for employers than for anybody else.

It shouldn’t be expected that the grass mows itself. By all means, if you can, then good for you. But the fact remains that a skilled software developer can be worth a lot of money to a business, and I don’t think it’s at all unreasonable to expect them to chip in.

I see developers as shared resources. Over their career, they’ll likely bring value to a bunch of different enterprises. It’s rare that we stay in the same place for our entire useful dev life.

In that sense, I see training and mentoring developers as a common good. And I believe that, in the long term, while developers should definitely own their own learning journey, employers should be expected to contribute to it while they’re together.

It should be a collaboration that hopefully brings benefit to both parties directly, but more importantly takes a long-term view and brings wider benefits to a whole range of organisations over their careers.

One benefit in particular is how developers who received proper long-term training and mentoring – it’s no secret that I’m an advocate of structured 3-5-year apprenticeships – can go on to pass their knowledge and skills on to people coming into the profession.

Frankly, if that was the norm, we might be looking at a very different industry. And, for employers, the ripples you start will eventually find their way back to your shores when you’re hiring.

“Why is it so hard to find developers who can do TDD, refactoring and Continuous Integration?” It’s because so few get expert training and mentoring in them. Invest in your developers, and build the capability to rapidly, reliably and sustainably evolve working software to meet rapidly-changing business needs.

“Our senior developers already know this stuff, Jason?”

I hear this very often from managers who’ll invest in training entry-level developers, but only entry-level.

Do they, though?

A large-scale study of developer activity in the IDE found that, of the devs who said they did TDD, only 8% were doing anything even close in reality. Most didn’t even run their tests, let alone drive their designs from them and run them continuously. Developers checking in code they haven’t even seen compile is more common than you might think.

They may well believe that they’re doing them, of course. They learned it from someone (who learned it from someone) who learned it from e.g. a YouTube tutorial made by someone who’s evidently never actually seen it being done. (I check every year – there’s a LOT of those, and they get a LOT of views.)

After all this time working with so many different teams in a wide range of problem domains, I can tell you de facto that the practices developers claim they’re doing – TDD, refactoring, “clean code”, CI & CD, modular design etc – usually aren’t being practiced much at all. That’s the norm, I’m afraid.

Unsurprisingly, the employer therefore sees none of the benefits in shrinking lead times, more reliable releases, and a more sustainable pace of delivery. The work remains mostly fighting fires and heroics around every deployment, rapidly eating up your budget on frantic motion-without-progress.

(And now we’re seeing that being amplified by you-know-what!)

Turns out you can’t just say you’re doing it. You have to ACTUALLY DO IT to get the benefits.

The way it often plays out is:

– You send your new hires to Jason (or someone like Jason). Jason teaches them some good habits that we’ve seen over the decades are likely to reduce delivery lead times, improve release stability and lower the cost of change.

– New hires go back to their teams, where – day-in and day-out – they see senior colleagues setting a bad example, and being rewarded for heroically putting out the fires they started.

– They may resolve to find themselves a job where they’ll get to apply what they’ve learned, and not feel pressured to just hack things out like everybody else.

– But, more commonly, they’ll just give up and go with the flow. Or, more accurately, the lack of it. Their careers take the path most-travelled, and you continue to wonder why it’s so hard to find senior developers who can do this stuff.

I would urge you to consider this when deciding who needs training and mentoring. I appreciate, it’s a touchy subject with folks who claim they’re already doing these things. But there are ways you can broach it: a “refresher”, “mentoring the juniors”, etc etc.

It really helps to align teams, and make the learning more “sticky” in day-to-day work. Otherwise, there’s a very real chance your junior developers will be un-taught by their senior peers.

And then, like I said, you don’t get the benefits – just the fires.

The AI-Ready Software Developer #13 – *You* Are The Intelligence

Human beings are funny old things. Over millions of years of evolution we’ve developed some traits that served us well in the wild, but might arguably work against us in our domesticated form.

We’re susceptible to psychological ticks that can distort our thinking and make us act irrationally, and even act against our own interests.

One of those ticks is our tendency to assign agency or intent to things that demonstrably don’t have it. We evolved to have Theory of Mind so that we can put ourselves in another person or animal’s shoes, and ask “Can that sabre-toothed tiger see me behind this tree?” or “Is Ugg planning to steal my best rock?”

The problems can start when we apply Theory of Mind to the weather (why does it insist on raining as soon as I put my coat on?), machinery (this washing machine hates me!), or – just for instance – a Large Language Model.

It’s understandable when we mistake software that matches patterns and predicts what comes next for something that actually thinks, because the patterns it’s matching are products of actual thinking – Actual Intelligence.

Heck, when he was stranded on that remote island, Tom Hanks formed a close friendship with a volleyball, and all that took was a handprint with eyes. The bar before anthropomorphism kicks in isn’t set very high.

Many LLM users ascribe qualities and abilities to the models that they demonstrably don’t have, like the ability to reason or to understand or to plan.

What they can do is to help us to reason and to understand and to plan.

Very importantly, we can also learn. In real time. From surprisingly few examples. And we don’t need a 100 MW power supply and the contents of Lake Michigan to do it.

In a collaboration between a human expert and an LLM, if we assign roles according to our strengths, the LLM is the powerful statistical pattern matcher and token predictor, trained on the sum total of current human knowledge – be it accurate or not – as of its training cut-off date. But it cannot think. It’s the world’s most well-read idiot. And we are the brains of the outfit.

We also need to remember that, despite what enthusiastic promoters of “agentic” coding assistants claim, LLMs have no capability to see the bigger picture and to think and plan strategically about things like the business domain, the user’s goals, the system architecture, or any of those “bird’s eye” concerns. Because they have no ability to think.

When we ask them to, they’ll “hallucinate” a high-level plan for us quite happily (and there I go, anthropomorphising). Like most “AI” output, it will look very plausible – more convincing than a handprint on a volleyball. But on closer inspection, there’s a very high probability that it will be full of Brown M&Ms. At such context sizes, it’s pretty much guaranteed.

And this is where psychology comes in again. Some people don’t see the problems. Maybe they don’t recognise them when they see them? Maybe they choose not to see them? Some folks really want to believe…

I have found it necessary to continually remind myself of the true nature of LLMs when I’m using them, and of the inherent – and very probably unfixable – limitations of their architecture.

The developers I’m seeing getting the best results using LLMs use them in ways that play to the tool’s strengths, and retain complete control over work that plays to theirs – keeping the LLM on a very short leash. They have the map. They set the route. They do the navigating.