codemanship – Codemanship's Blog

The Solution To BDUF Isn’t Faster BDUF

Big Design Up-Front – making lots of design decisions before getting real-world feedback on any of them – fails at any speed of decision-making. Indeed, the faster we make design decisions, the more we tend to fail.

Most folks make a category error of approaching design as a shopping list of decisions. We decide A, B, C, D. If it turns out B is wrong, we can just change B.

But it’s not a list – it’s a tree. Each decision constrains future choices. If B is wrong, C and D may well be wrong, too, if they’re consequences of B.

And we might also make the mistake of thinking we can just unpick the decision tree, but it’s not as simple as that. First of all, we have to distinguish the leaves from the nodes. What is the root decision that we got wrong. And all we see in the resulting code is leaves.

Also, dependencies between decisions don’t operate like dependencies in code. So we’d have to extract an entire branch of decisions from an orthogonally-interconnected architecture. Decision X, Y and Z may have logical similarities that lead to shared modules that exist independently of whether features I, J and K reuse them.

BDUF is like a game of Play Your Cards Right where they don’t turn over any cards until the end. And it doesn’t matter how fast you play it, or how many goes you get, you’ll almost certainly lose.

The key factor in dramatically improving our odds of making good design decisions is how many consequent decisions we make before we get real-world feedback – how far we keep driving down that road after we make the turn.

And an AI Maserati’s just going to make things worse – we can go even further in the wrong direction, even faster.

So when someone boasts that Claude Cowork generated a PRD for 50 features in an hour, I can’t help thinking “Tell me there are no users without telling me there are no users”.

Can I back up what I’m claiming here? Yes, I think I probably can.

You’re looking in the wrong place. The value’s in the feedback, folks – not in the plan.

Return of the Revenge of the Son of Software Process Engineering

I watched an interview recently with Claude Code creator Boris Cherny where he talked about how he doesn’t prompt anymore, he just “writes loops” that do the prompting for him.

Putting aside the quality of the end product that’s producing, it’s fascinating watching a whole generation of AI-assisted and agentic developers reinvent something that’s been around for a long time.

The last time I kept my hands clean with Software Process Engineering (I could have said “got my hands dirty”, but SPE always felt like the opposite to me) was when I was Development System Architect at Symbian in 2006-2007. Ostensibly, my job was to model Symbian’s software engineering processes, so we could add that to the mountain of other process documentation teams could ignore.

You model engineering processes in pretty much the same way you’d model any business process. There are goals. There are roles. There are workflows. There’s information. There are rules. You get the picture.

I’d created my own UML profile that extended the official Software Process Engineering Metamodel – originally designed to enable teams to customise the Rational Unified Process (which was only ever intended as a template or a toolkit for defining processes, not a process in its own right). I incorporated a metamodel for modeling goals and performance measures based on the Balanced Scorecard, and some ideas about mapping goals to processes and processes to system use cases – “You do this step using Perforce” sort of thing.)

I, of course, had lost all faith in SPE by 2006. But I could do it, and do it well. I just knew that nobody touching the code was likely to ever look at these models. Because once upon a time, I was one of those coders being “programmed” by a methodologist, and I didn’t.

But the underlying conceit of SPE – that teams are factory machines that can be programmed using the metamodel – appears to be experiencing a renaissance of late with the rise of “harness engineering”. I see people defining development workflows, with goals and roles and workflows and information and rules… You get the picture.

They might not realise it, but this is software process engineering. I even see folks claiming to have codified entire “teams” of agents – each with their own goals, roles, workflows, information and rules – that interact and coordinate in wider workflows.

And it suffers from the exact same delusion at its centre – a programmable machine that executes instructions, turning use cases into realisations, turning realisations into class models, turning class models into code. Like in a factory.

In reality, the “machines” are non-deterministic and the outcomes are by no means guaranteed. The only time anyone ever did it the way I drew it is when it was me drawing and doing. My sphere of real control extends no further than me. And even then, there are times when I – quite rightly – don’t listen to me.

Any process model you define will be – at best – an abstract approximation of how you do it. And if it wasn’t built by observing what you really do, it won’t even be that. 92% of Java developers who said they did TDD actually didn’t do anything even in the ballpark when observed.

I recommend not going down this road. It’s a Fool’s Errand – whether you’re codifying development processes for teams made of people, or for teams made of hyper-scale token predictors. These folks are seriously underestimating how incredibly hard it is, and the end results show it.

I recommend that workflows be controlled by Actual Intelligence that can learn and adapt in the messy and unpredictable real world. In my “Ralph loop”, I am Ralph.

Nobody in history has ever described software development in machine-executable detail and had it actually work at scale. Sure, you might be the first. Anything’s possible.

But the fact you didn’t know how many truly great software engineers have tried, or that software process engineering was even a thing with a name (and a metamodel), doesn’t bode well.

If you’re curious, though, here’s a guide to the SPEM by Sparx Systems (who seem to have added it to their UML modeling solution since my Symbian days).

A Car Crash In Slow Motion

Since I’m among friends, I hope I can be open with you.

I started Codemanship 17 years ago in my late 30s, as a response to being asked by a recruiter for the gazillionth time “Why are you still a software developer?”

I’d been contracting for 12 years, and been programming professionally for 18, and that is what I do. I passed through lead developer roles into architect and then senior/head architect roles, and decided to walk my career back to being hands-on as a developer, but with enough authority and control over how my teams worked to do a good job – despite management.

Over the previous decade, I’d spent more and more time mentoring developers, as well as bits of structured training here and there.

The job that flipped the switch in me to make the jump permanently to that role was working as a Software Development Coach – a title I invented for myself because I didn’t like the one they’d given me (Technical Architect) – at BBC Worldwide. Even if I say so myself, I made a difference there. Not just to one developer or one team. Software development at BBC Worldwide was different by the time I moved on.

I didn’t move far, though. The lunchtime talks and workshops I’d instigated were increasingly being attended by software engineers from down the road at BBC corporate. After the Worldwide gig, I spent several years popping in to various BBC sites in London and Manchester running training and a very successful peer-led coaching experiment in TV Platforms (the iPlayer folks).

And in between, I ran my last team full-time for a small new consulting company owned by two guys who knew nothing about software development. I won’t go into the details, but that last contract left such a sour taste – and the BBC work was looming – that I finally did it and became a full-time trainer and coach, and founded Codemanship in 2009.

If you’ve started your own business, you’ll know that the first 2-3 years can be tough. I had savings, but one client was never going to be enough. I had a pretty modest goal: make half what I was making as a contractor, doing something I really enjoy. (And I like to think it has real value, too – we’ll circle back to that.)

So I found a small building on Blackfriars Road in South London – opposite Southwark tube station and, importantly, a decent boozer – and rented training rooms for weekend courses for folks funding their own career growth. I didn’t have the business contacts, but I knew a lot of software developers.

For a fraction of the price of corporate training, groups of ~16 enthusiastic folks gave up their weekends and a few hundred quid of their hard-earned cash to do what you might recognise as the ancestors of the Code Craft courses that I’ve run for many clients and for thousands of developers since then.

It was such good value that folks flew in from as far afield as Russia – back when they could – and the US and Canada. They’d book a hotel for a day or two after to see the sights – make a city break of it.

We’d spend a day TDD-ing and refactoring and SOLID-ing and wotnot, and then retire to The Ring to reflect on the day and talk around what we’d covered, going into really cool asides and general chit-chat.

Training has never felt like work to me – hence my ambition to do it for my living – but these weekend workshops felt even less like work, and more like tech meetups or conferences. Y’know – the interesting bits in between and after the talks. If there were 17 of us in the workshop, there’d usually be 8-10 of us in the pub after. That’s a sociable ratio.

Private workshops for corporate clients rarely end like that. (With exceptions, of course – Hi to the folks at Hostelworld in Porto and their magically replenishing beer fridge!)

The weekend workshops felt much more like community events than corporate training. They’re very hands-on, folks are pairing up and meeting new people, and I am “Jason from Twitter/LinkedIn/Jason’s Blog”, and not just “That Guy Who’s Running The Course I’ve Been Told To Go On”.

Although it never really occurred to me, these original training cohorts became my busy bees, buzzing from gig to gig, gaining seniority and influence year on year, until corporate orders started coming in from places they were working.

Codemanship’s client base grew quite organically from things like this, as well as from my activities within the developer community – speaking at events, organising conferences, going to meetups etc.

In fits and starts, the business grew. Sure, there were fallow periods, and there were busy periods. And I didn’t feel the need after a while to run out-of-hours training. Organising small, low-priced public workshops is a lot more work per £ it brings in. There was just enough corporate training and coaching to keep the lights on. And, generally, the trend was “number go up” by roughly 10% a year.

By 2019, I was on track to achieve my goal – half my contract income doing something I love. At the time, I really didn’t think I was reaching for the Moon.

Then, in early 2020… Well, you know what happened in early 2020.

Business just disappeared for 6 months. But then something fell into my lap, courtesy of Nat Pryce, that ultimately led to autumn 2020 to autumn 2022 being the best two years the business ever had. And although I knew an ongoing coaching gig was going to be financial aberration, and that I shouldn’t get used to it, during the same time, the training side of the business grew too – beyond my original goal.

The summer of 2022 was the peak. I was better off than I’d been for many years, and was even about to put in an offer on a detached house in Wiltshire – with a garage and a garden and a utility room! Imagine that -having a room just for utility! Folks like me living in an expensive-ish London postcode can only dream of such luxuries.

Then, as with all peaks, it’s downhill on the other side. Unwittingly, I’d been the beneficiary of a hiring frenzy bankrolled by free credit during the lockdown era. It was only then I realised just how closely my sales tracked with entry-level hiring. In the room, I saw plenty of senior developers. But, it turns out, those workshops wouldn’t have been booked at all if it wasn’t for the junior intake in the room. I’d become the Onboarding Guy.

In 2023, sales dropped 65% – an almost exact match to entry-level hiring here in the UK and Europe. And the ongoing coaching gig had already ended with the rapid rise in interest rates, so that generous tap was turned off.

That’s a big pay cut.

But, I’d been through these cycles before, and had weathered them with savings and loans. So that’s what I did. I didn’t panic. I didn’t think “Shit, I need to get a contract”. I thought “This, too, shall pass.”

In 2024, it didn’t pass. Entry-level hiring fell further. Layoffs, layoffs, layoffs in the news. But still, I didn’t panic. I had savings. I had time.

In 2025, hiring started to recover, but not entry-level hiring. (Hey, it’s a good job all those senior developers know TDD, right?)

But by the middle of the year, something gave me hope and made me stay my course. By this time, after two years of experimenting with and researching AI-assisted coding, I’d figured something out – the principles and practices that I’d been teaching for 25 years, far from becoming less important, were becoming more important than ever with the rise of AI.

Clever Jason! They’ll be queuing outside my door any moment.

As 2025 went on, and more and more good data rolled in, that position just got more and more solid.

Any minute now…

AI coding tool adoption passed a tipping point over the Christmas break, as many engineering leaders finally found some time to play with the technology, got it to build them a Calendar app or a TO-DO list in a nanosecond, and came back to the office and proclaimed to their teams “You will use this!” Because real software development is exactly the same as doing self-contained mini projects by yourself for fun.

So we’ve been seeing more and more teams finding out what I – and many others – figured out a year or more ago. Without solid engineering foundations, that stuff will hurt you. It’ll slow down your release cycles, it’ll make your lead times longer, and it’ll create a growing mountain of quality problems that leaks into production. The evidence is now overwhelming that’s really what’s happening for the majority of teams.

And when I’ve spoken to engineering leaders and polled them about it, they agree that engineering foundations are more important then ever.

Any minute now…

In the meantime, my savings are gone, my credit cards are maxxed out, and orders in 2026 are down to 10% of what they were five years ago.

Two months ago, I pivoted back to where I started – out-of-hours training for people funding their own learning. If your boss doesn’t see the value in engineering foundations, maybe you do. And, if I set the price accordingly, maybe I can put that kind of training within your reach.

And this actually started well. The first few workshops on Tuesday evenings and Saturday mornings sold out. And they’ve really taken me back to that training room on Blackfriars Road, because they’ve felt much more like community events with some training thrown in to give us something to talk about.

I’ve really been enjoying the after-workshop discussions, and the ratio has again been very sociable – typically more than 70% stay on to chat. And this gave me hope.

To quote John Cleese in the 80s movie Clockwise, “It’s not the despair, Laura. I can stand the despair. It’s the hope.”

After those first few, interest has dropped off dramatically. I suspect the 60 or so folks who’ve bought tickets are the extent of the market within my reach.

I do not know where it goes from here.

So, after a little cry early this morning – not kidding – I think maybe it’s time to do some adulting and let go of this particular life goal. I can’t hold out any longer. In fact, I should have let go last year because now I’m a year older and in a real hole.

I don’t know what I’m going to do next. I find myself 55 years old and having not been employed by anybody else for 17 years. Friends will know that I’ve stayed very hands-on and current throughout that, and am still very capable of working as a developer and also leading teams. And – having used so many over the years – I can learn programming languages, tools and tech stacks very fast, even at my age.

But it’s not you I’d need to convince. As I understand it, job applicants have to contend with so many layers of corporate gatekeepers these days (human and AI) – who wouldn’t know a software developer from a hole in the ground – that I suspect I will struggle to get in front of the right people.

The final public workshops will go ahead as planned. Folks have bought tickets. And there are still some places left – I’d be very happy if you could join us. This could well be your last chance to experience what I’ve spent 17 years making a unique, hands-on training experience.

June 16 & 20 – Refactoring

June 30 – Specification By Example

July 7-9 – Code Craft (one final public voyage for my flagship 3-day workshop)

And if you ask me to run a private workshop for your team, I’m not going to say no. I’d be a fool to.

But I’m officially now “between careers”. Where that ends up at my age… I guess I’m going to find out.

Codemanship has turned out to be half my entire career. I’d hoped one day it would be my retirement. I love to do this job when I’m given the chance. And, if you follow me on social media, you probably know I do it even when nobody’s paying, which has been most of the time. (And, of course, there were times when I didn’t realise I wasn’t being paid – but that’s the life of a small business owner.)

I can’t complain. It’s been my dream job, and I’m very grateful to everyone I’ve met along the way.

Feedback With A Face

One handy thing about living in London, if you enjoy stand-up comedy, is that so many comedians test new material here in small venues – often playing “works in progress” to audiences of just a few dozen.

Stewart Lee famously iterates his show over many, many performances at the Leicester Square Theatre before he takes it on tour to bigger venues and has it recorded for TV and DVD.

Here’s the thing: comedy requires feedback. Immediate feedback. Not an aggregate report at the end of the show, but in-the-moment feedback about how a joke’s landing. In big theatres, audiences can become that faceless aggregate, but in 100-seater venues, every data point has a face.

And that matters. It matters when you can see the faces and hear the responses from your audience. Because now each one of them matters, and that’s a very different kind of feedback to being told that “27% thought that the routine about Prince Andrew went on a bit too long” after they’ve all gone home.

I hear developers all the time complaining that there are just too many users to get that kind of feedback-with-a-face. I say that’s a choice – like skipping the warm-up gigs at Old Rope at the Comedy Store and taking your show straight to the O2 Arena.

It’s worth cultivating small audiences to test new material on. Sure, you don’t get to see the aggregate trends – only big audiences can give you that. But you can see their faces, and you know immediately if the joke’s aren’t landing. And if you’re going to die on stage, it’s preferable not to do it in front of 20,000 paying punters.

One final thought: I’ve observed our industry morph from one where the data points had faces and individual users’ experiences mattered to one where we only play the proverbial stadiums, and we only see the trends, not the faces.

This, I suspect – while not a direct cause – has been an enabler of “enshittification”. It’s much easier to do that to a faceless aggregate.

Faster Feedback -> Better Outcomes

The impact of feedback loops like testing in software development can be as profound as it is widely misunderstood.

Movie-making had a similar problem up until the 1960s. Crew shoots a take during the day. Director has to wait until the film’s processed so they can watch “the dailies” to check for any mistakes nobody noticed at the time – like an extra using an iPhone in what’s supposed to be 1889 – and to see if the shot actually works dramatically, comedically etc.

If they wanted to fix it, back in the day, that could mean rebuilding the set, or transporting everyone – cast, crew, equipment, costumes, props etc – back to the location. Remounting shots is a big deal.

In 1960, comic actor and director Jerry Lewis started using “video-assist” while working on The Bellboy. Takes were captured simultaneously on film and on video, so the director can check each shot in “video village” immediately after the take. If a joke’s not working, they can see straight away and adjust for the next take. By the mid-60s, the technology had been refined using a beam splitter to ensure the video captured was showing exactly what the film camera was recording. WYSIWYG.

It made a big difference. When we move the feedback much closer to the action and the myriad decisions made in just a single shot, fixing problems gets much quicker and much, much cheaper. So – unsurprisingly – more problems get fixed.

Cinephiles like myself may have noticed a tangible leap in the quality of films being made during the 1960s and early 1970s, as this technology became mainstream.

In software development, we have our equivalents of “video-assist” – techniques we can use to bring the feedback much closer to the decision, making mistakes much quicker and cheaper to fix.

A good example is developer testing. Instead of making a whole bunch of changes to the code and then testing all of them, we make one change and immediately run to our equivalent of “video village” – a unit test suite, for example – to check for problems.

Teams that rely on downstream testing are doing the equivalent of waiting to see the dailies. When problems are caught, fixing them becomes a bigger deal. Likely as not, the developers have moved on. The set’s been struck, so to speak, and remounting those shots is a bigger deal.

What other examples can you think of where we move feedback closer to the decision in software development?

Feedbackmaxxing

You know the TV gameshow Play Your Cards Right? Contestants are shown a sequence – in two rows – of giant playing cards presented face-down. The host turns over the first card. The contestant then has to guess if the next card is higher or lower than that one.

They move across the board, guessing and then revealing one card at a time until either the contestant guesses wrong or they complete the sequence and win the game.

Now imagine a version of that where they don’t turn the cards over until the contestant has guessed higher or lower for the entire sequence.

“That’s just silly, Jason.”

You’re absolutely right. It is silly. Very silly. The odds of winning the game would be so remote that we’d probably never see it happen.

So why are you developing software that way?

Be honest now – you are.

You don’t turn the cards over one a time. You make a whole bunch of guesses about what the users or the business really needs. Then you make a whole bunch of design decisions that may or may not be the right decisions. Then you make a whole bunch of changes to the code that may or may not work. And only then do you turn the cards over to see if all those many guesses were good guesses.

Every decision, and every change to the code, carries uncertainty. And that uncertainty compounds with every subsequent decision or change. If we have a 90% chance of getting one right, we have an 81% chance of getting two right, a 35% chance of getting ten right, and 0.003% chance of getting 100 right. The more uncertainty accumulates, the longer we spend driving in the dark with the lights off.

These decisions and these changes don’t exist in isolation. One decision is often a consequence of an earlier decision – another junction along the way of the path we chose. One change to the code will constrain our choice of future changes.

If we take a wrong turn with any decision or any change (which is just another decision, really), how long can we afford to waste heading down the wrong road? How long will it take and how much will it cost to get back on the right road?

The further we go before we get a meaningful answer, the bigger the wasted time and effort, and the more it will cost to correct.

And this is where sunk cost enters the chat. When the cost of correcting a mistake is too high, teams will tend to choose to live with the mistake. Waddayagonnado?

And that’s how you make software, that is.

A smarter way is to turn the cards over as they’re being played. Test your guesses against reality as soon as possible, so the next guess is less likely to be a stop on the wrong road.

If you guessed wrong, no problemo. Correcting your mistake is quick and cheap. You don’t have to undo 100 decisions that followed, then make 100 new ones.

So a critical metric in software development is how long it takes for us to test our decisions after they’ve been made. That feedback latency needs to be as low as possible.

I’m now calling this approach feedbackmaxxing, because that’s how we talk these days apparently.

Feedbackmaxxing is maximising feedback frequency while minimising feedback latency across the entire software development system

This is about two variables we can control in our development process:

Batch Size – how many decisions need feedback (e.g., from testing, from code review, from users) at a time?
Feedback Frequency– how often do we get that feedback?

The bigger the batches, the longer it takes to get feedback. The smaller the batches, the sooner we learn what works and what doesn’t.

The smart players work in small batches – they solve one problem at a time – and engineer their feedback loops to be very fast.

Software development cycles are loops within loops. We have that outer loop – will a reminder to reorder a prescription reduce missed doses? And we have the inner loop – did that change I just made to the code work? Did it break anything that was depending on it?

The smart players know something about how to optimise nested loops, too. They know that to speed up the outer loop – the real-world user feedback from working releases – you focus your attention on the innermost loop.

How long does it take to build and test the software? If the answer is an hour, you have a big problem. Your choices are not great – you can either test one change at a time, and spend most of your day waiting for feedback. Or- and this is the most popular choice – you make a lot of changes, and then test them, in the mistaken belief this will save you time. “I’m too busy building on top of broken code for testing!”

The other systemic effect that large batches has is – because they take longer to get feedback on (reviewing a 5-line diff vs. a 500-line diff, for example) – changes tend to end up sitting in queues waiting their turn.

Make the batches bigger, the queues get larger, and delays get longer. The more decisions we make before testing them, the slower we get overall.

The evidence at this point is overwhelming that AI code generation speeds developers up, but slows teams down. We’ve been maxxing the wrong thing.

Large Language Models can make a lot of decisions – e.g., a lot of changes to our code – very, very quickly. It comes as no surprise that data from studying work queues across thousands of teams shows diffs getting bigger and bigger, queues getting large and larger, and lead times for getting changes into production getting longer and longer.

In the most meaningful sense, feedback latency isn’t the time elapsed after a decision’s been made before we get feedback, but the number of subsequent decisions made that are a consequence of it – how many miles did we carry on down that road. Lightning fast code generation doesn’t help us here. If anything, it probably makes latency worse – we’re much further down potentially the wrong road driving a Maserati than if we’d walked.

“Ah, but Jason, we can just get the agent to regenerate the software again from the original specs.” U-huh? Tell me you’ve never tried that on anything non-trivial without telling me you’ve never tried that on anything non-trivial.

“Aha! But we can just get the agent to make the changes we need.” This is where the peak-end rule bites on the backside. Ask users, for example, for feedback on a single design choice, and you’ll get specific, meaningful, useful thoughts. Ask them for feedback on 50 choices, and they’ll talk about the one or two things that stood out, and the last thing they saw. (See also: code reviews – “Looks good to me”).

And then there’s the established fact that LLMs are good at generating code that they’re bad at modifying later. And the more complex the code base is, the worse they get. I wish you the best of luck with that!

You are drinking from a code-generating firehose, and it’s getting out of control.

The answer to your AI-generated woes is feedbackmaxxing. Ask one question at a time. Get an answer as soon as possible. Test continuously. Review continuously. Integrate continuously. Get real-world feedback continuously.

A lot of people struggle to picture what that looks like.

Once you’ve seen it, though, your journey to Feedbackmaxxville (twinned with Gas Town) can begin.

Talking of which…

What If The Real Key To AI Coding Is Old-Fashioned & Boring?

“The key to AI-assisted and agentic software development is <insert thing you were selling before>”

The Big Design Up-Front folks say the key is better specifications. The plan-driven folks say it’s better plans. The architects say it’s better architecture. The product managers say it’s better product management. The command-and-control folks say it’s better agent orchestration. The test automators say it’s better test suites. The folks selling static analysis tools say it’s better automated code reviews. The folks selling the models say… well, we know what they say. MORE TOKENS!!!

It’s true that I’m also claiming that the key to AI-assisted software development is something I just happen to specialise in – development practices that work in small batches and rapid feedback loops.

The difference is that the data’s led me back here, just like it led me to it in the first place.

The only thing that AI code generation has really changed is the speed at which code’s generated and the amount of code that needs designing, testing, reviewing, refactoring and integrating.

Data collected on thousands of teams by the DevOps Research & Assessment group shows code being created faster, only to end up languishing in queues waiting for user feedback, design decisions, testing, review and merging to the release branch. Net effect – slower delivery and less stable releases.

Data collected on millions of CI workflows by CircleCI shows code being created faster on developer branches, only to end up languishing in queues waiting for user feedback, design decisions, testing, review and merging to the release branch. Net effect – slower delivery and less stable releases.

Data collected on thousands of teams by Faros shows code being created faster on developer branches, only to end up languishing in queues waiting for user feedback, design decisions, testing, review and merging to the release branch. Net effect – slower delivery and less stable releases.

The problem is what it always was – phase-gated development processes that try to handle design, testing, review, refactoring, merging and releasing large batches of changes.

You can’t specify your way out of it. You can’t architect your way out of it. You can’t automate your way out of it (because judgement will always be needed – Actual Intelligence). You can’t product manage or type-check or DDD or team topology your way out of it.

That’s not to say these things bring no value. They all do.

But batch sizes and feedback loops hold the biggest leverage here, by orders of magnitude. They always did and they always will.

But who wants to hear about taking smaller steps, right? That’s just boring stuff from the 1990s.

Would it help if I called it “feedbackmaxxing”?

In Teams, Individual Productivity Is A Harmful Illusion

The hardest lesson I had to learn as a software developer in the early part of my career was that what felt “productive” to me locally – uninterrupted time, code getting created fast etc – often turned out to be a bad sign for overall team outcomes.

Taking interruptions as an example, I mistook concurrency for parallelism in the way I worked. Concurrency is about communication and coordination, and when a bunch of devs are working on different aspects of the same problem at the same time, communication and coordination become the primary activities.

Coding interrupts communication and coordination. CODING IS THE INTERRUPTION.

The more time I spend coding and not communicating, the further I drift from the rest of the team.

We talk a good game about building shared understanding and aligning teams, but then we do everything we can to minimise that in the pursuit of individual “productivity”.

Of course, there are ways we can go about writing code that maintain communication and coordination – but your boss might not like them. (Sing along if you know the words – “But that’s 2 developers doing the work of 1!”)

Cliffs Notes version: what feels productive to you is often counter-productive for the team

The fix? Keep one eye on the horizon, not both eyes on your feet. Effective teams need a high level of situational awareness – being cognizant of what’s going on around you.

And we can work in ways that make us more interruptible. But your boss might not like them, either. (“One little test at a time? It looks so slow!”)

I’ve found so many times that having clear end goals – unambiguously articulated, and ideally measurable for meaningful feedback – has counteracted this illusion of individual productivity.

Indeed, a little like Douglas Harding’s Headless Way, after a while you realise that there is no individual productivity in software development. There is only the team.

Of course – as has happened so many times before – most teams are running in the exact opposite direction with AI-assisted coding, pursuing this seductive illusion of “individual productivity” at the expense of team outcomes.

And – just as all those times before – the solutions that actually work are known to a small few.

Public Code Craft Training – July 7-9

For the small percentage of engineering orgs who’d genuinely like to be shipping more reliable software and be more responsive to the needs of their business and their users – it’s a niche, I know – I’m running a public 3-day online Code Craft workshop on July 7-9.

If you’re a developer, twist your manager’s arm – especially if they’re expecting you to be more productive using tools like Claude Code and Copilot.

If you’re an engineering leader, this is the real AI-assisted software engineering training your teams need – and, funnily enough, it’s mostly about software engineering and only a little bit about AI. It’s about making teams AI-ready.

It’s 6x half-day modules that give developers a practical, hands-on introduction to the foundational technical practices that enable teams to accelerate release cycles, shrink lead times and improve release reliability – with and without AI.

Specification By Example
Test-Driven Development
Refactoring
Design Principles
Continuous Delivery
Code Craft & AI – grounded on hard data, includes how to apply CRESS principles for context engineering to AI-assisted workflows

To learn more and register, visit https://codemanship.co.uk/codecraft.html

Places are limited.

Extending The Horizon Of Agent Autonomy Is A Testing Problem

I’ve talked before about how improbable long-horizon autonomous agentic workflows are. Every step is a throw of a weighted dice, and with each additional step the probability of success goes down.

On top of this, decisions have dependencies, and this means that errors can compound downstream. Take a wrong turn at step N, and step N+1, N+2, N+3 could well build on that mistake.

The other side of the equation is verification. Mistakes aren’t a problem if they’re caught before they compound.

So we now have two components: the probability of an error, and the probable number of subsequent steps before the error’s detected.

More bluntly, if the agent f***ed up, how soon would we/it know?

The E in my CRESS principles for context engineering stands for “Empirical” – input contexts should be grounded in observed reality, not unverified model output. I visualise raw model output as being like untreated sewage. Yes, there’s water in it. But it’s not safe for the model to drink.

To make it safe, it needs to be tested against reality and potentially debugged and refactored, or flushed down the drain if it’s too far gone.

I don’t know about you, but I’d think twice about drinking water from the tap if I knew that the only testing it had been through was someone holding it up to the light and pronouncing “Looks good to me”.

This takes us into the wonderful world of test assurance, and into territory that will be alien to the vast majority of software teams. The longer the autonomous horizon, the higher the assurance needs to be.

I’m seeing lots of folks (finally!) discovering the value of mutation testing – a technique for testing your tests by deliberately introducing errors and seeing if they fail – in agentic workflows. And there’s no doubting this helps close the gaps that errors can leak through from one step to the next.

But the kind of full autonomy Anthropic and others claim will soon be upon us requires degrees of assurance that go way beyond even that needed for safety-critical systems into uncharted territory.

Now, personally, I think testing and verification in software has left a lot to be desired for many decades. Almost none of you have ever gotten in the ballpark of what I consider to be good enough, even for line-of-business applications, let alone safety-critical ones.

But even if we could drag ourselves into that ballpark, I know from experience that high-integrity software engineering still requires acres of human judgement and learning that LLMs will likely never be capable of.

But it might extend the agentic horizon from, say, N steps to 1.1 N steps before we need to course correct. And that could be the key to squeezing out more net value from the technology – maybe our lead times shrink from L to 0.9 L?

The fun part is that I know for a fact – having tried for nearly 30 years to get teams interested in upping the integrity of their products – that 99% will not want to hear that the answer is MORE RIGOUR.

(And, of course, we’re just talking about one kind of testing here. When we add in other qualities of software, like maintainability, performance and security – you can probably see why I consider full autonomy a Fool’s Errand.)