feedback loops – Codemanship's Blog

The Solution To BDUF Isn’t Faster BDUF

Big Design Up-Front – making lots of design decisions before getting real-world feedback on any of them – fails at any speed of decision-making. Indeed, the faster we make design decisions, the more we tend to fail.

Most folks make a category error of approaching design as a shopping list of decisions. We decide A, B, C, D. If it turns out B is wrong, we can just change B.

But it’s not a list – it’s a tree. Each decision constrains future choices. If B is wrong, C and D may well be wrong, too, if they’re consequences of B.

And we might also make the mistake of thinking we can just unpick the decision tree, but it’s not as simple as that. First of all, we have to distinguish the leaves from the nodes. What is the root decision that we got wrong. And all we see in the resulting code is leaves.

Also, dependencies between decisions don’t operate like dependencies in code. So we’d have to extract an entire branch of decisions from an orthogonally-interconnected architecture. Decision X, Y and Z may have logical similarities that lead to shared modules that exist independently of whether features I, J and K reuse them.

BDUF is like a game of Play Your Cards Right where they don’t turn over any cards until the end. And it doesn’t matter how fast you play it, or how many goes you get, you’ll almost certainly lose.

The key factor in dramatically improving our odds of making good design decisions is how many consequent decisions we make before we get real-world feedback – how far we keep driving down that road after we make the turn.

And an AI Maserati’s just going to make things worse – we can go even further in the wrong direction, even faster.

So when someone boasts that Claude Cowork generated a PRD for 50 features in an hour, I can’t help thinking “Tell me there are no users without telling me there are no users”.

Can I back up what I’m claiming here? Yes, I think I probably can.

You’re looking in the wrong place. The value’s in the feedback, folks – not in the plan.

Feedback With A Face

One handy thing about living in London, if you enjoy stand-up comedy, is that so many comedians test new material here in small venues – often playing “works in progress” to audiences of just a few dozen.

Stewart Lee famously iterates his show over many, many performances at the Leicester Square Theatre before he takes it on tour to bigger venues and has it recorded for TV and DVD.

Here’s the thing: comedy requires feedback. Immediate feedback. Not an aggregate report at the end of the show, but in-the-moment feedback about how a joke’s landing. In big theatres, audiences can become that faceless aggregate, but in 100-seater venues, every data point has a face.

And that matters. It matters when you can see the faces and hear the responses from your audience. Because now each one of them matters, and that’s a very different kind of feedback to being told that “27% thought that the routine about Prince Andrew went on a bit too long” after they’ve all gone home.

I hear developers all the time complaining that there are just too many users to get that kind of feedback-with-a-face. I say that’s a choice – like skipping the warm-up gigs at Old Rope at the Comedy Store and taking your show straight to the O2 Arena.

It’s worth cultivating small audiences to test new material on. Sure, you don’t get to see the aggregate trends – only big audiences can give you that. But you can see their faces, and you know immediately if the joke’s aren’t landing. And if you’re going to die on stage, it’s preferable not to do it in front of 20,000 paying punters.

One final thought: I’ve observed our industry morph from one where the data points had faces and individual users’ experiences mattered to one where we only play the proverbial stadiums, and we only see the trends, not the faces.

This, I suspect – while not a direct cause – has been an enabler of “enshittification”. It’s much easier to do that to a faceless aggregate.

Faster Feedback -> Better Outcomes

The impact of feedback loops like testing in software development can be as profound as it is widely misunderstood.

Movie-making had a similar problem up until the 1960s. Crew shoots a take during the day. Director has to wait until the film’s processed so they can watch “the dailies” to check for any mistakes nobody noticed at the time – like an extra using an iPhone in what’s supposed to be 1889 – and to see if the shot actually works dramatically, comedically etc.

If they wanted to fix it, back in the day, that could mean rebuilding the set, or transporting everyone – cast, crew, equipment, costumes, props etc – back to the location. Remounting shots is a big deal.

In 1960, comic actor and director Jerry Lewis started using “video-assist” while working on The Bellboy. Takes were captured simultaneously on film and on video, so the director can check each shot in “video village” immediately after the take. If a joke’s not working, they can see straight away and adjust for the next take. By the mid-60s, the technology had been refined using a beam splitter to ensure the video captured was showing exactly what the film camera was recording. WYSIWYG.

It made a big difference. When we move the feedback much closer to the action and the myriad decisions made in just a single shot, fixing problems gets much quicker and much, much cheaper. So – unsurprisingly – more problems get fixed.

Cinephiles like myself may have noticed a tangible leap in the quality of films being made during the 1960s and early 1970s, as this technology became mainstream.

In software development, we have our equivalents of “video-assist” – techniques we can use to bring the feedback much closer to the decision, making mistakes much quicker and cheaper to fix.

A good example is developer testing. Instead of making a whole bunch of changes to the code and then testing all of them, we make one change and immediately run to our equivalent of “video village” – a unit test suite, for example – to check for problems.

Teams that rely on downstream testing are doing the equivalent of waiting to see the dailies. When problems are caught, fixing them becomes a bigger deal. Likely as not, the developers have moved on. The set’s been struck, so to speak, and remounting those shots is a bigger deal.

What other examples can you think of where we move feedback closer to the decision in software development?

Feedbackmaxxing

You know the TV gameshow Play Your Cards Right? Contestants are shown a sequence – in two rows – of giant playing cards presented face-down. The host turns over the first card. The contestant then has to guess if the next card is higher or lower than that one.

They move across the board, guessing and then revealing one card at a time until either the contestant guesses wrong or they complete the sequence and win the game.

Now imagine a version of that where they don’t turn the cards over until the contestant has guessed higher or lower for the entire sequence.

“That’s just silly, Jason.”

You’re absolutely right. It is silly. Very silly. The odds of winning the game would be so remote that we’d probably never see it happen.

So why are you developing software that way?

Be honest now – you are.

You don’t turn the cards over one a time. You make a whole bunch of guesses about what the users or the business really needs. Then you make a whole bunch of design decisions that may or may not be the right decisions. Then you make a whole bunch of changes to the code that may or may not work. And only then do you turn the cards over to see if all those many guesses were good guesses.

Every decision, and every change to the code, carries uncertainty. And that uncertainty compounds with every subsequent decision or change. If we have a 90% chance of getting one right, we have an 81% chance of getting two right, a 35% chance of getting ten right, and 0.003% chance of getting 100 right. The more uncertainty accumulates, the longer we spend driving in the dark with the lights off.

These decisions and these changes don’t exist in isolation. One decision is often a consequence of an earlier decision – another junction along the way of the path we chose. One change to the code will constrain our choice of future changes.

If we take a wrong turn with any decision or any change (which is just another decision, really), how long can we afford to waste heading down the wrong road? How long will it take and how much will it cost to get back on the right road?

The further we go before we get a meaningful answer, the bigger the wasted time and effort, and the more it will cost to correct.

And this is where sunk cost enters the chat. When the cost of correcting a mistake is too high, teams will tend to choose to live with the mistake. Waddayagonnado?

And that’s how you make software, that is.

A smarter way is to turn the cards over as they’re being played. Test your guesses against reality as soon as possible, so the next guess is less likely to be a stop on the wrong road.

If you guessed wrong, no problemo. Correcting your mistake is quick and cheap. You don’t have to undo 100 decisions that followed, then make 100 new ones.

So a critical metric in software development is how long it takes for us to test our decisions after they’ve been made. That feedback latency needs to be as low as possible.

I’m now calling this approach feedbackmaxxing, because that’s how we talk these days apparently.

Feedbackmaxxing is maximising feedback frequency while minimising feedback latency across the entire software development system

This is about two variables we can control in our development process:

Batch Size – how many decisions need feedback (e.g., from testing, from code review, from users) at a time?
Feedback Frequency– how often do we get that feedback?

The bigger the batches, the longer it takes to get feedback. The smaller the batches, the sooner we learn what works and what doesn’t.

The smart players work in small batches – they solve one problem at a time – and engineer their feedback loops to be very fast.

Software development cycles are loops within loops. We have that outer loop – will a reminder to reorder a prescription reduce missed doses? And we have the inner loop – did that change I just made to the code work? Did it break anything that was depending on it?

The smart players know something about how to optimise nested loops, too. They know that to speed up the outer loop – the real-world user feedback from working releases – you focus your attention on the innermost loop.

How long does it take to build and test the software? If the answer is an hour, you have a big problem. Your choices are not great – you can either test one change at a time, and spend most of your day waiting for feedback. Or- and this is the most popular choice – you make a lot of changes, and then test them, in the mistaken belief this will save you time. “I’m too busy building on top of broken code for testing!”

The other systemic effect that large batches has is – because they take longer to get feedback on (reviewing a 5-line diff vs. a 500-line diff, for example) – changes tend to end up sitting in queues waiting their turn.

Make the batches bigger, the queues get larger, and delays get longer. The more decisions we make before testing them, the slower we get overall.

The evidence at this point is overwhelming that AI code generation speeds developers up, but slows teams down. We’ve been maxxing the wrong thing.

Large Language Models can make a lot of decisions – e.g., a lot of changes to our code – very, very quickly. It comes as no surprise that data from studying work queues across thousands of teams shows diffs getting bigger and bigger, queues getting large and larger, and lead times for getting changes into production getting longer and longer.

In the most meaningful sense, feedback latency isn’t the time elapsed after a decision’s been made before we get feedback, but the number of subsequent decisions made that are a consequence of it – how many miles did we carry on down that road. Lightning fast code generation doesn’t help us here. If anything, it probably makes latency worse – we’re much further down potentially the wrong road driving a Maserati than if we’d walked.

“Ah, but Jason, we can just get the agent to regenerate the software again from the original specs.” U-huh? Tell me you’ve never tried that on anything non-trivial without telling me you’ve never tried that on anything non-trivial.

“Aha! But we can just get the agent to make the changes we need.” This is where the peak-end rule bites on the backside. Ask users, for example, for feedback on a single design choice, and you’ll get specific, meaningful, useful thoughts. Ask them for feedback on 50 choices, and they’ll talk about the one or two things that stood out, and the last thing they saw. (See also: code reviews – “Looks good to me”).

And then there’s the established fact that LLMs are good at generating code that they’re bad at modifying later. And the more complex the code base is, the worse they get. I wish you the best of luck with that!

You are drinking from a code-generating firehose, and it’s getting out of control.

The answer to your AI-generated woes is feedbackmaxxing. Ask one question at a time. Get an answer as soon as possible. Test continuously. Review continuously. Integrate continuously. Get real-world feedback continuously.

A lot of people struggle to picture what that looks like.

Once you’ve seen it, though, your journey to Feedbackmaxxville (twinned with Gas Town) can begin.

Talking of which…