CRESS Principles for Context Engineering – R is for Refutable

If speculative ideas can not be tested, they’re not science; they don’t even rise to the level of being wrong.

Wolfgang Pauli

When we interact with a language model, we’re communicating in natural language. And communicating in natural language is a lossy process.

There’s what I intended it to mean, and then there’s the meaning the model interprets, and they’re often not the same thing.

Many bad things have happened in the world because the receiver misinterpreted the intent of the sender. So it’s important to know with high confidence if we’ve grabbed the wrong end of the stick.

The inherent ambiguity of natural languages works against our desire to make our meaning clear.

In real-world communication, a simple technique to uncover misunderstandings is to test interpretations to see if they satisfy the original intent.

Including a test in an instruction given to an LLM serves two useful purposes:

  1. It restricts pattern-matching to those that also match the test and not just the natural language instruction. Coding models are actually trained by pairing code samples with tests of some kind, and more recently test execution has been used as a reward function in reinforcement learning. LLMs are sort of build for tests.
  2. It potentially gives us a direct way to check if the output doesn’t satisfy the intent. If our success criteria are turned into executable tests – e.g. unit tests – then we can run them against the output and get immediate feedback.

Imagine we want our LLM to generate code to add items to an online shopping basket. I regularly see prompts that look something like this.

Please generate a Python function for adding items to a shopping
basket. It should take product and quantity as parameters.

But the devil’s in the detail. What exactly are we expecting to happen when the function adds the item? How will we know if it doesn’t happen the way we intended?

I’ve been providing BDD-style tests in my contexts, along the lines of:

Given an empty basket,
And the customer has selected the product with ID 811 and stock of 3
When the customer adds the product to the basket with quantity 2
Then a new order item is added to the basket with product 811 and quantity 2
And 2 of product 811’s stock are put on hold, leaving available stock of 1

This gives the LLM much more to go on regarding the expected behaviour – the precise intent – of adding an item to the basket.

And it can be directly translated into unit tests:

class AddToBasket(unittest.TestCase):
def test_order_item_is_added(self):
basket = []
product = Product(id=811, stock=3)
add_to_basket(basket, product, quantity=2)
item = basket[0]
self.assertEqual(item.product, product)
self.assertEqual(item.quantity, 2)
def test_stock_put_on_hold(self):
basket = []
product = Product(id=811, stock=3)
add_to_basket(basket, product, quantity=2)
self.assertEqual(product.hold, 2)
self.assertEqual(product.available_stock(), 1)

(NB: In my workflow, I’d tackle one test at a time – we’ll cover that in the final two letters in CRESS.)

Provided the executable tests the LLM generates match the intent – and it’s really important to check that they do – any implementation it generates will need to pass them.

If the implementation doesn’t pass the tests, or the tests don’t match the intent, I revert the changes, flush the context (see “C is for Current“) and try again – perhaps adding further clarification to the context, like additional tests, if needed.

Does this really make a difference? It certainly does. I conducted closed-loop experiments where I tasked Claude Code – using Opus 4.6 – to implement a set of features for a small, but non-trivial, system.

I’d written my own reference implementation with tests that used a simple API that didn’t reveal any internal design details. I preserved the API and moved the tests to where Claude couldn’t see them, leaving just my instructions and the API for it to work with.

When Claude had finished, I moved the tests back in to the project and ran them, scoring each pass by the % of tests passing.

I didn’t intervene until Claude said it was done. (In real life, I don’t use it this way, of course.)

In one version of the experiment, I provided BDD-style examples in the prompt. In another, I just gave Claude the basic feature descriptions. In both versions, Claude was instructed to generate its own tests from its interpretation of the requirements.

In a single pass, measured by % of tests passing, the difference was big.

Image

Over multiple passes, feeding back test results after each, the difference got even bigger.

Image

With test examples provided, the agent has explicit success criteria to converge on. Without them, it just goes around in circles, literally aimlessly. Poor little Ralph!

One final thought: not all interactions with an AI coding tool will be about adding or changing functionality. What if the task is a refactoring?

Well, hopefully your refactorings have goals – they’re done with intent to improve the design.

In my TDD workflow, at every green light – whenever the tests are passing again – I perform a mini code review on the changes. I might, for example, run a linter over the diff. Let’s say one of my code quality checks – just another kind of test – is for functions or methods that have a cyclomatic complexity > 5.

If the LLM changes a function and makes CC = 6, I now have a failing test. I could revert and feed that back in another pass (and giving an LLM two objectives in the same interaction reduces the odds of either being satisfied, so we could be here all day throwing the dice over and over again).

Or I could ask the LLM to refactor the function, and then run the check again to see if the restructured version is within limits.

However I choose to handle it, importantly I have a clear way to know when it hasn’t worked.

Is It Time To Get Back To Fundamentals?

I have a friend who built a recording studio in his garden. The building – an adapted garden office – cost £15,000.

Inside, he installed a pre-owned Neve 24-track mixing console with motorised faders in a custom-built desk – total cost: £17,000.

Add to that easily another £15K-20K of high-end gear and studio fittings, he probably spent about £50,000 on that home studio in all. It took him 3 years in his spare time to build it out.

What does it sound like? I don’t know. I’ve never heard any music come out of it.

I, on the other hand, bought a 2-channel audio interface for £150 + some software and recorded 5 albums and 8 EPs – some getting radio-play on rock/metal stations. I was even Indie Band Of The Week on Metal Express Radio.

And it struck me that, while my hobby is making music, Jeff’s real hobby is building studios.

And that, folks, is the current state of AI-assisted software development. I see folks building some pretty elaborate studios, but I’m not hearing much in the way of finished music coming out of them.

Maybe it’s time to get back to basics and start focusing on the end product again?

Talking of fundamentals, if your boss won’t invest in training you in foundational software development practices like Specification By Example and Test-Driven Development, I’m running out-of-hours workshops in May specifically for self-funding learners. £99 + UK VAT.

Have We Lost Sight Of Our Patients & Their Problems?

Psst. If your boss won’t invest in training you in Specification By Example or Test-Driven Development, I’m running out-of-hours workshops in May specifically for self-funding learners. £99 + UK VAT.

Many of us pretend that software releases are an end in themselves – that shipping what we said we would means success. We give the medicine to the patient, and that’s the end of the treatment.

Hopefully your doctor isn’t quite so naïve. The treatment doesn’t end when the pharmacist fills the prescription, or even when the patient takes the medicine.

There’s the little matter of the effect of the medicine on the patient – is it actually working? Does their blood pressure go down? Does their heart rhythm stabilise? Is the medicine producing the desired outcome?

In the UK, if you test positive for any one of three conditions – high blood pressure, Type 2 diabetes or high cholesterol – you’ll be tested for the other two. Bad things tend to come in threes.

If it turns out you’ve got the full set, interventions for all three may be required – ranging from lifestyle changes to prescription drugs, depending on how acute each condition is.

And your doctor’s unlikely to prescribe treatments for all three at once, unless it’s really urgent. Typically, they’ll prescribe, say, a calcium blocker for high blood pressure and then monitor your BP for a while – long enough that they’d expect to see some significant change.

Depending on the feedback – the measurements that indicate what effect the treatment’s having – they may up the dose, or add another prescription, or send you on a meditation course, or confiscate your smartphone. It all really depends on what works and what doesn’t, as measured over time.

Once the numbers are going in the right direction, they may then move on to other treatments for other conditions – e.g., statins for your high cholesterol. And again, they’ll monitor what effect each treatment’s having on the patient in reality.

Biology’s complicated, and the effect of a medical intervention on a specific patient can’t be predicted with high accuracy. Yes, statins will probably bring your cholesterol down, just like it probably won’t snow in London in April. But it’s by no means guaranteed.

Businesses are also complicated, even a tiny business like mine. And the effect of an intervention like, say, changing the design of the home page is by no means guaranteed. We might guess that displaying our top-selling vegan products prominently will increase their sales, but until our changes hit the real world, that’s all it is – guesswork.

And if vegan sales go up, do sales of hamburgers and sausages go down?

The word “solution” implies we’re solving a problem, but this all too often gets lost in the cut-and-thrust of software development. We become bogged down in the detail of prescribing and dispensing the medicine, and too easily lose sight of the patient and their condition.

In my workshops for self-funding learners on Specification By Example, you’ll learn to start not with the prescription nor with the pharmacy, but to put the patient & the problem at the centre of the development process.

Specification By Example – May 12 18:45 BST & May 16 09:45 BST

The AI-Ready Software Developer #18 – “Productivity”. You Keep Using That Word.

It’s 20 years since I created a website with the banner “I Care About Software” as part of a loose “post-agile” movement that sought to step back from the tribes and factions that had grown to dominate software development at the time.

Regardless of whether we believed X, Y or Z was the “best way”, could we at least agree that the outcomes matter?

It matters if the software does what the user expects it to do. It matters if it does it reliably. It matters that it does it when they need it. It matters that when they need it to do something else, they don’t have to wait a year or three for us to bring them that change.

Unlike many other professions, and with few exceptions, we’re under no compulsion to produce useful, usable, reliable software or to be responsive to the customer’s needs. It’s largely voluntary.

We don’t usually get fined when we ship bugs. We won’t be sanctioned if the platform goes down for 24 hours. We won’t get struck off some professional register if the lead time on changes is months or years (or never).

(Of course, eventually, if we’re consistently bad, we can go out of business. But historically, another job – where we can screw up another business – hasn’t been difficult to find, even with a long trail of bodies behind us.)

And we don’t usually get a bonus for releases that go without incident, or a promotion for consistently maintaining short lead times.

In this sense, we have less incentive to do a good job than a takeaway delivery driver.

A friend once kindly introduced me to the project managers in her company to give them the old “better, sooner, for longer” pitch. I talked about teams I’d worked with who had built the capability to deliver and deliver and deliver, week after week, year after year, with no drama and no fires to put out.

They actually said the quiet part out loud: “But we get paid to put out the fires!”

For software developers, the carrot and the stick usually have very little to do with actual outcomes that customers and end users might care about. This is evidenced by the fact that so few teams keep even one eye on those outcomes.

The average development team doesn’t actually know how much of their time is spent fixing bugs instead of responding to user needs. They don’t know what their lead times are, or how they might be changing over the lifetime of the product or system. They’re often the last to know when the website’s down.

Most damning of all, the average development team has no idea what the users’ needs or the business goals of the product actually are. And that’s where the value that we all talk about really is, you’d have thought.

And so it’s entirely possible – inevitable, even – for the priorities of dev teams and of the people paying for and using the software to become very misaligned.

I’m always struck by the chasm that can grow between them, with developers genuinely believing they’re doing a great job while users just roll their eyes. You’d be surprised how often teams are blissfully unaware of how dissatisfied their customers are.

So, before you start that 2-year REPLACE ALL THE THINGS WITH RUST project, stop to ask yourselves “What impact would this have on overall outcomes?”

If your goal is to make your software more memory-safe, are there other ways that might be less radical or disruptive? (You might be surprised what you can do with static analysis, for example.)

Is it possible to do it a bit at a time, under the radar, to minimise the impact on customer-perceived value?

Will it really solve any problem the business actually has at all? I’m a fan of asking what the intended business outcomes are. You’d be amazed how often technical initiatives explode on contact with that question.

Which brings me to the topic de jour. The Gorman Paradox asks why, if “AI” coding assistants are having the profound impact on development team productivity many report – 2x, 5x, 10x, 100x (!) – we see no sign of that in the app stores, on business bottom lines, or in the wider economy? Where’s all this extra productivity going?

I also have to ask why the reports of productivity gains using “AI” vary so widely, with anecdotal reports of increases in excess of 1000%, and measured variances in the range of -20% to +20%.

The words doing all the work here are “anecdotal” and “measured”, I suspect. But also, in precisely what is being measured.

Optimistic findings are usually based on measurements of things the customer doesn’t care about – lines of code, commits, Pull Requests etc.

The pessimistic – or certainly less sensational – findings are usually based on measurements of things the customer does care about, like lead times, reliability and overall costs.

It’s well-understood why producing more code faster – faster than we can understand it and test it – tends to overwhelm the real bottlenecks in the software development process. So there’s no great mystery about how “AI” code generation can actually reduce overall system performance.

What has been mysterious is why some teams see it, and most teams don’t.

They attach a code-generating firehose to their process and can’t understand why the business is complaining that they’re not getting the power shower they were promised.

There is a candidate for a causal mechanism. Most teams don’t see the impact on systemic outcomes because they’re not looking.

So when a developer tells you that, say, Claude Code has made them 10x more productive, they’re not lying. (Well, okay, maybe some of them are.) They just have a very different understanding of what “productivity” means.

If we’re to survive as professionals in this “age of AI”, I recommend pinning your flag to the mast of user needs and business outcomes.

Most importantly, we should be measuring our success by the business goals of the software, or the feature, or the change. If the goal is to, say, increase our share of the vegan takeaway market, the ultimate test is whether in reality we actually do.

This is the ultimate definition of “Done”.

We claim to develop software iteratively, but that implies we’re iterating towards some goal. If iterations don’t converge, we get (literal) chaos – just a random walk through solution space. Which would be a sadly accurate summary of the majority of efforts, with most teams unable to articulate what the goals actually are. If, indeed, there are any.

Aligning Teams Around Shared Goals (Is A Very Good Idea)

Software development peeps: if I asked you what is the ultimate business goal of the software you’re working on, would you know? Are you sure it even has one?

I’m gonna tell you a story from my early days as a contractor working in London. I took over the lead in a team of 8 developers, and very quickly could see that things were going badly.

Putting aside all the technical obstacles they were wrestling with, like a heavy reliance on manual regression testing, and everybody trying to merge the day before a scheduled release etc, the thing that really struck me was the huge amount of time being spent on arguing.

Arguing about the tech stack. Arguing about the architecture. Arguing about the approach. The team was split into factions, all pulling in different directions, providing no net forward momentum.

In my experience, this is what happens when teams don’t have a clear direction to align on. What this team needs, I thought to myself, is a goal – a magnetic north to get them pointing roughly in the same direction.

So I went back to the business – because our business analysts couldn’t answer the question (and hadn’t asked, evidently) – and asked “What are you hoping to get from this new system?”

I’m going to change their goal to protect the innocent (it’s a bit of a giveaway). Let’s imagine they replied, “We’re looking to grow our vegan customer base”.

Now the team’s arguments had their magnetic north. How will this help grow our vegan customer base? You’d be surprised, in that light, how many of the contentions simply evaporated. (Or maybe you wouldn’t.)

You might be less surprised by the profound shift in the team’s focus. Use cases got dropped. New use cases were explored. The technical architecture got simplified. Communication improved, both within the team and with other dev teams, ops, and the business.

Because now we all had something in common to talk about.

Software products and systems don’t exist in a vacuum. They’re almost always part of something bigger. And if religion teaches us anything, it’s that people like to feel part of something bigger than themselves.

The shift in focus from delivering software to solving a problem can completely rewrite priorities and realign teams. And it completely pulls the rug from under what most developers think of as “productivity”.

Why does this matter more today?

We’re currently seeing our industry go through quite possibly it’s worst navel-gazing episode, certainly since I’ve been in it. I’ve never seen so many developers obsessing over the “how”, and not giving a moment’s thought to the “what”, let alone the “why”.

Who cares how fast we can climb the wrong mountains?

And finally, let’s pause to reflect on that word, “iterative”. Iteration without convergence is chaos – literally.

That rather begs the question, converging on what, exactly?

“Productivity”. You Keep Using That Word.

Bill writes a book with about 80,000 words. It takes him 500 hours.

Priti writes a book with about 60,000 words. It takes her 2,000 hours.

Which author is most productive?

It’s a nonsensical question, of course.

Maybe Priti’s book sold 10x what Bill’s did. Maybe it won the Booker Prize. Maybe Priti was commissioned straight away to write another book, with a big advance. Maybe Steven Spielberg bought the rights to Priti’s book and he’s going to make it into a blockbuster movie franchise.

Maybe Bill’s kids, who he wrote the book for, absolutely love it, and he doesn’t care what anybody else thinks. Maybe Bill’s book sold 100 copies and changed 100 lives. Maybe, in writing the book, Bill came to terms with a past trauma and is moving on with his life.

Here’s the thing about “productivity”: until we know what the goals are, it’s meaningless.

And if they have distinctly different goals, it’s also meaningless to measure Priti’s against Bill’s performance. Bill is apples, and Priti is oranges.

The industry’s recent obsession with developer productivity is equally meaningless. More code faster is greater productivity? More features? More Pull Requests?

If Priti’s book has more chapters than Bill’s, is Priti a more productive author?

If we create a software solution with the aim of reducing the cost of deliveries for our business, and the cost goes up, do we look in the repo to see if it was because we didn’t write enough code or do enough commits?

Or do we sit down and have a good long think about what changes we could make that would bring us closer to the goal? Do we observe the system in action to see where costs might be accruing?

And would that good long think and that observing end up as a statistic in some study about “lazy developers”, because we’re not producing more stuff?

Any formula that aims to describe software development productivity should include a term for value created. And teams should be empowered to move that dial – to choose their battles so their expensive and limited time is better-spent.

And “value” is very much in the eye of the beholder. What matters to me might not matter to you. It could be measured in dollars or pounds or yen. It could be measured in cats rehomed. It could be measured in meals on wheels delivered. It could be measured in lives saved.

And it’s usually multi-dimensional. Many businesses have been dashed on the rocks of a blinkered outlook, chasing one target at the expense of all others.

“If you give a manager a numerical target, he’ll make it even if he has to destroy the company in the process.”

– W. Edwards Deming.

Tensions within a business – like the HR team trying to improve employee morale while finance are cutting childcare – can be created by failing to consider the dependencies between outcomes, and to balance the needs of multiple groups of stakeholders.

With any goal, it’s important to explore how it might impact in different perspectives. Sure, we can cut costs on ingredients, but will the customers still enjoy the taste? All very well increasing margins, but for naught if we’re losing custom.

In the average software organisation, so little thought is given to why we’re building what we’re building. And when it is (usually by business stakeholders), those goals are not often communicated to development teams. It’s one of the most common complaints I hear from developers – nobody’s told them what that feature or change is actually supposed to achieve. What problem does it solve?

And this can easily translate into dysfunctions in the organisation of teams. Teams organised around technology stacks and technical disciplines -“front-end”, “back-end”, “database”, “QA”, “ops”, “architecture”, “UX design” – have about as much chance of achieving a business goal as a sack full of ferrets has of changing a lightbulb.

Smart companies organise around business outcomes, creating cross-functional teams of technical and business experts tasked with solving a business problem. We are on the same team because we share the same goal.

And when we iterate working software into the business, we have a clear idea of what it is we’re iterating towards, and ways to know when we’re getting closer to achieving it. Software development is an iterative, goal-seeking process.

Each release is an experiment. We don’t just push code out the door and move on. We observe as the experiment plays out, and learn for the next iteration.

Don’t optimise for delivery. Optimise for learning.

And yes, that’s going to create gaps in your commit history. But just because you shipped, that doesn’t mean you’re done yet.

If Releases Are Experiments, What’s Your Hypothesis?

A view I share with a small but growing number of people is the idea that software releases are experiments. An experiment needs a hypothesis, and that hypothesis needs to be falsifiable – otherwise how can we meaningfully test it?

“Make user happier”, “Make performance gooderer”, “Increase engagement”. These hypotheses are what Wolfgang Pauli would have called “not even wrong”.

“Reducing average time taken to find and order vegan takeaway will increase vegan orders” is a testable hypothesis.

But even then, it could probably do with some tightening – how do we measure that time? Is it the time from the customer’s session starting to an order being placed? And what do we mean by a “vegan order”? An order from a vegan restaurant? An order with all items marked as ‘vegan’, regardless of the restaurant? etc.

This is where domain modeling can come into its own. Hypotheses can be expressed formally in terms of the model (e.g., with UML and OCL, or entity-relationship models + pseudo-SQL), so we’re left in no doubt precisely what we mean by “average time taken” and “vegan orders”, and can know with high confidence when the data we get from running the experiment contradicts it.

And the design of each experiment is crucially important to make sure we’re getting the best – most useful – data. How big does the sample size need to be for statistically significant results? Could we run it in one region? How long should we run it for? When the experiment is over, how do we roll it back if we need to?

Within the context of a process of Continuous Delivery, where for some teams the mantra is “one feature per release”, we might instead practice “one hypothesis per experiment”. The learning’s from each experiment are then fed back into another pass, where we formulate a new hypothesis if ours was refuted by the data. Or, if the experiment was a success, we move on to the next problem: “I’m worried some of these restaurants aren’t preparing vegan food separately”. And around we go.

You’re totally going to start doing this tomorrow, aren’t you?*

* I’ve been asking that every year for the past 28 years.

The A-Z of Code Craft – Q is for Quality

Image

Gosh! Where to start with “quality”? Okay, let’s nip this one in the bud. I’m not going to be talking about testing.

I suppose an interesting place to start might be with the word’s meaning.

“The standard of something as measured against other things of a similar kind; the degree of excellence of something.”

This definition of “quality” is the one we reach for when we say things like “It’s a high-quality hotel” and “Feel the quality of this leather”. It suggests that something is better than other things of the same purpose or nature.

And when many people think of “code craft”, this is often the picture they have in their mind: a “master craftsperson” making superior products with skill and care.

But it’s a little too vague to be a useful definition. “Superior” in what ways?

Which brings us to a second definition:

“A distinctive attribute or characteristic possessed by someone or something.”

Here, there’s no concept of “better”. “He had a shifty quality about him” is not a glowing review. “The chicken had a rubbery quality” is not a 5-star recommendation. We’re just describing an attribute or a property. Whether that atribute or property is good or bad, or better or worse, is very much in the eye of the beholder. Hey, maybe you like your chicken rubbery!

In the context of software, there are infinite qualities we can measure and describe: lines of code, number of features, maximum concurrent users, downtime, meantime to failure, difficulty for users to learn, and so on. We can go on (and on) describing software in its infinite dimensions until the cows come home.

But not all qualities are things we might care about, or that our customers might care about, or that end users might care about, or that industry regulators might care about.

So we have to choose what qualities we’re going to focus on. We have to decide what’s important to us. Because, and here’s the funny thing, when we set our sights on a quality, it has a tendency to be manifested. You get what you measure. (Or “Be careful what you wish for!”)

Arguably, the most elusive and notoriously mercurial quality of software is its value. The industry’s still stuck in an old-fashioned, one-dimensional mindset about the value of software products and systems, namely money. Chasing a single number in a complex world is typically a recipe for dysfunction. The best example in recent history is “shareholder value”, the invention of which could now be considered potentially an extinction-level event.

“People with targets and jobs dependent upon meeting them will probably meet the targets – even if they have to destroy the enterprise to do it.”

W. Edwards Deming

But some organisations have started to take a more balanced view. What, for example, is the impact of a software product on the reputation of a business? What contribution does a product make to communities the business interacts with? How does a product help or hinder in attracting the best hires?

The most effective approaches to defining quality and designing strategies for achieving it take a balanced view, considering multiple perspectives, and building and testing theories about how one quality relates to others. This helps teams avoid falling into the trap of lowering the cost of making the proverbial cakes at the expense of, say, customer satisfaction and future sales.

When I’m leading the team – when it’s up to me, basically – our development process, as well as our approach to improving the development process, is driven not by an idea for a product or a solution, but by a set of goals that take into account a balance of needs. Call it “outcome-oriented”, “goal-oriented” or even “problem-oriented” development.

We don’t start by describing the software or the system. We start by describing how the world will be different with the software in it. And we expect our understanding of that to change as we learn more through releasing software into that world, just as we expect that world itself to change because of the software.

In this sense, the software isn’t an end in itself, but part of a wider and continuously evolving strategy. The original sin of software development management, from which all the other sins flow, is failing to involve the developers in the formulation of that strategy – failing to make them stakeholders in the outcome – and failing to give them a reason to care about quality.


If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, visit codemanship.co.uk for details of high-quality hands-on training mentoring for software developers.

The A-Z of Code Craft – I is for Iterative

Image

A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work.

John Gall, Systemantics: How Systems Work And Especially How They Fail (1975)

Software developers have known since pretty much the start that getting a solution of any appreciable complexity right in a single pass is nigh-on impossible.

That should come as no surprise. The chances of one line of code being spot-on might be reasonably good. But 100 lines? 100,000 lines? 1 million lines? The odds are so stacked against us that they’re effectively zero.

We like to think of software as machines, but the complexity of modern software systems is more comparable to biology. We’ve never built machines with that many moving parts.

Nature has a tried and tested way of solving problems of this level of complexity, though: EVOLUTION.

As Gall notes, we don’t start with the whole all-singing, all-dancing version. We start simple – as simple as possible – and then we iterate, adding more to the design and feeding back lessons learned from testing to see if the software’s fit for purpose. Importantly, every iteration of the software works (and if it doesn’t, git reset –hard). Complexity emerges one small, simple step at a time.

When we approach it like this, the emphasis in software developments shifts profoundly, from delivering code or delivering features, to learning how to achieve users’ goals and solve problems. This is why frequent small releases of working software, designed in close collaboration with our users, is so very important.

It’s also why the most effective teams are always keeping one eye on the prize, continuously – there’s that word again! – revisiting the goals and asking “Did we solve the problem?” If the answer’s “No”, and it usually is, we go round again, feeding back what worked and what needs to change into the next small release. The faster we iterate, the sooner we solve the problem.

Rapid iteration of solutions is no small ask, though. If want to put working software in the hands of users, say, once a day, then that software needs to be tested and integrated at least once a day (and probably many times a day). Once we pull on that thread, a whole set of disciplines emerge that some of us call “code craft”.

And so here we are!


If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, my Code Craft and Test-Driven Development live remote training workshops are HALF PRICE until March 31st 2025.

Where’s User Experience In Your Development Process?

I ran a little poll through the Codemanship twitter account yesterday, and thought I’d share the result with you.

There are two things that strike me about the results. Firstly, it looks like teams who actively involve user experience experts throughout the design process are very much in the minority. To be honest, this comes as no great surprise. My own observations of development teams over years tend to see UXD folks getting involved early on – often before any developers are involved, or any customer tests have been discussed – in a kind of a Waterfall fashion. “We’re agile. But the user interface design must not change.”

To me, this is as nonsensical as those times when I’ve arrived on a project that has no use cases or customer tests, but somehow magically has a very fleshed-out database schema that we are not allowed to change.

Let’s be clear about this: the purpose of the user experience is to enable the user to achieve their goals. That is a discussion for everybody involved in the design process. It’s also something that is unlikely we’ll get right first time, so iterating the UXD multiple times with the benefit of end user feedback almost certainly will be necessary.

The most effective teams do not organise themselves into functional silos of requirements analysis, UXD, architecture, programming, security, data management, testing, release and operations & support and so on, throwing some kind of output (a use case, a wireframe, a UML diagram, source code, etc) over the wall to the next function.

The most effective teams organise themselves around achieving a goal. Whoever’s needed to deliver on that should be in the room – especially when those goals are being discussed and agreed.

I could have worded the question in my poll “User Experience Designers: when you explore user goals, how often are the developers involved?” I suspect the results would have been similar. Because it’s the same discussion.

On a movie production, you have people who write scripts, people who say the lines, people who create sets, people who design costumes, and so on. But, whatever their function, they are all telling the same story.

The realisation of working software requires multiple disciplines, and all of them should be serving the story. The best teams recognise this, and involve all of the disciplines early and throughout the process.

But, sadly, this still seems quite rare. I hear lip service being paid, but see little concrete evidence that it’s actually going on.

The second thing I noticed about this poll is that, despite several retweets, the response is actually pretty low compared to previous polls. This, I suspect, also tells a story. I know from both observation and from polls that teams who actively engage with their customers – let alone UXD professionals etc – in their BDD/ATDD process are a small minority (maybe about 20%). Most teams write the “customer tests” themselves, and mistake using a BDD tool like Cucumber for actually doing BDD.

But I also get a distinct sense, working with many dev teams, that UXD just isn’t on their radar. That is somebody else’s problem. This is a major, major miscalculation – every bit as much as believing that quality assurance is somebody else’s problem. Any line of code that doesn’t in some way change the user’s experience – and I use the term “user” in the wider sense that includes, for example, people supporting the software in production, who will have their own user experience – is a line of code that should be deleted. Who is it for? Whose story does it serve?

We are all involved in creating the user experience. Bad special effects can ruin a movie, you know.

We may not all be qualified in UXD, of course. And that’s why the experts need to be involved in the ongoing design process, because UX decisions are being taken throughout development. It only ends when the software ends (and even that process – decommissioning – is a user experience).

Likewise, every decision a UI designer takes will have technical implications, and they may not be the experts in that. Which is why the other disciplines need to be involved from the start. It’s very easy to write a throwaway line in your movie script like “Oh look, it’s Bill, and he’s brought 100,000 giant fighting robots with him”, but writing 100,000 giant fighting robots and making 100,000 giant fighting robots actually appear on the screen are two very different propositions.

So let’s move on from the days of developers being handed wire-frames and told to “code this up”, and from developers squeezing input validation error messages into random parts of web forms, and bring these – and all the other – disciplines together into what I would call a “development team”.