Codemanship's Blog – Page 3 – Musings And Mutterings By Jason Gorman

Is It Time To Get Back To Fundamentals?

I have a friend who built a recording studio in his garden. The building – an adapted garden office – cost £15,000.

Inside, he installed a pre-owned Neve 24-track mixing console with motorised faders in a custom-built desk – total cost: £17,000.

Add to that easily another £15K-20K of high-end gear and studio fittings, he probably spent about £50,000 on that home studio in all. It took him 3 years in his spare time to build it out.

What does it sound like? I don’t know. I’ve never heard any music come out of it.

I, on the other hand, bought a 2-channel audio interface for £150 + some software and recorded 5 albums and 8 EPs – some getting radio-play on rock/metal stations. I was even Indie Band Of The Week on Metal Express Radio.

And it struck me that, while my hobby is making music, Jeff’s real hobby is building studios.

And that, folks, is the current state of AI-assisted software development. I see folks building some pretty elaborate studios, but I’m not hearing much in the way of finished music coming out of them.

Maybe it’s time to get back to basics and start focusing on the end product again?

_{Talking of fundamentals, i}_{f your boss won’t invest in training you in foundational software development practices like Specification By Example and Test-Driven Development, I’m running out-of-hours workshops in May specifically for self-funding learners. £99 + UK VAT.}

The Hottest AI Coding Skill In 2026? Coding Without AI

_Psst. _{If your boss won’t invest in training you in Specification By Example or Test-Driven Development, I’m running out-of-hours workshops in May specifically for self-funding learners. £99 + UK VAT.}

Statistics are showing that software developer hiring’s been on the rise again for about a year, but it seems priorities have changed.

Time was that our profession was a pyramid, with a base of entry-level and “junior” hires outnumbering seasoned professionals. But this time, we’re an aging population, with employers favouring developers with significant pre-AI experience.

“The current market has become increasingly senior-driven, with fewer junior roles available and employers expecting even entry-level candidates to have all the skills to hit the ground running.”

Harvey Nash, Software trends in the year ahead: A UK hiring outlook

Meanwhile, my LinkedIn feed’s awash with posts complaining that not only are coding tests still very much a thing, but they’re becoming even more of a thing. “Why”, they lament, “do employers want coding skills when AI can do all that?”

The problem is that – a significant proportion of the time – AI actually can’t do all that. Anyone who’s used AI coding assistants and agents for more than an hour or two will know that there are many times when we have to intervene. And intervening at the very least requires us to really understand what we’re intervening in.

As a trainer and mentor, I’ve watched code comprehension degrade alarmingly over the past 10 years as developers have been relying more and more on copying and pasting code from sources like Stack Overflow.

I especially implore junior developers not to do it. It very noticeably stunts their growth as programmers. The code really does need to go in through your eyes, through your brain and out through your fingers for knowledge to sink in. The language and reasoning centres of our brain need to be actively engaged.

If you lost your phone and urgently needed to call a friend, could you remember their number? Speed-dialing code has a similar effect.

In the last year, that’s gone into warp drive now that we have a powerful tool that automates the copypasta process at scale.

The effect of AI use on code comprehension is well-documented. The more we use it, the less we understand the code that’s being generated. And not just because we didn’t write it, it transpires – our ability to understand code generally atrophies. Use it or lose it.

Employers in 2026, it seems, favour developers who haven’t lost it. As I’ve been only half-joking for over a year, the devs who’ll be in highest demand in this age of AI will be the ones who don’t need it.

With highly public outages becoming routine, it looks like managers are reprioritising to make sure that when the you-know-what hits the fan on a critical system at 2 am, the person answering that call is capable of fixing it even when the AI is having one of its famous senior moments.

Those same employers, of course, then insist that the senior developers they hired because of their effectiveness without AI should use AI as much as possible. Sigh.

In my blog series The AI-Ready Software Developer, I wrote a post called Staying Sharp about how important it’s going to be to maintain our “traditional” programming skills. We need to find some time in the day or the week to leave the proverbial car at home and walk.

And if hitting your token limit is a blocker to doing more programming, you may already have deskilled yourself out of the market.

Essential Code Craft – Workshops In May

May will soon be upon us, and I’ve scheduled three out-of-hours workshops for self-funding learners in my Essential Code Craft series.

If your employer won’t invest in you, invest in yourself and join us.

Tues May 12th 18:45 BST & Sat May 16th 09:45 BST- Specification By Example

Over more than 70 years of developing software products and systems, we’ve learned that misunderstandings about the meaning of requirements is one of the biggest sources of avoidable rework.

Reducing ambiguity in specifications can dramatically reduce the risk of misinterpretation, whether it’s among human stakeholders or when we’re working with AI coding tools.

Tues May 19th 18:45 BST – Test-Driven Development

For nearly 30 years, Test-Driven Development has been the technical core of successful agile software development.

Teams have shortened delivery lead times dramatically, while actually improving the reliability of their releases, and lowering the cost of changing software using TDD.

And TDD is proving to be not just compatible with AI-assisted software development, but essential.

Have We Lost Sight Of Our Patients & Their Problems?

Many of us pretend that software releases are an end in themselves – that shipping what we said we would means success. We give the medicine to the patient, and that’s the end of the treatment.

Hopefully your doctor isn’t quite so naïve. The treatment doesn’t end when the pharmacist fills the prescription, or even when the patient takes the medicine.

There’s the little matter of the effect of the medicine on the patient – is it actually working? Does their blood pressure go down? Does their heart rhythm stabilise? Is the medicine producing the desired outcome?

In the UK, if you test positive for any one of three conditions – high blood pressure, Type 2 diabetes or high cholesterol – you’ll be tested for the other two. Bad things tend to come in threes.

If it turns out you’ve got the full set, interventions for all three may be required – ranging from lifestyle changes to prescription drugs, depending on how acute each condition is.

And your doctor’s unlikely to prescribe treatments for all three at once, unless it’s really urgent. Typically, they’ll prescribe, say, a calcium blocker for high blood pressure and then monitor your BP for a while – long enough that they’d expect to see some significant change.

Depending on the feedback – the measurements that indicate what effect the treatment’s having – they may up the dose, or add another prescription, or send you on a meditation course, or confiscate your smartphone. It all really depends on what works and what doesn’t, as measured over time.

Once the numbers are going in the right direction, they may then move on to other treatments for other conditions – e.g., statins for your high cholesterol. And again, they’ll monitor what effect each treatment’s having on the patient in reality.

Biology’s complicated, and the effect of a medical intervention on a specific patient can’t be predicted with high accuracy. Yes, statins will probably bring your cholesterol down, just like it probably won’t snow in London in April. But it’s by no means guaranteed.

Businesses are also complicated, even a tiny business like mine. And the effect of an intervention like, say, changing the design of the home page is by no means guaranteed. We might guess that displaying our top-selling vegan products prominently will increase their sales, but until our changes hit the real world, that’s all it is – guesswork.

And if vegan sales go up, do sales of hamburgers and sausages go down?

The word “solution” implies we’re solving a problem, but this all too often gets lost in the cut-and-thrust of software development. We become bogged down in the detail of prescribing and dispensing the medicine, and too easily lose sight of the patient and their condition.

In my workshops for self-funding learners on Specification By Example, you’ll learn to start not with the prescription nor with the pharmacy, but to put the patient & the problem at the centre of the development process.

Specification By Example – May 12 18:45 BST & May 16 09:45 BST

Essential Code Craft – The Roadmap

Some of you may have noticed that I’ve been running out-of-hours training workshops for self-funding learners recently, under the banner of Essential Code Craft.

In a way, this is a return to the early days of Codemanship when I ran regular weekend workshops – priced for individual pockets – that were mostly attended by developers investing in their own skills and career development.

Many of those people are now CTOs and heads of engineering, and I’ve been fortunate – and grateful – that quite a few have brought me in to provide the same kind of training for their teams.

But with senior engineering leaders now very distracted by the code-generating firehose – and while I wait for them to realise that nothing’s actually changed as far as software engineering fundamentals are concerned – I’m pivoting back to self-funders.

So far – just as it was way back when – the first two workshops filled up quickly. While the boss might not be thinking about investing in their developers at the moment, it seems a lot of developers are looking to invest in themselves.

And this is exactly the moment to do it. While a gazillion developers hunt for magic incantations to make a probabilistic next-token predictor act like something other than a probabilistic next-token predictor, the people who’ve done their homework already know: better results with AI coding tools have very little to do with the tools, and almost everything to do with the processes around them.

And it’s a double-win. The practices that produce the best outcomes with AI are the exact same practices that produce the best outcomes without AI.

The key to being effective with AI is being effective without it.

And here’s the hedge, but only for the informed gamblers – developer hiring is rising again, but the demographic of these new hires is changing. Employers are favouring senior developers with significant pre-LLM experience.

I, and a few others, predicted this would happen. Demand would be highest for people who can do the things AI coding tools can’t – like, well, understand code. I mean really understand it. Not “LGTM” understanding. Deep comprehension of programs.

Not only that, but for all kinds of good reasons – economic, environmental, energy, ethical, geopolitical – the future of hyperscale LLMs is by no means predictable. Folks grappling with reduced token limits and rapidly degrading performance with Anthropic’s newest models will hopefully have figured out by now that building workflows that depend heavily in hyperscale LLMs is building on quicksand.

Who are Acme Megacorp gonna’ hire – the dev who sits on their hands because they’re waiting for their token limit to reset, or the dev who can just carry on at roughly the same overall pace of delivery?

And we should be under no illusions that teams who’ve mastered the fundamentals of software delivery are routinely outperforming teams who haven’t – with or without AI. AI is clearly not the differentiator.

So, whether you’re going to apply these disciplines with Claude Code or Codex, or with IntelliJ or VS Code, they still matter – arguably more than ever.

And what are these disciplines? What is Essential Code Craft?

Specification By Example – build shared understanding and pin down requirements with testable specifications
Test-Driven Development – rapidly iterate working software designs with short delivery lead times and reliable releases
Continuous Integration – keep teams more in sync with their changes, merging and testing them many times a day to ensure a working, shippable-at-any-time product
Continuous Collaboration – keep teams on the same page by continuously communicating with practices like pair programming and teaming
Refactoring – reshape code to make change easier, while keeping it working and shippable at all times
Modular Design – optimise software architecture to localise the “blast radius” and minimise the cost of changes, while making rapid testing and smarter reuse easier
Continuous Inspection – minimise the bottleneck and the “LGTM” effect of downstream code review by making it a continuous and highly automated process
Continuous Delivery – combine these fundamentals in a delivery process that can get the proverbial peas from the farmer’s field to the kitchen table through rapid, reliable integration, build and deployment pipelines
Continuous Improvement – build development capability in an evidence-based way, learning what really works and what doesn’t as you build skills, automate tools and workflows, and explore and experiment with your approach – and that’s where I come in!)

Workshops on Specification By Example and Test-Driven Development are already live and taking registrations. If there’s demand, more will follow.

The roadmap is to build a set of repeating individual workshops, rotating monthly, that will eventually cover all of these disciplines – some explicitly, some implicitly like Continuous Integration and pair programming, which will be an integral part of most workshops.

Self-funders can pick and choose which to attend, and my hope is that they’ll be a bit like Pokemon cards – gotta collect ’em all!

Keep an eye on the Codemanship Ticket Tailor box office for details of upcoming workshops.

Also, details of new workshop times will be posted here first, so subscribe to this blog if you’d like to be kept in the loop for future workshops.

Specification By Example Was Essential Before AI. It’s Twice As Essential Now.

_Psst. _{If your boss won’t invest in training you in Specification By Example, I’m running out-of-hours workshops on May 12 and 16 specifically for self-funding learners. £99 + UK VAT.}

The research I’ve done over the last 3 years into AI-assisted programming, including my own closed-loop experiments, found that one major factor in the likelihood that an LLM will correctly interpret a specification is whether or not examples are included to clarify requirements.

Completion rates – as measured by acceptance tests passed – improve dramatically, even in a single pass.

In multiple passes, with feedback from acceptance testing, models given examples converge on impressive completion of ~80%, while without examples they tend to just go around in circles, with completion barely improving.

This should come as no surprise, because we saw a similar effect with dev teams before AI coding tools appeared on the scene. Teams who clarify requirements using examples are much more likely to interpret what the customer (or the product manager, or the business analyst) means correctly.

And, as requirements misunderstandings are typically one of the biggest sources of avoidable rework, they save a lot of time and money correcting mistakes that could have been spotted before a line of code was written.

The LLM equivalent means the same outcomes – features delivered as the prompter intended – in fewer passes, using fewer tokens (and burning down fewer proverbial forests).

Done right, specifications with examples can be translated pretty directly into executable tests that can drive the design and development of working software using techniques like Test-Driven Development.

A specification for totaling items in an order that uses examples is test-ready. Essentially, it is a test.

    def test_one_item(self):
        product = Product(id=327, price=159.95, stock=7, hold=1)
        order = Order([Item(product=product, quantity=1)])

        total = order.total()
        
        self.assertEqual(total, 159.5)

When I’ve provided specifications with examples in TDD training workshops, and measured successful interpretation of requirements by students, I’ve found the same trend that my experiments found with LLMs – it roughly doubles, and often hits 100% completion.

When I don’t include examples… Well, I’ve lost count of the number of times students thought that telling the Mars Rover to turn right moved it x+1, or that Roman Numerals should be converted into integers. As a trainer, it saves me and my students a lot of time – especially if I get to them later.

But humans can do things LLMs can’t, like understand, reason and learn. So levels of misinterpretation tend to be lower, because we can apply an understanding of the world and judge whether a requirement makes sense. Misinterpretation by AI coding assistants is a higher risk, and therefore the need to clarify is significantly heightened.

As is the need to use language consistently. While some folks claim – presumably because they haven’t tested it in any meaningful way – that LLMs don’t need code to be human-readable, the evidence is clear that they really, really do.

I’ve seen many times myself how completion rates dropped significantly when code wasn’t clearly and consistently signposted, using language that had a close conceptual correlation to our specifications. If I call it “sales tax” in one interaction, and “VAT” in another, the model struggles to anchor on a name for that variable, often interpreting them as distinct variables in the code.

Specifying with examples gives us an opportunity to establish a shared vocabulary for describing our problem domain, which aids communication between stakeholders, but also between humans and LLMs.

When developers are stuck for a name for a class or a function that makes the intent of that code clear, I encourage them to write that intent in plain English and take inspiration from that. Specifying with examples can help establish a shared language before a line of code’s been written.

The Mythical Agent-Month

_Psst. _{If your boss won’t invest in training you in Specification By Example (BDD, ATDD), I’m running out-of-hours workshops on May 12 and 16 specifically for self-funding learners. £99 + UK VAT.}

One of the more frustrating aspects of this new “AI era” is watching people rediscover things we’ve known about software development for many decades.

Apparently, for example, it really helps if our specifications include tests when we use AI. You don’t say?

Perhaps most annoying, though, are pronouncements that – “with AI”, of course – small teams outperform large teams. Yes indeed. “With AI”, a team of 6 developers can do the work of a team of 26.

Forget the fact that we’ve known that a team of 6 will tend to outperform a team of 26 for at least 50 years. Or the fact that we know precisely why small teams tend to get more done than large teams.

It’s called “Brooks’ Law”, named after Fred P. Brooks, author of the seminal software engineering book The Mythical Man-Month, published in 1975.

Among the topics Brooks expounds on is the effect of team size on the lines of communication required to keep that team in sync.

It can be calculated by imaginary pair-programming configurations. A team of 2 can only pair with each other, so they are continuously synchronising.

A team of 3 – let’s call them Aunt Flo, Farmer Barleymow and Bod (since we’re in 1975) – can be paired 3 ways to keep in sync:

Flo & Barleymow
Flo & Bod
Bod & Barleymow

A team of 4 requires 6 different pairings to keep everyone in sync. A team of 5 requires 10 unique pairings.

With each additional team member, the lines of communication required to keep everyone on the same page increases non-linearly. As the team grows, the communication overhead explodes.

The upshot of all this is that larger teams spend orders of magnitude more time keeping in sync, or – more commonly – dealing with the downstream consequences of not keeping in sync.

Brooks’ Law simply states that adding people to a late project makes it later.

It also means that adding developers to a team has rapidly diminishing returns. A team of 6 will probably get more done than a team of 3, but not twice as much. And a team of 12 may well get less done than a team of 6.

So am I at all surprised by this revelation that – “with AI” – a team of 6 can deliver more value than a team of 26? No. It’s exactly what I’d expect, with or without AI.

And I’ve seen it play out many times in my career: that cohesive, small team who were getting shit done has a dozen more bodies thrown at them in the mistaken belief that more shit will get done faster.

Typically, productivity slows to a crawl as the team attempts to get everybody pushing in roughly the same direction.

When it comes to software economics, team size – and team makeup – has very high leverage. Always has, and always will, regardless of who’s typing the code.

So it’s curious how, in the 51 years since TMM-M was published, the standard management response to slipping schedules and escalating costs has been to increase team size. It’s almost as if they haven’t heard of Brooks’ Law.

Now, here’s where this gets interesting. The combinatorial explosion in lines of communication required to keep team members in sync applies whether those team members are human or AI.

Agents working on the same code need to keep as much as possible on the same page about what’s being done to what. When 2 agents work in parallel on their own branches, the longer they go without synchronising, the more their picture of the plan and of the code drifts apart, and the bigger the risk of conflicts grows.

Add a third, and that overhead increases linearly. But add a fourth, and we’re off to the races.

At this point, agents either spend more time waiting in merge queues, or even more time dealing with the consequences of not waiting to merge safely.

Those nice folks who make Cursor helpfully demonstrated what happens when agent “swarms” are let loose on the same code base at the same time.

Here’s the rub; because merges – to happen safely – have to go in single file, there are real hard limits to how many merges can happen in any period of time. And therefore real hard limits to how many people or agents can be changing the same code base at the same time.

And that’s just the coordination required to avoid breaking the build. That’s before we even think about coordinating over requirements, over architecture, over coding standards etc etc etc.

When I run team workshops, the size of the team is a pretty reliable predictor of which ones will successfully complete the exercise. A team of 4 has a big advantage over a team of 12.

Although I’ve yet to test this empirically, what I’m seeing and hearing is that the number of agents working in parallel on the same set of source files might follow a similar trend.

Indeed, the folks I’ve been watching experiment with agentic software development the longest appear to have landed on a single thread of execution as the optimal solution. Even when there are multiple specialised agents involved, they’re taking turns.

It’s quite possible that the limit of the number of agents working in parallel – without very high separation of concerns from the code other agents are working on – is just one.

When we spin too many plates, the result is usually broken plates.

The AI-Ready Software Developer #24 – Specification Is A Conversation

_Psst. _{If your boss won’t invest in training you in Specification By Example, I’m running out-of-hours workshops on May 12 and 16 specifically for self-funding learners. £99 + UK VAT.}

A sentiment I see often on social media about “AI”-assisted and agentic coding goes something along the lines of “If you’re just translating specs into code, your job is disappearing”.

It sounds reasonable on the surface, if you believe that’s all many programmers were doing. Someone – say, a product manager or an architect – hands the programmer a specification for a feature, and the programmer just “codes it up” like a pharmacist filling a prescription.

But was that ever really a thing?

In reality, most software specifications are incomplete and ambiguous, and often contain logical contradictions that are hard to spot – because of the incompleteness and the ambiguity.

Think of the movie script that contains the line “A huge battle ensues”. The studio asks “How much will that cost?” The producer has absolutely no idea, because that part of the script still needs to be written. The line’s just a placeholder for more work to flesh out the details. And in software development, just as in movie-making, the devil is in the details. That’s where the time and the money goes.

And that’s the reality of software specifications written in natural languages like English, even ones written by programmers. At best, they’re placeholders for conversations. Extreme Programming actually makes that explicit: a “user story” is not a requirements specification. It’s just a placeholder for a chat with the person who wrote it. That’s why it’s a waste of time making them detailed.

And this means that the programmer’s job is not just to “code up the spec”, it’s to figure out what the specification actually means. What exactly happens in this huge battle?

And because specifications are incomplete and ambiguous and often contradictory, this process inevitably has to walk everything back to figuring out what the need being addressed is in the first place.

This is why, for so many years, I drummed it into product managers, requirements analysts and the like to come to the team not with the “what”, and certainly not with the “how”, but with the “why” – what problem are we aiming to solve?

We then work as a team – leveraging our combined expertise in systems design and development and the problem domain – to learn together how to solve the problem through successive iterations of best guesses informed by rapid user feedback.

Now, I could be wrong, but that doesn’t sound like “just translating specs into code” to me.

The promise of this new generation of “AI” coding tools is that non-programmers will be able to iterate working software by themselves. And this is true, to an extent.

Tools like Claude Code and Cursor have proven themselves to be very useful for generating prototypes and proofs-of-concept with no programmer involvement, enabling business analysts, UX designers, product managers, start-up founders and pastry chefs to test simple ideas quickly and cheaply.

The problem is that without the expert judgement of experienced programmers, it doesn’t mature to reliable, scalable, secure software that stands up to real-world production rigours.

So, at some point, you’ll have to pick up the phone to Programmers-R-Us and get some involved if you want your experiment to scale. Have your cheque book ready!

And this is where the problems really start. You now have kind-of, sort-of working software that validates your idea. There’s a fork in the road here. You can either:

Have the programmers find and fix all the problems to make the prototype market-ready
Use the prototype as the specification and have the programmers build a production-quality version from scratch with, y’know, tests and architecture and stuff

Let’s go through Door #1.

So, the prototype sort-of works, but there are bugs – oh, boy are there bugs?! – and security vulnerabilities and performance bottlenecks and scaling blockers and some gone-off cheddar and discarded prams and all the kind of stuff that LLMs will tend to leave in your code if you let them. Which you did, because you can’t tell a switch statement from a discarded pram.

So the programmers need to test the software thoroughly to find all the usage scenarios where the software doesn’t do what it’s supposed to. And there’s the Catch 22. What is it supposed to do? If only there was a complete and precise specification!

We don’t fare much better with Door #2.

Now your programmers have to reverse-engineer the prototype to figure out what it does. What happens when the user leaves that field blank and clicks “Continue”? What happens when the clock strikes midnight and interest needs to be applied to the account? What happens when the vehicle remains stationary for more than 5 minutes? Edge case after edge case after edge case.

You run into the exact same wall. A complete and precise specification for any non-trivial software is made up of thousands of definitive answers to these kinds of questions. Software systems are the most complex machines we’ve ever built. You can’t specify them on the back of a cigarette packet.

Or, to put it the way a customer once put it to me when we had this discussion about a new feature, “Why are you making it so complicated, Jason?”

“I’m not making it complicated. That’s how complicated what you’re asking for is.”

The system has to handle all of these inputs in some meaningful way, otherwise it will break. If the user’s email address isn’t valid, a whole bunch of features won’t work. Are you happy for the system to just not work for those users?

Then, as it almost always does, the conversation turned into a negotiation about the scope and complexity of that feature for the next release. We can always remove one variable now, and add it in a later iteration. It’s an old physics trick (see: Special Relativity).

And this is why requirements specifications are placeholders for conversations. If there’s no conversation, issues will not get addressed by experts who understand them until much, much later when they’re much, much harder to fix.

This is why, as a tech lead, I almost always – when presented with a “requirements specification” as a fait accompli – pressed the “Reset” button and started the conversation again at “Okay, so what seems to be the problem?”

That’s Door #3 – involve programmers early. Because those conversations have to happen whether you like it or not, and the sooner you have them, the sooner you’ll converge on a workable, production-ready solution.

A simple prototype can help you validate your idea before you pick up that phone, but the more design decisions you make before involving experts, the bigger and badder the catch-up’s going to be later. And you might be surprised – when you have a clear end goal in mind – how simple the simplest proof-of-concepts can be.

I’ve been in this game for 34 years, and in that time I’ve seen countless attempts to demarcate this process of building an understanding of not just what the software needs to do, but why it needs to do it.

They all inevitably walk into the same wall. You cannot pay someone else to understand something for you. It’s like paying someone to revise for your exams.

Software specification is necessarily a conversation between people with needs – and, ideally, money – and people who specialise in meeting needs using computers. T’was ever thus, t’will ever be.

Unless, of course, your specification is complete, consistent and mathematically precise.

And a complete, consistent, mathematically precise specification of a computer program is that computer program. That’s what source code is, and that’s why programming languages were invented.

A person who just translates complete, consistent and mathematically precise specifications into executable code is a compiler.

Fans of Spec-Driven Development may be feeling vindicated now because you believe your specifications are complete, consistent and precise. If you’ve clarified requirements using examples – to me and you, tests – that might push them towards being of that integrity.

But even if your specs really are completely complete, and completely consistent and completely precise – and even if LLMs were capable of reliably translating such specifications into code (which they’re not) – you need to remember that it will still be full of assumptions about what’s really needed. Basically, a formal specification is just formalised guesswork.

To quote the Second Doctor, “Logic, my dear Zoe, merely enables one to be wrong with authority”.

The real knowledge isn’t in the spec, or in the code, it’s in the feedback we get when people use it in the real world. In this sense, iterating is the ultimate requirements discipline – it’s where most of the real value gets discovered.

So, by all means, spec away. But don’t spec far – just enough to test an assumption with user feedback from working software. And user feedback’s like code reviews – the more changes we ask for feedback on, the less attention gets paid to most of them.

Research has found that when users give feedback, they often anchor on one or two standout moments – positive or negative – rather than the entire user experience. Psychologists call it the “peak-end rule” – it’s “LGTM” for user eyeballs.

Spec one change to functionality at a time, build it in rapid, tested iterations, ship it through a reliable delivery pipeline, and then go get that focused feedback. Because the spec very probably will need to change.

And if the spec rarely changes, I’d worry that we aren’t listening to our users. Either that or we got incredibly lucky (or clairvoyant).

It’s all one big, ongoing conversation.

The AI-Ready Software Developer #23 – Speaking Clearly

_Psst. _{If your boss won’t invest in training you in Test-Driven Development, I’m running out-of-hours workshops on April 7 and 11 specifically for self-funding learners. £99 + UK VAT.}

I see a lot of wrongheaded takes on “AI” coding assistants and agents online, but one of the most misguided goes something along the lines of “Code doesn’t need to be easy for humans to understand anymore”.

Presumably, these people have never stopped to ask themselves why they’re called “language models”, but let’s entertain their chain of thought for a moment.

The theory is that humans won’t need to understand the code generated by LLMs (wrong!) because they won’t need to edit the code themselves (wrong!), and so code should be optimised for things like token efficiency, not readability (WRONG!)

Some even go so far as to propose token-optimised programming languages designed by and especially for LLMs.

Working backwards through this sequence of compounding errors quickly unpicks the logic.

First of all, even if LLMs could design a general-purpose, Turing-complete programming language from scratch – which they can’t – it would require huge numbers of examples to train them to work with such a language. It might work for small DSLs, but languages like Java and Go are a whole other ballpark.

Without that large corpus of training data – showing examples of every aspect of the language many, many times – every request to generate code in it would be out-of-distribution. There’s a good reason why Claude, GPT, Gemini etc are better at generating code in popular languages. So, are we going to write those millions of examples? In a non-human-readable programming language?

Secondly, when you look at the tokens in code, how much of it is language syntax – reserved words, semicolons and so on – and how much of it is names that we have chosen for the things in our code to make it more self-descriptive?

“Ah, but Jason, AI-generated code doesn’t need to be self-descriptive, because we don’t need to understand it.”

I have bad news for you on that front, I’m afraid. Research – both my experiments and larger-scale studies – have found that the less clear code is to human readers, the less clear it is to LLMs. When code is obfuscated, models make more mistakes interpreting it.

This should come as no surprise. Language Models – the clue is in the name. What did folks think they were matching patterns in?

Human-readable code is model-optimized code.

LLMs don’t “understand” code like compilers do, they pattern-match on language.

So:

Meaningful identifiers = stronger semantic anchors
Consistent naming = better pattern recall
Verbose clarity = richer signal

And, just as it’s incredibly helpful on a software team if everybody’s speaking the same shared language to describe the problem they’re setting out to solve – including (especially) in the code itself – it’s also essential that we use consistent shared language in our prompts and in the code. If I ask Claude to change the “sales tax” calculation logic, and in the code the function’s called ‘v_tx’, we’re probably not going to get a glowing result.

Confusion of Tongues: The Construction of the Tower of Babel, Lucas van Valckenborch, 1594

And then there’s the take that humans won’t need to edit the code. Some even go so far as to say that LLMs will be generating machine code directly, bypassing the human-readable source.

The fact of the matter is that LLMs are unreliable. Very unreliable. I’ve documented the common experience of agents like Claude Code getting stuck in “doom loops”, when a task or problem goes outside their training data distribution, and they just can’t do it. This is a fact of life when we’re working with the technology, and the general consensus – even among the hyperscalers – is that it’s an unfixable problem at any achievable scale.

There will always be a need for the human in the loop, and there will always be a need for that human to understand the code.

If it’s token efficiency you’re after, the smart thing to do is to focus on reducing the probability of mistakes and unnecessary rework. Deliberately obfuscating our code is going to work against us in that respect.

The Mouse That Roared

_Psst. _{If your boss won’t invest in training you in Test-Driven Development, I’m running out-of-hours workshops on April 7 and 11 specifically for self-funding learners. £99 + UK VAT.}

Once upon a time, in a land far, far away (just north of London Bridge), there was a medium-sized financial services company with a problem. A big problem.

For 9 months, they had been trying to deploy a new release of their core platform, and every single attempt had exploded in their – and their customers’ – faces.

Teams of testers worked round the clock trying to find the bugs. Developers worked 12-hour days, 6 days a week trying to fix them. But the software just kept getting more and more broken. (As did the teams.)

Because, while the developers were fixing the bugs, the changes they were making were introducing new bugs – along with reintroducing some old favourites – that the testers weren’t finding until weeks later, if they found them at all. Users were 2x more likely to find a bug than the testers, and the stability of the platform was so bad that every release ended up being rolled back within a couple of days.

They were beached.

I’ve mentioned before that the DevOps Research & Assessment classifications of software delivery performance need a new level below “Poor”, which I called “Catastrophically Bad“. This is when every deployment fails, problems can’t be fixed – just reverted – and lead times are effectively infinite. Nothing’s getting delivered – well, nothing that sticks – and there’s no light at the end of that tunnel.

In desperation, an army of consultants and coaches was brought in who – at vast expense – set about “fixing” the process. Backlogs were groomed. Daily meetings were attended. Velocities were tracked. Estimates were tweaked. Test plans were optimised. Graphs and charts were prominently displayed.

And at the end of all that, the consultants and coaches confidently concluded, “Yep, you’re basically f***ed.” They exited the building with… well, let’s just say there wasn’t a leaving card. At least now they had the graphs to prove it. Money well spent, I’m sure we’d all agree.

That’s where I came in. I politely declined to engage with management on their “agile process” – as I have for many years now with all my clients.

Instead, I asked the developers to change one thing about how they worked. I took them into a meeting room, and showed them how to write NUnit tests.

In the next couple of weeks, we wrote some system-level smoke tests that they could run a few times a day just to provide some basic assurance that the thing wasn’t totally borked. Which, for a long time, it was.

Then I instructed them to write an automated “unit” test before any change they made to the code. Fixing a bug? Write a test for it first. Making a change to the logic? Write a test for it first. Refactoring the architecture? Write a test for it first. (And if you can’t, then refactor it with careful manual testing until you can, and then write a test for it.)

It took a couple-or-three months, but as the unit test coverage gradually increased – and in the areas of the code that were changing and breaking most often – the software stabilised enough to finally produce a release that stuck; a first foot forward in over a year.

The second foot forward – the next stable release – came a month after that, and a third a month after that. The platform – and the business built around it – was walking again.

This is where interesting things start to happen to the overall development process. With releases now happening fairly incident-free once a month instead of, well, never, the “agile process” adapted around that. Product managers were now thinking about next month more than they were thinking about next year. (Specifically, where they might be working next year.)

Again, I was invited to engage with them, and again I politely declined. I wasn’t done with this innermost feedback loop yet.

Tests covered about 40% of the code – basically, the code that had been changed in those 3 months – but that coverage was growing every day. By the time it got to around 80% six months later, release cycles had accelerated from monthly, still with a significant reliance on manual regression testing, to weekly, with a small amount of manual testing. And the backlog had been paid down to the point where lead times were in the order of a couple of weeks.

And while the test assurance grew, and the architecture became more modular to accommodate it – which significantly reduced the “blast radius” of changes and merge conflicts – lead times shrank even further.

And, again, the wider process adapted to this new reality, shrinking user feedback loops and becoming more willing to gamble and experiment, knowing that they were now placing much smaller bets.

The management process at this point looked dramatically different to how they were doing things when I arrived. And I hadn’t engaged with it at all, beyond recommending a couple of books about goals.

The organisation had evolved from plan-driven, big-bang releases (that always went bang), to iterative, goal-driven releases that enabled experimentation and learning.

And that all happened in response to shrinking lead times, enabled by accelerating release cycles, made possible by very fast, very frequent automated testing.

We didn’t need to ask for extra budget. We didn’t require software licenses. We didn’t need management to engage or give permission. We didn’t ask anybody outside of the dev teams to change anything. We just f***ing did it – this one little change to how the teams wrote code – and the entire organisation adapted around it.

In the proceeding years, that innermost feedback loop got tightened even more, with more investment in skills and automation. And, of course, the wider processes changed, as did the makeup of the teams, but not because we demanded they should. The delivery cycle accelerated, and the business adapted around that. They became more goal-oriented, more feedback-driven, and more experimental, and the organisation started to reflect that.

They now release changes multiple times a day, testing one change in the market at a time, and rapidly feeding back what they learn. Or, as you may know it, actual agility.

Sure, it took a few years for them to get there. But consultants and coaches had spent a year effecting no change at all at a very high cost. I spent a few days a month with them for a couple of years.

They made the mistake that almost all agile transformations make: they try to improve software delivery capability without actually engaging with it. You can’t plan or manage your way to daily releases.

You’re looking at the wrong feedback loop – the wrong cog in the clockwork. The big cogs aren’t driving the little cogs. The little cogs are driving the whole system.