The A-Z of Code Craft – I is for Iterative

Image

A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work.

John Gall, Systemantics: How Systems Work And Especially How They Fail (1975)

Software developers have known since pretty much the start that getting a solution of any appreciable complexity right in a single pass is nigh-on impossible.

That should come as no surprise. The chances of one line of code being spot-on might be reasonably good. But 100 lines? 100,000 lines? 1 million lines? The odds are so stacked against us that they’re effectively zero.

We like to think of software as machines, but the complexity of modern software systems is more comparable to biology. We’ve never built machines with that many moving parts.

Nature has a tried and tested way of solving problems of this level of complexity, though: EVOLUTION.

As Gall notes, we don’t start with the whole all-singing, all-dancing version. We start simple – as simple as possible – and then we iterate, adding more to the design and feeding back lessons learned from testing to see if the software’s fit for purpose. Importantly, every iteration of the software works (and if it doesn’t, git reset –hard). Complexity emerges one small, simple step at a time.

When we approach it like this, the emphasis in software developments shifts profoundly, from delivering code or delivering features, to learning how to achieve users’ goals and solve problems. This is why frequent small releases of working software, designed in close collaboration with our users, is so very important.

It’s also why the most effective teams are always keeping one eye on the prize, continuously – there’s that word again! – revisiting the goals and asking “Did we solve the problem?” If the answer’s “No”, and it usually is, we go round again, feeding back what worked and what needs to change into the next small release. The faster we iterate, the sooner we solve the problem.

Rapid iteration of solutions is no small ask, though. If want to put working software in the hands of users, say, once a day, then that software needs to be tested and integrated at least once a day (and probably many times a day). Once we pull on that thread, a whole set of disciplines emerge that some of us call “code craft”.

And so here we are!


If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, my Code Craft and Test-Driven Development live remote training workshops are HALF PRICE until March 31st 2025.

The A-Z of Code Craft – H is for Hexagonal

Image

In yesterday’s “G is for GUI“, I talked about the benefits of cleanly separating the logical design of the user interface from the implementation details of its representation (e.g., separating the logic of applying for a mortgage from the web page the end user clicks the buttons on).

There’s a wider principle at work here. Separating the core logic of our systems from implementation details like the operating system, the file system, network operations (e.g., web service calls), databases and wotnot is, it turns out, a very good idea.

HTML is one of those implementation details, as are front-end frameworks like React and Flutter. Applying for a mortgage is one concern, divs and spans are entirely another. Remembering what customers have ordered in the past is one concern, storing and retrieving orders from an Oracle database is quite another.

An architectural metaphor that seeks to provide a clear distinction pictures the software’s design as a hexagon. Inside the hexagon is the essential core logic. The sides of the hexagon represents clearly distinct implementation details – separate concerns – like the UI, the file system, databases, user authentication, and so on. (In the real world, of course, there may not be exactly six sides to your “hexagon”).

The goal of Hexagonal Architecture (sometimes referred to as the “Ports & Adapters” architecture) is to make it so we can easily change one aspect of the implementation – e.g., switch to a different database – without having to change any of the other aspects.

With this in mind, the different components of the system need to be loosely coupled. We don’t want, say, the UI depending on the business logic directly, nor do we want the business logic depending directly on the database. Incoming requests are handled by “ports” that forward messages or events to the right part of the core logic, and outgoing requests (e.g., database operations or web service calls) are directed through “adapters” that present our own interface – so we decide when it changes – to that external dependency.

When we consider architecture at scale, we need to remember that these separate concerns – UI, databases, web services, etc – are their own hexagons. Each has its own essential core logic (the thing that it’s about), each has its own external dependencies, and its own ports and adapters that the developers of those components concern themselves with.

It’s important not to get too hung up on the metaphor, though. Like “layered”, or “onion”, or “clean” architecture, it’s just a way of reminding us to cleanly separate the different concerns in our software, and in particular to keep implementation details like UI frameworks and databases firmly at arms length from the essential core logic.

And let’s not forget that, as software grows, there’ll likely be multiple separate concerns in the business logic itself. Managing communications with customers and fulfilling customer orders might benefit from a clean separation of concerns, even if they’re implemented in the same monolithic component. It’s not unheard of to have hexagons inside our hexagons – turtles all the way down!


If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, my Code Craft and Test-Driven Development live remote training workshops are HALF PRICE until March 31st 2025.

The A-Z of Code Craft – G is for GUI

Image

There’s a patch of grass near my house that doesn’t belong to any of us living on the street. Nobody mows it. We don’t even think of it as a “lawn”, and we don’t care for it. But we often complain about it. It causes problems.

In software, too, there can be patches that don’t receive the same attention as the rest of the source code. We might not even think of them as “code”. Historically, graphical user interfaces have suffered from this lack of care.

It’s not uncommon to find “front end” code lacking in fast automated tests, for example. Unsurprisingly, we often find that front-end code is broken as a result.

And this is usually because front end GUI code can be hard to test in isolation. This is almost always because of a lack of separation of concerns in that part of the architecture. (See “E is for Encapsulation“).

Developers will even tell us that “unit” testing GUI code isn’t possible. But this is almost always not the case (although some front-end frameworks don’t make it easy, it has to be said).

When we review that code, we’ll usually find that display logic (how is the mortgage interest rate formatted?), system information (what is the mortgage interest rate?), user interaction logic (what does it mean when I click the button labelled “Calculate Interest”?), and core “business” logic (how is the mortgage interest rate calculated?) are mixed in together in the same UI module.

This makes it nigh-on impossible to test display rendering, system state, user interactions and core logic separately. Basically, to test that the interest rate’s calculated correctly, somebody/something has to click the buttons and check the outputs that are being displayed.

A front-end architecture that more cleanly separates these concerns makes it much easier to write fast tests for the majority of that code. The idea of a “view model”, in particular, can enable us to capture the logic of the user’s journey without mixing that up with the details of how that experience is actually represented on the screen.

We can unit test logically what happens when the calculateInterest() function of the MortgageApplicationView is invoked. We don’t have to load the web page and we don’t have to click the “Calculate Interest” button to check how the system responds. Rendering for, say, the browser is another step beyond that; an implementation detail we want to hold at arm’s length. If we’re smart we can also unit test how the MortgageApplicationView is rendered as HTML in a separate test.

Some devs might say “But our front-end framework has MVC/MVP/view models”. Great! But if we rely on those to capture our logical user experience, we’re tying it closely to that framework. React, Vue.js, Flutter and other UI frameworks are implementation details. We don’t want to mix our logic with implementation details. UI logic should be POxOs (Plain Old Java Objects, Plain Old Python Objects etc) so we that have full control over them.


If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, my Code Craft and Test-Driven Development live remote training workshops are HALF PRICE until March 31st 2025.

The A-Z of Code Craft – F is for Freedom

Image

An anti-pattern I see often in software development is an organisation taking an A team – highly skilled, highly motivated – and somehow managing to squeeze a D performance out of them.

The problem is almost always that they won’t let the A team bring their A game. The team are micromanaged, often with dysfunctional processes and practices and other constraints imposed on them by management.

As a trainer and mentor of dev teams, I sadly often hear complaints like “We want to write more unit tests, but our manager won’t let us” or “We’re not allowed to push directly to the trunk” or “They don’t let us talk to the end users”.

Now, I would say “Why did you ask for permission?”, because I’m that kind of a developer. I will assume I was hired because they thought I’d do a good job (although I have been hired specifically not to do a good job, but that’s office politics for you). So I’m going to try to do the best job I can, the best way I know how.

And that means I’m going to continuously test the code. I’m going to continuously integrate my changes. I’m going to talk to end users. A lot.

One common characteristic of high-performing development teams is they have a considerable amount of autonomy to do what needs to be done to get good results for their customers. And here’s where it gets a little tricky.

Most people in positions of authority don’t willingly give away decision-making power. They find autonomy threatening. For all the talk of “servant-leaders”, they’re rare in reality.

Most often, when dev teams have high autonomy, it’s not because it was given, it’s because it was TAKEN.

But there are two sides to this bargain. If we get to work the way we think we should work, we have to deliver the goods. Teams who don’t deliver – perhaps because they’re being micromanaged – tend to get micromanaged even more closely. It’s a paradox. We have to earn the trust to be given the trust to do what needs to be done to earn that trust.

Our confidence – now there’s a word! – that we can deliver, and our ability to assert ourselves within the process, is a key enabler of achieving necessary levels of autonomy. It’s a quality A teams need to play an A game. Sorry folks. That’s the reality.

The mirror universe version of all this is those developers who insist on doing things badly. The thing about D developers is they can often genuinely believe they’re playing an A game (when I was new to the industry, I was inexperienced enough to believe this, too – I just didn’t know what an A game looked like).

Skills and education are pivotal here, as is a more democratic approach to setting the direction within the team. As much as I’d love to believe that my strong leadership was what delivered, I have to remind myself that – in reality – it’s the TEAM who deliver, and they do it the best way they can agree on.


If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, my Code Craft and Test-Driven Development live remote training workshops are HALF PRICE until March 31st 2025.

The A-Z of Code Craft – E is for Encapsulation

Image

Imagine you’re a pastry chef working in a professional kitchen. For some reason, the utensils you use to make pastries and cakes aren’t kept on your workstation. The rolling pins are kept on the meat workstation. The cookie cutters are kept on the fish station. The pastry brushes are kept on the saucier station. To do your job, you spend much of your time going to other chefs’ stations, and your workflow has to change every time they reorganise their stations.

A more efficient kitchen design would store the rolling pins, cookie cutters and pastry brushes on the pastry station, giving you the tools you need to do your job, and freeing you up from needing to know the details of other chefs’ workstations.

The technical term for this in software design is “encapsulation”.

In software design, data and dependencies are the “utensils” used by modules to fulfil their responsibilities. When a module needs to know something we call that “coupling”. When the knowledge a module requires to do their job is internalised inside that module, we call that “cohesion”. THINGS THAT CHANGE TOGETHER, BELONG TOGETHER.

A good modular design is said to be “cohesive and loosely-coupled”, making it easier to change one part of the system without having to change other parts. Changing how loan repayments are calculated doesn’t affect how, say, interest rates are calculated. They are SEPARATE CONCERNS.

Separation of concerns has a profound impact on our ability to change, test and reuse software. If coupling between modules is high, changes can ripple out along the couplings, causing the smallest changes to have wide-reaching effects. If the module we want to reuse is tightly coupled to other modules, we’ll need them, too – buying the whole Mercedes just to use the radio! If loan repayments and interest rates are calculated in the same module, we can’t test repayments without involving interest rates. If the module that calculates interest rates is a web service, our repayments tests are going to be slow.

Encapsulation and separation of concerns applies at every scale in software design, from individual functions to systems of systems. At larger scales, the cost of coupling rises by orders of magnitude. Tightly-coupled classes are a pain. Tightly-coupled web services kill businesses every day.

Finally, consider also how encapsulation might be applied to TEAMS. What impact does it have when, say, the user experience designer on a product’s placed on a separate team?


If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, my Code Craft and Test-Driven Development live remote training workshops are HALF PRICE until March 31st 2025.

The A-Z of Code Craft – D is for D.R.Y.

Image

“Don’t Repeat Yourself” is a widely misunderstood, often misapplied, and consequently much-maligned principle in the design of software.

While it’s true that repetition in code can hurt us, by multiplying the cost of change, it’s by no means the worst thing we can do. Indeed, sometimes repetition can help us if it makes code easier to understand. (If you refactor code to remove duplication, stop to ask if that’s made it harder to follow. If it has, put the duplication back!)

But that’s not what D.R.Y. is really about. Think of it this way: what’s the opposite of duplication? REUSE.

When we see multiple repetitions of a similar thing – be it copied-and-pasted code, or a repeated concept that appears in multiple places (I remember one application that had 3 Customer tables in the database, each created by different people for different features) – that’s a hint about what our design needs to be.

When we refactor to consolidate, we discover the need for reusable abstractions like parameterised functions or shared classes. Duplication points us towards potential modularity.

This is an evidence-based approach to design. We don’t speculate that a function might be reused, we see where it will be reused; we see the need for it in the current code.

Duplication in code can act as bread crumbs leading us to a better design and to genuinely useful – because they’re being used – reusable components. Removing duplication is where some of our most popular libraries and frameworks came from.

As for taking it too far, it’s certainly true that jumping on too quickly can produce over-abstracted code, and a much higher risk of choosing the wrong abstractions. The more examples we see, the more likely an abstraction is to be both the right one, and to actually pay for itself in the future

But let the duplication build up, and the refactoring’s going to take longer. In the zero-sum game of software development, things that take longer are less likely to happen, so we need to strike a balance.

The “Rule of Three” is a rough and ready guide for how many examples we might want to see before we refactor. Sometimes more, sometimes fewer, but on average, around three.

Scale is also a factor here. Reuse creates dependencies. If those cross team boundaries, it really needs to be worth it.

Don’t forget, either, that repetition also applies to not just our code, but our process for creating it. Automating repeated tests (regression tests) is a good example how refactoring duplication of effort in our process can streamline delivery.

Be mindful, though, that just as over-abstraction is a risk in refactoring duplication code, over-zealous automation is a risk in refactoring duplicated effort. I’ve worked with teams who have so many scripts and custom tools that it takes weeks or even months for new joiners to get up to speed, and some of those tools saved them less time and money than they took to create and maintain.


If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, my Code Craft and Test-Driven Development live remote training workshops are HALF PRICE until March 31st 2025.

The A-Z of Code Craft – C is for Continuous

Image

Old-fashioned approaches to creating software often encourage us to think of the activities involved as stages or phases in the process: the design phase, the coding phase, the testing phase, the integration phase, the release phase, and so on.

This approach has some major drawbacks. In fact, many of us have found that it simply doesn’t work on problems of any appreciable complexity.

The moment we start writing code, we see how the design needs to change. The moment we start testing, we see how the code needs to change. The moment we integrate our changes, we see how ours or other people’s code needs to change. The moment we release working software into the world, we learn how the software needs to change.

Around and around we go, feeding back our lessons into a never-ending continuous cycle of designing, coding (which I might argue is also designing), testing, integrating and releasing. The lines between these activities become very blurred. If I’m writing a failing test in a test-driven approach, am I designing, or am I coding, or am I testing? When I’m refactoring code, am I designing, or coding, or testing?

The correct answer is: YES.

And if we work backwards from the goal of having working software that can be shipped at any time, we inevitably arrive at the need for continuous integration, and that doesn’t work without continuous testing, and that doesn’t work if we try to design and write all the code before we do any testing. Instead, we work in micro feedback loops, progressing one small step at a time, gathering feedback throughout so we can iterate towards a good result.

But the continuous loops don’t end there. To ensure the software’s open to change, we also need to be continuously reviewing or inspecting the code. And to get the bigger picture right as the software grows – considering how the pieces of the jigsaw fit together – we need to be continuously architecting our products and systems on that larger scale.

And, finally, to be capable of doing these things well – abilities none of us are born with – we need to be continuously learning. In each nugget of feedback, we can see things that went well, and things we could do better. Rather than saving it all up for a “post-mortem” after a major release, and trying to change 1,001 things in our approach – which never works out! – we need to act on that feedback throughout the process, evolving our approach one lesson at a time.

Some, myself included, might say that if code craft could be crystallised in one word, that word would be “continuous”.


If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, my Code Craft and Test-Driven Development live remote training workshops are HALF PRICE until March 31st 2025.

The A-Z of Code Craft – B is for Builds

Image

Automated software builds, where the product’s prepared for a potential release from the source files, are a central part of code craft. 

Every time developers merge their code to the main (“trunk”) branch, an automated build’s triggered that checks out the latest version of the code, resolves its dependencies, compiles the code if necessary, and runs the automated tests to make sure it all works in the build environment (and not just on the developer’s machine).

If all is good, and all the tests – and other possible quality checks, like code linting – pass, then we can have confidence that the current version of the code sitting on the trunk could be shipped if we wanted, though that might require further steps like containerisation.

It’s important that our build “pipeline” – the sequence of steps performed in an automated build – contains sufficiently robust quality gates to give us that confidence, or issues may leak into production. 

As well as constructing a shippable version of the software from the source code, we can also think of automated builds as being like passport control at an airport departure gate, preventing software getting on the release plane if it’s likely to present a problem.

If any tests or checks fail during an automated build, we say that the build is “broken”, and the software’s blocked from being released. This makes it everybody’s problem, so it’s important that broken builds are fixed quickly, or the code on the trunk is rolled back to the previous working build so the team can carry on delivering value.

The best developers are very “build-aware”. They keep one eye on the status of the build, because it signals changes to the code base being made by other people on the team. If a build succeeds, they’ll get those latest changes and merge them into their local copy to keep in sync. If a build fails, they know it’s not safe to merge their changes into the trunk, or to get changes from the trunk, until the build’s fixed.

The execution time of builds has a profound effect on delivery lead times, due a phenomenon known as “short delay, long queue”. This is why performing builds manually isn’t a realistic option in continuous delivery. It’s very often the case that speeding up builds will increase agility at the team level.

_____________________________

If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, my Code Craft and Test-Driven Development live remote training workshops are HALF PRICE if delivered by March 31st 2025. 

The A-Z of Code Craft – A is for Agility

Image

When the idea of software development as a craft became popular, a lot of the talk was about “professionalism”, “mastery” and “beautiful code”. But this missed the point of craft, and arguably helped to alienate business stakeholders, giving them the impression software craft was some kind of esoteric pursuit with no value to businesses.

Nothing could be further from the truth. When we shift the focus from what craft means to us as developers, we can consider how practices like continuous testing, refactoring, modular design and continuous integration impact businesses.

The disciplines of code craft enable continuous delivery – the ability to ship working software that does what the customer wants at any time, and to do that reliably and sustainably.

No delays. No need for an “acceptance testing” or “stabilisation phase”. No downtime or frantic bug fixing after release. The software’s what was agreed, and it’s ready to go at the push of a button. Again, and again, and again, for as long as the business needs.

This dramatically shrinks lead times on new features and changes, making software much more responsive to changing business needs.

The mistake a lot of organisations have made is believing that the key to agility is what they’ve heard of as “Agile” – Scrum, Lean, Kanban etc. Nu-uh! The key to /real/ agility is code craft, and you build your “Agile” processes around those technical disciplines.

Because if the software’s not fit for release at any time, all the stand-up meetings, burndown charts and Jira tickets won’t amount to more than “Agility Theatre”. This is why so many “Agile transformations” failed. You can’t manage your way to software agility without building that capability in your dev teams.


If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, my Code Craft and Test-Driven Development live remote training workshops are HALF PRICE if delivered by March 31st 2025.

The LLM In The Room

Image

Over 2 years ago, the at-the-time not-for-profit research organisation OpenAI released a new version of their Large Language Model, GPT 3.5, under the friendlier brand name of ChatGPT, and started a media and market frenzy.

This was arguably the first time a chat interface could genuinely fool users into believing it was a person, and there was much talk about the age of “artificial general intelligence” and even “super-intelligence” now being upon us. Many pundits predicted the end of knowledge workers like lawyers, doctors, and – of course – software developers within a few years.

Naturally, this was a claim I had to check out for myself, so when GPT-4 was released a few months later, I signed up for the “Pro” version of ChatGPT to get (limited) access to it and started to experiment in various problem domains, including programming and software development.

Like millions of people, I was initially very impressed with GPT-4 (not so much with 3.5, I have to say). But as I started to try to actually do things – specific things – with it, its limitations became more and more apparent. While it is indeed remarkable that what is essentially a predictive texting engine can write Python or Java or C# that actually compiles – let’s not take that away from OpenAI – the actual code itself was less impressive.

In fact, it was often not acceptable at all. LLMs – and generative transformers more generally – are not very good at specifics. An honest marketing slogan for the technology might be “Impressive, but wrong.”

I found myself having to double-check everything, correct more than half the code, and routinely ended up having to “coach” GPT-4 to get half-decent results that didn’t look like they were written by an intern in a hurry. This often took longer than if I’d just written the code myself. As I’ve evaluated each new model, this has stubbornly remained the case.

No doubt in the intervening 20 months, “A.I. coding assistants” have improved, and I’ve been keeping a close eye on new models as they’ve been emerging to see just how much improved. In January 2025, we’re still at a point where LLM-generated code needs double-checking, correcting and refactoring too often to make them usable on anything beyond small one-shot “How do I…?” tasks. They are – as of today – at best, conversational interfaces to code examples included in their training data. They’re an improvement on Stack Overflow searches.

Hyperbolic claims by some of achieving 10x or even 100x productivity with these tools, or of non-programmers creating complex working products with them, like reports of flying saucers, have a tendency to evaporate on contact with reality. As yet, I’ve not seen a shred of hard evidence to back them up.

More tempered claims of modest productivity gains, backed up by hard data (i.e., not surveys of how productive devs feel LLMs are making them), paint a very ambiguous picture. Maybe they help a little. Maybe they don’t. Programming’s such a small part of software development that even if they did speed it up 10x – which at this point I’m confident they don’t – that’s a 90% saving on 10% of the work. There’s even hard data to suggest that, at the team level – and that’s where the productivity rubber meets the road – extensive LLM use can actually have a small negative impact. More code faster != more value sooner. I try to bear in mind that the feeling of productivity can often be deceptive. (For every “I stayed late and wrote a tonne of code uninterrupted” story, there’s usually at least four more “I spent the whole morning trying to understand 100s of changes some dude had pushed the night before” stories.)

One obvious long-term risk of having a big chunk of your code generated by “AI” at speed is that a team’s understanding of their code base will run away from them, creating a kind of “comprehension debt” that seems likely to significantly increase the cost of fixing problems that the LLM can’t fix. We should keep an eye on the Mean-Time To Recovery of businesses who proudly claim that a growing percentage of their code’s “AI-generated” (presumably to impress investors).

Now, a conversational interface to gazillions of code examples – a kind of Stack Overflow++ – is not to be sniffed at. Good for them! But what it most certainly is not is a replacement for actual software developers. Not even close. But outside of our profession, the confident pronouncements by CEOs and pundits in the media that they are has been doing real damage to the industry.

As “software developers”, they remain stubbornly not good enough. It would appear that this is an un-fixable problem, no matter how much training data and compute they throw at it. Pattern matchers are gonna pattern-match!

At some point, even investors, executives and commentators are going to be confronted with the reality that this technology hasn’t replaced any software developers. If anything, all the low-quality code these tools are churning out is creating a Mount Everest of technical debt that will require even more developers to keep the wheels on their enterprises turning in the future.

At this point, someone usually says “Ah, but Jason, maybe they’re not good enough now, but what about future models?” And this is where we all place our bets.

Some, like Microsoft, OpenAI and Nvidia, are betting that model performance is just going to keep improving until we reach AGI and beyond, even if we have to burn the planet to get there. This is their “growth story” upon which their current stock prices – riding at record highs – are based. If it’s not true, their stock prices will plummet back to what they were before this current “A.I.” bubble started to inflate. That’s trillions of dollars wiped off the NASDAQ. So there’s a lot very wealthy people with a very big interest in it turning out to be true. This is the biggest bet in history.

So anything that one of these models does that kind of sort of looks like AGI – in a certain light, from a distance, if we squint – is leapt upon as evidence that the Singularity is upon us, and that we should all start digging bunkers and buying canned goods in preparation for the inevitable Butlerian Jihad.

I’m skeptical of that. These claims are usually supported by A.I. performance benchmarks, and the models can be trained and fine-tuned to do well in these standard tests. There’s no shortage of training data.

And when I say “well”, I mean not as well as a human expert, but better than the average Joe. And while the gap closes little by little, that “little” seems to get “littler” with each new iteration. I speculated that transformer performance would converge on not-quite-good-enough. Needs more work. See me after. Not so much “super-intelligence” as “super-mediocrity”. Yes, it can write code, but not good code. Yes, it can play chess. Just not well. And so on.

The strength of LLMs is that they are not-quite-good-enough at very many text-based problems. But commercially, what’s the value proposition here? A not-quite-good-enough programmer that is also a not-quite-good-enough tax lawyer? An under-performing car that can also bake cakes is still an under-performing car.

And even as LLMs inch forward, there’s also the cost to consider with each new model. At $20 per month for ChatGPT Pro, OpenAI were losing money hand-over-first. The price of the new plan is ten times that. And they’re still burning through enormous amounts of investor cash. Executives at OpenAI have recently been floating the idea of a $2,000/month plan. But would they break even at that price? Reports that a single task performed by the newest model, in “high-compute” mode, can cost thousands of dollars, and still fall short of expert performance, makes me wonder if the final destination of all this research, all this fanfare, and all this MONEY, might be a world where human experts are both the better and the cheaper option. That would be very funny. I would laugh a lot as the world economy collapses!

Much has been made of the idea that the newest models can follow and evaluate multiple “chains of thought”, and there seems little doubt that this improves their performance in benchmark tests. I’m not at all convinced that this is, as the makers claim, “reasoning”.

There’s also the question of what these models are evaluating their “chain of thought” against. What’s telling them that this is the right maths answer, or the best chess move, or the right Python code? How could a language model know?

I wonder if OpenAI are, in these cases, using their LLMs as interfaces to, say, maths programs, or chess programs, or Python testing or linting tools. And is that “artificial general intelligence”, or is that a natural language interface to point solutions; application-specific intelligence?

And after all that, the end results are still not-quite-good-enough, even with oceans of computing power thrown at the problem.

I don’t have a crystal ball, so this is just a bet. And I’m betting that LLMs will eventually – once decision makers finally see the tiger in the Magic Eye picture of generative A.I. – find their natural fit in the world as very impressive conversational natural language interfaces. The question that follows is: natural language interfaces to what, exactly? And in many cases, the answer is: something we haven’t figured out how to build yet.

So, back into A.I. winter we go, until the next major breakthrough. Perhaps next time, businesses will have been so badly burned by the crash – we’ve never seen tech hyped on this scale before, and it’s distorting everything – that they’ll think a little more critically about claims of “A.G.I.” and “super-intelligence”.

I’d like to think that investors and executives, unlike LLMs, are capable of learning from experience and applying a little dynamic reasoning next time around.

In the meantime, we – software developers, and the businesses who rely on us – have a looming pipeline problem of potentially epic proportions. Businesses who’ve stopped hiring and training entry-level developers because “GitHub Copilot can do what they do” are going to find out what happens when nobody plants tomatoes because “Hey, who needs tomatoes? We’ve already got pasta sauce”.

Combine that with a backlog that stretches to the Moon of real business problems neglected while “A.I.” has been sucking all the oxygen out of the room, and a planet-sized amount of LLM-generated technical debt, and you have the perfect storm.

When that happens, I’ll be here if you need me, shopping for superyachts 🙂

NB: For those thinking “Yes, but what about the environmental and ethical impact of LLMs” As a paid-up member of the Green Party, I’m right there with you. But my argument isn’t aimed at people with a track record of making business decisions on ethical grounds. We don’t live in that world any more (if we ever did).