software development – Codemanship's Blog

It’s the System, Stupid!

Since this “Age of A.I.” arrived in late 2022, something’s been nagging at me. As more and more data rolls in, we see an apparent paradox emerging where “A.I.” coding assistants are concerned.

Individual developers report productivity gains using these tools (though many also report significant frustrations with, for example, “hallucinations”).

And at the same time, data clearly shows that the more teams use them, the bigger the negative impact on team outcomes like delivery throughput and release stability.

How can both these things be true?

We have one very plausible candidate for a causal mechanism, and it’s an age-old story in our industry.

When programmers get a feeling that they’re getting things done faster, they’re often only considering the part where they write the code – particularly when that’s their part of the process.

What they’re not considering is the whole software development process, and especially downstream activities like testing, code review, merging, deployment and operations.

More code faster can mean bigger change sets – more to test (and more bugs to fix), more code to review (and more refactorings to get it through review), more changes to merge (and more conflicts to resolve), and so on.

“A.I.” code generation’s a local optimisation that can come at the expense of the development system as a whole, especially if that system is more batch-oriented, with design, coding, testing, review, merging and release operating like sequential phases in the delivery of a new feature. In such a system, more code faster means bigger bottlenecks later. So there’s no paradox at all: one causes the other.

When teams work in much smaller cycles – make one change, test it, review the code, refactor, commit that and maybe push it to the trunk – they may experience far fewer downstream bottlenecks, with or without “A.I.” coding assistance. Arguably, coding assistants might make little noticeable difference in such a workflow.

The DORA data strongly indicates that the teams with the shortest lead times and the highest release stability tend to work this way, with continuous testing, code review and merging as the code’s being written.

And all this got me to thinking, maybe we’re targeting machine learning and “A.I.” at the wrong problem. Instead of focusing on individual developer productivity with things like code generation, perhaps this technology would yield more fruit if it was focused on systemic issues and reducing bottlenecks.

Maybe, for example, instead of using ML models to generate code, could they be more productively applied to reviewing code? Could a “smart” linter reduce the need for after-the-fact code review?

Of course, many of us already enjoy the benefits of “smart” linters. We call it “pair programming” or “ensemble programming”. And, having used static code analysis tools that incorporated statistical models or neural networks, the results weren’t all that impressive. Hard to see such a tool significantly out-performing a classic linter + a second pair of experienced eyes (if such eyes are available to you, of course, and maybe that’s the use case).

Perhaps the real value might be found in widening our view. What if a model (or models) could be trained on data collected across the entire cycle, from product strategy through to operational telemetry, support and beyond?

Imagine a model that, given, say, a Figma UI wireframe, could predict how many support calls you’d be likely to get about it. Imagine a model that, given a source file, could predict its mean time to failure in production?

More generally, imagine a model that could, with reasonable accuracy, predict the downstream impact of upstream activities, so as SuperDuperAgenticAI spits out its slop, alarm bells start to go off about where this is likely to lead if it gets any further.

A pipe dream, you might think. But in actual fact, such predictive technologies exist in other disciplines like electronic engineering, where statistical and ML models are used to predict the reliability and probable lifetimes of printed circuit boards, for example.

There would be some major hurdles to overcome to apply similar techniques to software development, though, not least of which is the jungle of higgledy-piggledy data formats our many proprietary tools and platforms produce. Electronics has established data interchange standards. We, for the most part, do not – probably because that would require enough of us to agree on some stuff, and that isn’t really our strong suit.

But, if these challenges could be overcome, or worked around (e.g., with a translation/encoding layer), I’m pretty sure there are patterns hidden in our complex and multi-dimensional workflow data that maybe nobody’s spotted yet. I mean, we’ve barely scratched the surface in the last 70+ years.

In a very handwavy sense, though, I feel quite sure now that “A.I.” is being targeted at the wrong problem in software: with an exclusive focus on individual developer productivity, when the focus should be on the system as a whole.

In the meantime, we’re pretty sure at this point that things like continuous design, continuous testing, continuous code review and continuous integration do have a positive systemic impact, so focusing on that is probably the most productive I can be for the foreseeable future.

If your team would like training and mentoring in the technical practices that we know speed up delivery cycles, shorten lead times and improve product and system reliability, with or without “A.I.”, pay us a visit.

The A-Z of Code Craft – O is for Outside-In

When I teach modular software design, I proffer four qualities of well-designed modules.

Well-designed modules…

Have one reason to change
Hide their internal workings
Have easily swappable dependencies
Have interfaces designed from their user’s point of view

That fourth one opens a smorgasbord of successful software design techniques – and not just module design – dating back to the beginnings of software engineering.

When considering the design of software modules – at any level of granularity from methods and functions all the way up entire systems – we’ve learned that an effective approach is to ask not “What does this module need to do?”, but “What does that user need to tell it to do?”

Use cases are one example of approaching design from the outside; from needs and goals of users, rather than the features and behaviours of systems. Test-Driven Development is another example where design begins with a user outcome (and that user could be another module, of course).

It’s not magic. When we start design by considering how modules and systems will be used (and we could look at modules as mini-systems in their own right – it’s turtles all the way down), we are usually led to designs that are useful.

In both use case-driven design, and TDD, the internal design of modules is driven directly by that external point of view. We begin by defining the desired user outcome (the “what”). We don’t begin by considering the internal design details (the “how”). The “how” is a consequence of the “what”, and design flows in that direction – from the outside in. (For example, working backwards from failing tests to implementation design.)

The reverse approach, where we design the pieces of the jigsaw and then try to put the pieces together at the end (“inside-out” design) has proven to have considerable drawbacks:

The wrong implementation code
Jigsaw pieces that don’t fit
Test code that bakes in the internal design

When we define the shape of the jigsaw pieces first, from the user’s point of view, their implementations are guaranteed to fit.

This was the original intention of Mock Objects. They can serve as placeholders for internal components that don’t exist yet. So when we’re writing a test for checking out a shopping basket, and we know that something will need to send a shipping note to the warehouse, but we don’t want to think about how that works yet, so we can “mock” a warehouse interface as a dependency of the module we’re testing. That mock, and our expectations about how it should be used by the checkout module, define a contract from that external point of view.

When we get around to implementing the warehouse module, its interface is already explicitly defined from the user’s point of view.

Outside-In design could be more accurately described as “usage-driven design”. It is working backwards from the user’s needs.

If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, visit codemanship.co.uk for details of high-quality hands-on training mentoring for software developers.

The A-Z of Code Craft – M is for Modularity

The ultimate goal of code craft is to be able to rapidly and sustainably evolve working software to meet rapidly changing business needs.

The key to this is short delivery lead times, and the key to that is making sure the software’s shippable at any time.

The key to software being shippable at any time is continuous testing, and the key to continuous testing is tests that run very fast.

If it takes 8 hours to sufficiently test the software, we’re at least 8 hours away from it being shippable (in practice – because you can introduce a lot of bugs in 8 hours – a lot longer).

If it takes 80 seconds, then a potential release is much, much closer to hand (and the bug count’s likely to be much, much lower – it’s a win-win).

So fast automated tests are the key to agility. And the key to fast automated tests is good separation of concerns in the architecture; otherwise known as modularity. (See “E is for Encapsulation“)

In an effectively modular design, different aspects of the system can be changed, reused and – most importantly – tested without needing to change, reuse or test other aspects. So we can test the calculation of the mortgage interest rate without having to involve, say, a database or a UI in that test.

Most programming languages and tech stacks have their mechanisms for encapsulating code at multiple scales from individual functions or classes, all the way up to distributed services and systems of systems. But the same principles apply at every level.

Well-designed modules:

Have one reason to change
Hide their internal workings
Are easily swappable
Have interfaces designed for the cient’s needs (not “What does this module do?”, but “What does the client need to tell it to do?”)

In particular, when it comes to achieving test suites where the vast majority run very fast, we need to cleanly separate our application’s logic from external concerns like accessing files or databases, calling web services, and so on. (See “H is for Hexagonal“).

I’ll repeat it one more time, for the folks at the back: the key to agility is modularity.

If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, my Code Craft and Test-Driven Development live remote training workshops are HALF PRICE until March 31st 2025.

The A-Z of Code Craft – L is for Liskov

One of the big benefits of modular software design is that it enables us to change systems by swapping module implementations instead of rewriting existing modules. But that requires us to take care to ensure new implementations satisfy the original contract for using that module’s services, or we’ll break client modules.

The Liskov Substitution Principle (the L in SOLID, named after computer scientist Barbara Liskov) states that an instance of any type can be substituted with an instance of any of its subtypes.

The LSP is often thought of as an object-oriented design principle, but, in practice, it applies to any mechanism of substitution in software design, including subclasses, interfaces, function pointers, web services, and so on.

When we, for example, substitute a different implementation of an API that breaks the original contract, we contravene Gorman’s First Law of Software Development:

“Thou shalt not break shit that was working”

LSP doesn’t just work across type relationships. It also works across versions. I see teams spending a lot of time fixing code that was broken by new releases of dependencies. And I mean a lot of time.

An extension of the LSP could state something like “A version of a component can be substituted with any newer version”. Another term for this is “backwards-compatibility”.

More often than not, teams are thoughtless about backwards-compatibility; routinely breaking contracts without realising they’re doing it.

A technique that’s gaining in popularity is contract testing. This involves creating two different set-ups for the same set of automated tests; one that stubs and mocks external dependencies, and one that uses the real end points. If the stubbed and mocked tests are all passing, but the tests using the end points suddenly start failing, that suggests something’s changed at the other end.

The biggest pay-off comes when the API team can run the client team’s contract tests themselves before releasing, giving them a heads-up that they’ve broken Gorman’s First Law.

The A-Z of Code Craft – I is for Iterative

A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work.

John Gall, Systemantics: How Systems Work And Especially How They Fail (1975)

Software developers have known since pretty much the start that getting a solution of any appreciable complexity right in a single pass is nigh-on impossible.

That should come as no surprise. The chances of one line of code being spot-on might be reasonably good. But 100 lines? 100,000 lines? 1 million lines? The odds are so stacked against us that they’re effectively zero.

We like to think of software as machines, but the complexity of modern software systems is more comparable to biology. We’ve never built machines with that many moving parts.

Nature has a tried and tested way of solving problems of this level of complexity, though: EVOLUTION.

As Gall notes, we don’t start with the whole all-singing, all-dancing version. We start simple – as simple as possible – and then we iterate, adding more to the design and feeding back lessons learned from testing to see if the software’s fit for purpose. Importantly, every iteration of the software works (and if it doesn’t, git reset –hard). Complexity emerges one small, simple step at a time.

When we approach it like this, the emphasis in software developments shifts profoundly, from delivering code or delivering features, to learning how to achieve users’ goals and solve problems. This is why frequent small releases of working software, designed in close collaboration with our users, is so very important.

It’s also why the most effective teams are always keeping one eye on the prize, continuously – there’s that word again! – revisiting the goals and asking “Did we solve the problem?” If the answer’s “No”, and it usually is, we go round again, feeding back what worked and what needs to change into the next small release. The faster we iterate, the sooner we solve the problem.

Rapid iteration of solutions is no small ask, though. If want to put working software in the hands of users, say, once a day, then that software needs to be tested and integrated at least once a day (and probably many times a day). Once we pull on that thread, a whole set of disciplines emerge that some of us call “code craft”.

And so here we are!

The A-Z of Code Craft – H is for Hexagonal

In yesterday’s “G is for GUI“, I talked about the benefits of cleanly separating the logical design of the user interface from the implementation details of its representation (e.g., separating the logic of applying for a mortgage from the web page the end user clicks the buttons on).

There’s a wider principle at work here. Separating the core logic of our systems from implementation details like the operating system, the file system, network operations (e.g., web service calls), databases and wotnot is, it turns out, a very good idea.

HTML is one of those implementation details, as are front-end frameworks like React and Flutter. Applying for a mortgage is one concern, divs and spans are entirely another. Remembering what customers have ordered in the past is one concern, storing and retrieving orders from an Oracle database is quite another.

An architectural metaphor that seeks to provide a clear distinction pictures the software’s design as a hexagon. Inside the hexagon is the essential core logic. The sides of the hexagon represents clearly distinct implementation details – separate concerns – like the UI, the file system, databases, user authentication, and so on. (In the real world, of course, there may not be exactly six sides to your “hexagon”).

The goal of Hexagonal Architecture (sometimes referred to as the “Ports & Adapters” architecture) is to make it so we can easily change one aspect of the implementation – e.g., switch to a different database – without having to change any of the other aspects.

With this in mind, the different components of the system need to be loosely coupled. We don’t want, say, the UI depending on the business logic directly, nor do we want the business logic depending directly on the database. Incoming requests are handled by “ports” that forward messages or events to the right part of the core logic, and outgoing requests (e.g., database operations or web service calls) are directed through “adapters” that present our own interface – so we decide when it changes – to that external dependency.

When we consider architecture at scale, we need to remember that these separate concerns – UI, databases, web services, etc – are their own hexagons. Each has its own essential core logic (the thing that it’s about), each has its own external dependencies, and its own ports and adapters that the developers of those components concern themselves with.

It’s important not to get too hung up on the metaphor, though. Like “layered”, or “onion”, or “clean” architecture, it’s just a way of reminding us to cleanly separate the different concerns in our software, and in particular to keep implementation details like UI frameworks and databases firmly at arms length from the essential core logic.

And let’s not forget that, as software grows, there’ll likely be multiple separate concerns in the business logic itself. Managing communications with customers and fulfilling customer orders might benefit from a clean separation of concerns, even if they’re implemented in the same monolithic component. It’s not unheard of to have hexagons inside our hexagons – turtles all the way down!

The A-Z of Code Craft – G is for GUI

There’s a patch of grass near my house that doesn’t belong to any of us living on the street. Nobody mows it. We don’t even think of it as a “lawn”, and we don’t care for it. But we often complain about it. It causes problems.

In software, too, there can be patches that don’t receive the same attention as the rest of the source code. We might not even think of them as “code”. Historically, graphical user interfaces have suffered from this lack of care.

It’s not uncommon to find “front end” code lacking in fast automated tests, for example. Unsurprisingly, we often find that front-end code is broken as a result.

And this is usually because front end GUI code can be hard to test in isolation. This is almost always because of a lack of separation of concerns in that part of the architecture. (See “E is for Encapsulation“).

Developers will even tell us that “unit” testing GUI code isn’t possible. But this is almost always not the case (although some front-end frameworks don’t make it easy, it has to be said).

When we review that code, we’ll usually find that display logic (how is the mortgage interest rate formatted?), system information (what is the mortgage interest rate?), user interaction logic (what does it mean when I click the button labelled “Calculate Interest”?), and core “business” logic (how is the mortgage interest rate calculated?) are mixed in together in the same UI module.

This makes it nigh-on impossible to test display rendering, system state, user interactions and core logic separately. Basically, to test that the interest rate’s calculated correctly, somebody/something has to click the buttons and check the outputs that are being displayed.

A front-end architecture that more cleanly separates these concerns makes it much easier to write fast tests for the majority of that code. The idea of a “view model”, in particular, can enable us to capture the logic of the user’s journey without mixing that up with the details of how that experience is actually represented on the screen.

We can unit test logically what happens when the calculateInterest() function of the MortgageApplicationView is invoked. We don’t have to load the web page and we don’t have to click the “Calculate Interest” button to check how the system responds. Rendering for, say, the browser is another step beyond that; an implementation detail we want to hold at arm’s length. If we’re smart we can also unit test how the MortgageApplicationView is rendered as HTML in a separate test.

Some devs might say “But our front-end framework has MVC/MVP/view models”. Great! But if we rely on those to capture our logical user experience, we’re tying it closely to that framework. React, Vue.js, Flutter and other UI frameworks are implementation details. We don’t want to mix our logic with implementation details. UI logic should be POxOs (Plain Old Java Objects, Plain Old Python Objects etc) so we that have full control over them.

The A-Z of Code Craft – E is for Encapsulation

Imagine you’re a pastry chef working in a professional kitchen. For some reason, the utensils you use to make pastries and cakes aren’t kept on your workstation. The rolling pins are kept on the meat workstation. The cookie cutters are kept on the fish station. The pastry brushes are kept on the saucier station. To do your job, you spend much of your time going to other chefs’ stations, and your workflow has to change every time they reorganise their stations.

A more efficient kitchen design would store the rolling pins, cookie cutters and pastry brushes on the pastry station, giving you the tools you need to do your job, and freeing you up from needing to know the details of other chefs’ workstations.

The technical term for this in software design is “encapsulation”.

In software design, data and dependencies are the “utensils” used by modules to fulfil their responsibilities. When a module needs to know something we call that “coupling”. When the knowledge a module requires to do their job is internalised inside that module, we call that “cohesion”. THINGS THAT CHANGE TOGETHER, BELONG TOGETHER.

A good modular design is said to be “cohesive and loosely-coupled”, making it easier to change one part of the system without having to change other parts. Changing how loan repayments are calculated doesn’t affect how, say, interest rates are calculated. They are SEPARATE CONCERNS.

Separation of concerns has a profound impact on our ability to change, test and reuse software. If coupling between modules is high, changes can ripple out along the couplings, causing the smallest changes to have wide-reaching effects. If the module we want to reuse is tightly coupled to other modules, we’ll need them, too – buying the whole Mercedes just to use the radio! If loan repayments and interest rates are calculated in the same module, we can’t test repayments without involving interest rates. If the module that calculates interest rates is a web service, our repayments tests are going to be slow.

Encapsulation and separation of concerns applies at every scale in software design, from individual functions to systems of systems. At larger scales, the cost of coupling rises by orders of magnitude. Tightly-coupled classes are a pain. Tightly-coupled web services kill businesses every day.

Finally, consider also how encapsulation might be applied to TEAMS. What impact does it have when, say, the user experience designer on a product’s placed on a separate team?

The A-Z of Code Craft – D is for D.R.Y.

“Don’t Repeat Yourself” is a widely misunderstood, often misapplied, and consequently much-maligned principle in the design of software.

While it’s true that repetition in code can hurt us, by multiplying the cost of change, it’s by no means the worst thing we can do. Indeed, sometimes repetition can help us if it makes code easier to understand. (If you refactor code to remove duplication, stop to ask if that’s made it harder to follow. If it has, put the duplication back!)

But that’s not what D.R.Y. is really about. Think of it this way: what’s the opposite of duplication? REUSE.

When we see multiple repetitions of a similar thing – be it copied-and-pasted code, or a repeated concept that appears in multiple places (I remember one application that had 3 Customer tables in the database, each created by different people for different features) – that’s a hint about what our design needs to be.

When we refactor to consolidate, we discover the need for reusable abstractions like parameterised functions or shared classes. Duplication points us towards potential modularity.

This is an evidence-based approach to design. We don’t speculate that a function might be reused, we see where it will be reused; we see the need for it in the current code.

Duplication in code can act as bread crumbs leading us to a better design and to genuinely useful – because they’re being used – reusable components. Removing duplication is where some of our most popular libraries and frameworks came from.

As for taking it too far, it’s certainly true that jumping on too quickly can produce over-abstracted code, and a much higher risk of choosing the wrong abstractions. The more examples we see, the more likely an abstraction is to be both the right one, and to actually pay for itself in the future

But let the duplication build up, and the refactoring’s going to take longer. In the zero-sum game of software development, things that take longer are less likely to happen, so we need to strike a balance.

The “Rule of Three” is a rough and ready guide for how many examples we might want to see before we refactor. Sometimes more, sometimes fewer, but on average, around three.

Scale is also a factor here. Reuse creates dependencies. If those cross team boundaries, it really needs to be worth it.

Don’t forget, either, that repetition also applies to not just our code, but our process for creating it. Automating repeated tests (regression tests) is a good example how refactoring duplication of effort in our process can streamline delivery.

Be mindful, though, that just as over-abstraction is a risk in refactoring duplication code, over-zealous automation is a risk in refactoring duplicated effort. I’ve worked with teams who have so many scripts and custom tools that it takes weeks or even months for new joiners to get up to speed, and some of those tools saved them less time and money than they took to create and maintain.

The A-Z of Code Craft – C is for Continuous

Old-fashioned approaches to creating software often encourage us to think of the activities involved as stages or phases in the process: the design phase, the coding phase, the testing phase, the integration phase, the release phase, and so on.

This approach has some major drawbacks. In fact, many of us have found that it simply doesn’t work on problems of any appreciable complexity.

The moment we start writing code, we see how the design needs to change. The moment we start testing, we see how the code needs to change. The moment we integrate our changes, we see how ours or other people’s code needs to change. The moment we release working software into the world, we learn how the software needs to change.

Around and around we go, feeding back our lessons into a never-ending continuous cycle of designing, coding (which I might argue is also designing), testing, integrating and releasing. The lines between these activities become very blurred. If I’m writing a failing test in a test-driven approach, am I designing, or am I coding, or am I testing? When I’m refactoring code, am I designing, or coding, or testing?

The correct answer is: YES.

And if we work backwards from the goal of having working software that can be shipped at any time, we inevitably arrive at the need for continuous integration, and that doesn’t work without continuous testing, and that doesn’t work if we try to design and write all the code before we do any testing. Instead, we work in micro feedback loops, progressing one small step at a time, gathering feedback throughout so we can iterate towards a good result.

But the continuous loops don’t end there. To ensure the software’s open to change, we also need to be continuously reviewing or inspecting the code. And to get the bigger picture right as the software grows – considering how the pieces of the jigsaw fit together – we need to be continuously architecting our products and systems on that larger scale.

And, finally, to be capable of doing these things well – abilities none of us are born with – we need to be continuously learning. In each nugget of feedback, we can see things that went well, and things we could do better. Rather than saving it all up for a “post-mortem” after a major release, and trying to change 1,001 things in our approach – which never works out! – we need to act on that feedback throughout the process, evolving our approach one lesson at a time.

Some, myself included, might say that if code craft could be crystallised in one word, that word would be “continuous”.