
A widely-known – and even more widely misunderstood – principle in software design is Don’t Repeat Yourself (“D.R.Y.”).
Duplication in code can be a problem, because it can multiply the cost of making changes to the repeated parts. But it’s by no means the worst thing we can do in code. I’ll take code that’s easy to understand over code that has zero duplication any day. (And I’ll take code that works over code that’s easy to understand, too.)
So, although duplication is listed in many sources as a code smell, it’s importance is perhaps sometimes overstated in the maintainability stakes. But D.R.Y. serves a purpose in the actual design process itself.
Think about it: what’s the opposite of duplication? Reuse. When we see repeated examples of code, they can act as a signpost towards some kind of modular, reusable replacement. Repeated blocks of code could become a parameterised function of method. Copied and pasted groups of functions or methods could become a shared module or class.
By paying attention to duplication and refactoring to consolidate it, modular abstractions emerge in our code; shared functions/methods, shared modules/classes, polymorphism, and so on.
It’s argued by some, like Kent Beck, that pulling on that string of duplicated code to allow a modular design to emerge is a more evidence-based, “scientific” approach to design. We introduce abstractions, not because we think we might need them (another code smell called “Speculative Generality”), but because we see multiple examples where we do need them in the working code as it presently is, and not as we imagine it might be or should be.
The skill in following duplication to a modular design is in seeing the patterns. And, here, it serves us to see more examples so we’re more likely to choose the right generalisation, encapsulated in the right abstraction.
But leave it too long – let the repetition go on and on – and we face another problem, which is that any refactoring we do to consolidate it will take longer and longer. In the zero-sum game of software development, where we have limited time and resources, things that take longer (and/or cost more) are less likely to happen.
So we want to wait until we see enough examples, but not wait too long that we might not ever get around to refactoring it. This balance is captured in The Rule Of Three. On average (i.e., not always), we wait to see three examples of repetition before we refactor. Could be more, could be fewer, but on average, three.
The other thing about The Rule of Three is that, when we see something repeated once, the odds of it being repeated again (and again) are quite small. When we see code (or a concept – a much longer blog post!) repeated three or more times, chances are higher that it will be repeated again. In the coding kata that this exercises I’m going to introduce is based on, you should see things really speeding up after you’ve refactored out the duplication in the solution code.
And, of course, code isn’t the only thing we repeat in software development. The process itself can be full of examples of duplicated effort. For example, I might manually deploy my web application every day. The stuff I’m doing at the command line – stopping servers, deleting old folders, copying new files across, running database updates, restarting servers, etc – I could do with a batch script so it becomes an automated single-click process. If the time and effort involved in automating deployments is significantly outweighed by the time saved doing deployments every day, that’s a profitable venture.
In the same way that duplicated code can signpost a better modular design, duplicated effort can point us more scientifically to a better process. And in this sense, it’s useful to be aware of where that duplicated effort’s occurring. Time and Motion studies kind of thing.
Anyway, here’s a coding kata that exercises your Rule Of Three senses. It’s based on a well-known kata that’s good for practicing spotting and removing duplication in solution code, but we’re going to expand on that.
The problem you’re going to write code to solve is the Roman Numerals kata. Ordinarily, developers tackle this exercise by writing their tests first (TDD). But in our version, we’re going to go test-after. Like in the bad old days. But still working in micro-feedback loops. So, write a little bit of code – change or add one thing – then test it.
For example, write a function that converts 1 into “I”, then test that. Then change the function to turn 2 into “II” and then perform both tests. Then change the function to turn 3 into “III”, and then test all three cases. And so on. So, baby code-test steps. And, of course, if you see a pattern of repetition in your solution code, consider refactoring once you’ve seen three complete examples of that pattern. (Pro tip: don’t jump in too soon, the pattern’s larger than you think!)
The Rule Of Three kata has… well… three rules:
- When you’ve performed the same test – e.g., 1 = “I” – at least three times, automate that test in a main() function or method so you can run it with a single command in a terminal, and easily inspect the result in your console window (test name, pass/fail, expected result, actual result). DO NOT USE A TESTING FRAMEWORK. The goal here is to discover a framework by consolidating duplication.
- When you’ve written at least three automated tests, refactor repeated test code into a shared set of abstractions (e.g., shared functions) to remove that duplication, and then carry on
- If you do this exercise alongside other developers or pairs (highly recommended), when three of you have your own set of shared testing abstractions, compare and consolidate them into a single shared testing library that you all use. Then carry on. You may find, as the number of automated tests grows further, that more evolution of the testing framework will help you to Don’t Repeat Yourself. So we may be looking at some kind of trunk-based concurrent development here 🙂



