By bike, it takes about 15 minutes to get from my house to Wimbledon Village in South West London. In a sports car that’s 10 times as fast as a bicycle – let’s call it a “10x” mode of transport – it still takes about 15 minutes to get from my house to Wimbledon Village.
When we travel on London’s roads, the journey time’s mostly determined not by the performance of our vehicle, but by how much time we spend waiting. Waiting at traffic lights. Waiting at junctions. Waiting at pedestrian crossings. Waiting to join roundabouts. It’s mostly waiting.
During rush hour, the average journey speed in London is just 9 miles/hour, whether you’re in a Porsche 911 or on a bicycle. This is not a limit of your vehicle, this is a limit of the system your vehicle has to work within.
We see a similar effect with software developers. Take any “10x” developer and put them in a 1x system, and you’ll get 1x performance out of them every time. (Yes, even if they use an “A.I.” coding assistant!) Fitting a jet engine to your car isn’t going to get you to Wimbledon Village any sooner.
If you really want to get more value sooner out of a dev team, don’t focus on the performance of the developers, focus on reducing the time they spend waiting – the time they spend blocked from creating value by the system they’re working within.
Sadly, blocking behaviours are rife in our industry. Pull Request code reviews are a good example, where a developer’s changes sit on the shelf waiting to be approved before they can make it into the end product.
There are many other examples of blocking behaviour, such as waiting for customer input, waiting for the UX designer to provide wireframes, waiting for a QA team to test the software, and so on. The average team spends most of their time not moving forward, but sitting at proverbial traffic lights.
In concurrent programming, we have a concept of “non-blocking” processes that can continue without waiting for another process to finish. Maximising the non-blocking parts can hugely improve the performance of the system as a whole.
There are so many common blockers that it’s beyond the scope of this little essay to discuss them all, but I can offer some general advice on unblocking your development teams:
Trust and empower teams to make more of the decisions
Encapsulate the knowledge and skills needed to deliver user/business outcomes within teams
Limit the amount of work in progress. It can be hard to see the bottlenecks and blockers when the team has a bunch of half-finished features in play.
View all handovers and sign-offs with hostile suspicion. They are the traffic lights of your dev process.
If something’s hurting (e.g., merging feature branches), do it more often. If possible, do it continuously. This is especially true about communication.
Java Jane doesn’t have to wait for the DBA to add a column if she can do it herself. T-shaped developers get blocked far less often.
More traffic = slower traffic. Team size has a similar, very well-known effect.
Dependencies are not your friend. If adding a new feature involves every team, you’re going to be doing a lot of waiting.
Remember that traffic lights exist for a reason. The speed demons in our teams are as likely to cause accidents as the speed demons on our roads. When unblocking processes, remember to make sure safety isn’t compromised. Not testing code before a release might seem faster…
And don’t forget – it’s the speed of the system we’re optimising. Focus more attention on that, not on individual developers.
If you’re serious about building your team’s capability to rapidly, reliably and sustainably evolve software to meet rapidly changing business needs, my Code Craft and Test-Driven Development live remote training workshops are HALF PRICE until March 31st 2025.
It’s been a while since I set a programming challenge with a seasonal theme – about a year, in fact – so here’s a new one which I hope you’ll enjoy, even if you don’t celebrate Xmas.
Imagine we run a railway that Santa and his helpers use to ship presents from their factory in Lapland non-stop to a distribution depot 860 km away just outside Helsinki. (That’s how it works. Don’t argue.)
The elves working at the depot need to know what time to expect the train so they can make sure everything’s ready at their end – reindeer fed, sleigh oiled, Santa’s lunchbox packed, etc. To aid in improving the accuracy of their ETA, they have commissioned us to write a software system.
Positioned at intervals of about 1 km, they’ve placed sensors along the track that send a signal to the depot as the train passes. There are contact points at the front and the rear of the train, so each passing of a sensor triggers 2 data messages – front and rear. The train is 200 m long.
Included in each data message is location information about that particular sensor telling us how far along the track from the factory in Lapland it is in kilometres.
The elves, of course, insist on a JSON format for these data messages, which looks like this:
Our software must determine the time that has elapsed between the first and second message, and use that information to calculate the speed of the train as it passed that sensor, and from there calculate an estimated time of arrival at the depot to the nearest minute.
This estimate will be updated with every new pair of messages sent by each sensor along the track to give the elves a live picture of the progress of the train.
Although in real life the train would need to vary its speed depending on conditions, for this exercise assume the train accelerates from rest at a constant 1 m/s/s and decelerates at the same rate, and has a maximum speed of 150 km/h.
For this exercise, you will need to create two programs:
One that simulate’s the train’s journey, sending pairs of sensor-passing data messages at intervals approximating every kilometre of the train’s progress
Another to receive these messages and update the train’s estimated time of arrival and display it on the screen
The sensors will send one final message when the train has reached the depot to let the receiver know the journey has ended.
I’ve been spending my early mornings buried in Java threading recently. Although we talk often of concurrency and “thread safety” in this line of work, there’s surprisingly little actual multi-threaded code being written. Normally, when developers talk about multi-threading, we’re referring to how we write code to handle asynchronous operations in other people’s code (e.g., promises in JavaScript).
My advice to developers has always been to avoid writing multi-threaded code wherever possible. Concurrency is notoriously difficult to get right, and the safest multi-threaded code is single-threaded.
I’ve been eating my own dog food on that, and it occurred to me a couple of weeks back that I’ve written very little multi-threaded code myself in recent years.
But there is still some multi-threaded code being written in languages like Java, C# and Python for high-performance solutions that are targeted at multi-CPU platforms. And over the last few months I’ve been helping a client with just such a solution for scaling up property-based tests to run on multi-core Cloud platforms.
One of the issues we faced is how do we test our multi-threaded code?
There’s a practical issue of executing multiple threads in a single-threaded unit test – particularly synchronizing so that we can assert an outcome after all threads have completed their work.
And also, thread scheduling is out of our control and – on Windows and similar platforms – unpredictable and non-repeatable. A race condition or a deadlock might not show up every time we run a test.
Over the last couple of weeks, I’ve been playing with a rough prototype to try and answer these questions. It uses a simple producer-consumer example – loading parcels into a loading bay and then taking them off the loading bay and loading them into a truck – to illustrate the challenges of both safe multi-threading and multi-threaded testing.
When I test multi-threaded code, I’m interested in two properties:
Safety – what should always be true while the code is executing?
Liveness – what should eventually be achieved?
To test safety, an assertion needs to be checked throughout execution. To test liveness, an assertion needs to be checked after execution.
After writing code to do this, I refactored the useful parts into custom assertion methods, always() and eventually().
always() takes a list of Runnables (Java’s equivalent of functions that accept no parameters and have no return value) that will concurrently perform the work we want to test. It will submit each Runnable to a fixed thread pool a specified number of times (thread count) and then wait for all the threads in the pool to terminate.
On a single separate thread, a boolean function (in Java, Supplier<Boolean>) is evaluated multiple times throughout execution of the threads under test. This terminates after the worker threads have terminated or timed out. If, at any point in execution, the assertion evaluates to false, the test will fail.
In use, it looks like this:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
bayLoader and truckLoader are objects that implement the Runnable interface. They will be submitted to the thread pool 2x each (because we’ve specified a thread count of 2 as our third parameter), so there will be 4 worker threads in total, accessing the same data defined in our set-up.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
The bayLoader threads will load parcels on to the loading bay, which holds a maximum of 50 parcels, until all the parcels have been loaded.
The truckLoader threads will unload parcels from the loading bay and load them on to the truck, until the entire manifest of parcels has been loaded.
A safety property of this concurrent logic is that there should never be more than 50 parcels in the loading bay at any time, and that’s what our always assertion checks multiple times during execution:
() -> bay.getParcelCount() <= 50
When I run this test once, it passes. Running it multiple times, it still passes. But just because a test’s passing, that doesn’t mean our code really works. Let’s deliberately introduce an error into our test assertion to make sure it fails.
() -> bay.getParcelCount() <= 49
The first time I run this, the test fails. And the second and third times. But on the fourth run, the test passes. This is the thread determinism problem; we have no control over when our assertion is checked during execution. Sometimes it catches a safety error. Sometimes the error slips through the gaps and the test misses it.
The good news is that if it catches an error just once, that proves we have an error in our concurrent logic. Of course, if we catch no errors, that doesn’t prove they’re not there. (Absence of evidence isn’t evidence of absence.)
What if we run the test 100 times? Rather than sit there clicking the “run” button over and over, I can rig this test up as a JUnitParamsparameterised test and feed it 100 test cases. (If you don’t have a parameterised testing feature, you can just loop 100 times).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
When I run this, it fails 91/100 times. Changing the assertion back, it passes 100/100. So I can have 100% confidence the code satisfies this safety property? Not so fast. 100 test runs leaves plenty of gaps. Maybe I can be 99% confident with 100 test runs. How about we do 1000 test runs? Again, they all pass. So that gives me maybe 99.9% confidence. 10,000 could give me 99.99% confidence. And so on.
Thankfully, after a little performance engineering, 10,000 tests run in less than 30 seconds. All green.
The eventually() assertion method works along similar lines, except that it only evaluates its assertion once at the end (and therefore runs significantly faster):
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
If my code encounters a deadlock, the worker threads will time out after 1000 milliseconds. If a race condition occurs and our data becomes corrupted, the assertion will fail. Running this 10,000 times shows all the tests are green. I’m 99.99% confident my concurrent logic works.
Finally, speaking of deadlocks and race conditions, how might we avoid those?
A race condition can occur when two or more threads attempt to access the same data at the same time. In particular, we run the risk of a pre-condition paradox when bay loaders attempt to load parcels on to the loading bay, and truck loaders attempt to unload parcels from the bay.
The bay loader can only load a parcel if the bay is not full. A truck loader can only unload a parcel if the bay is not empty.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
When I run my tests with this implementation of LoadingBay, 12% of them fail their liveness and safety checks because there’s a non-zero possibility of, say, a bay loader attempting to load a parcel after we’ve checked the bay isn’t full and another bay loader loading the 50th parcel in between that check and loading. Similarly, a truck loader might check that the bay isn’t empty, but before they unload the last parcel another truck loader thread takes it.
To avoid this situation, we need to ensure that pre-condition checks and actions are executed in a single, atomic sequence with no chance of other threads interfering.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
When I test this implementation, tests still fail. The problem is that some parcels aren’t getting loaded on to the bay (though the bay loader thinks they have been), and some parcels aren’t getting unloaded, either. Our truck loader may be putting null parcels on the truck.
When loading, the bay must not be full. When unloading, it must not be empty. So our worker threads need to wait until their pre-conditions are satisfied. Now, Java threading gives us wait() methods, but they only wait for a specified amount of time. We need to wait until a condition becomes true.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This passes all 10,000 safety and liveness test runs, so I have 99.99% confidence we don’t have a race condition. But…
What happens when all the parcels have been loaded on to the truck? There’s a risk of deadlock if the bay remains permanently empty.
So we also need a way to stop the loading and unloading process once all the manifest has been loaded.
I’ve dealt with this in a similar way to waiting for pre-conditions to be satisfied, except this time we repeat loading and unloading until the parcels are all on the truck.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
You may have already spotted the patterns in these two forms of loops:
Execute this action when this condition is true
Execute this action until this condition is true
Let’s refactor to encapsulate those nasty while loops.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
There. That looks a lot better, doesn’t it? All nice and functional.
I tend to find conditional synchronisation easier to wrap my head around than all the wait() and notify() and callbacks malarky, and experiences so far with this approach suggest I tend to produce more reliable multi-threaded code.
My explorations continue, but I thought there might be folk out there who’d find it useful to see where I’ve got so far with this.
You can see the current source code at https://github.com/jasongorman/syncloop (it’s just a proof of concept, so provided with no warranty or support, of course.)