Ned Batchelder: Generating data shapes with Hypothesis
In my last blog post (A testing conundrum), I described trying to test my Hasher class which hashes nested data. I couldn’t get Hypothesis to generate usable data for my test. I wanted to assert that two equal data items would hash equally, but Hypothesis was finding pairs like[0]and[False]. These are equal but hash differently because the hash takes the types into account.
In the blog post I said,
I don’t want a fixed schema for the data Hasher would accept, but tests to compare data generated from the same schema. It shouldn’t compare a list of ints to a list of bools. Hypothesis is good at generating things randomly. Usually it generates data randomly, but we can also use it to generate schemas randomly!
Hypothesis basics
Before describing my solution, I’ll take a quick detour to describe how Hypothesis works.
Hypothesis calls their randomness machines “strategies”. Here is a strategy that will produce random integers between -99 and 1000:
Strategies can be composed:
This will produce lists of integers from -99 to 1000. The lists will have up to 50 elements.
Strategies are used in tests with the@givendecorator, which takes a strategy and runs the test a number of times with different example data drawn from the strategy. In your test you check a desired property that holds true for any data the strategy can produce.
To demonstrate, here’s a test of sum() that checks that summing a list of numbers in two halves gives the same answer as summing the whole list:
By default, Hypothesis will run the test 100 times, each with a different randomly generated list of numbers.
Schema strategies
The solution to my data comparison problem is to have Hypothesis generate a random schema in the form of a strategy, then use that strategy to generate two examples. Doing this repeatedly will get us pairs of data that have the same “shape” that will work well for our tests.
This is kind of twisty, so let’s look at it in pieces. We start with a list of strategies that produce primitive values:
Then a list of strategies that produce hashable values, which are all the primitives, plus tuples of any of the primitives:
We want to be able to make nested dictionaries with leaves of some other type. This function takes a leaf-making strategy and produces a strategy to make those dictionaries:
Finally, here’s our strategy that makes schema strategies:
For debugging, it’s helpful to generate an example strategy from this strategy, and then an example from that, many times:
Hypothesis is good at making data we’d never think to try ourselves. Here is some of what it made:
Finally writing the test
Time to use all of this in a test:
Here I use the .flatmap() method to draw an example from thenested_data_schemasstrategy and call the provided lambda with the drawn example, which is itself a strategy. The lambda usesst.tuplesto make tuples with two examples drawn from the strategy. So we get one data schema, and two examples from it as a tuple passed into the test asdata_pair. The test then unpacks the data, hashes them, and makes the appropriate assertion.
This works great: the tests pass. To check that the test was working well, I made some breaking tweaks to the Hasher class. If Hypothesis is configured to generate enough examples, it finds data examples demonstrating the failures.
I’m pleased with the results. Hypothesis is something I’ve been wanting to use more, so I’m glad I took this chance to learn more about it and get it working for these tests. To be honest, this is way more than I needed to test my Hasher class. But once I got started, I wanted to get it right, and learning is always good.
I’m a bit concerned that the standard setting (100 examples) isn’t enough to find the planted bugs in Hasher. There are many parameters in my strategies that could be tweaked to keep Hypothesis from wandering too broadly, but I don’t know how to decide what to change.
Actually
The code in this post is different than the actual code I ended up with. Mostly this is because I was working on the code while I was writing this post, and discovered some problems that I wanted to fix. For example, thetuples_offunction makes homogeneous tuples: varying lengths with elements all of the same type. This is not the usual use of tuples (seeLists vs. Tuples). Adapting for heterogeneous tuples added more complexity, which was interesting to learn, but I didn’t want to go back and add it here.
You can look at thefinal strategies.pyto see that and other details, including type hints for everything, which was a journey of its own.
Postscript: AI assistance
I would not have been able to come up with all of this by myself. Hypothesis is very powerful, but requires a new way of thinking about things. It’s twisty to have functions returning strategies, and especially strategies producing strategies. The docs don’t have many examples, so it can be hard to get a foothold on the concepts.
Claude helped me by providing initial code, answering questions, debugging when things didn’t work out, and so on. If you are interested,this is one of the discussions I had with it.
https://nedbatchelder.com/blog/202512/generating_data_shapes_with_hypothesis.html