simulation | Healthy Algorithms

There are lovely libraries near my house. Three, all in different directions. I have spent hours in each, especially before my kids were in school. This is a story is about the one called the Douglass-Truth Library, the surprising way it got that name, and a computer simulation inspired by that surprise. Maybe it will surprise you, too.

The Douglass-Truth Library. Its name stands out from its peers. They have names like “Beacon Hill Library” and “Central Library”. Most Seattle libraries are named to match their neighborhoods.

Not the Douglass-Truth Library. We just celebrated the 50th anniversary of its renaming. It was renamed in 1975 to honor the abolitionists Fredrick Douglass and Sojourner Truth. And how were these particular leaders selected for this honor? A community vote!

Back in the early 1970s, community leaders in a neighborhood that had become predominantly African American thought their library deserved a name that reflected the community. They organized a ballot listing ten distinguished Black Americans and invited the neighborhood to vote.

The ballot clearly says “vote for one”.

How did the library get named after two?

A tie vote! Out of a ten-person race.

That is so unlikely. Is it so unlikely? How unlikely?

When I slow down to think about it, I think it must depend on how many votes there were. One hundred voters tying does not seem likely, but it seems more likely than ten thousand voters ending in a tie. How many ballots were cast? I could not find any documentation of this. But when I was at the renaming anniversary party last December, someone remembered: around 2,000 people voted.

So now can we figure out how unlikely?

Almost. I used to love simplifying probability equations and making approximations for these sorts of formulas, and if I was writing this blog 25 years ago, it would be all about Stirling’s approximation and how with pencil and paper we could puzzle this out. Stirling showed that tie probability scales like $\sqrt{\frac1N}$ , and that means 2,000 voters doesn’t give you 1-in-2,000 odds—it’s more like 1/45. Ties are rarer with more voters, but not vanishingly rare.

Lately, though, I’ve become more of a computer-simulation type. Pencil and paper is fine, but random number sampling is even more fun.

(Simulation code on GitHub | Direct link to run that code in Colab)

From here, I’ll give just a sketch, so you can have fun the way you like to have fun.

Assumptions: maybe the simplest way to start is if everyone is equally liked and voters are simply choosing randomly (no strategic behaviors). But would it be more likely to have a first-place tie if two front runners were each getting nearly 50% of the vote?

Simulation: With simplifying assumptions, I can simulate one person’s vote with np.random.choice(candidate_list, vote_probabilities), but it is faster to simulate everyone’s vote simultaneously. Then I need to tally the votes and see if there is a tie for first place.

			
def simulate_election(n_votes, candidate_probs):
    votes = np.random.choice(len(candidate_probs), size=n_votes, p=candidate_probs)
    tallies = pd.Series(votes).value_counts()
    return tallies.iloc[0] == tallies.iloc[1]  # True if tie for first

Results: If all ten candidates are equally liked, there’s roughly a 1 in 20 chance of a first-place tie (with 2,000 voters). The chances drops as turnout grows—but stays above 3% even with 5,000 voters.

If there are two front runners it is more like one in 100, and if one candidate has a comfortable majority, then a tie is extremely rare.

Conclusion: The math is fun. The computer simulation is fun. But the real breakthrough happened 50 years ago when the votes came in tied and some genius invented a new option: name the library after both heroes.

A hands-on guide to catching simulations bugs with automated tests

TL;DR

What you’ll learn: Write rigorous tests for randomized algorithms without arbitrary thresholds.

What you’ll get: A failing test that catches a subtle directional bias bug with Bayes factor = 10⁷⁹ (decisive evidence).

The approach: Run simulations many times, count outcomes, validate proportions using Bayesian hypothesis testing. The heart of this Fuzzy Checking Pattern is this method

fuzzy_checker.fuzzy_assert_proportion(
    observed_numerator,
    observed_denominator,
    target_proportion,
)

The Problem: How Do You Test Randomized Algorithms?

Imagine you’re using computer simulation in your research, like an agent-based model or a Monte Carlo simulation. You run your code and it produces output. Then you run it again and get different output. That’s expected! It’s random. But here’s the challenge:

How do you know your “random” behavior is actually correct?

Traditional unit tests fail:

# This doesn't work - random walks vary!
assert result == 42

# Neither does this - what threshold?
assert 40 <= result <= 44

This tutorial, inspired by Greg Wilson’s testing challenge, demonstrates a rigorous solution: Bayesian fuzzy checking using the FuzzyChecker from vivarium_testing_utils.

An Answer: Bayesian Hypothesis Testing

Instead of asking “is this close enough?” (with arbitrary thresholds), ask: “What’s the evidence ratio for bug vs. no-bug?”

We’ll do this with the Bayes factor:

Bayes factor > 100 = “decisive” evidence of a bug → Test FAILS
Bayes factor < 0.1 = “substantial” evidence of correctness → Test PASSES
Bayes factor between 0.1 and 100 = “inconclusive” → WARNING (need more data)

The Bug We’ll Catch

We’ll test a simple 2D random walk simulation. The walker starts at the center of a grid and takes random steps (left, right, up, down) until it reaches an edge.

The correct implementation uses four moves:

moves = [[-1, 0], [1, 0], [0, -1], [0, 1]]  # left, right, up, down

The buggy implementation has a subtle typo:

moves = [[-1, 0], [1, 0], [0, -1], [0, -1]]  # left, right, up, up (!)

Can you spot it? [0, -1] appears twice (up), and [0, 1] (down) is missing!

Impact: The walker can move left (25%), right (25%), up (50%), but never down (0%). This creates a bias that’s hard to spot with traditional asserts but shows up dramatically in statistical tests.

(If you want to run this on your own, you can find some instructions for getting set up here.)

The Fuzzy Checking Pattern

I have separated my simulation code (random_walk.py) from my automatic testing code (test_random_walk.py). I recommend this for you, too. The simulation does your science, the tests check if your science has bugs.

1. Write Your Simulation (in `random_walk.py`)

The fill_grid(grid, moves) function simulates a random walk and returns where it ended:

def fill_grid(grid, moves):
    """Fill grid with random walk starting from center."""
    center = grid.size // 2
    size_1 = grid.size - 1
    x, y = center, center
    num = 0

    while (x != 0) and (y != 0) and (x != size_1) and (y != size_1):
        grid[x, y] += 1
        num += 1
        m = random.choice(moves)
        x += m[0]
        y += m[1]

    return num, x, y  # Steps taken and final position

See random_walk.py lines 45-70 for the code in context.

2. Test by Calling Your Implementation

This random walk is symmetric, so I expect 25% of walks to exit at each edge. Let’s test that. Run many simulations and count where they exit:

from random_walk import Grid, fill_grid, CORRECT_MOVES

num_runs = 1000
num_left_exits = 0

for i in range(num_runs):
    random.seed(2000 + i)
    grid = Grid(size=11)

    num_steps, final_x, final_y = fill_grid(grid, CORRECT_MOVES)

    if final_x == 0:
        num_left_exits += 1

3. Assert with Bayes Factors

FuzzyChecker().fuzzy_assert_proportion(
    observed_numerator=edge_counts["left"],
    observed_denominator=num_runs,
    target_proportion=0.25
)

The Core Method: `fuzzy_assert_proportion()`

This method performs Bayesian hypothesis testing to validate that observed proportions match expectations:

from vivarium_testing_utils import FuzzyChecker

# Example: Validate that ~25% of walks exit at left edge
FuzzyChecker().fuzzy_assert_proportion(
    observed_numerator=254,       # 254 walks exited left
    observed_denominator=1000,    # Out of 1000 total walks
    target_proportion=0.25,       # We expect 25%
)

How It Works

Defines two distributions:
- H₀ (no bug): Based on your target proportion
- H₁ (bug): Broad prior (Jeffreys prior by default)
Calculates Bayes factor: BF = P(data | bug) / P(data | no bug)
Decides:
- BF > 100 → Decisive evidence of bug → AssertionError raised
- BF < 0.1 → Substantial evidence of no bug → Test passes silently
- 0.1 ≤ BF ≤ 100 → Inconclusive → Warning (need more data)

The buggy random walk exits left only 3% of the time (expected 25%):

pytest test_random_walk.py::test_buggy_version_catches_exit_bias -v

AssertionError: buggy_left_exit_proportion value 0.03 is significantly less than expected, bayes factor = 1.37e+79

That’s astronomically decisive evidence of a bug. The buggy version can move up twice but never down, so 94.6% of walks exit at the top edge, dramatically reducing other exits.

Key Files in This Tutorial

`random_walk.py`

The simulation implementation:

Grid class – Simple 2D grid for tracking visits
fill_grid(grid, moves) – Random walk that returns (steps, final_x, final_y)
CORRECT_MOVES and BUGGY_MOVES constants
Command-line interface for running single simulations

Run a simulation:

python random_walk.py --seed 42 --size 11

`test_random_walk.py`

Test suite with:

One test for the correct version (validates exit edge proportions)
One test for the buggy version (demonstrates catching the bug)
Examples of using fuzzy_assert_proportion() with exit locations

Run the tests with:

pytest test_random_walk.py

Adapting This for Your Own Work

Ready to use fuzzy checking in your own spatial simulations? Here’s how:

Step 1: Install the Package

pip install vivarium_testing_utils pytest

Step 2: Identify Statistical Properties

What should your simulation do in aggregate?

Agent-based models: “30% of agents should be in state A”
Monte Carlo: “Average outcome should be between X and Y”
Spatial sims: “Density should be symmetric when flipped horizontally or vertically”

Step 3: Write Fuzzy Tests

import pytest
from vivarium_testing_utils import FuzzyChecker

@pytest.fixture(scope="session")
def fuzzy_checker():
    checker = FuzzyChecker()
    yield checker
    checker.save_diagnostic_output("./diagnostics")  # this pattern saves a csv for further inspection

def test_my_property(fuzzy_checker):
    # Run your simulation many times
    successes = 0
    total = 0

    for seed in range(1000):
        result = my_simulation(seed)
        if condition_met(result):
            successes += 1
        total += 1

    # Validate with Bayesian inference
    fuzzy_checker.fuzzy_assert_proportion(
        observed_numerator=successes,
        observed_denominator=total,
        target_proportion=0.30,  # Your expected proportion
        name="my_property_validation"
    )

Step 4: Tune Sample Sizes

Small samples (n < 100): Tests might be inconclusive
Medium samples (n = 100-1000): Good for most properties
Large samples (n > 1000): High power to detect subtle bugs

If you get “inconclusive” warnings, increase your number of simulation runs.

A Challenge: Can You Find the Bug With Even Simpler Observations?

The tests above observe exit locations. But can you detect the bug using only the grid visit counts?

This is perhaps Greg Wilson’s original challenge: find a statistical property of the grid itself that differs between correct and buggy versions.

Some ideas to explore:

Does the distribution of visits differ between quadrants?
Are edge cells visited at different rates?
Does the center-to-edge gradient change?
What about the variance in visit counts?
Can you detect the bias without even tracking final positions?

Try implementing a test that catches the bug using only the grid object returned after the walk.

Additional Challenges: Deepen Your Understanding

Ready to experiment? Try these exercises to build intuition about fuzzy checking:

1. Sample Size Exploration

Reduce num_runs from 1000 to 100 in the directional balance test.

What happens to the Bayes factors?
Do tests become inconclusive?
How many runs do you need for decisive evidence?

2. Create a Subtle Bug

Modify the moves list to this alternative buggy version: [[-1, 0], [1, 0], [1, 0], [0, -1], [0, 1]] (two right moves instead of two up moves; this erroneous addition to the list means that the random walk has some chance to exit from each side).

Does fuzzy checking catch this subtler bias?
How does the Bayes factor compare to the up/down bug?
What does this reveal about detection power?

3. Validate New Properties

Write a new test that validates:

The center cell is visited most often
The walk forms a roughly circular distribution
The total path length scales as (grid size)²

Hint: For the center cell test, compare grid[center, center] to the average of edge cells using fuzzy_assert_proportion().

Conclusion

Testing stochastic simulations doesn’t have to rely on arbitrary thresholds or manual inspection. Bayesian fuzzy checking provides a rigorous, principled approach:

✅ Quantifies evidence with Bayes factors ✅ Expresses uncertainty naturally ✅ Catches subtle bugs that traditional tests may miss

The vivarium_testing_utils package makes this approach accessible with a simple, clean API. Whether you’re testing random walks, agent-based models, or Monte Carlo simulations, fuzzy checking helps you validate statistical properties with confidence.

What About More Complex Simulations?

This tutorial used a simple random walk where tracking direction counts was straightforward. But what about more complex spatial processes?

Greg Wilson’s blog post includes another example: invasion percolation, where a “filled” region grows by randomly selecting neighboring cells to fill next. The grid patterns are much more complex than a random walk.

How would you test that? What statistical properties would you validate? How would you instrument the code to observe the right quantities?

This tutorial was created to demonstrate practical statistical validation for spatial simulations. The fuzzy checking methodology was developed at IHME for validation and verification of complex health simulations.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Category Archives: simulation

The Douglass-Truth Library Has How Unlikely a Name? A Little Simulation in Python

Testing Stochastic Simulations: Bayesian Fuzzy Checking in Python

TL;DR

The Problem: How Do You Test Randomized Algorithms?

An Answer: Bayesian Hypothesis Testing

The Bug We’ll Catch

The Fuzzy Checking Pattern

1. Write Your Simulation (in `random_walk.py`)

2. Test by Calling Your Implementation

3. Assert with Bayes Factors

The Core Method: `fuzzy_assert_proportion()`

How It Works

Key Files in This Tutorial

`random_walk.py`

`test_random_walk.py`

Adapting This for Your Own Work

Step 1: Install the Package

Step 2: Identify Statistical Properties

Step 3: Write Fuzzy Tests

Step 4: Tune Sample Sizes

A Challenge: Can You Find the Bug With Even Simpler Observations?

Additional Challenges: Deepen Your Understanding

1. Sample Size Exploration

2. Create a Subtle Bug

3. Validate New Properties

Further Reading

Conclusion

What About More Complex Simulations?

Introducing Pseudopeople: simulated person data in python

Introducing Vivarium (again)

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta

Category Archives: simulation

The Douglass-Truth Library Has How Unlikely a Name? A Little Simulation in Python

Testing Stochastic Simulations: Bayesian Fuzzy Checking in Python

TL;DR

The Problem: How Do You Test Randomized Algorithms?

An Answer: Bayesian Hypothesis Testing

The Bug We’ll Catch

The Fuzzy Checking Pattern

1. Write Your Simulation (in random_walk.py)

2. Test by Calling Your Implementation

3. Assert with Bayes Factors

The Core Method: fuzzy_assert_proportion()

How It Works

Key Files in This Tutorial

random_walk.py

test_random_walk.py

Adapting This for Your Own Work

Step 1: Install the Package

Step 2: Identify Statistical Properties

Step 3: Write Fuzzy Tests

Step 4: Tune Sample Sizes

A Challenge: Can You Find the Bug With Even Simpler Observations?

Additional Challenges: Deepen Your Understanding

1. Sample Size Exploration

2. Create a Subtle Bug

3. Validate New Properties

Further Reading

Conclusion

What About More Complex Simulations?

Introducing Pseudopeople: simulated person data in python

Introducing Vivarium (again)

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta

1. Write Your Simulation (in `random_walk.py`)

The Core Method: `fuzzy_assert_proportion()`

`random_walk.py`

`test_random_walk.py`