Tag Archives: python

The Douglass-Truth Library Has How Unlikely a Name? A Little Simulation in Python

There are lovely libraries near my house.  Three, all in different directions.  I have spent hours in each, especially before my kids were in school.  This is a story is about the one called the Douglass-Truth Library, the surprising way it got that name, and a computer simulation inspired by that surprise.  Maybe it will surprise you, too.

Image

The Douglass-Truth Library.  Its name stands out from its peers. They have names like “Beacon Hill Library” and “Central Library”. Most Seattle libraries are named to match their neighborhoods.

Not the Douglass-Truth Library.  We just celebrated the 50th anniversary of its renaming. It was renamed in 1975 to honor the abolitionists Fredrick Douglass and Sojourner Truth.  And how were these particular leaders selected for this honor?  A community vote!

Image

Back in the early 1970s, community leaders in a neighborhood that had become predominantly African American thought their library deserved a name that reflected the community. They organized a ballot listing ten distinguished Black Americans and invited the neighborhood to vote.

The ballot clearly says “vote for one”.

How did the library get named after two?

A tie vote! Out of a ten-person race.

That is so unlikely. Is it so unlikely? How unlikely?

When I slow down to think about it, I think it must depend on how many votes there were.  One hundred voters tying does not seem likely, but it seems more likely than ten thousand voters ending in a tie.  How many ballots were cast?  I could not find any documentation of this.  But when I was at the renaming anniversary party last December, someone remembered: around 2,000 people voted.

So now can we figure out how unlikely?

Almost.  I used to love simplifying probability equations and making approximations for these sorts of formulas, and if I was writing this blog 25 years ago, it would be all about Stirling’s approximation and how with pencil and paper we could puzzle this out.  Stirling showed that tie probability scales like 1N\sqrt{\frac1N}, and that means 2,000 voters doesn’t give you 1-in-2,000 odds—it’s more like 1/45. Ties are rarer with more voters, but not vanishingly rare.

Lately, though, I’ve become more of a computer-simulation type.  Pencil and paper is fine, but random number sampling is even more fun.

(Simulation code on GitHub | Direct link to run that code in Colab)

From here, I’ll give just a sketch, so you can have fun the way you like to have fun.

Assumptions: maybe the simplest way to start is if everyone is equally liked and voters are simply choosing randomly (no strategic behaviors).  But would it be more likely to have a first-place tie if two front runners were each getting nearly 50% of the vote?

Simulation: With simplifying assumptions, I can simulate one person’s vote with np.random.choice(candidate_list, vote_probabilities), but it is faster to simulate everyone’s vote simultaneously.  Then I need to tally the votes and see if there is a tie for first place.

def simulate_election(n_votes, candidate_probs):
    votes = np.random.choice(len(candidate_probs), size=n_votes, p=candidate_probs)
    tallies = pd.Series(votes).value_counts()
    return tallies.iloc[0] == tallies.iloc[1]  # True if tie for first

Results: If all ten candidates are equally liked, there’s roughly a 1 in 20 chance of a first-place tie (with 2,000 voters). The chances drops as turnout grows—but stays above 3% even with 5,000 voters.

If there are two front runners it is more like one in 100, and if one candidate has a comfortable majority, then a tie is extremely rare. 

Image

Conclusion: The math is fun. The computer simulation is fun. But the real breakthrough happened 50 years ago when the votes came in tied and some genius invented a new option: name the library after both heroes.

Image

Leave a comment

Filed under simulation

AI in Epi 554 (part 2)

Following up on the general guidance I offered Epi 554 last week, this week I tried to get specific about how to use AI assistance in debugging. I think there is room for improvement, but I’m going to get it out to you, and maybe you’ll tell me how to improve.

Debugging 1: When the code fails

Maybe I’ve told you that AI is BS.  But that doesn’t make it useless.

Useful for debugging: use it so that you don’t stay stuck for long
example: error in code from Lab 2, shown below:

Image

(if you know how to fix this… don’t worry you’ll have an error msg that is less obvi soon; and if you are above-average in debugging… this approach might make you worse!)

Teach an advanced AI technique called “Prompt Engineering“, example: paste error, type “why?” — aside: be polite in your prompts, for a better world and for better answers.  Let’s not go through the details of the answer in detail — I want to focus on how and when to ask

  • You can be more verbose, e.g. you can explain what you were trying to do, paste your error, and ask why you got this error and if it has ideas on how to fix the error that you got.
  • You can also use the first precept of prompt engineering: tell AI who you want it to be.  e.g. “You are a friendly and expert teaching assistant.” or “You are a busy and distracted professor.” (?) 
  • Customize as preferred, e.g. if appropriate, you can start with “you answer in English, but you know that I speak Spanish as a native and English is not my first language.”

You try: here is an error to work with [[I didn’t actually come up with this]], and an answer that still doesn’t fix it.  What might you ask next?

Lauren Wilner (Epi 560 TA) says: For debugging, I have found that ChatGPT is mediocre. I give it the code I ran and the error I got, generally, but I find that often it gives me either (1) code that has the same error again or (2) new code that has a different error.

Summary: AI is BS, not useless; Useful for debugging; don’t stay stuck for long, prompt with code example, and polite request for help. Keep convo going if necess.  It is just imitating the way words often hang together in online text, like stack overflow and cross validated, but if it gets your code to run… then you have running code!

Comments Off on AI in Epi 554 (part 2)

Filed under education

AI Assistance for Pseudopeople: GPTs for configuration dicts

Over the last year, I’ve been hard at work making simulated data. I love making simulated data, and I finally put a minimal blog about it up (https://healthyalgorithms.com/2023/11/19/introducing-pseudopeople-simulated-person-data-in-python/)

I have a persistent challenge when I use pseudopeople in my work: configuring the noise requires a deeply nested python dictionary, and I can never remember what goes in it.

After reading a recent dispatch from Simon Willison, I thought that maybe the new “GPTs” affordances from OpenAI could help me deal with this. I’m very optimistic about the potential of AI assistance for data science work.

And with just a short time of messing around, I have something I’m pretty happy with:
https://chat.openai.com/g/g-7e9Dfx1fv-pseudopeople-config-wizard

Image

If you try it out and want to confirm that your custom config works, here is a Google Colab that you can use to test it out: https://colab.research.google.com/drive/1UG38OZigDwBy4zNJHo5fZ752LdalQ7Bw?usp=sharing

Comments Off on AI Assistance for Pseudopeople: GPTs for configuration dicts

Filed under census, software engineering

Introducing Pseudopeople: simulated person data in python

Image

I’m still settling back into blogging as a custom, so perhaps that is why it has taken me six months to think of announcing our new python package here! Without further ado, let me introduce you to pseudopeople.

It is a Python package that generates realistic simulated data about a fictional United States population, designed for use in testing entity resolution methods or other data science algorithms at scale.

To see it for yourself, here is a three-line quickstart, suitable for using in a Google Colab or a Jupyter Notebook:

!pip install pseudopeople

import pseudopeople as psp
psp.generate_decennial_census()

Enjoy!

3 Comments

Filed under census, simulation

One more IDV in Python approach

https://plot.ly/dash/
https://community.plot.ly/c/dash
https://github.com/plotly/dash
https://plot.ly/dash/getting-started
View at Medium.com

Comments Off on One more IDV in Python approach

Filed under Uncategorized

Its 2018, how to IDV in Python?

I’ve got a fun little viz that I need to demo for Important People (IP) in early March [editor’s note: still not done… that deadline was highly optimistic!]. How to do it?

In Python? Sure. In a Jupyter notebook? Maybe. With Matplotlib? Probably not… at least I better have a look at the state of the alternatives.

Did I mention that it is essential for this viz to be *interactive*? It needs to allow the Important People to explore the predictions of some ML model, or at least allow me to explore them while they call out how to explore.

Years ago, I attempted to designate a particular plot the “hello, world” of data viz. Remember that? I think we should extend it to a hello world of interactive data viz. Maybe just choosing the number of digits is enough. Or should it follow the visual information seeking mantra? But “hello, world” cannot be too complicated.

yhat?

Altair
https://altair-viz.github.io
https://github.com/altair-viz/altair_widgets/blob/master/examples/Iris.ipynb
http://pbpython.com/altair-intro.html

Bokeh
https://bokeh.pydata.org/en/latest/docs/gallery.html#gallery

Interactive Data Visualization using Bokeh (in Python)

Click to access Python_Bokeh_Cheat_Sheet.pdf

https://www.datacamp.com/courses/interactive-data-visualization-with-bokeh/
https://www.datacamp.com/community/blog/bokeh-cheat-sheet-python
https://demo.bokehplots.com/apps/movies

A Dramatic Tour through Python’s Data Visualization Landscape (including ggplot and Altair)

Comments Off on Its 2018, how to IDV in Python?

Filed under Uncategorized

Love to Software Carpentry

I have been a fan of this educational offering for a while now, and I have been mentioning that for a while now, too. But I am moved to say it again, because I’m planning a four-session Intro to Python training for aspiring Health Metrics Scientists, and the SWC curriculum is making that so easy.  It could have been so hard. ❤ u SWC.

Comments Off on Love to Software Carpentry

Filed under Uncategorized

NLP in Python: n-gram language model for Verbal Autopsy responses

This turned out to be a bit of a downer, but it was a good learning exercise, and the general approach will be useful for generating test data on a different project.  See notebook here.

Comments Off on NLP in Python: n-gram language model for Verbal Autopsy responses

Filed under Uncategorized

Introducing Vivarium (again)

Just before that year of not writing anything here, I mentioned that I have a new microsimulation platform, and it is called Vivarium.  That is still true, and now it even has some documentation: https://vivarium.readthedocs.io/en/latest/ 

It has been the thing that kept me too busy to blog for the last year.  But it did generate some aesthetically pleasing figures for test purposes, as well as some population health results of interest.  More details to come.

Comments Off on Introducing Vivarium (again)

Filed under simulation, Uncategorized

Righter signatures in Jupyter

Did you know you can change the signature of functions dynamically in Python 3? It is a bit nasty, and maybe will make things look nicer for vivarium users.

Attempt: https://github.com/ihmeuw/vivarium/pull/2
Docs: https://docs.python.org/3/library/inspect.html#inspect.Signature
SO question that got me started: https://stackoverflow.com/a/33112180/1935494

Comments Off on Righter signatures in Jupyter

Filed under Uncategorized