Influence of Presidents on Baby Names

Of Presidents and Babies

In 2014, Ben Blatt used Social Security Administration (SSA) data to suggest anecdotally that Americans have a habit of naming their newborns after sitting presidents. Blatt found that the number of babies (per 100,000 births) given the same name as a president increase when this politician runs for the presidency and during his tenure in the Oval Office.

For example, the relative popularity of the name “Grover” increased during the presidential election of 1984, which Grover Cleveland won, and remained higher than expected during the Cleveland administration. This pattern repeated itself when Grover Cleveland was elected president for a second, non-consecutive term in 1892, but to a lesser degree.

Image

Quasi-Experiment

I sought to replicate the results from Blatt’s analysis with data from the same source using a quasi-experiment (the code for which can be found here). Specifically, I tested the hypothesis that the relative popularity of a president’s name is larger:

  1. During the president’s tenure than during the period preceding his presidency, and
  2. Than the popularity of the names of other presidents.

This quasi-experimental reformulation of Blatt’s hypothesis is a more stringent test of the idea that sitting presidents influence the naming decisions of the parents of newborns because it could suggest that the influence of presidents on baby names is specific to (1) the tenure of a sitting president and (2) their particular name (as opposed to similar presidential first names).

Data

I used SSA data of names from Social Security card applications for births that occurred in the United States between 1910 and 2014. (Note that these data were only available by year, whereas Blatt’s data count baby names by day).

The analyzed data were limited to presidents who took office after the earliest year in the newborn data; that is, all presidents since, and including, Woodrow Wilson. For all presidents, first names were used instead of nicknames (e.g., James rather than Jimmy).

Method

For each president, I counted two values:

  1. The number of babies given his name during his administration [1] (e.g., the number of babies given the name William during Bill Clinton’s administration; prez_during), and
  2. The number of babies given the name of all non-sitting presidents during the same period of time (others_during).

The ratio of the first two numbers (prez_during / others_during) estimates the popularity of the sitting president’s name against the popularity of the names of all of the other presidents during the sitting president’s administration.

For each president, these two counts were repeated for a time period of the same length before his administration (e.g., the number of babies given the name William during the 8 years that preceded Bill Clinton’s 2-term administration; prez_pre and others_pre).

The ratio of these two numbers captures the same popularity difference as before, but during the period before this name becomes the sitting president’s name (prez_pre / others_pre).

If Americans have a bias to name their babies after sitting presidents, then the difference between these two ratios (prez_during / others_duringprez_pre / others_pre) should, on average, be positive at a rate higher than chance. In other words, whatever the popularity advantage of a president’s name to that of other presidential names before that president takes office, that popularity advantage should increase during the president’s tenure.

Results

Across the 17 presidents in the data, the mean difference of these ratios was 0.008 (SD = 0.02), an estimate not statistically reliable with a two-tailed t-test (t = 1.59, p = 0.13). Nonetheless, the ratio difference was in the expected direction for 11 of the 17 presidents (65% of the time). The figure below shows the mean ratio difference with the standard error of the mean as its error bar.

ratio_difference

Conclusion

These results suggest that a president’s first name is as popular with newborns before as during his administration. However, the small sample size available and the poor temporal resolution of the data used here (i.e., newborn data are not available at a daily or monthly level),  may just have made it difficult to detect this effect, if it truly does exist.

[1] Given the limited temporal specificity of the newborn data, I define an administration as the period from the year the president takes office to the year the president leaves office, inclusive.

Headline Readability Varies By News Outlet

United_states_supreme_court_buildingOf the following two sentences, which do you think is more readable; that is, which one is easier to read and understand?

  1. Supreme Court strikes down overall political donation cap
  2. Supreme Court allows more private money in election campaigns

I don’t know about you, but I had to look up the meaning of “overall political donation cap.” In contrast, I could easily infer what “private money in election campaigns” means.

These sentences are news headlines (from the New York Times and CNN, respectively) about the Supreme Court’s decision on  McCutcheon v. FEC. In reading the news coverage of this decision, I was struck by these two headlines: they communicate the same data, but they vary so starkly in readability!

These two headlines raise the question: do different news outlets write headlines that vary systematically in their readability? For example, are headlines from the New York Times less readable, on average, than CNN’s? Or is the readability difference we see between these two McCutcheon headlines a product of chance?

To try to answer this question, we need data in the form of news headlines.

COLLECTING HEADLINES FROM GOOGLE NEWS

logo11w

Thankfully, Google News offers plenty of such data. I wrote a Python script to  scrape headlines and the names of their outlets from this website about 20 times a day for 2 weeks (April 14 to May 2, 2014).

After cleaning the results (text data are messy!), data collection yielded 9,289 unique headlines from 928 different outlets. However, we want to focus on outlets that provide a decent number of headlines. We can limit our analysis to 4,476 unique headlines from the 20 different outlets that provided at least 100 headlines.

(The data as well as the code used to collect, analyze, and visualize them can be found in the GitHub repository google-news.)

MEASURING READABILITY

How do you measure the readability of a piece of text? One famous metric is called the Flesch-Kincaid Grade Level. It uses word and sentence length to estimate the years of schooling that a reader requires to read and understand a piece of text: the longer the word or the sentence, the more schooling that is needed to read it.

For example, the New York Times and CNN headlines above have grade levels of 11.1 and 8.9, respectively. These scores suggest that 11th graders should be able to read both headlines easily whereas 9th graders may struggle reading the New York Times headline.

DISTRIBUTION OF HEADLINE READABILITY

Analysis of our Google News data revealed that news headlines are generally easy to read; with a mean grade level of 7.7 (SD = 4.3), 8th graders can read most of them. The graph below (interactive version here) shows how headline readability is distributed around this average grade level.

google_news_2

Headlines binned by Flesch-Kincaid Grade Level.

Elementary school graduates (grade levels less than 6) can read about a third of headlines (35.9%). Middle school graduates (grade levels between 6 and 9) can handle almost two thirds of them (65.2%). Finally, high school graduates should read and understand 9 out of 10 headlines on Google News.

AVERAGE HEADLINE READABILITY BY NEWS OUTLET

To answer the question that motivated this project, we can compare headline readability across the 20 different outlets. The graph below (interactive version here) shows that different news outlets write headlines that vary quite systematically in their readability.

google_news

Mean Flesch-Kincaid Grade Level by news outlet. Error bars denote +/- 1 standard error.

Voice of America wrote the least readable headlines, requiring almost a 10th grade education to read them. It was followed somewhat distantly by Fox News and BBC News, with grade levels around 8.5. The Fox News result is surprising given that its audience tends to be less educated than those of other outlets (Pew Research, 2012).

The average grade levels of the next 14 outlets ranged from 8.2 (Los Angeles Times) to 7.2 (Businessweek). That is, a single year of education encompassed the differences in average headline readability between 70% of the outlets in the sample. But most of these differences are too small and variable to claim that they are statistically important.

On the lower end of the spectrum, ESPN had the most readable headlines, which required 5 and a half years of education. It was followed by two outlets with headlines of 6th grade readability: USA Today (6.3) and ABC News (6.8).

CONCLUSION

News headlines are relatively easy to read: a high school graduate can handle 9 in 10. However, headline readability varies strikingly between outlets. The least and most readable outlets differ in average grade level by almost 5 years. Fifth graders and above can read ESPN headlines, but it takes 10th graders to read Voice of America’s.

We began comparing two headlines: one from the New York Times (11.1 grade level) and another from CNN (8.9 grade level). If our data collection had ended there, then we would have incorrectly concluded that New York Times headlines are harder to read than CNN’s.

As our systematic data collection from Google News showed, these two headlines do not reflect the overall trend in readability between their outlets (on average, CNN headlines are harder to read than the NYT’s).

The discrepancy between these two conclusions highlights how anecdotal evidence, if untested with the systematic collection of data, can skew our understanding of how the world really works.