Note: I have not published blog posts about my academic papers over the past few years. To ensure that my blog contains a more comprehensive record of my published papers and to surface them for folks who missed them, I will periodically (re) publish blog posts about some “older” published projects. This post draws material from a previously published post by Kaylea Champion on the Community Data Science Blog.
Taboo subjects—such as sexuality and mental health—are as important to discuss as they are difficult to raise in conversation. Although many people turn to online resources for information on taboo subjects, censorship and low-quality information are common in search results. In two papers I recently published at CSCW—both led by Kaylea Champion—we presented a series of analyses showing how taboo shapes the process of collaborative knowledge building on English Wikipedia.
The first study is a quantitative analysis showing that articles on taboo subjects are much more popular and are the subject of more vandalism than articles on non-taboo topics. In surprising news, we also found that they were edited more often and were of higher quality!
The first challenge we faced in conducting this work was identifying taboo articles. Kaylea had a brilliant idea for a new computational approach to doing so without relying on our individual intuitions about what qualifies as taboo (something we understood would be highly specific to our own culture, class, etc). Her approach was to make use of an insight from linguistics: people develop euphemisms as ways to talk about taboos (i.e., think about all the euphemisms we’ve devised for death, or sex, or menstruation, or mental health).
We used this insight to build a new machine-learning classifier based on English Wiktionary definitions. If a ‘sense’ of a word was tagged as euphemistic, we treated the words in the definition as indicators of taboo. The end result was a series of words and phrases that most powerfully differentiate taboo from non-taboo. We then did a simple match between those words and phrases and the titles of Wikipedia articles. The topics were taboo enough that we were a little uncomfortable discussing them in our meetings! We built a comparison sample of articles whose titles are words that, like our taboo articles, appear in Wiktionary definitions.
In the first paper, we used this new dataset to test a series of hypotheses about how taboo shapes collaborative production in Wikipedia. Our initial hypotheses were based on the idea that taboo information is often in high demand but that Wikipedians might be reluctant to associate their names (or usernames) with taboo topics. The result, we argued, would be articles that were in high demand but of low quality.
We found that taboo articles are thriving on Wikipedia! In summary, we found that in comparison to non-taboo articles:
- Taboo articles are more popular (as expected).
- Taboo articles receive more contributions (contrary to expectations).
- Taboo articles receive more low-quality contributions (as expected).
- Taboo articles are higher quality (contrary to expectations).
- Taboo article contributors are more likely to contribute without an account (as expected), and have less experience (as expected), but that accountholders are more likely to make themselves more identifiable by having a user page, disclosing their gender, and making themselves emailable (all three of these are contrary to expectation!).

Kaylea attempted to understand these somewhat confusing results by designing a fantastic mixed-methods analysis that sought to unpack some of the nuance missing in the quantitative analysis by delving deep into the “life histories” of four articles on English Wikipedia: two on taboo topics related to women’s anatomy (Clitoris and Menstration) and two nontaboo articles chosen for comparison (Cell membrance and Philip Pullman).
Although the findings from the analysis can be difficult to summarize succinctly (as with many qualitative studies), we showed how the taboo example articles’ success was hard-won amid real challenges and attacks. The paper describes how challenges were overcome through resilient leadership, often provided by a single dedicated individual. The paper provides a template for how taboo can be—and frequently is—overcome by dedicated Wikipedians in ways that provide useful knowledge resources in real demand.
For more details, visualizations, statistics, and more, we hope you’ll take a look at our papers, both linked below.
The full citation for the papers are: (1) Champion, Kaylea, and Benjamin Mako Hill. 2023. “Taboo and Collaborative Knowledge Production: Evidence from Wikipedia.” Proceedings of the ACM on Human-Computer Interaction 7 (CSCW2): 299:1-299:25. https://doi.org/10.1145/3610090. (2) Champion, Kaylea, and Benjamin Mako Hill. 2024. “Life Histories of Taboo Knowledge Artifacts.” Proceedings of the ACM: Human-Computer Interaction 8 (CSCW2): 505:1-505:32. https://doi.org/10.1145/3687044.
We have also released replication materials for the paper, including all the data and code used to conduct the analyses.
This blog post and the paper it describes are collaborative work by Kaylea Champion and Benjamin Mako Hill.






























