Mark E. Whiting

Quantifying common sense

2024-01-16T00:00:00+00:00

Together with Duncan Watts and others, I have developed an empirical method for quantifying common sense at the level of an individual or a collective.

Our paper has been published at PNAS, and the significance statement from the article is as follows:

Common sense, while often portrayed as universal, is paradoxically also often claimed not to exist. Here, we resolve this puzzling situation by introducing a formal methodology to empirically quantify common sense both at individual and collective levels. We then demonstrate the method with a dataset involving human raters evaluating claims. We show that common sense varies considerably across types of claims but aligns most closely with plainly worded, factual claims about physical reality; in contrast, does not vary much across different types of people. We also find limited presence of collective common sense, undermining universalist claims and supporting skeptics. Finally, we argue that quantifying common sense is useful both for applications in social science and AI.

The work has received attention at a few sources, which Altmetric summarize nicely:

Integrative experiments

2022-12-21T00:00:00+00:00

Together with Abdullah Almaatouq, Duncan Watts and others, we have been developing and using a different apraoch to running experiments. In Beyond Playing 20 Questions with Nature: Integrative Experiment Design in the Social and Behavioral Sciences we write:

The dominant paradigm of experiments in the social and behavioral sciences views an experiment as a test of a theory, where the theory is assumed to generalize beyond the experiment’s specific conditions. According to this view, which Alan Newell once characterized as “playing twenty questions with nature,” theory is advanced one experiment at a time, and the integration of disparate findings is assumed to happen via the scientific publishing process. In this article, we argue that the process of integration is at best inefficient, and at worst it does not, in fact, occur. We further show that the challenge of integration cannot be adequately addressed by recently proposed reforms that focus on the reliability and replicability of individual findings, nor simply by conducting more or larger experiments. Rather, the problem arises from the imprecise nature of social and behavioral theories and, consequently, a lack of commensurability across experiments conducted under different conditions. Therefore, researchers must fundamentally rethink how they design experiments and how the experiments relate to theory. We specifically describe an alternative framework, integrative experiment design, which intrinsically promotes commensurability and continuous integration of knowledge. In this paradigm, researchers explicitly map the design space of possible experiments associated with a given research question, embracing many potentially relevant theories rather than focusing on just one. The researchers then iteratively generate theories and test them with experiments explicitly sampled from the design space, allowing results to be integrated across experiments. Given recent methodological and technological developments, we conclude that this approach is feasible and would generate more-reliable, more-cumulative empirical and theoretical knowledge than the current paradigm—and with far greater efficiency.

COVID dashboards

2021-08-15T00:00:00+00:00

Together with Duncan Watts and others on the team at the CSSLab at Penn, and in conjunction with the city of Philadelphia, we built a collection of interactive data dashboards that visually summarize human mobility patterns over time and space for a number of cities, starting with Philadelphia, along with highlighting potentially relevant demographic correlates. The dashboards are available at covid.seas.upenn.edu.

The dashboards are built in Observable. The data are a proprietary combination of cell phone GPS data, demographic data derived from the American Community Survey, and COVID-19 caseload data from the New York Times.

Team fracture

2019-11-12T00:00:00+00:00

While at Stanford, Michael Bernstein and I, with the help of many great undergrads, designed an experiment to study the consistency of team fracture, a notion we defined as a loss of team viability so severe that the team no longer wants to work together CSCW’19.

We asked, was a problematic team always doomed to frustration, or could it have ended another way?

To study this, we introduce an online experiment that reconvenes the same team repeatedly, without members realizing that they have worked together before, enabling us to temporarily erase previous team dynamics. We found that reconvened teams are strikingly polarized by task in the consistency of their fracture outcomes — on a creative task, teams might as well have been a completely different set of people: the same teams changed their fracture outcomes at a random chance rate, on the other hand on a cognitive conflict and on an intellective task, the team instead replayed the same dynamics without realizing it, rarely changing their fracture outcomes. Our results indicate that, for some tasks, team fracture can be strongly influenced by interactions in the first moments of a team’s collaboration, and that interventions targeting these initial moments may be critical to scaffolding long-lasting teams.

In follow up work, we studied the predictability of team fracture CSCW’21, using a range of human rated and automatically generated features. We found that automatically generated features alone served as a strong predictor of team fracture. We made the underlying system available at viability.stanford.edu.

Fair work

2019-10-30T00:00:00+00:00

While at Stanford, Grant Hugh, Michael Bernstein and I, released a mechanism to make it possible for Amazon Mechanical Turk requesters to pay workers a more fair wage by adding only one line of code to their tasks HCOMP’19. Our paper was awarded an Honorable Mention!

You can use the service by visiting fairwork.stanford.edu, where you will get a line of code to include in your tasks on MTurk.

Below I’ve included the paper abstract:

Fair Work: Crowd Work Minimum Wage with One Line of Code
Accurate task pricing in microtask marketplaces requires substantial effort via trial and error, contributing to a pattern of worker underpayment. In response, we introduce Fair Work, enabling requesters to automatically pay their workers minimum wage by adding a one-line script tag to their task HTML on Amazon Mechanical Turk. Fair Work automatically surveys workers to find out how long the task takes, then aggregates those self-reports and auto-bonuses workers up to a minimum wage if needed. Evaluations demonstrate that the system estimates payments more accurately than requesters and that worker time surveys are close to behaviorally observed time measurements. With this work, we aim to lower the threshold for pro-social work practices in microtask marketplaces.

Crowdsourcing

2017-06-01T00:00:00+00:00

Crowdsourcing tools today offer ad hoc and remote work arrangements at an unprecedented scale, yet many of the perks of traditional jobs have not yet been realized for this community. The Stanford Crowd Research Collective developed Daemo CSCW’17, a crowdsourcing platform aiming to mitigate many of these challenges. For example we introduced crowd guilds CSCW’17, to improve reputation and feedback within crowdsourcing, inspired by traditional worker guilds, Daemo Constitution CI’17, to give workers and requesters agency in platform governance, boomerang UIST’16, to incentivize honest feedback, and prototype tasks HCOMP’17, to fix tasks before they launch.

Automated design

2017-01-01T00:00:00+00:00

To automate design, the nuances of design decisions need to be represented computationally. This is challenging because even humans often don’t agree on the decisions going into a design. In my PhD, I introduced an automatic, grammar-based system for structuring design information DCC’16, and showed how it could be applied to design problems AI EDAM.

Incentive design

2016-12-01T00:00:00+00:00

Using incentives to motivate human behavior is very effective, but poorly designed incentives often lead to worse behavior. For example, the common technique of requiring a certain number of characters of feedback in an online feedback form is more likely to lead to feedback padded with spaces than higher quality responses. With Dilrukshi Gamage, Thejan Rajapakshe, Haritha Thilakarathne, Indika Perera, and Shantha Fernando, we explored how aligning the incentives for peer feedback in online learning helps improve the perceived quality and length of the feedback L@S’17.

Hybrid systems

2016-01-01T00:00:00+00:00

Humans are pretty good at chess, computers are better, humans and computers together have been even better. However, we are still not very good at getting the best out of this kind of working relationship. Mentoring undergraduate students at CMU and in the Pitt i3, we built systems to encourage humans and computers to work together. For example, we explored how socio-technical systems can provide novel insight into new product development opportunities, by performing computational analysis on Amazon product reviews iCONF’16, and we studied how hybrid tools can help researchers get started with crowdsourcing faster iCONF’18.