<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="3.9.3">Jekyll</generator><link href="https://whiting.me/feed.xml" rel="self" type="application/atom+xml" /><link href="https://whiting.me/" rel="alternate" type="text/html" hreflang="en" /><updated>2026-06-03T14:01:03+00:00</updated><id>https://whiting.me/feed.xml</id><title type="html">Mark E. Whiting</title><subtitle>Mark is Chief Scientist at Pareto and research fellow at University of Pennsylvania</subtitle><entry><title type="html">Common sense, machines, and what they don’t know</title><link href="https://whiting.me/common-sense-llms/" rel="alternate" type="text/html" title="Common sense, machines, and what they don’t know" /><published>2025-10-15T00:00:00+00:00</published><updated>2025-10-15T00:00:00+00:00</updated><id>https://whiting.me/common-sense-llms</id><content type="html" xml:base="https://whiting.me/common-sense-llms/">&lt;p&gt;The &lt;a href=&quot;/2024/01/16/Quantifying-common-sense.html&quot;&gt;framework we introduced&lt;/a&gt; for quantifying common sense was built around people — individuals rating claims, and the structure of agreement across a population. The obvious next question is what happens when the raters aren’t human.&lt;/p&gt;

&lt;p&gt;With &lt;a href=&quot;https://tuandung.net&quot;&gt;Tuan Dung (Josh) Nguyen&lt;/a&gt; and &lt;a href=&quot;https://css.seas.upenn.edu/people/duncan-watts/&quot;&gt;Duncan Watts&lt;/a&gt;, we applied the same methodology at scale to large language models, evaluating commonsense knowledge in humans and in LLMs on the same set of claims &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1093/pnasnexus/pgag029&quot;&gt;PNAS Nexus&lt;/a&gt;. The comparison is useful both ways: it tells us what current models know in the same terms we used for people, and it exposes where benchmarks designed for humans break down when pointed at a machine.&lt;/p&gt;

&lt;p&gt;Running alongside this, a broader community effort — organized by &lt;a href=&quot;https://www.csl.sri.com/~chaudhri/&quot;&gt;Vinay Chaudhri&lt;/a&gt; with many others — has been articulating what a new knowledge resource for AI might look like, one that goes beyond existing knowledge graphs and taps into the kind of structured, commonsense, and expert knowledge that modern AI systems still struggle to use reliably &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1002/aaai.70035&quot;&gt;AI Magazine&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Taken together, these projects push at the same question from two sides: how do we measure what machines know, and how do we build the resources that would let them know more?&lt;/p&gt;</content><author><name></name></author><category term="commonsense" /><category term="LLMs" /><category term="AI" /><category term="knowledge" /><category term="Penn" /><summary type="html">The framework we introduced for quantifying common sense was built around people — individuals rating claims, and the structure of agreement across a population. The obvious next question is what happens when the raters aren’t human.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://whiting.me/common-sense-llms.png" /><media:content medium="image" url="https://whiting.me/common-sense-llms.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">AI hasn’t fixed teamwork</title><link href="https://whiting.me/ai-and-teamwork/" rel="alternate" type="text/html" title="AI hasn’t fixed teamwork" /><published>2025-09-15T00:00:00+00:00</published><updated>2025-09-15T00:00:00+00:00</updated><id>https://whiting.me/ai-and-teamwork</id><content type="html" xml:base="https://whiting.me/ai-and-teamwork/">&lt;p&gt;Every wave of productivity technology arrives with a promise that it will finally make teamwork better. Generative AI is the current one, and perhaps the last one.&lt;/p&gt;

&lt;p&gt;With &lt;a href=&quot;https://qingxiao.info&quot;&gt;Qing Xiao&lt;/a&gt;, &lt;a href=&quot;https://xehu.github.io&quot;&gt;Xinlan Emily Hu&lt;/a&gt;, &lt;a href=&quot;https://arvindkarunakaran.com&quot;&gt;Arvind Karunakaran&lt;/a&gt;, &lt;a href=&quot;https://www.cs.cmu.edu/~hongs/&quot;&gt;Hong Shen&lt;/a&gt;, and &lt;a href=&quot;https://hanchengcao.me&quot;&gt;Hancheng Cao&lt;/a&gt;, we followed a project-based software development organization longitudinally from 2023 to 2025 — through the period in which generative AI tools went from novelty to default — to see what actually changed in how teams collaborated. The &lt;a href=&quot;https://doi.org/10.48550/arXiv.2509.10956&quot;&gt;preprint is on arXiv&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The short version of what we found is in the title: AI hasn’t fixed teamwork. It has, however, shifted the collaborative culture in measurable and sometimes surprising ways — in who does what, in what counts as a contribution, and in the texture of day-to-day coordination. That’s a distinct finding from either “AI is transformative” or “AI changes nothing,” and it lines up with a broader pattern in our &lt;a href=&quot;/2025/09/01/task-space.html&quot;&gt;team research&lt;/a&gt;: the effects of a new tool depend on the task, the team, and the culture they arrive into.&lt;/p&gt;</content><author><name></name></author><category term="AI" /><category term="teams" /><category term="collaboration" /><category term="longitudinal" /><summary type="html">Every wave of productivity technology arrives with a promise that it will finally make teamwork better. Generative AI is the current one, and perhaps the last one.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://whiting.me/ai-and-teamwork.png" /><media:content medium="image" url="https://whiting.me/ai-and-teamwork.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">A task space for team research</title><link href="https://whiting.me/task-space/" rel="alternate" type="text/html" title="A task space for team research" /><published>2025-09-01T00:00:00+00:00</published><updated>2025-09-01T00:00:00+00:00</updated><id>https://whiting.me/task-space</id><content type="html" xml:base="https://whiting.me/task-space/">&lt;p&gt;Teams research is fragmented. Every discipline that studies groups — organizational behavior, social psychology, HCI, economics, operations — has its own favorite tasks: brainstorming, jury deliberation, prisoner’s dilemma, hidden profile, creative writing prompts, estimation games. Results on these tasks rarely talk to each other, and it’s genuinely hard to tell whether that’s because the findings conflict or because the tasks do. This problem extends beyond teh task to how tasks are operationalized and how experiments are parameterized and measured.&lt;/p&gt;

&lt;p&gt;With &lt;a href=&quot;https://xehu.github.io&quot;&gt;Xinlan Emily Hu&lt;/a&gt;, &lt;a href=&quot;https://linneagandhi.com&quot;&gt;Linnea Gandhi&lt;/a&gt;, &lt;a href=&quot;https://css.seas.upenn.edu/people/duncan-watts/&quot;&gt;Duncan Watts&lt;/a&gt;, and &lt;a href=&quot;http://amaatouq.io&quot;&gt;Abdullah Almaatouq&lt;/a&gt;, we introduce the &lt;strong&gt;Task Space&lt;/strong&gt;: a framework that organizes the tasks teams do along dimensions that matter for how teams actually perform them, so findings from one task can be meaningfully compared with findings from another &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1287/mnsc.2023.03544&quot;&gt;Management Science&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This sits on top of a stack of earlier work that kept pointing at the same gap. In &lt;a href=&quot;/2019/11/12/team-fracture.html&quot;&gt;Did It Have To End This Way?&lt;/a&gt; we showed that the same teams can produce very different outcomes depending on the task in front of them &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1145/3359311&quot;&gt;CSCW’19&lt;/a&gt;. In &lt;a href=&quot;https://doi.org/10.1145/3392877&quot;&gt;Parallel Worlds&lt;/a&gt; we re-convened the same teams without them knowing to see how much of their trajectory was locked in &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1145/3392877&quot;&gt;CSCW’20&lt;/a&gt;. And in &lt;a href=&quot;https://hci.stanford.edu/publications/2021/juries/HuWhitingBernstein_JuriesConsistentDecisions_CHI2021.pdf&quot;&gt;Online Juries&lt;/a&gt; we found that for some decision tasks, teams are remarkably consistent — while for others, they’re essentially random &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1145/3411764.3445433&quot;&gt;CHI’21&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The Task Space takes the next step: rather than studying individual tasks and hoping the findings generalize, it maps the space those tasks live in so that generalization can be tested directly.&lt;/p&gt;</content><author><name></name></author><category term="teams" /><category term="tasks" /><category term="integrative" /><category term="framework" /><category term="Penn" /><category term="MIT" /><summary type="html">Teams research is fragmented. Every discipline that studies groups — organizational behavior, social psychology, HCI, economics, operations — has its own favorite tasks: brainstorming, jury deliberation, prisoner’s dilemma, hidden profile, creative writing prompts, estimation games. Results on these tasks rarely talk to each other, and it’s genuinely hard to tell whether that’s because the findings conflict or because the tasks do. This problem extends beyond teh task to how tasks are operationalized and how experiments are parameterized and measured.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://whiting.me/task-space.png" /><media:content medium="image" url="https://whiting.me/task-space.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Quantifying common sense</title><link href="https://whiting.me/Quantifying-common-sense/" rel="alternate" type="text/html" title="Quantifying common sense" /><published>2024-01-16T00:00:00+00:00</published><updated>2024-01-16T00:00:00+00:00</updated><id>https://whiting.me/Quantifying%20common%20sense</id><content type="html" xml:base="https://whiting.me/Quantifying-common-sense/">&lt;p&gt;Common sense is one of those ideas that everyone appeals to and no one agrees on. It’s supposedly universal, but it’s also regularly invoked to complain that other people don’t have any. That tension — universal in principle, contested in practice — is actually measurable, if you’re careful about what you count.&lt;/p&gt;

&lt;p&gt;With &lt;a href=&quot;https://css.seas.upenn.edu/people/duncan-watts/&quot;&gt;Duncan Watts&lt;/a&gt; and collaborators, I developed a method to quantify common sense empirically, at two levels: for an individual (how aligned is this person’s take on a claim with everyone else’s?) and for a collective (how much does a group actually agree?). Running the method over a large set of human-rated claims, we found that what we think of as “commonsense” varies a lot depending on the kind of claim. The clearest agreement shows up on plainly-worded factual claims about the physical world; agreement drops off sharply for claims that are social, normative, or ambiguously worded. Interestingly, who the raters are matters much less than what kind of claim it is. And at the collective level, the universal common sense people often assume exists mostly doesn’t &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1073/pnas.2309535121&quot;&gt;PNAS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The paper’s full significance statement spells this out:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Common sense, while often portrayed as universal, is paradoxically also often claimed not to exist. Here, we resolve this puzzling situation by introducing a formal methodology to empirically quantify common sense both at individual and collective levels. We then demonstrate the method with a dataset involving human raters evaluating claims. We show that common sense varies considerably across types of claims but aligns most closely with plainly worded, factual claims about physical reality; in contrast, does not vary much across different types of people. We also find limited presence of collective common sense, undermining universalist claims and supporting skeptics. Finally, we argue that quantifying common sense is useful both for applications in social science and AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The method turns out to be useful beyond humans — we’ve since used it to &lt;a href=&quot;/2025/10/15/common-sense-llms.html&quot;&gt;evaluate common sense in large language models&lt;/a&gt;, where having a single framework that applies to both humans and machines is genuinely handy.&lt;/p&gt;

&lt;p&gt;The work has received attention at a few sources, which Altmetric summarize nicely:&lt;/p&gt;

&lt;div class=&quot;altmetric-embed altmetric-box&quot; data-badge-type=&quot;medium-donut&quot; data-badge-details=&quot;right&quot; data-altmetric-id=&quot;158425670&quot;&gt;&lt;/div&gt;</content><author><name></name></author><category term="commonsense" /><category term="experiments" /><category term="meta-scientific" /><category term="methods" /><category term="Penn" /><summary type="html">Common sense is one of those ideas that everyone appeals to and no one agrees on. It’s supposedly universal, but it’s also regularly invoked to complain that other people don’t have any. That tension — universal in principle, contested in practice — is actually measurable, if you’re careful about what you count.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://whiting.me/commonsense.png" /><media:content medium="image" url="https://whiting.me/commonsense.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">How well can social scientists predict society?</title><link href="https://whiting.me/forecasting/" rel="alternate" type="text/html" title="How well can social scientists predict society?" /><published>2023-04-15T00:00:00+00:00</published><updated>2023-04-15T00:00:00+00:00</updated><id>https://whiting.me/forecasting</id><content type="html" xml:base="https://whiting.me/forecasting/">&lt;p&gt;One way to ask whether a theory is any good is to ask what it predicts. During COVID, a large group of social scientists ran a forecasting tournament to find out — teams competed to predict near-term changes in societal outcomes like mood, polarization, political ideology, discrimination, and life satisfaction, using whatever theoretical apparatus they wanted.&lt;/p&gt;

&lt;p&gt;I contributed to the effort as part of the &lt;a href=&quot;https://predictions.uwaterloo.ca/&quot;&gt;Forecasting Collaborative&lt;/a&gt;, led by &lt;a href=&quot;https://uwaterloo.ca/psychology/people-profiles/igor-grossmann&quot;&gt;Igor Grossmann&lt;/a&gt; and colleagues. The headline result is sobering: across nearly every outcome, expert forecasts were not reliably better than simple statistical benchmarks, and often worse &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1038/s41562-022-01517-1&quot;&gt;Nat Hum Behav&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;That finding fits uncomfortably well with the argument we make in &lt;a href=&quot;/2022/12/21/Integrative-experiments.html&quot;&gt;Beyond Playing 20 Questions with Nature&lt;/a&gt; and in our work on &lt;a href=&quot;/2024/01/16/Quantifying-common-sense.html&quot;&gt;common sense&lt;/a&gt;: if a field can’t generate reliable predictions about the phenomena it studies, it’s worth asking hard questions about how the field is cumulating knowledge in the first place. Forecasting tournaments are one of the cleaner tests we have — they are hard to game, easy to score, and directly tied to the kind of claims theories are supposed to support.&lt;/p&gt;</content><author><name></name></author><category term="forecasting" /><category term="societal" /><category term="change" /><category term="meta-science" /><category term="Penn" /><summary type="html">One way to ask whether a theory is any good is to ask what it predicts. During COVID, a large group of social scientists ran a forecasting tournament to find out — teams competed to predict near-term changes in societal outcomes like mood, polarization, political ideology, discrimination, and life satisfaction, using whatever theoretical apparatus they wanted.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://whiting.me/forecasting.png" /><media:content medium="image" url="https://whiting.me/forecasting.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Integrative experiments</title><link href="https://whiting.me/Integrative-experiments/" rel="alternate" type="text/html" title="Integrative experiments" /><published>2022-12-21T00:00:00+00:00</published><updated>2022-12-21T00:00:00+00:00</updated><id>https://whiting.me/Integrative%20experiments</id><content type="html" xml:base="https://whiting.me/Integrative-experiments/">&lt;p&gt;The dominant way we run experiments in social and behavioral science — one experiment at a time, each treated as a test of a theory assumed to generalize — has a serious problem. The integration across experiments that is supposed to happen in the published record largely doesn’t, and the recent push for more reliable single findings doesn’t fix it. You can do every individual experiment perfectly well and still not end up with a cumulative theory.&lt;/p&gt;

&lt;p&gt;With &lt;a href=&quot;http://amaatouq.io&quot;&gt;Abdullah Almaatouq&lt;/a&gt;, &lt;a href=&quot;https://cocosci.princeton.edu&quot;&gt;Tom Griffiths&lt;/a&gt;, &lt;a href=&quot;https://suchow.io&quot;&gt;Jordan Suchow&lt;/a&gt;, &lt;a href=&quot;https://knowledgelab.org/team/&quot;&gt;James Evans&lt;/a&gt;, and &lt;a href=&quot;https://css.seas.upenn.edu/people/duncan-watts/&quot;&gt;Duncan Watts&lt;/a&gt;, we argue that the fix has to happen at the level of experimental design. In &lt;strong&gt;integrative experiments&lt;/strong&gt;, researchers explicitly map the space of possible experiments associated with a research question, then iteratively sample from that space. Instead of trying to defend a single experimental condition as the one that captures the phenomenon, you treat the design space itself as the object of study &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1017/S0140525X22002874&quot;&gt;BBS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The paper drew a large set of commentaries from across the field, which we responded to in a follow-up &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1017/S0140525X23002789&quot;&gt;BBS&lt;/a&gt;. The discussion — about what theories are for, how generalization should work, and whether the field needs a different unit of analysis — is, for me, the most interesting part of the project.&lt;/p&gt;

&lt;p&gt;Most of the other threads in my recent work connect here. &lt;a href=&quot;/2021/03/15/empirica.html&quot;&gt;Empirica&lt;/a&gt; is the infrastructure that makes integrative experiments actually runnable; the &lt;a href=&quot;/2024/01/16/Quantifying-common-sense.html&quot;&gt;common sense framework&lt;/a&gt; and the &lt;a href=&quot;/2025/09/01/task-space.html&quot;&gt;task space&lt;/a&gt; are both attempts to make the design space in a particular domain tractable; and the &lt;a href=&quot;/2023/04/15/forecasting.html&quot;&gt;forecasting results&lt;/a&gt; are one way of showing that the status quo is leaving a lot on the table.&lt;/p&gt;</content><author><name></name></author><category term="integrative" /><category term="experiments" /><category term="meta-scientific" /><category term="methods" /><category term="Penn" /><category term="MIT" /><summary type="html">The dominant way we run experiments in social and behavioral science — one experiment at a time, each treated as a test of a theory assumed to generalize — has a serious problem. The integration across experiments that is supposed to happen in the published record largely doesn’t, and the recent push for more reliable single findings doesn’t fix it. You can do every individual experiment perfectly well and still not end up with a cumulative theory.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://whiting.me/integrative%20experiments.png" /><media:content medium="image" url="https://whiting.me/integrative%20experiments.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Ambiguity online</title><link href="https://whiting.me/online-ambiguity/" rel="alternate" type="text/html" title="Ambiguity online" /><published>2022-05-01T00:00:00+00:00</published><updated>2022-05-01T00:00:00+00:00</updated><id>https://whiting.me/online-ambiguity</id><content type="html" xml:base="https://whiting.me/online-ambiguity/">&lt;p&gt;Face-to-face, a huge amount of what we communicate is nonverbal — a pause, a glance, a shrug. Online, most of that channel disappears, and the little that remains gets compressed into things like read receipts, likes, profile changes, and emoji. These signals are easy to produce and easy to misread.&lt;/p&gt;

&lt;p&gt;With &lt;a href=&quot;https://soyeonpark.com&quot;&gt;So Yeon Park&lt;/a&gt; and &lt;a href=&quot;https://web.stanford.edu/~mshanks/&quot;&gt;Michael Shanks&lt;/a&gt; at Stanford, as part of the HPI-Stanford Design Thinking Research Program, we studied these “nonverbal online actions” — how people use them, how others interpret them, and where the gaps between sender and receiver sit &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1007/978-3-030-76324-4_5&quot;&gt;DTR’21&lt;/a&gt;. A follow-up looked specifically at what happens when those actions create confusion, and what people do to repair it &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1007/978-3-031-09297-8_3&quot;&gt;DTR’22&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A sharper version of the same problem is synthetic media. With &lt;a href=&quot;https://dilrukshigamage.wordpress.com&quot;&gt;Dilrukshi Gamage&lt;/a&gt;, &lt;a href=&quot;https://piyushghasiya.github.io&quot;&gt;Piyush Ghasiya&lt;/a&gt;, &lt;a href=&quot;https://vamshi-b.github.io&quot;&gt;Vamshi Bonagiri&lt;/a&gt;, and &lt;a href=&quot;https://sites.google.com/view/sasahara&quot;&gt;Kazutoshi Sasahara&lt;/a&gt;, we analyzed Reddit conversations about deepfakes to see how people actually talk about them — the concerns they raise, the distinctions they draw, and the societal implications that surface in their own words &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1145/3491102.3517446&quot;&gt;CHI’22&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Across these projects, the pattern that keeps coming up is that online communication is less a thinner version of in-person communication than it is a different medium with its own ambiguities — and the interesting research question is usually how people navigate those ambiguities, not whether they exist.&lt;/p&gt;</content><author><name></name></author><category term="online" /><category term="communication" /><category term="nonverbal" /><category term="deepfakes" /><category term="Stanford" /><summary type="html">Face-to-face, a huge amount of what we communicate is nonverbal — a pause, a glance, a shrug. Online, most of that channel disappears, and the little that remains gets compressed into things like read receipts, likes, profile changes, and emoji. These signals are easy to produce and easy to misread.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://whiting.me/online-ambiguity.png" /><media:content medium="image" url="https://whiting.me/online-ambiguity.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">COVID dashboards</title><link href="https://whiting.me/COVID-dashboards/" rel="alternate" type="text/html" title="COVID dashboards" /><published>2021-08-15T00:00:00+00:00</published><updated>2021-08-15T00:00:00+00:00</updated><id>https://whiting.me/COVID-dashboards</id><content type="html" xml:base="https://whiting.me/COVID-dashboards/">&lt;p&gt;Together with &lt;a href=&quot;https://css.seas.upenn.edu/people/duncan-watts/&quot;&gt;Duncan Watts&lt;/a&gt; and others on the team at the &lt;a href=&quot;https://css.seas.upenn.edu&quot;&gt;CSSLab&lt;/a&gt; at Penn, and in conjunction with the city of Philadelphia, we built a collection of interactive data dashboards that visually summarize human mobility patterns over time and space for a number of cities, starting with Philadelphia, along with highlighting potentially relevant demographic correlates. The dashboards are available at &lt;a href=&quot;https://covid.seas.upenn.edu&quot;&gt;covid.seas.upenn.edu&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The dashboards are built in &lt;a href=&quot;https://observablehq.com/collection/@wattslab/covid-dashboards&quot;&gt;Observable&lt;/a&gt;. The data are a proprietary combination of cell phone GPS data, demographic data derived from the American Community Survey, and COVID-19 caseload data from the New York Times.&lt;/p&gt;

&lt;p&gt;A related piece of the same broader effort was the &lt;strong&gt;non-pharmaceutical interventions (NPI) dataset&lt;/strong&gt;, where a large team of student annotators tracked how local COVID-19 policy — mask mandates, business closures, gathering limits, and so on — changed in near-real-time across many jurisdictions. Policies shifted week to week, and there was no single authoritative source for what was actually in force where, which made modeling the pandemic’s trajectory much harder than it needed to be. With &lt;a href=&quot;https://biocomplexity.virginia.edu/person/benjamin-hurt&quot;&gt;Benjamin Hurt&lt;/a&gt;, &lt;a href=&quot;https://engineering.virginia.edu/faculty/madhav-marathe&quot;&gt;Madhav Marathe&lt;/a&gt;, &lt;a href=&quot;https://hci.stanford.edu/msb/&quot;&gt;Michael Bernstein&lt;/a&gt;, and many others, we released the resulting annotated dataset and the annotation workflow behind it &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1038/s41597-023-01979-6&quot;&gt;Scientific Data&lt;/a&gt;.&lt;/p&gt;</content><author><name></name></author><category term="COVID-19" /><category term="dashboard" /><category term="observable" /><category term="data" /><category term="visualization" /><category term="Penn" /><summary type="html">Together with Duncan Watts and others on the team at the CSSLab at Penn, and in conjunction with the city of Philadelphia, we built a collection of interactive data dashboards that visually summarize human mobility patterns over time and space for a number of cities, starting with Philadelphia, along with highlighting potentially relevant demographic correlates. The dashboards are available at covid.seas.upenn.edu.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://whiting.me/COVID%20dahsboard.jpg" /><media:content medium="image" url="https://whiting.me/COVID%20dahsboard.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Online learning at scale</title><link href="https://whiting.me/online-learning/" rel="alternate" type="text/html" title="Online learning at scale" /><published>2021-06-15T00:00:00+00:00</published><updated>2021-06-15T00:00:00+00:00</updated><id>https://whiting.me/online-learning</id><content type="html" xml:base="https://whiting.me/online-learning/">&lt;p&gt;Massive open online courses promised education at a scale that in-person teaching can’t reach, but the instructor–student ratio that works in a seminar doesn’t survive the jump to tens of thousands of learners. So much of what makes a class work — feedback, discussion, a sense that anyone is paying attention — has to be reconstructed out of what the learners themselves do for each other.&lt;/p&gt;

&lt;p&gt;Over several years with &lt;a href=&quot;https://dilrukshigamage.wordpress.com&quot;&gt;Dilrukshi Gamage&lt;/a&gt;, &lt;a href=&quot;https://www.uom.lk/staff/indika-perera&quot;&gt;Indika Perera&lt;/a&gt;, &lt;a href=&quot;https://www.uom.lk/staff/shantha-fernando&quot;&gt;Shantha Fernando&lt;/a&gt;, &lt;a href=&quot;https://open.hpi.de/users/tstaubitz&quot;&gt;Thomas Staubitz&lt;/a&gt; and others, we studied how to make peer-driven learning actually work. We found that aligning incentives to the quality of feedback, rather than simply its length, produced feedback learners judged more useful &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1145/3051457.3054013&quot;&gt;L@S’17&lt;/a&gt;. We showed that seeding peer assessment groups with trained “introduced peers” improved the quality of subsequent discussion &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1109/TALE.2018.8615262&quot;&gt;TALE’18&lt;/a&gt;. We surveyed the broader state of the field in a systematic literature review of peer assessment in MOOCs &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.1080/01587919.2021.1911626&quot;&gt;Distance Education&lt;/a&gt;. And we showed that treating learners as communities of practice — rather than isolated users moving through a course — measurably improves outcomes &lt;a class=&quot;button smallCaps&quot; href=&quot;https://asianchi.acm.org&quot;&gt;Asian CHI’21&lt;/a&gt;, which was recognized with a best paper award.&lt;/p&gt;

&lt;p&gt;The through-line across these projects is that scale doesn’t remove the social structure of learning; it just changes who has to supply it.&lt;/p&gt;</content><author><name></name></author><category term="MOOCs" /><category term="peer" /><category term="assessment" /><category term="online" /><category term="learning" /><summary type="html">Massive open online courses promised education at a scale that in-person teaching can’t reach, but the instructor–student ratio that works in a seminar doesn’t survive the jump to tens of thousands of learners. So much of what makes a class work — feedback, discussion, a sense that anyone is paying attention — has to be reconstructed out of what the learners themselves do for each other.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://whiting.me/online-learning.png" /><media:content medium="image" url="https://whiting.me/online-learning.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Running experiments at scale</title><link href="https://whiting.me/empirica/" rel="alternate" type="text/html" title="Running experiments at scale" /><published>2021-03-15T00:00:00+00:00</published><updated>2021-03-15T00:00:00+00:00</updated><id>https://whiting.me/empirica</id><content type="html" xml:base="https://whiting.me/empirica/">&lt;p&gt;Most of the empirical work I do with collaborators needs infrastructure that doesn’t really exist off the shelf. Running a single behavioral experiment with a handful of people is well supported; running thousands of them, each with groups of participants interacting in real time, is not.&lt;/p&gt;

&lt;p&gt;Lead by &lt;a href=&quot;http://amaatouq.io&quot;&gt;Abdullah Almaatouq&lt;/a&gt;, and with &lt;a href=&quot;https://joshua-a-becker.com&quot;&gt;Joshua Becker&lt;/a&gt;, &lt;a href=&quot;https://jhoughton.com&quot;&gt;James Houghton&lt;/a&gt;, &lt;a href=&quot;https://github.com/npaton&quot;&gt;Nicolas Paton&lt;/a&gt;, and &lt;a href=&quot;https://css.seas.upenn.edu/people/duncan-watts/&quot;&gt;Duncan Watts&lt;/a&gt;, we built and extended &lt;a href=&quot;https://empirica.ly&quot;&gt;Empirica&lt;/a&gt;, an open-source virtual lab designed for exactly this kind of high-throughput, macro-level experimentation. Empirica handles the hard parts — synchronization, treatment randomization, real-time interaction, and iterative design — so researchers can focus on the experiment itself &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.3758/s13428-020-01535-9&quot;&gt;BRM’21&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Empirica was one piece of a broader conversation, across many labs, about what it would take to actually scale up behavioral and social science. In a working paper with a large group of collaborators we laid out a vision for shared infrastructure, shared protocols, and shared participant access as the missing middle of the field &lt;a class=&quot;button smallCaps&quot; href=&quot;https://doi.org/10.17605/OSF.IO/KNVJS&quot;&gt;OSF’21&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This infrastructure turned out to be essential to the work on &lt;a href=&quot;/2022/12/21/Integrative-experiments.html&quot;&gt;integrative experiments&lt;/a&gt; that came next — you can’t run a design space full of experiments if each one costs you a month of engineering.&lt;/p&gt;</content><author><name></name></author><category term="Empirica" /><category term="infrastructure" /><category term="virtual" /><category term="lab" /><category term="MIT" /><category term="Penn" /><summary type="html">Most of the empirical work I do with collaborators needs infrastructure that doesn’t really exist off the shelf. Running a single behavioral experiment with a handful of people is well supported; running thousands of them, each with groups of participants interacting in real time, is not.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://whiting.me/empirica.png" /><media:content medium="image" url="https://whiting.me/empirica.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>