Guest post from Thais Lobo, Liliana Bounegru & Jonathan W. Y. Gray, King’s College London.
This work was supported by the Centre for Digital Culture and Department of Digital Humanities at King’s College London and developed further through collaborations with researchers and students at the University of Amsterdam.

Digital journalists increasingly turn to web archives like the Wayback Machine to follow how things on the Internet break, change or disappear – from deleted posts to quietly edited pages.
The web has become not only a source of information but also the subject of media investigations, prompting journalists, researchers and activists to use digital archives to reconstruct timelines, verify claims, uncover hidden connections and hold powerful actors to account.
As online materials grow more fragile and prone to disappearance, the Internet Archive’s Wayback Machine has been critical in making “lost” web pages available – recently celebrating archiving over a trillion pages.
As we’ve previously written about on this blog, the Wayback Machine is an important resource for our work as media researchers, helping us to trace histories of digital media objects (for example, changes in ad tracker signatures of viral “fake news” sites over time).
We are also interested in how others use web archives across fields, and what we can learn from each other.
In this piece we draw on the Internet Archive’s News Stories collection to surface practices and use cultures of the Wayback Machine amongst journalists and media organisations. We analysed a dataset of about 8,600 news articles, assembled by the IA via daily Google News keyword searches since 2018.
Drawing on a combination of digital methods, machine learning and lots of reading – we surfaced nine ways that journalists use the Wayback Machine in their reporting.
***
1. following what is deleted
Shifting political alliances are a common driver of online footprint erasure. Deleted tweets have revealed past critics in current allies (here and here), and current career aspirations were juxtaposed with earlier conflicting stances in personal blogs and websites (here, here, here and here).
Unannounced takedowns of collections or site sections on government websites often prompt investigations using archival snapshots. Examples include removed editions of presidential newsletters and deleted staff contact lists for services supporting vulnerable groups, signaling access-to-information breaches.

The removal of official publications also enticed further contextualisation, revealing cases in which information was deleted due to being incomplete, inaccurate or inconveniently timed.
Beyond politics, erasing on corporate websites highlights commercial and reputational pressures, such as deleted statements on forced labour, product safety and climate deception.

2. following what has been altered
Subtle alterations on webpages can also reveal a plain-to-see effort to reshape narratives.
Reporting based on archived pages shows how wording edits can move in opposite directions: from hardening language on migration ahead of a policy announcement to softening controversial statements in view of a political nomination, or erasing customer protection promises prior to a bankruptcy filing.

In other cases, small additions to online content have proved just as revealing. A before and after snapshot of a blog post showed how a supposed early warning about a virus threat was added only after the pandemic began. Similarly, changes to a social media platform’s API rules appeared shortly after third-party apps were banned, subtly reframing the policy to align with new restrictions.
3. following what is banned
Sometimes removals are deliberate, often at the request of companies seeking to enforce copyright, control branding, or limit liability.
Reports from media investigations highlight how such bans can affect games (here, here, here and here), apps and technical reviews.

In some cases, the bans intersect with political pressures, such as Hong Kong news outlets being shuttered under pro‑Beijing pressure, and disinformation networks being taken down due to links to state actors.
4. following what is broken
Archived snapshots are also often the only way to reconstruct what preceded a link break, when it happened, and what information was effectively cut off.
For example, an investigation into a set of broken URLs on a government website revealed that the pages themselves had not been removed, but the links pointed to outdated servers, creating a false impression of secrecy that sparked a conspiracy theory.
In another case, a major technical glitch took multiple Nigerian government websites offline, cutting off access to official information and showing how even unintentional failures can undermine transparency.
5. following what is hacked
Compromised versions of hacked websites and social media accounts present another form of using archived snapshots as traceable historical record.

For example, past screenshots of Twitter’s bio page revealed inconsistencies in claims about an alleged takeover of the US president’s social media account. In other cases, such snapshots helped surface a forensic trail and distinguish unauthorised activity carried out by activists (here and here) from the ones linked to cybercriminal groups (here).
6. following what is connected
Archived web data often uncovers unexpected linkages between domains’ ownership that appear unrelated on the surface.

For example, journalists used analytics codes of copies of sites maintained by the Wayback Machine to uncover disinformation networks. In another investigation, archived records verified that a website redirect to Joe Biden’s presidential campaign was unrelated to him, debunking conspiracy theories about the domain’s ownership.

Snapshots of a fake Black Lives Matter Facebook page and its associated websites allowed reporters to trace the individuals behind the operation. Similarly, archived versions of Amazon storefronts exposed networks of accounts generating affiliate revenue from coordinated product listings.
7. following what is reported
Archived web pages have proven vital for tracing how stories are presented across media outlets and platforms.
Investigations have examined archived versions of individual pages, such as headline coverage relying heavily on unverified claims, a news agency editorial premature assessment, or the unflagging of a branded content.

In another case, snapshots of the Google homepage captured during the 2018 State of the Union speech disproved a viral claim that Google ignored Donald Trump’s address in favour of Barack Obama.
8. following what is unchanged
In other investigations, the most revealing detail is what did not change.
For example, during a bushfire crisis in Australia, archived pages showed that a key policy statement by the Greens party was left untouched, despite a disinformation campaign claiming to the contrary.
Similarly, a social media account circulated as having been reactivated under a new wave of laissez-faire moderation was, in fact, never suspended.
9. following what is saved
When forums, platforms and websites vanish, it’s the work of crowdsourced archivists that capture their traces before they vanish for good.
In several reported cases, users raced to preserve spaces such as a long-running forum for sex workers, a 16-year-old Q&A site, a meme-sharing platform, and a free music library.
Archiving web pages can become part of the story.
***
These are some of the ways we’ve noticed journalists using web archives – and there are many more! If you know of other interesting examples, we’d love to hear from you.
We hope that these nine ways may help to inspire critical and creative uses of web archives to “follow the changes” – exploring what they can tell us about digital culture and society, and the times we live in.
This work was supported by the Centre for Digital Culture and Department of Digital Humanities at King’s College London and developed further through collaborations with researchers and students at the University of Amsterdam.





















