Digitizing with the Internet Archive

Digitizing with the Internet Archive is a seamless, access-driven process designed to make your collections more available to the world. At the heart of our digitization service is the Scribe Workstation. From delivery of materials in any form, to metadata integration, we streamline every step. Our trained technicians digitize each item in full color, capturing structural metadata and oversized pages when needed. After thorough quality assurance, items are uploaded to archive.org in multiple formats, making them text searchable and accessible to all.

About the Scribe Workstation

The Scribe workstation features a durable aluminum frame that supports two adjustable camera mounting rails, each equipped with a color camera to simultaneously capture the recto and verso pages of a book. It includes a floating V-shaped book cradle designed to minimize stress on materials, along with a glass platen that can be raised and lowered using a foot pedal for hands-free operation. Museum-grade lights provide even illumination, while a dedicated computer captures the high-resolution images and performs initial pre-processing. After digitization and on-site quality assurance, the images are uploaded via RSYNC to processing servers.

Close-up view of a V-shaped book scanner station at the Internet Archive Digitization Services, featuring dual overhead cameras and a central monitor for digital imaging.
Person pushing a metal cart loaded with archival boxes and books.

Delivery

We’re happy to accept any form of shipment; hand delivery, mail carrier, private courier, etc. If you need help with how to organise this, your local digitization manager can help.

Cabinet with multiple labeled drawers containing physical metadata or inventory records, each marked with handwritten number ranges.

Metadata

We like to provide full metadata for all items digitized by us. There are several methods we can do this:

  • Z39.50 – We can set up a Z39.50 connection with your catalog, enabling us to pull metadata directly from your library
  • Metadata Template – You can provide us with a csv containing all the metadata you would like included with your items, if you would like to see an example of our csv template you can view it here.
  • basic metadata – upload with an identifier and catalog your items once they are online.
  • or – Let us compose it for you.
Technician operating a high-resolution book scanner at the Internet Archive Digitization Services, carefully digitizing a bound book under professional lighting.

Digitize

We provide a completely non destructive digitization service.

Items are scanned in full color, using two digital high-resolution cameras by our specially trained Digitization Technicians. During the digitization process, structural metadata is captured, for example Page types and numbers. Once the item has been digitized, any oversized pages will be shot at our folio station and inserted seamlessly into the digital facsimile.

Dependent on the size of shipment and complexity of your material, our turnaround time can be as little as 4-6 weeks.

Close-up of server hardware with a panel labeled “Internet Archive” featuring the organization's column logo.

Upload

After digitization, items enter our republisher process. Images are cropped, de-skewed and thoroughly quality assessed. They are then uploaded onto archive.org into your own Library Collection to be enjoyed and read by our patrons.

As part of our standard service we provide images in JP2 format. If required, we can also provide lossless TIFF files. For more information please see our handy help page on file formats.

Person wearing an Internet Archive shirt viewing a digitized book on a desktop computer screen in a scanning or access workstation.

Access

Images are processed using Tesseract OCR (Optical Character Recognition), this makes each page text searchable. Several file formats for each item are created, including jp2.zip, DJVU, PDF, epub, full text and json. Daisy files can also be generated on the spot for those with print disabilities and an LOC access key.

We encourage our partners to upload their born digital and previously digitised materials using our S3 uploader. For information on bulk uploading to the Internet Archive please view our help page.

Our bookreader can also be imported directly into your library website, making items easily available to your patrons.

Brewster Kahle, founder of the Internet Archive, holding a copy of Libraries of the Future against his chest.

“It doesn’t look like a printout with a staple, it doesn’t look like a report. It looks like a book.”

Brewster Kahle, digital librarian