Digitizing with the Internet Archive
Digitizing with the Internet Archive is a seamless, access-driven process designed to make your collections more available to the world. At the heart of our digitization service is the Scribe Workstation. From delivery of materials in any form, to metadata integration, we streamline every step. Our trained technicians digitize each item in full color, capturing structural metadata and oversized pages when needed. After thorough quality assurance, items are uploaded to archive.org in multiple formats, making them text searchable and accessible to all.
About the Scribe Workstation
The Scribe workstation features a durable aluminum frame that supports two adjustable camera mounting rails, each equipped with a color camera to simultaneously capture the recto and verso pages of a book. It includes a floating V-shaped book cradle designed to minimize stress on materials, along with a glass platen that can be raised and lowered using a foot pedal for hands-free operation. Museum-grade lights provide even illumination, while a dedicated computer captures the high-resolution images and performs initial pre-processing. After digitization and on-site quality assurance, the images are uploaded via RSYNC to processing servers.
Delivery
We’re happy to accept any form of shipment; hand delivery, mail carrier, private courier, etc. If you need help with how to organise this, your local digitization manager can help.
Metadata
We like to provide full metadata for all items digitized by us. There are several methods we can do this:
- Z39.50 – We can set up a Z39.50 connection with your catalog, enabling us to pull metadata directly from your library
- Metadata Template – You can provide us with a csv containing all the metadata you would like included with your items, if you would like to see an example of our csv template you can view it here.
- basic metadata – upload with an identifier and catalog your items once they are online.
- or – Let us compose it for you.
Digitize
We provide a completely non destructive digitization service.
Items are scanned in full color, using two digital high-resolution cameras by our specially trained Digitization Technicians. During the digitization process, structural metadata is captured, for example Page types and numbers. Once the item has been digitized, any oversized pages will be shot at our folio station and inserted seamlessly into the digital facsimile.
Dependent on the size of shipment and complexity of your material, our turnaround time can be as little as 4-6 weeks.
Upload
After digitization, items enter our republisher process. Images are cropped, de-skewed and thoroughly quality assessed. They are then uploaded onto archive.org into your own Library Collection to be enjoyed and read by our patrons.
As part of our standard service we provide images in JP2 format. If required, we can also provide lossless TIFF files. For more information please see our handy help page on file formats.
Access
Images are processed using Tesseract OCR (Optical Character Recognition), this makes each page text searchable. Several file formats for each item are created, including jp2.zip, DJVU, PDF, epub, full text and json. Daisy files can also be generated on the spot for those with print disabilities and an LOC access key.
We encourage our partners to upload their born digital and previously digitised materials using our S3 uploader. For information on bulk uploading to the Internet Archive please view our help page.
Our bookreader can also be imported directly into your library website, making items easily available to your patrons.
“It doesn’t look like a printout with a staple, it doesn’t look like a report. It looks like a book.”
Brewster Kahle, digital librarian
