Shop Mateina
$26.50 with 48 percent savings
Digital List Price: $50.99 Image

These promotions will be applied to this item:

Some promotions may be combined; others are not eligible to be combined with other offers. For details, please see the Terms & Conditions associated with these promotions.

You've subscribed to ! We will preorder your items within 24 hours of when they become available. When new books are released, we'll charge your default payment method for the lowest price available during the pre-order period.
Update your device or payment method, cancel individual pre-orders or your subscription at
Your Memberships & Subscriptions

Buy for others

Give as a gift or purchase for a team or group.
Learn more

How it works

  1. Choose your delivery method
  2. Send now or schedule for later
  3. Add your personal message
  4. Recipients can read on any device

These ebooks can only be redeemed by recipients in the US. Redemption links and eBooks cannot be resold.

Added to

Sorry, there was a problem.

There was an error retrieving your Wish Lists. Please try again.

Sorry, there was a problem.

List unavailable.
Kindle app logo image

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.

Read instantly on your browser with Kindle for Web.

Using your mobile phone camera - scan the code below and download the Kindle app.

QR code to download the Kindle App

Sponsored
  • Site Reliability Engineering: How Google Runs Production Systems

Follow the authors

Get new release updates & improved recommendations
See all
Something went wrong. Please try your request again later.

Site Reliability Engineering: How Google Runs Production Systems 1st Edition, Kindle Edition

4.7 out of 5 stars (1,219)

The overwhelming majority of a software systemâ??s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems?

In this collection of essays and articles, key members of Googleâ??s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. Youâ??ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficientâ??lessons directly applicable to your organization.

This book is divided into four sections:

  • Introductionâ??Learn what site reliability engineering is and why it differs from conventional IT industry practices
  • Principlesâ??Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)
  • Practicesâ??Understand the theory and practice of an SREâ??s day-to-day work: building and operating large distributed computing systems
  • Managementâ??Explore Google's best practices for training, communication, and meetings that your organization can use
Due to its large file size, this book may take longer to download
This title is only available on select devices and the latest version of the Kindle app. Please refer to the supported device list before purchase. Available on these devices

See all supported devices

Kindle E-Readers

  • Kindle Voyage (7th Generation)
  • Kindle Paperwhite (12th Generation)
  • Kindle Touch (4th Generation)
  • Kindle Colorsoft (1st Generation)
  • Kindle Scribe Colorsoft (1st Generation)
  • Kindle Oasis (8th Generation)
  • Kindle Oasis (9th Generation)
  • Kindle Oasis (10th Generation)
  • Kindle (11th Generation, 2022 Release)
  • Kindle Paperwhite (11th Generation)
  • Kindle Scribe (2024 Release)
  • Kindle Paperwhite (5th Generation)
  • Kindle (8th Generation)
  • Kindle Paperwhite (10th Generation)
  • Kindle (7th Generation)
  • Kindle Paperwhite (7th Generation)
  • Kindle (10th Generation)
  • Kindle Scribe (1st Generation, 2022 Release)
  • Kindle Scribe (3rd Generation)
  • Kindle Paperwhite (6th Generation)
  • Kindle (11th Generation, 2024 Release)

Fire Tablets

  • Fire HD 10 Plus
  • Fire HD 8 (12th Generation)
  • Fire HD 10 (11th Generation)
  • Fire HD 8 (8th Generation)
  • Fire HD 8 (10th Generation)
  • Fire 10 HD (13th Gen)
  • Fire HD 8 (12th Generation)
  • Fire Max 11 (13th Generation)
  • Fire HD 8 (12th Generation)
  • Fire 7 (9th Generation)
  • Fire 7 (12th Generation)
  • Fire HD 10 (9th Generation)

Free Kindle Reading Apps

  • Kindle for Android Phones
  • Kindle for Android Tablets
  • Kindle for iPhone
  • Kindle for PC
  • Kindle for iPad
  • Kindle for Mac
  • Kindle for Web
Sponsored

Customers also bought or read

Loading...

From the brand


From the Publisher

Image

This book is divided into four sections:
  • Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices
  • Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)
  • Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems
  • Management—Explore Google's best practices for training, communication, and meetings that your organization can use

How to Read This Book

This book is a series of essays written by members and alumni of Google’s Site Reliability Engineering organization. It’s much more like conference proceedings than it is like a standard book by an author or a small number of authors. Each chapter is intended to be read as a part of a coherent whole, but a good deal can be gained by reading on whatever subject particularly interests you. (If there are other articles that support or inform the text, we reference them so you can follow up accordingly.)

You don’t need to read in any particular order, though we’d suggest at least starting with Chapters 2 and 3, which describe Google’s production environment and outline how SRE approaches risk, respectively. (Risk is, in many ways, the key quality of our profession.) Reading cover-to-cover is, of course, also useful and possible; our chapters are grouped thematically, into Principles (Part II), Practices (Part III), and Management (Part IV). Each has a small introduction that highlights what the individual pieces are about, and references other articles published by Google SREs, covering specific topics in more detail. Additionally, there’s a companion website mentioned in the book that has a number of helpful resources.

We hope this will be at least as useful and interesting to you as putting it together was for us.

— The Editors.

Image

Image

Image

Image
Site Reliability Engineering
Image
The Site Reliability Workbook
Customer Reviews
4.7 out of 5 stars 1,219
4.7 out of 5 stars 387
Price $27.89 $28.89
Explore the book & companion workbook How Google Runs Production Systems Practical Ways to Implement SRE

Editorial Reviews

About the Author

Niall Murphy leads the Ads Site Reliability Engineering team at Google Ireland. He has been involved in the Internet industry for about 20 years, and is currently chairperson of INEX, Ireland’s peering hub. He is the author or coauthor of a number of technical papers and/or books, including "IPv6 Network Administration" for O’Reilly, and a number of RFCs. He is currently cowriting a history of the Internet in Ireland, and is the holder of degrees in Computer Science, Mathematics, and Poetry Studies, which is surely some kind of mistake. He lives in Dublin with his wife and two sons.

^

Betsy Beyer is a Technical Writer for Google Site Reliability Engineering in NYC. She has previously written documentation for Google Datacenters and Hardware Operations teams. Before moving to New York, Betsy was a lecturer on technical writing at Stanford University.

^

Chris Jones is a Site Reliability Engineer for Google App Engine, a cloud platform-as-a-service product serving over 28 billion requests per day. Based in San Francisco, he has previously been responsible for the care and feeding of Google’s advertising statistics, data warehousing, and customer support systems. In other lives, Chris has worked in academic IT, analyzed data for political campaigns, and engaged in some light BSD kernel hacking, picking up degrees in Computer Engineering, Economics, and Technology Policy along the way. He’s also a licensed professional engineer.

^

Jennifer Petoff is a Program Manager for Google’s Site Reliability Engineering team and based in Dublin, Ireland. She has managed large global projects across wide-ranging domains including scientific research, engineering, human resources, and advertising operations. Jennifer joined Google after spending eight years in the chemical industry. She holds a PhD in Chemistry from Stanford University and a BS in Chemistry and a BA in Psychology from the University of Rochester.

Product details

  • ASIN ‏ : ‎ B01DCPXKZ6
  • Publisher ‏ : ‎ O'Reilly Media
  • Accessibility ‏ : ‎ Learn more
  • Publication date ‏ : ‎ March 23, 2016
  • Edition ‏ : ‎ 1st
  • Language ‏ : ‎ English
  • File size ‏ : ‎ 12.0 MB
  • Enhanced typesetting ‏ : ‎ Enabled
  • X-Ray ‏ : ‎ Not Enabled
  • Word Wise ‏ : ‎ Enabled
  • Print length ‏ : ‎ 865 pages
  • ISBN-10 ‏ : ‎ 9781491951163
  • ISBN-13 ‏ : ‎ 978-1491951163
  • Page Flip ‏ : ‎ Enabled
  • Best Sellers Rank: #318,960 in Kindle Store (See Top 100 in Kindle Store)
  • Customer Reviews:
    4.7 out of 5 stars (1,219)

About the authors

Follow authors to get new release updates, plus improved recommendations.

Customer reviews

4.7 out of 5 stars
1,219 global ratings
Sponsored
Kindle edition is horribly formatted
1 out of 5 stars
Kindle edition is horribly formatted
The Kindle edition is horribly formatted. Headings, subheadings and call-outs are not visible as such, but appear as normal body text. Unacceptable for such an expensive publication. Additional information: this is on Windows 10, with the Kindle reader app from the Windows Store. Attached screenshot shows how the author and editor credits, as well as a quote, appear as normal body text at the start of a chapter: this problem persists through the entire publication, and is especially annoying for subheadings. All my other Kindle books look fine using the same reader app.
Thank you for your feedback
Sorry, there was an error
Sorry we couldn't load the review

Top reviews from the United States

  • Reviewed in the United States on October 4, 2025
    Format: PaperbackVerified Purchase
    Amazing book with high quality content
  • Reviewed in the United States on April 20, 2019
    Format: PaperbackVerified Purchase
    It's worth noting that there is a great Coursera course about SRE from Google. It will not cover as much as the book, but's it is a distilled version to learn the basics.

    This book has a lot of great information, which I found invaluable over the years. One of the harder thing for growing organizations is to keep teams focused, and I've seen that DevOps and SRE practices help to zero in on what is essential.

    A lot of Automation related work feels like 'yak shaving,' which is a term to refer to entirely unrelated things that don't add value to our product. For development teams, this feels very frustrating. Why would I want to make a script to automate this? We only use it once a year!

    SRE helps to solve these frustrations, to some extent, with practices that help organizations understand why should they communicate, why should they talk about issues, and why we measure some things on some level and not others.
    9 people found this helpful
    Report
  • Reviewed in the United States on January 28, 2026
    Format: PaperbackVerified Purchase
    Excellent
  • Reviewed in the United States on December 22, 2018
    Format: KindleVerified Purchase
    First off - it's worth noting that Google lets you read this entire book for free on their website.

    I bought the Kindle version anyways because I spend enough time in front of a backlit screen that it seemed worth it to read something this large using a device that's better on your eyes. Unfortunately the Kindle version is formatted terribly and I wish I'd bought the print version instead. The book is broken up into Parts which are broken up into Chapters which are further broken up into headlined sections. The Kindle version identifies those headlined sections as chapters which is somewhat useless.

    Anyways, the first few chapters aren't especially useful unless you work at Google. They mostly discuss what's unique about Google's computing infrastructure. Despite this, they were EASILY my favorite part of the book because the material is so interesting and their approach is so unique. After that, each chapter is written in a way that it can stand on its own if you aren't reading the entire book, or are reading it out of order. This is convenient for people who want to pick and choose what parts they want to read, but means that people who are reading the entire thing wind up getting a lot of the same information multiple times. It's all written by different people too, which on the one hand makes it not quite as repetitive, but on the other hand makes it hard to just skim over the sections with info you already have because you don't recognize it as information you already know until you've processed it.

    Overall this is a fantastic book on DevOps, SRE, and current trends in the industry, It's a great read for anyone who wants to apply some "best practices" to their role. I would however say that reading the entire thing is overkill for most people and not necessarily the best use of your time if you have other things you'd like to be learning as well.

    Part 1 - Fascinating read. I imagine this would be a good overview if you're about to start at Google and want a sneak peek at how things are done, but I'm only speculating this as an outsider.

    Part 2 - Interesting and useful concepts for modern cloud computing.

    Part 3 - Some useful info and a lot of stuff that's not really unique to Google in my experience. Read the parts that you think you could use some improvement on, skip the rest.

    Part 4 - A condensed view from a managerial perspective of things you already read in Part 3.

    Part 5 - Some case studies, comparisons from other businesses, a useless recap, and examples that could be useful to share using the website version of the book if you're trying to explain to your team what new concepts are being implemented.
    47 people found this helpful
    Report
  • Reviewed in the United States on January 8, 2021
    Format: PaperbackVerified Purchase
    I was amazed by the depth of this book, and the way it covers several aspects of what it takes to operate a complex and distributed software system. I was particularly impressed with the details of some chapters related to monitoring, load balancing (at the front end and back end), designing applications to manage overload conditions, and being on call.
    I think the book has a lot to teach and inspire. Some of the approaches described are very counterintuitive like the error budget, and the blameless postmortem culture. One of the shortcomings I noticed was that some chapters are hard to read because they treat rather advanced topics. The fact that the book has very few illustrations makes it hard to understand some of the concepts at times. Overall, an invaluable resource.
    One person found this helpful
    Report
  • Reviewed in the United States on June 26, 2016
    Format: PaperbackVerified Purchase
    The really liked this book. Cool to see how Google actually runs things at their scale. Got me thinking about things I never thought about when it comes to my work in tech. This could sound like the book makes you paranoid, but I think that's too negative. I felt more like I now have a little license and education on how things can (and will) fail and how I can better prepare for and mitigate them. It's like you got to do a ride along in a busy Ambulance service, gets you thinking "hmm, maybe I should take that CPR course and brush up on the heimlich maneuver...".

    Even though several of the topics covered weren't things I deal with day to day, I think the mindset you develop after seeing how they solve various issues applies to most any IT / tech endeavor (i.e. whether you're in ops, a SWE, etc.). I think if this book's subject interests you at all, you'll really appreciate having read it.
    16 people found this helpful
    Report
  • Reviewed in the United States on April 1, 2022
    Format: PaperbackVerified Purchase
    Tons of nuggets about best practices, how they can be useful across industry, Google's tooling, how they got there, challenges faced, communication between engineers and SRE, how to look at problems, and so much more.
    There were parts of the book that got can be too deep or not best explained, and end up boring. I just skipped pages to move on to the next learning.
    Overall a good addition to my library.
  • Reviewed in the United States on February 11, 2024
    Format: PaperbackVerified Purchase
    I think Googles practices are now standard across the industry. A lot of things mentioned in the book are already in practice at my employ. Good read.

Top reviews from other countries

Translate all reviews to English
  • J. Andrews
    5.0 out of 5 stars The book every infrastructure engineer and DevOps person should read
    Reviewed in the United Kingdom on January 21, 2018
    Format: KindleVerified Purchase
    If you are new to infrastructure engineering this book will inform you as to an approach and model to use as you start down this road. If you are an experienced engineer then you will see a lot of truth in what is written here. It may change you viewpoint or solidify an existing one, whatever the case this book is an essential reference and an honest account with a huge amount of wisdom.
  • Niels Albers
    5.0 out of 5 stars Must read for the serious DevOps engineer
    Reviewed in the Netherlands on May 4, 2016
    Format: KindleVerified Purchase
    Just the first chapter alone lists a number of concrete issues that anyone who has any experience with operations at all will both recognise, and the recommendations this book makes just make sense. Actually, not only people with DevOps experience should be reading this, there is a lot in here that their managers could certainly profit from, in every sense of the word.
    Key words:
    - Error budget
    - Toil / development ballance (and the 50% time rule)
    - The impossibility of never having a failure.

    I'm still working my way through the book, but every new chapter has new insights that really help to put our complex job into perspective, and offer concrete ways of making our work better.
  • Grisi
    5.0 out of 5 stars Deep info
    Reviewed in Australia on November 1, 2023
    Format: PaperbackVerified Purchase
    What to know about an Engineer - read this book - it's Deep
  • Chf
    5.0 out of 5 stars Interesting and useful
    Reviewed in Italy on May 16, 2016
    Format: KindleVerified Purchase
    Of course, I have not the same infrastructure like Google but many problems are the same.
    This book is very interesting because shows different tips & tricks to resolve and manage communication problems between departments and of course reliability problems.
    I suggest it to every IT professional, ITIL experts, DevOps wannabe and of course CTO.
  • Óscar Casal Sánchez
    5.0 out of 5 stars Excelente libro
    Reviewed in Spain on April 18, 2017
    Format: KindleVerified Purchase
    Libro excelente que da muchos puntos de vista de como formar un equipo de trabajo y cómo afrontar los problemas. También recorre todos los procesos de una empresa: presupuestos, monitorización, sla, puesta marcha servicio, mantenimiento de un servicio...

    En este libro se ve que la cultura de Google es "blameless" y que no hay una línea entre devs y ops, existe el concepto de SRE que podría decirse que es parecido al actual de devops, aunque con más funciones.

    Libro que debería leer toda persona que trabaja en IT y también a toda la
    Report

Report an issue


Does this item contain inappropriate content?
Do you believe that this item violates a copyright?
Does this item contain quality or formatting issues?