Customers who viewed this item also viewed
Learn more
These promotions will be applied to this item:
Some promotions may be combined; others are not eligible to be combined with other offers. For details, please see the Terms & Conditions associated with these promotions.
Your Memberships & Subscriptions
Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Follow the authors
OK
Site Reliability Engineering: How Google Runs Production Systems 1st Edition, Kindle Edition
The overwhelming majority of a software systemâ??s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems?
In this collection of essays and articles, key members of Googleâ??s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. Youâ??ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficientâ??lessons directly applicable to your organization.
This book is divided into four sections:
- Introductionâ??Learn what site reliability engineering is and why it differs from conventional IT industry practices
- Principlesâ??Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)
- Practicesâ??Understand the theory and practice of an SREâ??s day-to-day work: building and operating large distributed computing systems
- Managementâ??Explore Google's best practices for training, communication, and meetings that your organization can use
- ISBN-109781491951163
- ISBN-13978-1491951163
- Edition1st
- PublisherO'Reilly Media
- Publication dateMarch 23, 2016
- LanguageEnglish
- File size12.0 MB
See all supported devices
Kindle E-Readers
- Kindle Voyage (7th Generation)
- Kindle Paperwhite (12th Generation)
- Kindle Touch (4th Generation)
- Kindle Colorsoft (1st Generation)
- Kindle Scribe Colorsoft (1st Generation)
- Kindle Oasis (8th Generation)
- Kindle Oasis (9th Generation)
- Kindle Oasis (10th Generation)
- Kindle (11th Generation, 2022 Release)
- Kindle Paperwhite (11th Generation)
- Kindle Scribe (2024 Release)
- Kindle Paperwhite (5th Generation)
- Kindle (8th Generation)
- Kindle Paperwhite (10th Generation)
- Kindle (7th Generation)
- Kindle Paperwhite (7th Generation)
- Kindle (10th Generation)
- Kindle Scribe (1st Generation, 2022 Release)
- Kindle Scribe (3rd Generation)
- Kindle Paperwhite (6th Generation)
- Kindle (11th Generation, 2024 Release)
Fire Tablets
- Fire HD 10 Plus
- Fire HD 8 (12th Generation)
- Fire HD 10 (11th Generation)
- Fire HD 8 (8th Generation)
- Fire HD 8 (10th Generation)
- Fire 10 HD (13th Gen)
- Fire HD 8 (12th Generation)
- Fire Max 11 (13th Generation)
- Fire HD 8 (12th Generation)
- Fire 7 (9th Generation)
- Fire 7 (12th Generation)
- Fire HD 10 (9th Generation)
Free Kindle Reading Apps
- Kindle for Android Phones
- Kindle for Android Tablets
- Kindle for iPhone
- Kindle for PC
- Kindle for iPad
- Kindle for Mac
- Kindle for Web
Products related to this item
Customers also bought or read
- Software Engineering at Google: Lessons Learned from Programming Over Time
Kindle Edition$34.67$34.67 - System Design Interview – An insider's guide#1 Best SellerDistributed Systems & Computing
Kindle Edition$39.99$39.99 - Implementing Service Level Objectives: A Practical Guide to SLIs, SLOs, and Error Budgets
Kindle Edition$27.86$27.86 - The Staff Engineer's Path: A Guide for Individual Contributors Navigating Growth and Change
Kindle Edition$19.99$19.99 - Operating Systems: Three Easy Pieces#1 Best SellerComputer Operating Systems Theory
Kindle Edition$9.99$9.99 - Refactoring: Improving the Design of Existing Code (Addison-Wesley Signature Series (Fowler))
Kindle Edition$46.39$46.39
From the brand
-
Your partner in learning
-
Bestsellers
-
Software Development
-
Programming Languages
-
AI / Machine Learning
-
Data Science
-
Data, Databases and more
-
Cloud Services
-
Business
-
Finance
-
Blockchain / Cryptocurrency
-
Security
-
Lean series
-
Cookbooks
-
Head First series
-
97 Things series
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
From the Publisher
This book is divided into four sections:
- Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices
- Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)
- Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems
- Management—Explore Google's best practices for training, communication, and meetings that your organization can use
How to Read This Book
This book is a series of essays written by members and alumni of Google’s Site Reliability Engineering organization. It’s much more like conference proceedings than it is like a standard book by an author or a small number of authors. Each chapter is intended to be read as a part of a coherent whole, but a good deal can be gained by reading on whatever subject particularly interests you. (If there are other articles that support or inform the text, we reference them so you can follow up accordingly.)
You don’t need to read in any particular order, though we’d suggest at least starting with Chapters 2 and 3, which describe Google’s production environment and outline how SRE approaches risk, respectively. (Risk is, in many ways, the key quality of our profession.) Reading cover-to-cover is, of course, also useful and possible; our chapters are grouped thematically, into Principles (Part II), Practices (Part III), and Management (Part IV). Each has a small introduction that highlights what the individual pieces are about, and references other articles published by Google SREs, covering specific topics in more detail. Additionally, there’s a companion website mentioned in the book that has a number of helpful resources.
We hope this will be at least as useful and interesting to you as putting it together was for us.
— The Editors.
Site Reliability Engineering
|
The Site Reliability Workbook
|
|
|---|---|---|
| Customer Reviews |
4.7 out of 5 stars 1,219
|
4.7 out of 5 stars 387
|
| Price | $27.89$27.89 | $28.89$28.89 |
| Explore the book & companion workbook | How Google Runs Production Systems | Practical Ways to Implement SRE |
Editorial Reviews
About the Author
Niall Murphy leads the Ads Site Reliability Engineering team at Google Ireland. He has been involved in the Internet industry for about 20 years, and is currently chairperson of INEX, Ireland’s peering hub. He is the author or coauthor of a number of technical papers and/or books, including "IPv6 Network Administration" for O’Reilly, and a number of RFCs. He is currently cowriting a history of the Internet in Ireland, and is the holder of degrees in Computer Science, Mathematics, and Poetry Studies, which is surely some kind of mistake. He lives in Dublin with his wife and two sons.
^Betsy Beyer is a Technical Writer for Google Site Reliability Engineering in NYC. She has previously written documentation for Google Datacenters and Hardware Operations teams. Before moving to New York, Betsy was a lecturer on technical writing at Stanford University.
^Chris Jones is a Site Reliability Engineer for Google App Engine, a cloud platform-as-a-service product serving over 28 billion requests per day. Based in San Francisco, he has previously been responsible for the care and feeding of Google’s advertising statistics, data warehousing, and customer support systems. In other lives, Chris has worked in academic IT, analyzed data for political campaigns, and engaged in some light BSD kernel hacking, picking up degrees in Computer Engineering, Economics, and Technology Policy along the way. He’s also a licensed professional engineer.
^Jennifer Petoff is a Program Manager for Google’s Site Reliability Engineering team and based in Dublin, Ireland. She has managed large global projects across wide-ranging domains including scientific research, engineering, human resources, and advertising operations. Jennifer joined Google after spending eight years in the chemical industry. She holds a PhD in Chemistry from Stanford University and a BS in Chemistry and a BA in Psychology from the University of Rochester.
Product details
- ASIN : B01DCPXKZ6
- Publisher : O'Reilly Media
- Accessibility : Learn more
- Publication date : March 23, 2016
- Edition : 1st
- Language : English
- File size : 12.0 MB
- Enhanced typesetting : Enabled
- X-Ray : Not Enabled
- Word Wise : Enabled
- Print length : 865 pages
- ISBN-10 : 9781491951163
- ISBN-13 : 978-1491951163
- Page Flip : Enabled
- Best Sellers Rank: #318,960 in Kindle Store (See Top 100 in Kindle Store)
- Customer Reviews:
About the authors

- Working on an SRE-based startup from Dublin, Ireland
- Twitter http://twitter.com/niallm
- Photos at http://www.edge-cases.photos

Discover more of the author’s books, see similar authors, read book recommendations and more.

Jennifer Petoff is Director of Google Cloud Platform (GCP) & Technical Infrastructure (TI) Education and is based in Lisbon, Portugal. She leads training programs for Google's GCP and TI Engineering Teams. Jennifer is one of the co-editors of the best-selling book, "Site Reliability Engineering: How Google Runs Production Systems"; lead author of "Training Site Reliability Engineers: What Your Organization Needs to Create a Learning Program"; and is a regular speaker at DevOps and SRE conferences around the world.
Jennifer joined Google in 2007 after spending eight years in the chemical industry. She holds a PhD in Chemistry from Stanford University and a BS in Chemistry and a BA in Psychology from the University of Rochester in the United States.

Betsy is a Technical Writer for Google in NYC specializing in Site Reliability Engineering. She has previously written documentation for Google's Data Center and Hardware Operations Teams in Mountain View and across its globally-distributed data centers. Before moving to New York, Betsy was a lecturer on technical writing at Stanford University. En route to her current career, Betsy studied International Relations and English Literature, and holds degrees from Stanford and Tulane.
Products related to this item
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonReviews with images
Kindle edition is horribly formatted
Top reviews from the United States
There was a problem filtering reviews. Please reload the page.
- Reviewed in the United States on October 4, 2025Format: PaperbackVerified PurchaseAmazing book with high quality content
- Reviewed in the United States on April 20, 2019Format: PaperbackVerified PurchaseIt's worth noting that there is a great Coursera course about SRE from Google. It will not cover as much as the book, but's it is a distilled version to learn the basics.
This book has a lot of great information, which I found invaluable over the years. One of the harder thing for growing organizations is to keep teams focused, and I've seen that DevOps and SRE practices help to zero in on what is essential.
A lot of Automation related work feels like 'yak shaving,' which is a term to refer to entirely unrelated things that don't add value to our product. For development teams, this feels very frustrating. Why would I want to make a script to automate this? We only use it once a year!
SRE helps to solve these frustrations, to some extent, with practices that help organizations understand why should they communicate, why should they talk about issues, and why we measure some things on some level and not others.
- Reviewed in the United States on December 22, 2018Format: KindleVerified PurchaseFirst off - it's worth noting that Google lets you read this entire book for free on their website.
I bought the Kindle version anyways because I spend enough time in front of a backlit screen that it seemed worth it to read something this large using a device that's better on your eyes. Unfortunately the Kindle version is formatted terribly and I wish I'd bought the print version instead. The book is broken up into Parts which are broken up into Chapters which are further broken up into headlined sections. The Kindle version identifies those headlined sections as chapters which is somewhat useless.
Anyways, the first few chapters aren't especially useful unless you work at Google. They mostly discuss what's unique about Google's computing infrastructure. Despite this, they were EASILY my favorite part of the book because the material is so interesting and their approach is so unique. After that, each chapter is written in a way that it can stand on its own if you aren't reading the entire book, or are reading it out of order. This is convenient for people who want to pick and choose what parts they want to read, but means that people who are reading the entire thing wind up getting a lot of the same information multiple times. It's all written by different people too, which on the one hand makes it not quite as repetitive, but on the other hand makes it hard to just skim over the sections with info you already have because you don't recognize it as information you already know until you've processed it.
Overall this is a fantastic book on DevOps, SRE, and current trends in the industry, It's a great read for anyone who wants to apply some "best practices" to their role. I would however say that reading the entire thing is overkill for most people and not necessarily the best use of your time if you have other things you'd like to be learning as well.
Part 1 - Fascinating read. I imagine this would be a good overview if you're about to start at Google and want a sneak peek at how things are done, but I'm only speculating this as an outsider.
Part 2 - Interesting and useful concepts for modern cloud computing.
Part 3 - Some useful info and a lot of stuff that's not really unique to Google in my experience. Read the parts that you think you could use some improvement on, skip the rest.
Part 4 - A condensed view from a managerial perspective of things you already read in Part 3.
Part 5 - Some case studies, comparisons from other businesses, a useless recap, and examples that could be useful to share using the website version of the book if you're trying to explain to your team what new concepts are being implemented.
- Reviewed in the United States on January 8, 2021Format: PaperbackVerified PurchaseI was amazed by the depth of this book, and the way it covers several aspects of what it takes to operate a complex and distributed software system. I was particularly impressed with the details of some chapters related to monitoring, load balancing (at the front end and back end), designing applications to manage overload conditions, and being on call.
I think the book has a lot to teach and inspire. Some of the approaches described are very counterintuitive like the error budget, and the blameless postmortem culture. One of the shortcomings I noticed was that some chapters are hard to read because they treat rather advanced topics. The fact that the book has very few illustrations makes it hard to understand some of the concepts at times. Overall, an invaluable resource.
- Reviewed in the United States on June 26, 2016Format: PaperbackVerified PurchaseThe really liked this book. Cool to see how Google actually runs things at their scale. Got me thinking about things I never thought about when it comes to my work in tech. This could sound like the book makes you paranoid, but I think that's too negative. I felt more like I now have a little license and education on how things can (and will) fail and how I can better prepare for and mitigate them. It's like you got to do a ride along in a busy Ambulance service, gets you thinking "hmm, maybe I should take that CPR course and brush up on the heimlich maneuver...".
Even though several of the topics covered weren't things I deal with day to day, I think the mindset you develop after seeing how they solve various issues applies to most any IT / tech endeavor (i.e. whether you're in ops, a SWE, etc.). I think if this book's subject interests you at all, you'll really appreciate having read it.
- Reviewed in the United States on April 1, 2022Format: PaperbackVerified PurchaseTons of nuggets about best practices, how they can be useful across industry, Google's tooling, how they got there, challenges faced, communication between engineers and SRE, how to look at problems, and so much more.
There were parts of the book that got can be too deep or not best explained, and end up boring. I just skipped pages to move on to the next learning.
Overall a good addition to my library.
- Reviewed in the United States on February 11, 2024Format: PaperbackVerified PurchaseI think Googles practices are now standard across the industry. A lot of things mentioned in the book are already in practice at my employ. Good read.
Top reviews from other countries
J. AndrewsReviewed in the United Kingdom on January 21, 20185.0 out of 5 stars The book every infrastructure engineer and DevOps person should read
Format: KindleVerified PurchaseIf you are new to infrastructure engineering this book will inform you as to an approach and model to use as you start down this road. If you are an experienced engineer then you will see a lot of truth in what is written here. It may change you viewpoint or solidify an existing one, whatever the case this book is an essential reference and an honest account with a huge amount of wisdom.
Niels AlbersReviewed in the Netherlands on May 4, 20165.0 out of 5 stars Must read for the serious DevOps engineer
Format: KindleVerified PurchaseJust the first chapter alone lists a number of concrete issues that anyone who has any experience with operations at all will both recognise, and the recommendations this book makes just make sense. Actually, not only people with DevOps experience should be reading this, there is a lot in here that their managers could certainly profit from, in every sense of the word.
Key words:
- Error budget
- Toil / development ballance (and the 50% time rule)
- The impossibility of never having a failure.
I'm still working my way through the book, but every new chapter has new insights that really help to put our complex job into perspective, and offer concrete ways of making our work better.
GrisiReviewed in Australia on November 1, 20235.0 out of 5 stars Deep info
Format: PaperbackVerified PurchaseWhat to know about an Engineer - read this book - it's Deep
ChfReviewed in Italy on May 16, 20165.0 out of 5 stars Interesting and useful
Format: KindleVerified PurchaseOf course, I have not the same infrastructure like Google but many problems are the same.
This book is very interesting because shows different tips & tricks to resolve and manage communication problems between departments and of course reliability problems.
I suggest it to every IT professional, ITIL experts, DevOps wannabe and of course CTO.
-
Óscar Casal SánchezReviewed in Spain on April 18, 20175.0 out of 5 stars Excelente libro
Format: KindleVerified PurchaseLibro excelente que da muchos puntos de vista de como formar un equipo de trabajo y cómo afrontar los problemas. También recorre todos los procesos de una empresa: presupuestos, monitorización, sla, puesta marcha servicio, mantenimiento de un servicio...
En este libro se ve que la cultura de Google es "blameless" y que no hay una línea entre devs y ops, existe el concepto de SRE que podría decirse que es parecido al actual de devops, aunque con más funciones.
Libro que debería leer toda persona que trabaja en IT y también a toda la
































