Site Reliability Engineering: How Google Runs Production Systems 1, Murphy, Niall Richard, Beyer, Betsy, Jones, Chris, Petoff, Jennifer, eBook

Products related to this item

Sponsored

Page 1 of 1Start over

Customers also bought or read

Page 1 of 1Start over

Observability Engineering: Achieving Production Excellence
224
Kindle Edition
$34.67
Software Engineering at Google: Lessons Learned from Programming Over Time
821
Kindle Edition
$34.67
Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems
162
Kindle Edition
$44.93
Platform Engineering
68
Kindle Edition
$37.05
System Design Interview – An insider's guide
3,307
Distributed Systems & Computing
Kindle Edition
$39.99
The DevOps Handbook: How to Create World-Class Agility, Reliability, & Security in Technology Organizations
3,026
Kindle Edition
$27.99
Systems Performance
328
Kindle Edition
$46.39
Software Architecture: The Hard Parts: Modern Trade-Off Analyses for Distributed Architectures
709
Software Design Tools
Kindle Edition
$37.99
Database Internals: A Deep Dive into How Distributed Data Systems Work
544
Kindle Edition
$34.51
Release It!: Design and Deploy Production-Ready Software
496
Kindle Edition
$38.34
Understanding Distributed Systems, Second Edition: What every developer should know about large distributed applications
271
Kindle Edition
$35.00
The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win
14,850
Business Production & Operations
Kindle Edition
$17.99
Implementing Service Level Objectives: A Practical Guide to SLIs, SLOs, and Error Budgets
91
Kindle Edition
$27.86
The Kubernetes Book
1,557
Kindle Edition
$9.99
Building Microservices: Designing Fine-Grained Systems
815
Kindle Edition
$30.65
The Staff Engineer's Path: A Guide for Individual Contributors Navigating Growth and Change
861
Kindle Edition
$19.99
Seeking SRE: Conversations About Running Production Systems at Scale
94
Kindle Edition
$36.45
A Philosophy of Software Design, 2nd Edition
2,642
Kindle Edition
$9.99
Fundamentals of Software Architecture: A Modern Engineering Approach
65
Computer Programming Logic
Kindle Edition
$54.53
TCP/IP Illustrated: The Protocols, Volume 1 (Addison-Wesley Professional Computing Series)
309
TCP-IP
Kindle Edition
$53.59
Operating Systems: Three Easy Pieces
680
Computer Operating Systems Theory
Kindle Edition
$9.99
Domain-Driven Design: Tackling Complexity in the Heart of Software
1,560
Kindle Edition
$54.73
Kubernetes: Up and Running: Dive into the Future of Infrastructure
115
Kindle Edition
$41.79
Refactoring: Improving the Design of Existing Code (Addison-Wesley Signature Series (Fowler))
1,204
Kindle Edition
$46.39
Terraform: Up and Running: Writing Infrastructure as Code
256
Kindle Edition
$40.73
High Performance Browser Networking: What every web developer should know about networking and web performance
208
Kindle Edition
$37.32
Java Concurrency in Practice
869
Kindle Edition
$32.98
Learning Go: An Idiomatic Approach to Real-World Go Programming
175
Kindle Edition
$41.46
Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations
3,479
Kindle Edition
$16.99
Become an Effective Software Engineering Manager
335
Kindle Edition
$39.99

Loading...

From the brand

Your partner in learning

Visit the Store
Bestsellers

Visit the Store
Software Development

Visit the Store
Programming Languages

Visit the Store
AI / Machine Learning

Visit the Store
Data Science

Visit the Store
Data, Databases and more

Visit the Store
Cloud Services

Visit the Store
Business

Visit the Store
Finance

Visit the Store
Blockchain / Cryptocurrency

Visit the Store
Security

Visit the Store
Lean series

Visit the Store
Cookbooks

Visit the Store
Head First series

Visit the Store
97 Things series

Visit the Store
Sharing the knowledge of experts

O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.

Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.

From the Publisher

This book is divided into four sections:

Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices
Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)
Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems
Management—Explore Google's best practices for training, communication, and meetings that your organization can use

How to Read This Book

This book is a series of essays written by members and alumni of Google’s Site Reliability Engineering organization. It’s much more like conference proceedings than it is like a standard book by an author or a small number of authors. Each chapter is intended to be read as a part of a coherent whole, but a good deal can be gained by reading on whatever subject particularly interests you. (If there are other articles that support or inform the text, we reference them so you can follow up accordingly.)

You don’t need to read in any particular order, though we’d suggest at least starting with Chapters 2 and 3, which describe Google’s production environment and outline how SRE approaches risk, respectively. (Risk is, in many ways, the key quality of our profession.) Reading cover-to-cover is, of course, also useful and possible; our chapters are grouped thematically, into Principles (Part II), Practices (Part III), and Management (Part IV). Each has a small introduction that highlights what the individual pieces are about, and references other articles published by Google SREs, covering specific topics in more detail. Additionally, there’s a companion website mentioned in the book that has a number of helpful resources.

We hope this will be at least as useful and interesting to you as putting it together was for us.

— The Editors.

	Site Reliability Engineering	The Site Reliability Workbook

Customer Reviews	1,219	387
Price	$27.89	$28.89
Explore the book & companion workbook	How Google Runs Production Systems	Practical Ways to Implement SRE

Editorial Reviews

About the Author

Niall Murphy leads the Ads Site Reliability Engineering team at Google Ireland. He has been involved in the Internet industry for about 20 years, and is currently chairperson of INEX, Ireland’s peering hub. He is the author or coauthor of a number of technical papers and/or books, including "IPv6 Network Administration" for O’Reilly, and a number of RFCs. He is currently cowriting a history of the Internet in Ireland, and is the holder of degrees in Computer Science, Mathematics, and Poetry Studies, which is surely some kind of mistake. He lives in Dublin with his wife and two sons.

^

Betsy Beyer is a Technical Writer for Google Site Reliability Engineering in NYC. She has previously written documentation for Google Datacenters and Hardware Operations teams. Before moving to New York, Betsy was a lecturer on technical writing at Stanford University.

^

Chris Jones is a Site Reliability Engineer for Google App Engine, a cloud platform-as-a-service product serving over 28 billion requests per day. Based in San Francisco, he has previously been responsible for the care and feeding of Google’s advertising statistics, data warehousing, and customer support systems. In other lives, Chris has worked in academic IT, analyzed data for political campaigns, and engaged in some light BSD kernel hacking, picking up degrees in Computer Engineering, Economics, and Technology Policy along the way. He’s also a licensed professional engineer.

^

Jennifer Petoff is a Program Manager for Google’s Site Reliability Engineering team and based in Dublin, Ireland. She has managed large global projects across wide-ranging domains including scientific research, engineering, human resources, and advertising operations. Jennifer joined Google after spending eight years in the chemical industry. She holds a PhD in Chemistry from Stanford University and a BS in Chemistry and a BA in Psychology from the University of Rochester.

Product details

ASIN ‏ : ‎ B01DCPXKZ6
Publisher ‏ : ‎ O'Reilly Media
Accessibility ‏ : ‎ Learn more
Publication date ‏ : ‎ March 23, 2016
Edition ‏ : ‎ 1st
Language ‏ : ‎ English
File size ‏ : ‎ 12.0 MB
Enhanced typesetting ‏ : ‎ Enabled
X-Ray ‏ : ‎ Not Enabled
Word Wise ‏ : ‎ Enabled
Print length ‏ : ‎ 865 pages
ISBN-10 ‏ : ‎ 9781491951163
ISBN-13 ‏ : ‎ 978-1491951163
Page Flip ‏ : ‎ Enabled
Best Sellers Rank: #318,960 in Kindle Store (See Top 100 in Kindle Store)
- #6 in Network Disaster & Recovery Administration
- #8 in Computer Systems Analysis & Design (Books)
- #15 in Distributed Systems & Computing
Customer Reviews:
(1,219)

Brief content visible, double tap to read full content.

Full content visible, double tap to read brief content.

Videos

Help others learn more about this product by uploading a video!

Upload your video

About the authors

Follow authors to get new release updates, plus improved recommendations.

Niall Richard Murphy
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
- Working on an SRE-based startup from Dublin, Ireland
- Twitter http://twitter.com/niallm
- Photos at http://www.edge-cases.photos
See more on the author's page
Chris Jones
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
Discover more of the author’s books, see similar authors, read book recommendations and more.
See more on the author's page
Jennifer Petoff
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
Jennifer Petoff is Director of Google Cloud Platform (GCP) & Technical Infrastructure (TI) Education and is based in Lisbon, Portugal. She leads training programs for Google's GCP and TI Engineering Teams. Jennifer is one of the co-editors of the best-selling book, "Site Reliability Engineering: How Google Runs Production Systems"; lead author of "Training Site Reliability Engineers: What Your Organization Needs to Create a Learning Program"; and is a regular speaker at DevOps and SRE conferences around the world.
Jennifer joined Google in 2007 after spending eight years in the chemical industry. She holds a PhD in Chemistry from Stanford University and a BS in Chemistry and a BA in Psychology from the University of Rochester in the United States.
See more on the author's page
Betsy Beyer
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
Betsy is a Technical Writer for Google in NYC specializing in Site Reliability Engineering. She has previously written documentation for Google's Data Center and Hardware Operations Teams in Mountain View and across its globally-distributed data centers. Before moving to New York, Betsy was a lecturer on technical writing at Stanford University. En route to her current career, Betsy studied International Relations and English Literature, and holds degrees from Stanford and Tulane.
See more on the author's page

Products related to this item

Sponsored

Page 1 of 1Start over

Related books

Page 1 of 1Start Over

Sponsored

Customer reviews

1,219 global ratings

How customer reviews and ratings work

Sponsored

Reviews with images

See all photos

Top reviews from the United States

There was a problem filtering reviews. Please reload the page.

Wats_lopes
Amazing book with high quality content
Reviewed in the United States on October 4, 2025
Format: PaperbackVerified Purchase

Amazing book with high quality content

Read more

Helpful

Report
Claudio Rodriguez
Great book to help small companies
Reviewed in the United States on April 20, 2019
Format: PaperbackVerified Purchase
It's worth noting that there is a great Coursera course about SRE from Google. It will not cover as much as the book, but's it is a distilled version to learn the basics.

This book has a lot of great information, which I found invaluable over the years. One of the harder thing for growing organizations is to keep teams focused, and I've seen that DevOps and SRE practices help to zero in on what is essential.

A lot of Automation related work feels like 'yak shaving,' which is a term to refer to entirely unrelated things that don't add value to our product. For development teams, this feels very frustrating. Why would I want to make a script to automate this? We only use it once a year!

SRE helps to solve these frustrations, to some extent, with practices that help organizations understand why should they communicate, why should they talk about issues, and why we measure some things on some level and not others.

Read more

9 people found this helpful

Helpful

Report
Amazon Customer
Excellent
Reviewed in the United States on January 28, 2026
Format: PaperbackVerified Purchase
Excellent

Read more

Helpful

Report
Eric H.
Lots of great information, but also a lot of redundancy
Reviewed in the United States on December 22, 2018
Format: KindleVerified Purchase
First off - it's worth noting that Google lets you read this entire book for free on their website.

I bought the Kindle version anyways because I spend enough time in front of a backlit screen that it seemed worth it to read something this large using a device that's better on your eyes. Unfortunately the Kindle version is formatted terribly and I wish I'd bought the print version instead. The book is broken up into Parts which are broken up into Chapters which are further broken up into headlined sections. The Kindle version identifies those headlined sections as chapters which is somewhat useless.

Anyways, the first few chapters aren't especially useful unless you work at Google. They mostly discuss what's unique about Google's computing infrastructure. Despite this, they were EASILY my favorite part of the book because the material is so interesting and their approach is so unique. After that, each chapter is written in a way that it can stand on its own if you aren't reading the entire book, or are reading it out of order. This is convenient for people who want to pick and choose what parts they want to read, but means that people who are reading the entire thing wind up getting a lot of the same information multiple times. It's all written by different people too, which on the one hand makes it not quite as repetitive, but on the other hand makes it hard to just skim over the sections with info you already have because you don't recognize it as information you already know until you've processed it.

Overall this is a fantastic book on DevOps, SRE, and current trends in the industry, It's a great read for anyone who wants to apply some "best practices" to their role. I would however say that reading the entire thing is overkill for most people and not necessarily the best use of your time if you have other things you'd like to be learning as well.

Part 1 - Fascinating read. I imagine this would be a good overview if you're about to start at Google and want a sneak peek at how things are done, but I'm only speculating this as an outsider.

Part 2 - Interesting and useful concepts for modern cloud computing.

Part 3 - Some useful info and a lot of stuff that's not really unique to Google in my experience. Read the parts that you think you could use some improvement on, skip the rest.

Part 4 - A condensed view from a managerial perspective of things you already read in Part 3.

Part 5 - Some case studies, comparisons from other businesses, a useless recap, and examples that could be useful to share using the website version of the book if you're trying to explain to your team what new concepts are being implemented.

Read more

47 people found this helpful

Helpful

Report
R. J. Silva
Comprehensive and Detailed Roadmap for Operating a Large Production Environment
Reviewed in the United States on January 8, 2021
Format: PaperbackVerified Purchase
I was amazed by the depth of this book, and the way it covers several aspects of what it takes to operate a complex and distributed software system. I was particularly impressed with the details of some chapters related to monitoring, load balancing (at the front end and back end), designing applications to manage overload conditions, and being on call.
I think the book has a lot to teach and inspire. Some of the approaches described are very counterintuitive like the error budget, and the blameless postmortem culture. One of the shortcomings I noticed was that some chapters are hard to read because they treat rather advanced topics. The fact that the book has very few illustrations makes it hard to understand some of the concepts at times. Overall, an invaluable resource.

Read more

One person found this helpful

Helpful

Report
RJ
Used most of a highlighter on this book...
Reviewed in the United States on June 26, 2016
Format: PaperbackVerified Purchase
The really liked this book. Cool to see how Google actually runs things at their scale. Got me thinking about things I never thought about when it comes to my work in tech. This could sound like the book makes you paranoid, but I think that's too negative. I felt more like I now have a little license and education on how things can (and will) fail and how I can better prepare for and mitigate them. It's like you got to do a ride along in a busy Ambulance service, gets you thinking "hmm, maybe I should take that CPR course and brush up on the heimlich maneuver...".

Even though several of the topics covered weren't things I deal with day to day, I think the mindset you develop after seeing how they solve various issues applies to most any IT / tech endeavor (i.e. whether you're in ops, a SWE, etc.). I think if this book's subject interests you at all, you'll really appreciate having read it.

Read more

16 people found this helpful

Helpful

Report
A Goel
Great insight in Google SRE and best practices
Reviewed in the United States on April 1, 2022
Format: PaperbackVerified Purchase
Tons of nuggets about best practices, how they can be useful across industry, Google's tooling, how they got there, challenges faced, communication between engineers and SRE, how to look at problems, and so much more.
There were parts of the book that got can be too deep or not best explained, and end up boring. I just skipped pages to move on to the next learning.
Overall a good addition to my library.

Read more

Helpful

Report
Shop
Good peak under the hood.
Reviewed in the United States on February 11, 2024
Format: PaperbackVerified Purchase
I think Googles practices are now standard across the industry. A lot of things mentioned in the book are already in practice at my employ. Good read.

Read more

Helpful

Report

Top reviews from other countries

Translate all reviews to English

J. Andrews
The book every infrastructure engineer and DevOps person should read
Reviewed in the United Kingdom on January 21, 2018
Format: KindleVerified Purchase

If you are new to infrastructure engineering this book will inform you as to an approach and model to use as you start down this road. If you are an experienced engineer then you will see a lot of truth in what is written here. It may change you viewpoint or solidify an existing one, whatever the case this book is an essential reference and an honest account with a huge amount of wisdom.

Read more
Report
Niels Albers
Must read for the serious DevOps engineer
Reviewed in the Netherlands on May 4, 2016
Format: KindleVerified Purchase
Just the first chapter alone lists a number of concrete issues that anyone who has any experience with operations at all will both recognise, and the recommendations this book makes just make sense. Actually, not only people with DevOps experience should be reading this, there is a lot in here that their managers could certainly profit from, in every sense of the word.
Key words:
- Error budget
- Toil / development ballance (and the 50% time rule)
- The impossibility of never having a failure.

I'm still working my way through the book, but every new chapter has new insights that really help to put our complex job into perspective, and offer concrete ways of making our work better.

Read more
Report
Grisi
Deep info
Reviewed in Australia on November 1, 2023
Format: PaperbackVerified Purchase
What to know about an Engineer - read this book - it's Deep

Read more
Report
Chf
Interesting and useful
Reviewed in Italy on May 16, 2016
Format: KindleVerified Purchase
Of course, I have not the same infrastructure like Google but many problems are the same.
This book is very interesting because shows different tips & tricks to resolve and manage communication problems between departments and of course reliability problems.
I suggest it to every IT professional, ITIL experts, DevOps wannabe and of course CTO.

Read more
Report
Óscar Casal Sánchez
Excelente libro
Reviewed in Spain on April 18, 2017
Format: KindleVerified Purchase
Libro excelente que da muchos puntos de vista de como formar un equipo de trabajo y cómo afrontar los problemas. También recorre todos los procesos de una empresa: presupuestos, monitorización, sla, puesta marcha servicio, mantenimiento de un servicio...

En este libro se ve que la cultura de Google es "blameless" y que no hay una línea entre devs y ops, existe el concepto de SRE que podría decirse que es parecido al actual de devops, aunque con más funciones.

Libro que debería leer toda persona que trabaja en IT y también a toda la

Read more
Report
Translate review to English

Customers who viewed this item also viewed

Buy for others

How it works

Sorry, there was a problem.

Sorry, there was a problem.

Image Unavailable

Follow the authors

Site Reliability Engineering: How Google Runs Production Systems 1st Edition, Kindle Edition

See all supported devices

Kindle E-Readers

Fire Tablets

Free Kindle Reading Apps

Products related to this item

Customers also bought or read

From the brand

Your partner in learning

Bestsellers

Software Development

Programming Languages

AI / Machine Learning

Data Science

Data, Databases and more

Cloud Services

Business

Finance

Blockchain / Cryptocurrency

Security

Lean series

Cookbooks

Head First series

97 Things series

From the Publisher

This book is divided into four sections:

How to Read This Book

Editorial Reviews

About the Author

Product details

Videos

About the authors

Niall Richard Murphy

Chris Jones

Jennifer Petoff

Betsy Beyer

Products related to this item

Related books

Customer reviews

Images in this review

Reviews with images

Kindle edition is horribly formatted

Top reviews from the United States

There was a problem filtering reviews. Please reload the page.

Top reviews from other countries

5.0 out of 5 stars The book every infrastructure engineer and DevOps person should read

5.0 out of 5 stars Must read for the serious DevOps engineer

5.0 out of 5 stars Deep info

5.0 out of 5 stars Interesting and useful

5.0 out of 5 stars Excelente libro

Report an issue

The book every infrastructure engineer and DevOps person should read

Must read for the serious DevOps engineer

Deep info

Interesting and useful

Excelente libro