Conversation
After reading some articles [1] [2] yesterday about Docker and the "init" process I got to thinking about the problems that we've been seeing on Travis. The basic problem is that a Linux system may need an "init" process to work properly when processes become zombies. Docker by default doesn't handle this and the root process typically isn't an init process, so this can occasionally cause quite a few problems. We've been seeing spurious errors on Travis inside containers which look like OOM and such, but my guess is that zombie processes were being reparented to the top-level shell. The shell didn't expect the zombies and then behaved very strangely. This commit fixes these problems by using Yelp's "dumb-init" program [2] as the init process in all of our containers. This ensures that there's a valid init ready to reap children when they're reparented, which our test suite apparently generates a bunch of throughout the tests and such. [1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ [2]: https://engineeringblog.yelp.com/2016/01/dumb-init-an-init-for-docker.html
Contributor
|
(rust_highfive has picked a reviewer for you, use r? to override) |
Member
Author
|
@bors: r+ |
Collaborator
|
📌 Commit 5e991e0 has been approved by |
Collaborator
|
⌛ Testing commit 5e991e0 with merge 01d53df... |
bors
added a commit
that referenced
this pull request
Dec 14, 2016
Fix travis builds After reading some articles [1] [2] yesterday about Docker and the "init" process I got to thinking about the problems that we've been seeing on Travis. The basic problem is that a Linux system may need an "init" process to work properly when processes become zombies. Docker by default doesn't handle this and the root process typically isn't an init process, so this can occasionally cause quite a few problems. We've been seeing spurious errors on Travis inside containers which look like OOM and such, but my guess is that zombie processes were being reparented to the top-level shell. The shell didn't expect the zombies and then behaved very strangely. This commit fixes these problems by using Yelp's "dumb-init" program [2] as the init process in all of our containers. This ensures that there's a valid init ready to reap children when they're reparented, which our test suite apparently generates a bunch of throughout the tests and such. [1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ [2]: https://engineeringblog.yelp.com/2016/01/dumb-init-an-init-for-docker.html
Contributor
|
Well I'm glad to see all that digging we did yesterday had some useful outcome. :) We really should just get your Linux CI moved to Taskcluster. Not that it would have necessarily made a difference for this specific issue, but we have a pretty nice working setup with in-tree Dockerfiles for Firefox builds, and I suspect you'd benefit from some of the other flexibility. |
Collaborator
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
After reading some articles 1 2 yesterday about Docker and the "init"
process I got to thinking about the problems that we've been seeing on Travis.
The basic problem is that a Linux system may need an "init" process to work
properly when processes become zombies. Docker by default doesn't handle this
and the root process typically isn't an init process, so this can occasionally
cause quite a few problems.
We've been seeing spurious errors on Travis inside containers which look like
OOM and such, but my guess is that zombie processes were being reparented to the
top-level shell. The shell didn't expect the zombies and then behaved very
strangely.
This commit fixes these problems by using Yelp's "dumb-init" program 2 as the
init process in all of our containers. This ensures that there's a valid init
ready to reap children when they're reparented, which our test suite apparently
generates a bunch of throughout the tests and such.