Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

detect OOM reaper action on p4-fusion#51284

Merged
peterguy merged 17 commits intomainfrom
pg/42385
May 17, 2023
Merged

detect OOM reaper action on p4-fusion#51284
peterguy merged 17 commits intomainfrom
pg/42385

Conversation

@peterguy
Copy link
Contributor

@peterguy peterguy commented Apr 29, 2023

Address #42385 and other issues that crop up by wrapping p4-fusion calls with a shell script that detects when p4-fusion is killed, gathers resource usage stats about the killed process, outputs info about the death along with the resource usage stats, and exits with a non-zero return code, so that the error is picked up by the UI and shown in the repo list (see attached video)

To reduce friction, the shell script is named p4-fusion and the actual p4-fusion binary executable is renamed p4-fusion-binary. If we want to use different names, we will also need to modify the gitserver code that runs the p4-fusion commands.

Test plan

Build a gitserver Docker image, run a p4-fusion command in it, kill the p4-fusion-binary process, see that the output ends with info about the killed process, and see that the return code is non-zero.

Build a gitserver Docker image:

VERSION=dev IMAGE=sourcegraph/gitserver ./cmd/gitserver/build.sh

Start a gitserver instance:

HOSTNAME=127.0.0.1:3178
GITSERVER_EXTERNAL_ADDR=127.0.0.1:3503
GITSERVER_ADDR=127.0.0.1:3503
SRC_REPOS_DIR=$HOME/.sourcegraph/repos_3
SRC_PROF_HTTP=127.0.0.1:3553
GITSERVER_INDEX=3
docker run \
--rm \
-e "GITSERVER_EXTERNAL_ADDR=${GITSERVER_EXTERNAL_ADDR}" \
-e "GITSERVER_ADDR=0.0.0.0:${HOSTNAME##*:}" \
-e "SRC_FRONTEND_INTERNAL=host.docker.internal:${SRC_FRONTEND_INTERNAL##*:}" \
-e "SRC_PROF_HTTP=0.0.0.0:${SRC_PROF_HTTP##*:}" \
-e "HOSTNAME=${HOSTNAME}" \
-p ${GITSERVER_ADDR}:${HOSTNAME##*:} \
-p ${SRC_PROF_HTTP}:${SRC_PROF_HTTP##*:} \
-v ${SRC_REPOS_DIR}:/data/repos \
--detach \
--name gitserver-${GITSERVER_INDEX} \
sourcegraph/gitserver

Connect two terminals to it:

docker exec -it gitserver-3 bash

In one terminal, run this command (get the admin password from 1Password):

P4PORT=perforce.sgdev.org:1666
P4USER=admin
export P4PORT P4USER
p4 login -a <<<"REDACTED PASSWORD"
p4-fusion \
--path //go/... \
--client "" \
--user "${P4USER}" \
--src /data/repos/go/.git \
--networkThreads 64 \
--printBatch 10 \
--port "${P4PORT}" \
--lookAhead 2000 \
--retries 10 \
--refresh 100000 \
--maxChanges 4000 \
--includeBinaries false \
--fsyncEnable true \
--noColor true

In the other terminal, use pkill -9 p4-fusion-binar to end it. That's not a typo: Alpine's process table stores only the first 15 characters of the command. You could instead use pkill -9 -f p4-fusion-binary, but that matches against the entire command line so it's more dangerous.

In the first terminal:

  • see the output end with something like:

p4-fusion was killed by an external signal. At the time of its demise, it had been running for 00:20, had used 0:01.00 CPU time, reserved 390.95m RAM and was using .14m.

  • type$? and hit Enter to see the return code of the p4-fusion command. It should be 137.

Success!

- detect when p4-fusion gets killed
- report status resource usage in stdout and return code
it's getting complex enough that an anonymous subprocess is unwieldy.
and make the gitserver build scripts identical and able to detect if they are OSS or enterprise,
because the actual build process is the same, except for some naming.
@cla-bot cla-bot bot added the cla-signed label Apr 29, 2023
@peterguy peterguy self-assigned this Apr 29, 2023
@peterguy peterguy added perforce first-class-perforce Issues associated with make Perforce a first class code host labels Apr 29, 2023
peterguy added 4 commits May 1, 2023 07:28
use a pipe to capture output from the p4-fusion binary so that when it exits with a non-zero exit code,
its output can be searched for a handled signal.
Also added support for including the signal name in the output, if it can be found.
@peterguy peterguy requested a review from a team May 2, 2023 22:03
When being killed by the OOM reaper, the p4-fusion-binary process goes into a zombie state for a while before exiting.
The stats collected while in that zombie state are invalid.
Add the process state to the `ps` output and collect stats only if it is still alive.
@peterguy peterguy marked this pull request as ready for review May 11, 2023 18:24
@peterguy peterguy changed the title address issue #42385 detect OOM reaper action on p4-fusion May 12, 2023
Copy link
Contributor

@indradhanush indradhanush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some questions, nothing major to be blocking.

We should run a main-dry-run against this to make sure everything is in place before merging.

peterguy and others added 7 commits May 16, 2023 17:16
@peterguy peterguy merged commit 018beee into main May 17, 2023
@peterguy peterguy deleted the pg/42385 branch May 17, 2023 14:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed first-class-perforce Issues associated with make Perforce a first class code host perforce

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check memory before cloning a large Perforce depot (or monitor for OOM)

3 participants