-
Notifications
You must be signed in to change notification settings - Fork 384
Description
Overview
Kind of an odd duck, but this came out of an internal need at my dayjob and I think it's worth examining for public use.
The tl;dr use case is to have a (long running in their case, but that's actually orthogonal) Python process using Fabric 1 or Invoke to kick off subprocesses that then live outside the control of the main program. Roughly analogous to sourcing a shell script that uses & a bunch.
An obvious counter-argument to this case is "use real process supervision" -- but I've determined that my local users have a good-enough need for true "orphaned children" subprocesses that this falls under a legit, if corner, use case for a toolset like Invoke. Plus the normal adage that if one set of users is doing an unusual thing, there will be others, and I'd rather the software cope gracefully with this instead of surprising users.
Investigation
- Under Fabric 1's
local(..., capture=False)(the default behavior), it's possible to dolocal("shell command &")to background a shell command in a way that's completely disassociated from the parent Python process.- at least under Linux, this is equivalent to
/bin/sh -c "shell command &", which (for reasons I don't fully understand yet) results in the subprocess being a child of PID 1/init, instead of the Python process as normal.
- at least under Linux, this is equivalent to
- Under Invoke, this doesn't currently work because we keep much tighter track of the subprocess, including the reader threads hooked up to its pipes, which don't shut down until forced to by the subprocess closing those pipes.
- If one says
pty=False(the default), the ampersand appears to be ignored; theruncall hangs out until the subprocess completes. - If one says
pty=True, we get even stranger behavior where the subprocess appears to either not run or exit instantly (pending investigation - this is less critical for now)
- If one says
- I've determined that the child process does actually get assigned as a child of init even in this scenario, but Invoke still "blocks" to completion because of those reader threads.
- If we temporarily neuter the reader threads, we get the same behavior as under Fabric 1 - control returns to the
runcaller immediately, and the Python process can even exit without affecting the now orphaned child.- If we pass
Noneto the child process' pipes, and the child process is not internally redirecting output, its out/err still goes to our controlling terminal; this doesn't seem super useful though.
- If we pass
Upshot
A base case here could be to add a kwarg allowing the caller to say "hey, I don't care about this program's output - don't use any IO threads"; in tandem with a trailing ampersand (or eg nohup) this enables the use case under question.
What this leads to is then the question of what should happen to the Result from the run call, as it will generally be full of lies.
Furthermore, this looks very much like an actual "async run()" where a user may want to continue interacting with the subprocess - the only real difference is that I'd expect a naive async setup to "clean up" subprocesses before Python exits, and we don't truly want that here. But that feels orthogonal enough that it can wait for an iterative improvement.
TODO
- Cut up
Runner._run_bodysome more - I've wanted to do this for some time anyway, it's the longest single function in probably the whole codebase - Add an option controlling whether the IO threads are used or not
- Confirm whether that alone suffices for the real world internal use case
- Update how
Resultbehaves in this scenario - guessing we'd yield a different subclass? - Consider extending that to continue interacting with the subproc in some fashion - perhaps the threads actually always run (allowing access to stdout/err) but we simply control whether they get join()'d before returning control.
- Examine the ramifications for interpreter shutdown, both re: IO threads if they do run, and re: an option for reaping or ignoring child processes
- Figure out how Fabric 2 cares about this; think
asynchronousshould "just work", butdisownis possibly a no-op (then again, openssh sshd does very roughly the same crap we do locally re: fork+execv, so it's plausible that if we "behave weirdly" and just cut off all our pipes, it will persist remotely?