Conversation
This comment has been minimized.
This comment has been minimized.
af547a6 to
7c4944d
Compare
|
|
||
| // Remove make-related flags to ensure Cargo can correctly set things up | ||
| cargo.env_remove("MAKEFLAGS"); | ||
| cargo.env_remove("MFLAGS"); |
There was a problem hiding this comment.
This was intentional because of #56090 (comment). I don't know if that issue has been fixed since or not.
There was a problem hiding this comment.
jobserver-rs has much better diagnostics now, even if it's not fixed now it will be more clear what's going on.
There was a problem hiding this comment.
I'll have a look. Possibly I could split the PR in two; the bootstrap - cmake - ninja-build - ... process tree doesn't have cargo in it.
There was a problem hiding this comment.
I quickly looked at it. Few observations:
The "bad file descriptor" issue applies only to old style jobservers that work with pipes. The issue happens when a parent process either closes the file descriptors explicitly or makes them non-inheritable, yet continues to advertise them in the MAKEFLAGS environment variable to child processes.
As far as I can see, sccache closes file descriptors explicitly, but for a valid reason: they start a daemon the first time you run sccache with has a lifetime (heh) longer than the jobserver. Afterwards, they start their own jobserver, and I suppose they set their own MAKEFLAGS too then.
Another source of this issue is possibly the Python scripts in this repo. Python defaults to making file descriptors non-inheritable (PEP 445). So, to support an old style jobserver, subprocess.Popen(..., close_fds=False) should be used in bootstrap.py in this repo.
However, given that (a) ninja only supports the new style jobserver, and (b) the sccache logic about closing fds doesn't run with the new style jobserver, I don't think it's worth trying to support the old jobserver protocol.
So, my proposal is, if MAKEFLAGS contains the string --jobserver-auth=fifo: (new style jobserver):
- Pass
MAKEFLAGSverbatim to child processes - Avoid passing
-j Ntocmaketo ensurecmakeandninjarespect the jobserver
That would be the minimal change, and there shouldn't be any bad file descriptor errors.
There was a problem hiding this comment.
I've pushed a change, I don't think there should be any concerns related to "bad file descriptor" issues now.
This comment was marked as off-topic.
This comment was marked as off-topic.
When bootstrapping Rust, the `-j N` flag was passed to CMake, which was then forwarded to Ninja. This prevents the jobserver from being used, and as a result leads to oversubscription when Rust is just one of the many packages built as part of a larger software stack. Since Cargo and the Rust compiler have long supported the jobserver, it would be good if also bootstrapping Rust itself would participate in the protocol, leading to composable parallelism. This change allows bootstrapping to respect an existing FIFO based jobserver. Old pipe based jobservers are not supported, because they are brittle: currently the Python scripts in bootstrap do not inherit the file descriptors, but do pass on `MAKEFLAGS`. Because Ninja only supports FIFO based jobservers, it's better to focus on new jobservers only. In summary: * Bootstrap Cargo passes `MAKEFLAGS` verbatim to subprocesses if it advertises a FIFO style jobserver, otherwise it unsets it. * `llvm.rs` does not pass `-j` to `cmake` when a FIFO style jobserver is set in `MAKEFLAGS. * Bootstrap Cargo no longer unsets `MKFLAGS`: from git blame, GNU Make considered it a historical artifact back in 1992, and it is never read by GNU Make, it's only set for backwards compatibility. Signed-off-by: Harmen Stoppels <me@harmenstoppels.nl>
7c4944d to
20cb59b
Compare
|
Question: Instead of magically inspecting the ambient MAKEFLAGS, would it make sense to have a bootstrap.toml setting that explicitly enables or disables MAKEFLAGS passthrough? That would potentially avoid the need for magic, while still allowing jobserver-based build configurations to easily opt into the behaviour they want. |
|
Thanks for your comment! I would argue that Currently This PR can be seen as adding Regarding consistency, I haven't checked this in-depth, but I don't think For a packager running a large build with more than just Rust, the expectation is that children respect the parent's resource limits (let's say |
|
The job Click to see the possible cause of the failure (guessed by this bot) |
|
Bear in mind that The mechanism we have for resolving those conflicts is |
|
That's a fair point. Preferably I'd like to see concrete issues before defensively contributing a configuration option that then stays around. But if you insist, I'm happy to add it, cause my goal is simply to make the rust build behave nicely. Why I don't expect serious issues with this PR:
|
When bootstrapping Rust, the
-j Nflag was passed to CMake, which wasthen forwarded to Ninja. This prevents the jobserver from being used,
and as a result leads to oversubscription when Rust is just one of the
many packages built as part of a larger software stack.
Since Cargo and the Rust compiler have long supported the jobserver, it
would be good if also bootstrapping Rust itself would participate in the
protocol, leading to composable parallelism.
This change allows bootstrapping to respect an existing FIFO based
jobserver. Old pipe based jobservers are not supported, because they are
brittle: currently the Python scripts in bootstrap do not inherit the
file descriptors, but do pass on
MAKEFLAGS, which has lead to errorslike "invalid file descriptor" in the past. Because Ninja only supports
FIFO based jobservers, it's better to focus on new jobservers only,
which shouldn't suffer from the "invalid file descriptor" issue.
In summary:
MAKEFLAGSverbatim to subprocesses if itadvertises a FIFO style jobserver, otherwise it unsets it. This ensures
subprocesses respect the jobserver during bootstrap.
llvm.rsdoes not pass-jtocmakewhen a FIFO style jobserver isset in
MAKEFLAGS. This ensures Ninja respects the jobserver.MKFLAGS: from git blame, GNU Makeconsidered it a historical artifact back in 1992, and it is never read
by GNU Make, it's only set for backwards compatibility in case sub-Makefiles
read it.
I've tested this with the Spack package manager starting the POSIX jobserver,
building node.js and rust in parallel with
-j16, which looks like this:As you can see there are 10
g++processes running for rust, and6for node.js, andwith a mix of
makeandninjaas build tools :).(The only violation I see now is
rust-lld, but I think that'll be fixed with the LLVM 23release)