Skip to content

bootstrap: respect POSIX jobserver#152057

Open
haampie wants to merge 1 commit intorust-lang:mainfrom
haampie:hs/fix/bootstrap-respect-jobserver-protocol
Open

bootstrap: respect POSIX jobserver#152057
haampie wants to merge 1 commit intorust-lang:mainfrom
haampie:hs/fix/bootstrap-respect-jobserver-protocol

Conversation

@haampie
Copy link

@haampie haampie commented Feb 3, 2026

When bootstrapping Rust, the -j N flag was passed to CMake, which was
then forwarded to Ninja. This prevents the jobserver from being used,
and as a result leads to oversubscription when Rust is just one of the
many packages built as part of a larger software stack.

Since Cargo and the Rust compiler have long supported the jobserver, it
would be good if also bootstrapping Rust itself would participate in the
protocol, leading to composable parallelism.

This change allows bootstrapping to respect an existing FIFO based
jobserver. Old pipe based jobservers are not supported, because they are
brittle: currently the Python scripts in bootstrap do not inherit the
file descriptors, but do pass on MAKEFLAGS, which has lead to errors
like "invalid file descriptor" in the past. Because Ninja only supports
FIFO based jobservers, it's better to focus on new jobservers only,
which shouldn't suffer from the "invalid file descriptor" issue.

In summary:

  • Bootstrap Cargo passes MAKEFLAGS verbatim to subprocesses if it
    advertises a FIFO style jobserver, otherwise it unsets it. This ensures
    subprocesses respect the jobserver during bootstrap.
  • llvm.rs does not pass -j to cmake when a FIFO style jobserver is
    set in MAKEFLAGS. This ensures Ninja respects the jobserver.
  • Bootstrap Cargo no longer unsets MKFLAGS: from git blame, GNU Make
    considered it a historical artifact back in 1992, and it is never read
    by GNU Make, it's only set for backwards compatibility in case sub-Makefiles
    read it.

I've tested this with the Spack package manager starting the POSIX jobserver,
building node.js and rust in parallel with -j16, which looks like this:

$ pstree 382710
python3─┬─python3
        └─python3─┬─python3─┬─make───make───6*[ccache───g++───cc1plus]
                  │         └─{python3}
                  └─python3─┬─python3.11───bootstrap───cmake───ninja-build───10*[sh───ccache───g++───cc1plus]
                            └─{python3}

As you can see there are 10 g++ processes running for rust, and 6 for node.js, and
with a mix of make and ninja as build tools :).

(The only violation I see now is rust-lld, but I think that'll be fixed with the LLVM 23
release)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Feb 3, 2026
@rustbot
Copy link
Collaborator

rustbot commented Feb 3, 2026

r? @jieyouxu

rustbot has assigned @jieyouxu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rust-log-analyzer

This comment has been minimized.

@haampie haampie force-pushed the hs/fix/bootstrap-respect-jobserver-protocol branch from af547a6 to 7c4944d Compare February 3, 2026 15:48

// Remove make-related flags to ensure Cargo can correctly set things up
cargo.env_remove("MAKEFLAGS");
cargo.env_remove("MFLAGS");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional because of #56090 (comment). I don't know if that issue has been fixed since or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobserver-rs has much better diagnostics now, even if it's not fixed now it will be more clear what's going on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a look. Possibly I could split the PR in two; the bootstrap - cmake - ninja-build - ... process tree doesn't have cargo in it.

Copy link
Author

@haampie haampie Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quickly looked at it. Few observations:

The "bad file descriptor" issue applies only to old style jobservers that work with pipes. The issue happens when a parent process either closes the file descriptors explicitly or makes them non-inheritable, yet continues to advertise them in the MAKEFLAGS environment variable to child processes.

As far as I can see, sccache closes file descriptors explicitly, but for a valid reason: they start a daemon the first time you run sccache with has a lifetime (heh) longer than the jobserver. Afterwards, they start their own jobserver, and I suppose they set their own MAKEFLAGS too then.

Another source of this issue is possibly the Python scripts in this repo. Python defaults to making file descriptors non-inheritable (PEP 445). So, to support an old style jobserver, subprocess.Popen(..., close_fds=False) should be used in bootstrap.py in this repo.

However, given that (a) ninja only supports the new style jobserver, and (b) the sccache logic about closing fds doesn't run with the new style jobserver, I don't think it's worth trying to support the old jobserver protocol.


So, my proposal is, if MAKEFLAGS contains the string --jobserver-auth=fifo: (new style jobserver):

  1. Pass MAKEFLAGS verbatim to child processes
  2. Avoid passing -j N to cmake to ensure cmake and ninja respect the jobserver

That would be the minimal change, and there shouldn't be any bad file descriptor errors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a change, I don't think there should be any concerns related to "bad file descriptor" issues now.

@jieyouxu

This comment was marked as off-topic.

@rustbot rustbot assigned Zalathar and unassigned jieyouxu Feb 5, 2026
When bootstrapping Rust, the `-j N` flag was passed to CMake, which was
then forwarded to Ninja. This prevents the jobserver from being used,
and as a result leads to oversubscription when Rust is just one of the
many packages built as part of a larger software stack.

Since Cargo and the Rust compiler have long supported the jobserver, it
would be good if also bootstrapping Rust itself would participate in the
protocol, leading to composable parallelism.

This change allows bootstrapping to respect an existing FIFO based
jobserver. Old pipe based jobservers are not supported, because they are
brittle: currently the Python scripts in bootstrap do not inherit the
file descriptors, but do pass on `MAKEFLAGS`. Because Ninja only
supports FIFO based jobservers, it's better to focus on new jobservers
only.

In summary:

* Bootstrap Cargo passes `MAKEFLAGS` verbatim to subprocesses if it
  advertises a FIFO style jobserver, otherwise it unsets it.
* `llvm.rs` does not pass `-j` to `cmake` when a FIFO style jobserver is
  set in `MAKEFLAGS.
* Bootstrap Cargo no longer unsets `MKFLAGS`: from git blame, GNU Make
  considered it a historical artifact back in 1992, and it is never read
  by GNU Make, it's only set for backwards compatibility.

Signed-off-by: Harmen Stoppels <me@harmenstoppels.nl>
@haampie haampie force-pushed the hs/fix/bootstrap-respect-jobserver-protocol branch from 7c4944d to 20cb59b Compare February 5, 2026 09:07
@Zalathar
Copy link
Member

Zalathar commented Feb 5, 2026

Question: Instead of magically inspecting the ambient MAKEFLAGS, would it make sense to have a bootstrap.toml setting that explicitly enables or disables MAKEFLAGS passthrough?

That would potentially avoid the need for magic, while still allowing jobserver-based build configurations to easily opt into the behaviour they want.

@haampie
Copy link
Author

haampie commented Feb 5, 2026

Thanks for your comment! I would argue that MAKEFLAGS is sufficiently standard to be the authoritative source, that I don't see the need for a second configuration option.

Currently cargo, rustc,make, ninja and gcc all turn into a jobserver client based on the value of MAKEFLAGS. Soon, with LLVM 23, clang and lld can be added to this list. One exception is the opt-in gcc -flto=auto, but to me that's rather an inconvenience because it's hard to ensure this flag is consistently passed to the compiler at the level of a package.

This PR can be seen as adding bootstrap to that list of tools, ensuring consistency with cargo and rustc.

Regarding consistency, I haven't checked this in-depth, but I don't think cargo has a way to prevent it from becoming a client to a jobserver advertised in MAKEFLAGS, other than unsetting the variable.

For a packager running a large build with more than just Rust, the expectation is that children respect the parent's resource limits (let's say -j N is a decent first order approximation to max system load). An explicit package-specific opt-in in bootstrap.toml means it will only be discovered after the system has already locked up due to oversubscription, which is the reason I submitted this PR :D.

@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-tools failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
REPOSITORY                                   TAG       IMAGE ID       CREATED      SIZE
ghcr.io/dependabot/dependabot-updater-core   latest    bcec0b4e062b   9 days ago   783MB
=> Removing docker images...
Deleted Images:
untagged: ghcr.io/dependabot/dependabot-updater-core:latest
untagged: ghcr.io/dependabot/dependabot-updater-core@sha256:b662be51f7b8ef7e2c8464428f14e49cb79c36aa9afb7ecb9221dfe0f507050c
deleted: sha256:bcec0b4e062b5ffe11cc1c2729558c0cd96621c0271ab5e97ff3a56e0c25045a
deleted: sha256:64e147d5e54d9be8b8aa322e511cda02296eda4b8b8d063c6a314833aca50e29
deleted: sha256:5cba409bb463f4e7fa1a19f695450170422582c1bc7c0e934d893b4e5f558bc6
deleted: sha256:cddc6ebd344b0111eaab170ead1dfda24acdfe865ed8a12599a34d338fa8e28b
deleted: sha256:2412c3f334d79134573cd45e657fb6cc0abd75bef3881458b0d498d936545c8d
---
tests/ui/double_parens.rs ... ok
tests/ui/drain_collect.fixed ... ok
tests/ui/duplicate_underscore_argument.rs ... ok
tests/ui/duplicated_attributes.rs ... ok
tests/ui/duration_suboptimal_units_days_weeks.rs ... ok
tests/ui/duration_suboptimal_units.rs ... ok
tests/ui/duration_subsec.rs ... ok
tests/ui/double_parens.fixed ... ok
tests/ui/duration_suboptimal_units_days_weeks.fixed ... ok
tests/ui/duration_suboptimal_units.fixed ... ok
tests/ui/duration_subsec.fixed ... ok
tests/ui/elidable_lifetime_names.rs ... ok
tests/ui/eager_transmute.rs ... ok
tests/ui/else_if_without_else.rs ... ok
tests/ui/empty_docs.rs ... ok
---
...............................................    (147/147)

======== tests/rustdoc-gui/search-filter.goml ========

[ERROR] line 48: Error: The CSS selector "#search-tabs .count.loading" still exists: for command `wait-for-false: "#search-tabs .count.loading"`
    at <file:///checkout/obj/build/x86_64-unknown-linux-gnu/test/rustdoc-gui/doc/test_docs/index.html?search=test>

======== tests/rustdoc-gui/search-result-display.goml ========

[WARNING] line 39: Delta is 0 for "x", maybe try to use `compare-elements-position` instead?

@Zalathar
Copy link
Member

Zalathar commented Feb 5, 2026

Bear in mind that bootstrap has a variety of users with different and conflicting needs. What makes sense for downstream packager/distro-style workflows might be unwelcome for local development, or for upstream CI.

The mechanism we have for resolving those conflicts is bootstrap.toml, in conjunction with defaults and profiles and configure scripts.

@haampie
Copy link
Author

haampie commented Feb 5, 2026

That's a fair point.

Preferably I'd like to see concrete issues before defensively contributing a configuration option that then stays around. But if you insist, I'm happy to add it, cause my goal is simply to make the rust build behave nicely.

Why I don't expect serious issues with this PR:

  • MAKEFLAGS is typically not set by CI because of reproducibility, nor by the user (I don't think people do export MAKEFLAGS=...). If it is set, it's typically because the build is kicked off by make, and then I would argue respecting its jobserver is desirable.

  • For comparsion I looked at llvm/llvm-project and llvm/llvm-zorg: other than support for the jobserver, they don't mention or unset MAKEFLAGS in any scripts or sources. At the same time CI and developers use jobserver aware tools like cmake and ninja. Are there any differences in Rust development/infra that would make passing MAKEFLAGS problematic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants