Currently rustc has a requirement that whenever it compiles a crate that it understands the entire crate graph of all dependencies. In other words, when you compile a crate like cargo the compiler will have to load all dependencies (including dependencies of dependencies). This means that rustc has to actually find and locate the rlib artifacts on the filesystem.
For immediate dependencies (those specified via extern crate) the compiler has --extern arguments (passed by Cargo) to guard it. For dependencies of dependencies Cargo knows where the files are but doesn't tell rustc. Instead rustc is left to its own devices to actually find these rlibs.
This probing logic is currently implemented like so:
- Look at all files in the search paths of the comipler (passed via
-L and also the sysroot).
- When looking for crate
foo, filter files which start with libfoo and end with .rlib
- Given this set of candidates, look at the metadata inside of all of them. Filter these candidates by the "hash" we're loading
- If only one candidate remains, link to it. If more than one is here, bail out and generate an error.
The "hash" is listed for all dependencies of dependencies because the compiler will record this information in the metadata of a compiled crate. For example of we have a dependency chain that looks like cargo depends on tar which depends on libc, then we'll get loading behavior like so:
- The
tar crate is found via --extern on the command line.
- When processing
tar rustc needs to load the libc crate. The metadata for tar, however, lists the hash of the libc crate it was originally compiled against.
- The compiler will go through the process above to find all
libc candidates and then use the hash listed in tar's metadata to find the one exact crate we're looking for.
Ok so with all that information, let's talk about downsides! One of the drawbacks of this logic is pretty obvious when we take a look at the serde crate. Let's say that rustc is looking for the serde dependency. This means it'll also look at crates like serde_derive, serde_json, serde_yaml, etc (see the trend?). While not a huge perf problem today, this definitely has the chance to become a larger problem.
More worrisome, however, is that the Gecko folks are trying to use the Tup build system and when adding rustc support this causes lots of problems. Tup as a build system tracks what files are accessed as part of a build process (aka it's tracking what files rustc itself is accessing). To Tup it looks like rustc is depending on all these files! Rustc in reality is rejecting many of these files and not actually using them, but to Tup they look like false dependencies.
In order to solve this problem, let's get rustc to probe for less files at build time! The logic here of prefix/suffix matching is very old I believe and in this day and age I'm not actually sure if it's buying us much. Instead I think it may be best to switch to a precise match solution where we look for an exact filename on the filesystem instead of a set of filetimes.
All Rust crates compiled through Cargo (and almost all Rust crates in general) are compiled with -C extra-filename which is where those funny hashes get inserted into rlib filenames. These extra hashes are not currently encoded into a crate's metadata, but they'd be able to point rustc to the exact file that it needs!
I think that we'll want to start encoding the -C extra-filename argument into a crate's metadata so that way transitive dependencies can copy this information and then when rustc loads it it knows exactly what to look for.
Given our cargo example above, the libc loading step would now look like:
- The
tar crate's metadata says it depends on libc which was found with the extra filename as $extra_filename. The compiler then looks for exactly libc${extra_filename}.rlib and doesn't look at extraneous files.
This I believe should solve the Tup problems and also head off any possible inefficiences in crate loading. I'm also more than willing to help out mentor the implementation of this if anyone's interested, just let me know!
cc @luser
cc @mshal
Currently rustc has a requirement that whenever it compiles a crate that it understands the entire crate graph of all dependencies. In other words, when you compile a crate like
cargothe compiler will have to load all dependencies (including dependencies of dependencies). This means that rustc has to actually find and locate the rlib artifacts on the filesystem.For immediate dependencies (those specified via
extern crate) the compiler has--externarguments (passed by Cargo) to guard it. For dependencies of dependencies Cargo knows where the files are but doesn't tell rustc. Instead rustc is left to its own devices to actually find these rlibs.This probing logic is currently implemented like so:
-Land also the sysroot).foo, filter files which start withlibfooand end with.rlibThe "hash" is listed for all dependencies of dependencies because the compiler will record this information in the metadata of a compiled crate. For example of we have a dependency chain that looks like
cargodepends ontarwhich depends onlibc, then we'll get loading behavior like so:tarcrate is found via--externon the command line.tarrustc needs to load thelibccrate. The metadata fortar, however, lists the hash of thelibccrate it was originally compiled against.libccandidates and then use the hash listed intar's metadata to find the one exact crate we're looking for.Ok so with all that information, let's talk about downsides! One of the drawbacks of this logic is pretty obvious when we take a look at the
serdecrate. Let's say that rustc is looking for theserdedependency. This means it'll also look at crates likeserde_derive,serde_json,serde_yaml, etc (see the trend?). While not a huge perf problem today, this definitely has the chance to become a larger problem.More worrisome, however, is that the Gecko folks are trying to use the Tup build system and when adding rustc support this causes lots of problems. Tup as a build system tracks what files are accessed as part of a build process (aka it's tracking what files rustc itself is accessing). To Tup it looks like rustc is depending on all these files! Rustc in reality is rejecting many of these files and not actually using them, but to Tup they look like false dependencies.
In order to solve this problem, let's get rustc to probe for less files at build time! The logic here of prefix/suffix matching is very old I believe and in this day and age I'm not actually sure if it's buying us much. Instead I think it may be best to switch to a precise match solution where we look for an exact filename on the filesystem instead of a set of filetimes.
All Rust crates compiled through Cargo (and almost all Rust crates in general) are compiled with
-C extra-filenamewhich is where those funny hashes get inserted into rlib filenames. These extra hashes are not currently encoded into a crate's metadata, but they'd be able to point rustc to the exact file that it needs!I think that we'll want to start encoding the
-C extra-filenameargument into a crate's metadata so that way transitive dependencies can copy this information and then when rustc loads it it knows exactly what to look for.Given our
cargoexample above, thelibcloading step would now look like:tarcrate's metadata says it depends onlibcwhich was found with the extra filename as$extra_filename. The compiler then looks for exactlylibc${extra_filename}.rliband doesn't look at extraneous files.This I believe should solve the Tup problems and also head off any possible inefficiences in crate loading. I'm also more than willing to help out mentor the implementation of this if anyone's interested, just let me know!
cc @luser
cc @mshal