LLVM is quite a flexible compiler with a huge number of targets, but sometimes targets require custom versions or forks of LLVM. Up to now we've got two primary example of this:
- Emscripten uses a fork of LLVM which has a custom backend that emits asm.js through its tooling.
- iOS may have restrictions which require it to ship bitcode, and as a result will probably require a particular version of LLVM.
While each of these targets may have a lot more going on with it in terms of future plans and whatnot, it suffices to say that for the near future (6mo -1y) it seems like Emscripten in particular won't be moving away from its LLVM fork and we'd like to keep its functionality. This desire to keep Emscripten results in a tension with upgrading LLVM on our end as we can't do so until Emscripten does so.
As a result, let's ship multiple copies of LLVM!
General idea
The overall idea for this issue is to allow each target to optionally have a custom LLVM backend. We would then be compiling LLVM multiple times, once per necessary, and shipping multiple copies of LLVM to users. At compile time the compiler would select which version of LLVM is appropriate, dynamically load it, and then use it to compile and generate code.
This means that our build system will need to prepare itself for building multiple copies of LLVM. By default developers probably won't be building multiple copies of LLVM, but the bots on Travis/AppVeyor would all be compiling multiple copies when making dist builds.
The current thinking is that rustc_driver-the-crate will no longer depend on rustc_trans. Instead rustc_trans will be compiled as usual except it will also expose a C interface. The driver will then dynamically select the right trans backend, open it up, and use the C API to register hooks and whatnot.
Compiler changes
I believe the first thing that'll need to be changed is how we build the compiler, specifically with how librustc_trans is loaded. I've been told that the rustc_trans crate is very close to only exposing basically a C API, and this would require us to complete that work. So the first task for this issue would be to work with the compiler team to ensure that the rustc_trans crate has a C API and the rustc_driver crate only uses this C API.
Once that's been done the dependency between librust_driver and librustc_trans can be broken. Instead we'll be doing something like:
- Remove
rustc_trans from librustc_driver/Cargo.toml
- Change
librustc_llvm to compile only as an rlib, not as both an rlib and a dylib.
- Add a new step to rustbuild in compile.rs called
RustcTrans
- Implement
RustcTrans similar to the step called Rustc, but this step will compile just the librustc_trans target
- Implement
RustcTransLink similar to RustcLink, except it'll link just the one rustc_trans dylib into the sysroot in a specific location (detailed below)
- Augment the
Assemble step to require RustcTransLink in addition to RustcLink
The sysroot (on unix) currently looks like:
bin/
rustc
rustdoc
lib/
librustc_driver-xxx.so
librustc_trans-xxx.so
librustc_...so
rustlib/
$target/
lib/
libstd.rlib
libcore.rlib
I think what we'll want to move to is something that looks like:
bin/
rustc
rustdoc
lib/
librustc_driver-xxx.so
librustc_...so
rustlib/
backends/
librustc_trans-standard.so
librustc_trans-emscripten.so
librustc_trans-ios.so
$target/
lib/
libstd.rlib
libcore.rlib
Specifically the librustc_trans.so dynamic library no longer lives in lib. Instead multiple copies of it will live in lib/rustlib/backends. The RustcTransLink step is what will assemble the backends folder. Initially we'll just have the standard dynamic library sitting inside there.
Once this is done the driver needs to be modified when loading rustc_trans the crate. At runtime the driver will determine the target and look at an optional field in the custom target spec. This'll default to None which say sto load the "standard" backend, and if it's Some rustc will instead look for a different backend. For now we'll add this later though.
Ok so at this point, hopefully, rustc_driver is now loading librustc_trans through a dynamic library at runtime and we're ready for the next step!
Changes to rustbuild
Next up we need to get a second version of LLVM compiling. For now we'll stick to the motivational use case for this, Emscripten. First thing to do is to add a config option to config.toml.example, let's say something like:
[llvm]
# Configures multiple separately compiled backends to get created. This is
# used in Rust for the Emscripten target primarily right now which uses a
# fork of LLVM. This key is empty by default (only one LLVM backend is compiled)
# but it can be an array of strings, where currently the only accepted string is
# "emscripten"
#separately-compiled-backends = []
We'll then modify the Assemble step to check this config option. For each configured backend we'll execute RustcTransLink appropriately (adding a new option for the LLVM backend we'd like to create) and plumb that option all the way down to the Llvm target which will get modified appropriately.
Once this is done you should be able to configure via config.toml that you'd like to have an emscripten backend and when ./x.py build is executed it'll compile LLVM/librustc_trans twice into two separate directories.
In order to ensure that librustc_trans builds are cached appropriately this may want to also add features to the rustc_trans crate which get toggled depending on the LLVM backend, but this can be played around with when implementing.
Now at this point we've got multiple LLVM compilations, so let's put some polishing touches on things!
Distribution changes
We'll want to change the rustc component package to include the backends folder that we're creating. This will involve changing the Rustc step in dist.rs, and when you run ./x.py dist the rustc packages created should all have the librustc_trans dylib inside them at the backends location.
Eventually we'll also want to enable the multiple llvm backends by default when the configured release channel is not dev and the DEPLOY env var is set to 1. This can be done most likely in src/ci/run.sh by passing a new option.
Finally what we'll want to do is add a second submodule. We'll want, for example, a src/llvm-emscripten submodule. This won't actually get checked out on most builds, but for the dist builds on the bots we'll make sure to update the submodule and run with it.
And... I think that may be it? I'm sure I'll need to fill in a lot of cracks along the way but I'm more than willing to help mentor this issue! If you're interested in implementing this please just let me know!
LLVM is quite a flexible compiler with a huge number of targets, but sometimes targets require custom versions or forks of LLVM. Up to now we've got two primary example of this:
While each of these targets may have a lot more going on with it in terms of future plans and whatnot, it suffices to say that for the near future (6mo -1y) it seems like Emscripten in particular won't be moving away from its LLVM fork and we'd like to keep its functionality. This desire to keep Emscripten results in a tension with upgrading LLVM on our end as we can't do so until Emscripten does so.
As a result, let's ship multiple copies of LLVM!
General idea
The overall idea for this issue is to allow each target to optionally have a custom LLVM backend. We would then be compiling LLVM multiple times, once per necessary, and shipping multiple copies of LLVM to users. At compile time the compiler would select which version of LLVM is appropriate, dynamically load it, and then use it to compile and generate code.
This means that our build system will need to prepare itself for building multiple copies of LLVM. By default developers probably won't be building multiple copies of LLVM, but the bots on Travis/AppVeyor would all be compiling multiple copies when making dist builds.
The current thinking is that
rustc_driver-the-crate will no longer depend onrustc_trans. Insteadrustc_transwill be compiled as usual except it will also expose a C interface. The driver will then dynamically select the right trans backend, open it up, and use the C API to register hooks and whatnot.Compiler changes
I believe the first thing that'll need to be changed is how we build the compiler, specifically with how
librustc_transis loaded. I've been told that therustc_transcrate is very close to only exposing basically a C API, and this would require us to complete that work. So the first task for this issue would be to work with the compiler team to ensure that therustc_transcrate has a C API and therustc_drivercrate only uses this C API.Once that's been done the dependency between
librust_driverandlibrustc_transcan be broken. Instead we'll be doing something like:rustc_transfromlibrustc_driver/Cargo.tomllibrustc_llvmto compile only as an rlib, not as both an rlib and a dylib.RustcTransRustcTranssimilar to the step calledRustc, but this step will compile just thelibrustc_transtargetRustcTransLinksimilar toRustcLink, except it'll link just the onerustc_transdylib into the sysroot in a specific location (detailed below)Assemblestep to requireRustcTransLinkin addition toRustcLinkThe sysroot (on unix) currently looks like:
I think what we'll want to move to is something that looks like:
Specifically the
librustc_trans.sodynamic library no longer lives inlib. Instead multiple copies of it will live inlib/rustlib/backends. TheRustcTransLinkstep is what will assemble thebackendsfolder. Initially we'll just have thestandarddynamic library sitting inside there.Once this is done the driver needs to be modified when loading
rustc_transthe crate. At runtime the driver will determine the target and look at an optional field in the custom target spec. This'll default toNonewhich say sto load the "standard" backend, and if it'sSomerustc will instead look for a different backend. For now we'll add this later though.Ok so at this point, hopefully, rustc_driver is now loading librustc_trans through a dynamic library at runtime and we're ready for the next step!
Changes to rustbuild
Next up we need to get a second version of LLVM compiling. For now we'll stick to the motivational use case for this, Emscripten. First thing to do is to add a config option to
config.toml.example, let's say something like:We'll then modify the
Assemblestep to check this config option. For each configured backend we'll executeRustcTransLinkappropriately (adding a new option for the LLVM backend we'd like to create) and plumb that option all the way down to theLlvmtarget which will get modified appropriately.Once this is done you should be able to configure via
config.tomlthat you'd like to have an emscripten backend and when./x.py buildis executed it'll compile LLVM/librustc_trans twice into two separate directories.In order to ensure that
librustc_transbuilds are cached appropriately this may want to also add features to therustc_transcrate which get toggled depending on the LLVM backend, but this can be played around with when implementing.Now at this point we've got multiple LLVM compilations, so let's put some polishing touches on things!
Distribution changes
We'll want to change the
rustccomponent package to include thebackendsfolder that we're creating. This will involve changing theRustcstep indist.rs, and when you run./x.py disttherustcpackages created should all have thelibrustc_transdylib inside them at thebackendslocation.Eventually we'll also want to enable the multiple llvm backends by default when the configured release channel is not
devand theDEPLOYenv var is set to 1. This can be done most likely insrc/ci/run.shby passing a new option.Finally what we'll want to do is add a second submodule. We'll want, for example, a
src/llvm-emscriptensubmodule. This won't actually get checked out on most builds, but for the dist builds on the bots we'll make sure to update the submodule and run with it.And... I think that may be it? I'm sure I'll need to fill in a lot of cracks along the way but I'm more than willing to help mentor this issue! If you're interested in implementing this please just let me know!