Queries: demand-driven compilation
As described in Overview of the compiler, the Rust compiler
is still (as of July 2021) transitioning from a
traditional “pass-based” setup to a “demand-driven” system.
The compiler query system is the key to rustc’s demand-driven organization.
The idea is pretty simple.
Instead of entirely independent passes
(parsing, type-checking, etc.), a set of function-like queries
compute information about the input source.
For example,
there is a query called type_of that, given the DefId of
some item, will compute the type of that item and return it to you.
Query execution is memoized. The first time you invoke a query, it will go do the computation, but the next time, the result is returned from a hashtable. Moreover, query execution fits nicely into incremental computation; the idea is roughly that, when you invoke a query, the result may be returned to you by loading stored data from disk.1
Eventually, we want the entire compiler control-flow to be query driven.
There will effectively be one top-level query (compile) that will run compilation on a crate; this
will in turn demand information about that crate, starting from the end.
For example:
- The
compilequery might demand to get a list of codegen-units (i.e. modules that need to be compiled by LLVM). - But computing the list of codegen-units would invoke some subquery that returns the list of all modules defined in the Rust source.
- That query in turn would invoke something asking for the HIR.
- This keeps going further and further back until we wind up doing the actual parsing.
Although this vision is not fully realized, large sections of the compiler (for example, generating MIR) currently work exactly like this.
Invoking queries
Invoking a query is simple.
The TyCtxt (“type context”) struct offers a method for each defined query.
For example, to invoke the type_of query, you would just do this:
let ty = tcx.type_of(some_def_id);
How the compiler executes a query
So you may be wondering what happens when you invoke a query method.
The answer is that, for each query, the compiler maintains a
cache – if your query has already been executed, then, the answer is
simple: we clone the return value out of the cache and return it
(therefore, you should try to ensure that the return types of queries
are cheaply cloneable; insert an Rc if necessary).
Providers
If, however, the query is not in the cache, then the compiler will
call the corresponding provider function.
A provider is a function implemented in a specific module and manually registered into either
the Providers struct (for local crate queries) or
the ExternProviders struct (for external crate queries)
during compiler initialization.
The macro system generates both structs,
which act as function tables for all query implementations, where each
field is a function pointer to the actual provider.
Note: Both the Providers and ExternProviders structs are generated by macros and act as function tables for all query implementations.
They are not Rust traits, but plain structs with function pointer fields.
Providers are defined per-crate. The compiler maintains, internally, a table of providers for every crate, at least conceptually. There are two sets of providers:
- The
Providersstruct for queries about the local crate (that is, the one being compiled) - The
ExternProvidersstruct for queries about external crates (that is, dependencies of the local crate)
Note that what determines the crate that a query is targeting is not the kind of query, but the key.
For example, when you invoke tcx.type_of(def_id), that could be a
local query or an external query, depending on what crate the def_id
is referring to (see the self::keys::Key trait for more information on how that works).
Providers always have the same signature:
fn provider<'tcx>(
tcx: TyCtxt<'tcx>,
key: QUERY_KEY,
) -> QUERY_RESULT {
...
}
Providers take two arguments: the tcx and the query key.
They return the result of the query.
N.B. Most of the rustc_* crates only provide local
providers. Almost all extern providers wind up going through the
rustc_metadata crate, which loads the information from the crate metadata.
But in some cases there are crates that
provide queries for both local and external crates, in which case
they define both a provide and a provide_extern function, through
wasm_import_module_map, that rustc_driver can invoke.
How providers are set up
When the tcx is created, it is given both the local and external providers by its creator using
the Providers struct from rustc_middle::util.
This struct contains both the local and external providers:
pub struct Providers {
pub queries: crate::query::Providers, // Local crate providers
pub extern_queries: crate::query::ExternProviders, // External crate providers
pub hooks: crate::hooks::Providers,
}
Each of these provider structs is generated by the macros and contains function pointers for their respective queries.
How are providers registered?
The util::Providers struct is filled in during compiler initialization, by the rustc_interface crate from the DEFAULT_QUERY_PROVIDERS static.
The actual provider functions are defined across various rustc_* crates (like rustc_middle, rustc_hir_analysis, etc).
To register providers, each crate exposes a provide function that looks like this:
pub fn provide(providers: &mut query::Providers) {
*providers = query::Providers {
type_of,
// ... add more providers here
..*providers
};
}
Note that this function accepts query::Providers not util::Providers.
It is exceedingly rare to need a provide function that doesn’t just accept query::Providers.
If more than the queries field of util::Providers is being updated then util::Providers can be accepted instead:
pub fn provide(providers: &mut rustc_middle::util::Providers) {
providers.queries.type_of = type_of;
// ... add more local providers here
providers.extern_queries.type_of = extern_type_of;
// ... add more external providers here
providers.hooks.some_hook = some_hook;
// ... add more hooks here
}
Note that util::Providers implements DerefMut to query::Providers so callers of the provide functions can pass in a util::Providers and it will just work for provider functions that accept query::Providers too
- This function takes a mutable reference to the
query::Providersstruct and sets the fields to point to the correct provider functions. - You can also assign queries individually, e.g.
providers.type_of = type_of;. - You can assign fields individually for each provider type (local, external, and hooks).
Adding a new provider
Suppose you want to add a new query called fubar.
This section focuses on wiring up the providers; for how to declare the query itself in the big rustc_queries! macro, see Adding a new query below.
In practice you usually:
- Decide which crate “owns” the query (for example
rustc_hir_analysis,rustc_mir_build, or anotherrustc_*crate). - In that crate, look for an existing
providefunction:
If it exists, you will extend it to set the field for your new query. If the crate does not yet have apub fn provide(providers: &mut query::Providers) { // existing assignments }providefunction, add one and make sure it is included inDEFAULT_QUERY_PROVIDERSin therustc_interfacecrate so that it actually gets called during initialization (see the discussion above). - Implement the provider function itself:
fn fubar<'tcx>(tcx: TyCtxt<'tcx>, key: LocalDefId) -> Fubar<'tcx> { ... } - Register it in the crate’s
providefunction:pub fn provide(providers: &mut query::Providers) { *providers = query::Providers { fubar, ..*providers }; }
How queries interact with external crate metadata
When a query is made for an external crate (i.e., a dependency), the query system needs to load the information from that crate’s metadata.
This is handled by the rustc_metadata crate, which is responsible for decoding and providing the information stored in the .rmeta files.
The process works like this:
-
When a query is made, the query system first checks if the
DefIdrefers to a local or external crate by checking ifdef_id.krate == LOCAL_CRATE. This determines whether to use the local provider fromProvidersor the external provider fromExternProviders. -
For external crates, the query system will look for a provider in the
ExternProvidersstruct. Therustc_metadatacrate registers these external providers through theprovide_externfunction inrustc_metadata/src/rmeta/decoder/cstore_impl.rs. Just like:#![allow(unused)] fn main() { pub fn provide_extern(providers: &mut ExternProviders) { providers.foo = |tcx, def_id| { // Load and decode metadata for external crate let cdata = CStore::from_tcx(tcx).get_crate_data(def_id.krate); cdata.foo(def_id.index) }; // Register other external providers... } } -
The metadata is stored in a binary format in
.rmetafiles that contains pre-computed information about the external crate, such as types, function signatures, trait implementations, and other information needed by the compiler. When an external query is made, therustc_metadatacrate:- Loads the
.rmetafile for the external crate - Decodes the metadata using the
Decodabletrait - Returns the decoded information to the query system
- Loads the
This approach avoids recompiling external crates, allows for faster compilation of dependent crates, and enables incremental compilation to work across crate boundaries.
Here is a simplified example, when you call tcx.type_of(def_id) for a type defined in an external crate, the query system will:
- Detect that the
def_idrefers to an external crate by checkingdef_id.krate != LOCAL_CRATE - Call the appropriate provider from
ExternProviderswhich was registered byrustc_metadata - The provider will load and decode the type information from the external crate’s metadata
- Return the decoded type to the caller
This is why most rustc_* crates only need to provide local providers - the external providers are handled by the metadata system.
The only exception is when a crate needs to provide special handling for external queries, in which case it would implement both local and external providers.
When we define a new query that should work across crates, it does not automatically become cross-crate just because it is listed in rustc_queries!.
You will typically need to:
- Add the query to
rustc_queries!with appropriate modifiers (for example whether it is cached on disk). - Implement a local provider in the owning crate and register it via that crate’s
providefunction. - Add an external provider in
rustc_metadataviaprovide_extern, and ensure the query’s result is encoded and decoded in the crate metadata.
An example of introducing such a cross-crate query can be found in commit 996a185 in the rust-lang/rust repository.
Adding a new query
How do you add a new query? Defining a query takes place in two steps:
- Declare the query name, its arguments and description.
- Supply query providers where needed.
To declare the query name and arguments, you simply add an entry to
the big macro invocation in compiler/rustc_middle/src/query/mod.rs.
Then you need to add a documentation comment to it with some internal description.
Then, provide the desc attribute which contains a user-facing description of the query.
The desc attribute is shown to the user in query cycles.
This looks something like:
rustc_queries! {
/// Records the type of every item.
query type_of(key: DefId) -> Ty<'tcx> {
cache_on_disk_if { key.is_local() }
desc { |tcx| "computing the type of `{}`", tcx.def_path_str(key) }
}
...
}
A query definition has the following form:
query type_of(key: DefId) -> Ty<'tcx> { ... }
^^^^^ ^^^^^^^ ^^^^^ ^^^^^^^^ ^^^
| | | | |
| | | | query modifiers
| | | result type
| | query key type
| name of query
query keyword
Let’s go over these elements one by one:
- Query keyword: indicates a start of a query definition.
- Name of query: the name of the query method (
tcx.type_of(..)). Also used as the name of a struct (ty::queries::type_of) that will be generated to represent this query. - Query key type: the type of the argument to this query.
This type must implement the
ty::query::keys::Keytrait, which defines (for example) how to map it to a crate, and so forth. - Result type of query: the type produced by this query.
This type should (a) not use
RefCellor other interior mutability and (b) be cheaply cloneable. Interning or usingRcorArcis recommended for non-trivial data types.2 - Query modifiers: various flags and options that customize how the query is processed (mostly with respect to incremental compilation).
So, to add a query:
- Add an entry to
rustc_queries!using the format above. - Link the provider by modifying the appropriate
providemethod; or add a new one if needed and ensure thatrustc_driveris invoking it.
External links
Related design ideas, and tracking issues:
- Design document: On-demand Rustc incremental design doc
- Tracking Issue: “Red/Green” dependency tracking in compiler
More discussion and issues:
-
The Incremental compilation in detail chapter gives a more in-depth description of what queries are and how they work. If you intend to write a query of your own, this is a good read. ↩
-
The one exception to those rules is the
ty::steal::Stealtype, which is used to cheaply modify MIR in place. See the definition ofStealfor more details. New uses ofStealshould not be added without alerting@rust-lang/compiler. ↩