Proposal
As part of Rust semantics, the Rust compiler will generate memory copies to implement moves and copies of Copy types.
In practical terms, these moves appear as calls to memcpy or similar, or are simply invisible as inlined memory copies, depending on what the optimizer decides to do. While a profiler will see these calls, or a particularly hot path of inlined memcpy, it's almost impossible to know that 1) they were compiler-generated (vs, say, an explicit call to .clone()) and 2) even if they are, what types are being moved or copied?
This proposes a new flag, -Zannotate-moves, which causes rustc to annotate these calls in the debug info. This annotation only affects the debug info and has no effect on the actual generated code. Each annotated move or copy is wrapped so that it appears to have been inlined from either
core::profiling::compiler_move<T, const SIZE: usize>(_src: *const T, _dst: *mut T);
or
core::profiling::compiler_copy<T, const SIZE: usize>(_src: *const T, _dst: *mut T);
So, for example, a profiler sample backtrace might look like:
0: memcpy
1: core::profiling::compiler_move::<MyLargeType, 780>()
2: mycode::move_large_things_a_lot()
(This requires the profiler to understand inlined functions properly.)
These functions are actually defined in library/core/src/profiling.rs but they are never called; they exist solely as anchors for these annotations so that the debug info has something to reference.
The -Zannotate-moves option takes either a boolean to explicitly enable/disable the functionality, or a number to set a size threshold in bytes. Only moves/copies of objects >= that size threshold are annotated. The default is 65 so that small objects which fit in a typical cache line are not annotated on the assumption that their copies are cheap. It will also only annotate moves/copies which could result in memcpy, which I think precludes many scalar and vector moves anyway.
The impact on binary size is very small, and again, only in the debug info sections. With the 65 byte limit, librustc_driver.so increases by less than 0.5%. Even with a size limit of 8 it's only about 2%. I have not measured the impact on compile time itself, but I would be very surprised if it is much. As a result I think this is something we could leave enabled by default (once stabilized).
I have two prototype PRs implementing this:
I have generated some initial flamegraphs of rustc compilations, and found that it overall spends approximately 1.5% of its time in moves of various kinds (and about 0.1% of its time in annotated copies).
Mentors or Reviewers
@RalfJung, @saethlin and @nnethercote provided very useful comments on my MIR-transform based PR.
Process
The main points of the Major Change Process are as follows:
You can read more about Major Change Proposals on forge.
Proposal
As part of Rust semantics, the Rust compiler will generate memory copies to implement moves and copies of
Copytypes.In practical terms, these moves appear as calls to
memcpyor similar, or are simply invisible as inlined memory copies, depending on what the optimizer decides to do. While a profiler will see these calls, or a particularly hot path of inlined memcpy, it's almost impossible to know that 1) they were compiler-generated (vs, say, an explicit call to.clone()) and 2) even if they are, what types are being moved or copied?This proposes a new flag,
-Zannotate-moves, which causes rustc to annotate these calls in the debug info. This annotation only affects the debug info and has no effect on the actual generated code. Each annotated move or copy is wrapped so that it appears to have been inlined from eitheror
So, for example, a profiler sample backtrace might look like:
(This requires the profiler to understand inlined functions properly.)
These functions are actually defined in
library/core/src/profiling.rsbut they are never called; they exist solely as anchors for these annotations so that the debug info has something to reference.The
-Zannotate-movesoption takes either a boolean to explicitly enable/disable the functionality, or a number to set a size threshold in bytes. Only moves/copies of objects >= that size threshold are annotated. The default is65so that small objects which fit in a typical cache line are not annotated on the assumption that their copies are cheap. It will also only annotate moves/copies which could result inmemcpy, which I think precludes many scalar and vector moves anyway.The impact on binary size is very small, and again, only in the debug info sections. With the 65 byte limit, librustc_driver.so increases by less than 0.5%. Even with a size limit of 8 it's only about 2%. I have not measured the impact on compile time itself, but I would be very surprised if it is much. As a result I think this is something we could leave enabled by default (once stabilized).
I have two prototype PRs implementing this:
as (small) future projects(I had thought they had more code in common with llvm).I have generated some initial flamegraphs of rustc compilations, and found that it overall spends approximately 1.5% of its time in moves of various kinds (and about 0.1% of its time in annotated copies).
Mentors or Reviewers
@RalfJung, @saethlin and @nnethercote provided very useful comments on my MIR-transform based PR.
Process
The main points of the Major Change Process are as follows:
@rustbot secondor kickoff a team FCP with@rfcbot fcp $RESOLUTION.You can read more about Major Change Proposals on forge.