Conversation
This comment has been minimized.
This comment has been minimized.
5217fd7 to
0ace3e7
Compare
This comment has been minimized.
This comment has been minimized.
|
I wonder how hard it would be to store true 32bit pointers in the const eval allocation for the vtable. That would avoid all hacks elsewhere around the size mismatch between const eval and runtime. |
0ace3e7 to
d58809f
Compare
This comment has been minimized.
This comment has been minimized.
|
☔ The latest upstream changes (presumably #147475) made this pull request unmergeable. Please resolve the merge conflicts. |
This is a WIP patch for implementing rust-lang/compiler-team#903. It adds a new unstable flag `-Zexperimental-relative-rust-abi-vtables` that makes vtables PIC-friendly. This is only supported for LLVM codegen and not supported for other backends. Early feedback on this is welcome. I'm not sure if how I implemented it is the best way of doing so since much of the actual vtable emission is heavily done during LLVM codegen. That is, the vtable to MIR looks like a normal table of pointers and byte arrays and I really only make the vtables relative on the codegen level. Locally, I can build the stage 1 compiler and runtimes with relative vtables, but I couldn't figure out how to tell the build system to only build stage 1 binaries with this flag, so I work around this by unconditionally enabling relative vtables in rustc. The end goal I think we'd like is either something akin to multilibs in clang where the compiler chooses which runtimes to use based off compilation flags, or binding this ABI to the target and have it be part of the default ABI for that target (just like how relative vtables are the default for Fuchsia in C++ with Clang). I think the later is what target modifiers do (rust-lang#136966). Action Items: - I'm still experimenting with building Fuchsia with this to assert it works e2e and I still need to do some measurements to see if this is still worth pursuing. - More work will still be needed to ensure the correct relative intrinsics are emitted with CFI and LTO. Rn I'm experimenting on a normal build.
d58809f to
6ff8b5f
Compare
|
The job Click to see the possible cause of the failure (guessed by this bot) |
|
☔ The latest upstream changes (presumably #152934) made this pull request unmergeable. Please resolve the merge conflicts. |
|
I'm testing this patch on my random crates including some vtable-heavy ones. It reduces binary size from 1% to 5%, mainly from cutting down dynamic relocations ( However, I got some SEGFAULT at runtime due to vtable layout mismatch between const-eval and runtime (as mentioned above). That is, fn main() {
const X: &dyn std::fmt::Display = &42i32; // absolute fnptr vtable
println!("{X}"); // assume it is relative, oops
}I also got a weird compile error with no further information when compiling rust-analyzer |
My colleague Erick has a more up-to-date verison of this at main...erickt:rust:relative-vtables which should include support for building the runtimes and (hopefully) has fixes for the merge conflicts I didn't have time to address here, so you might get more luck trying that out. (Fair warning: some of those updates there were vibe-coded, but they do seem to get rustc and runtime tests passing and we can build a bunch of downstream rust projects with it.) I'll eventually come back and clean this PR up, but we're still trying to collect some numbers on the side. It could be we missed a few cases though. If there are any runtime assumptions about the vtable ABI, then those will need to be changed as well. Same for const-eval which I'm not sure if I remember tackling in my initial patch. |
|
@oxalica - thanks for trying it out! Which crates are you testing it with? As @PiJoules said, we've got this patch passing the Rust test suite, and working with servo, tokio, ripgrep, and chrome, and also showing between 0.25 to 4%-ish savings. I just need to get come performance numbers before resuming talks with the compiler team. I'd be happy to see if I can reproduce the segfaults. |
|
Thanks both of you for the work!
A public one is
I'm testing this PR rebased onto 99246f4 which is the latest non-conflicting commit. It may be a bit out-of-date. If there are more updates (that fixes merge conflict), it would be good to push into this PR to make testing easier. The crash happens on
To me the runtime cost of vtable calls does not matter much. It is already assumed that vtable call would be slow due to the non-inline-able call and branch misprediction, and another <1cycle add instruction is nothing. The main intention of my use case is to reduce code size and the startup cost. Absolute relocations increase the work to be done during dynamic linking before main, and reduce memory share (more data in process-private |
|
@oxalica - I just pushed up a new version in https://github.com/erickt/rust/tree/relative-vtables that's rebased on top of rust. The big thing with it is that it tries to solve the problem that relative vtables does a breaking ABI change between the compiler and the standard library. So now you need to:
...
[rust]
experimental-relative-vtables = true
...
I haven't had a chance to test it against |
|
Got some benchmark numbers with the help of Gemini. The 1. rust Components
|
| Section | stage2-with-rel |
stage2-no-rel |
Delta |
|---|---|---|---|
| Total Size | 147,812,784 bytes | 146,520,056 bytes | +1,292,728 bytes (+0.88%) |
.data.rel.ro |
1.8 MB | 2.8 MB | -977 KB (-35%) |
.rela.dyn |
2.3 MB | 3.3 MB | -959 KB (-28%) |
.rodata |
5.44 MB | 4.93 MB | +510 KB (+10.3%) |
.text |
67.5 MB | 66.6 MB | +906 KB (+1.3%) |
rustc stage2 tools binary: rust-analyzer
| Section | stage2-with-rel |
stage2-no-rel |
Delta |
|---|---|---|---|
| Total Size | 55,765,360 bytes | 55,901,680 bytes | -136,320 bytes (-0.24%) |
.data.rel.ro |
765 KB | 1,273 KB | -508 KB (-39%) |
.rela.dyn |
1,207 KB | 2,132 KB | -925 KB (-43%) |
.rodata |
2,811 KB | 2,541 KB | +270 KB (+10.6%) |
.text |
27,601 KB | 27,337 KB | +264 KB (+0.9%) |
rustc stage2 tools binary: rust-analyzer-proc-macro-srv
| Section | stage2-with-rel |
stage2-no-rel |
Delta |
|---|---|---|---|
| Total Size | 1,949,040 bytes | 1,950,336 bytes | -1,296 bytes (-0.07%) |
.data.rel.ro |
17 KB | 26 KB | -9 KB (-34%) |
.rela.dyn |
45 KB | 57 KB | -12 KB (-21%) |
.rodata |
59 KB | 54 KB | +5 KB (+9.2%) |
.text |
985 KB | 985 KB | No change |
2. External Projects (Vendored Offline Builds)
ripgrep
-
Build Time:
stage2-with-rel: 15.27sstage2-no-rel: 14.70s
-
Section Sizes:
| Section | stage2-with-rel |
stage2-no-rel |
Delta |
|---|---|---|---|
| Total Size | 25,416,024 bytes | 25,595,336 bytes | -179,312 bytes (-0.70%) |
.data.rel.ro |
134,840 bytes | 173,624 bytes | -38,784 bytes (-22.34%) |
.rela.dyn |
43,440 bytes | 62,232 bytes | -18,792 bytes (-30.20%) |
.rodata |
852,884 bytes | 812,756 bytes | +40,128 bytes (+4.94%) |
.text |
2,467,737 bytes | 2,460,761 bytes | +6,976 bytes (+0.28%) |
servo (servoshell)
-
Build Time:
stage2-with-rel: 5m 26.40sstage2-no-rel: 5m 27.26s
-
Section Sizes:
| Section | stage2-with-rel |
stage2-no-rel |
Delta |
|---|---|---|---|
| Total Size | 223,778,568 bytes | 224,778,112 bytes | -999,544 bytes (-0.44%) |
.data.rel.ro |
36,051,840 bytes | 45,565,304 bytes | -9,513,464 bytes (-20.88%) |
.rela.dyn |
15,221,688 bytes | 18,552,240 bytes | -3,330,552 bytes (-17.95%) |
.rodata |
32,986,680 bytes | 32,627,504 bytes | +359,176 bytes (+1.10%) |
.text |
111,263,946 bytes | 111,399,786 bytes | -135,840 bytes (-0.12%) |
palc (Unittest Binary)
-
Build Time (Tests, No-Run):
stage2-with-rel: 11.80sstage2-no-rel: 11.50s
-
Section Sizes:
| Section | stage2-with-rel |
stage2-no-rel |
Delta |
|---|---|---|---|
| Total Size | 1,042,808 bytes | 1,041,304 bytes | +1,504 bytes (+0.14%) |
.data.rel.ro |
27,120 bytes | 31,304 bytes | -4,184 bytes (-13.37%) |
.rela.dyn |
47,400 bytes | 54,264 bytes | -6,864 bytes (-12.65%) |
.rodata |
57,892 bytes | 55,564 bytes | +2,328 bytes (+4.19%) |
.text |
601,554 bytes | 600,098 bytes | +1,456 bytes (+0.24%) |
This is a WIP patch for implementing rust-lang/compiler-team#903. It adds a new unstable flag
-Zexperimental-relative-rust-abi-vtablesthat makes vtables PIC-friendly. This is only supported for LLVM codegen and not supported for other backends.Early feedback on this is welcome. I'm not sure if how I implemented it is the best way of doing so since much of the actual vtable emission is heavily done during LLVM codegen. That is, the vtable to MIR looks like a normal table of pointers and byte arrays and I really only make the vtables relative on the codegen level.
Locally, I can build the stage 1 compiler and runtimes with relative vtables, but I couldn't figure out how to tell the build system to only build stage 1 binaries with this flag, so I work around this by unconditionally enabling relative vtables in rustc. The end goal I think we'd like is either something akin to multilibs in clang where the compiler chooses which runtimes to use based off compilation flags, or binding this ABI to the target and have it be part of the default ABI for that target (just like how relative vtables are the default for Fuchsia in C++ with Clang). I think the later is what target modifiers do (#136966).
Action Items: