Change twice used large const table to static#82676
Conversation
This table is used twice in core::num::dec2flt::algorithm::power_of_ten.
According to the semantics of const, a separate huge definition of the
table is inlined at both places.
fn power_of_ten(e: i16) -> Fp {
assert!(e >= table::MIN_E);
let i = e - table::MIN_E;
let sig = table::POWERS.0[i as usize];
let exp = table::POWERS.1[i as usize];
Fp { f: sig, e: exp }
}
Theoretically this gets cleaned up by optimization passes, but in
practice I am experiencing a miscompile from LTO on this code. Making
the table a static, which would only be defined a single time and not
require attention from LTO, eliminates the miscompile and seems
semantically more appropriate anyway. A separate bug report on the LTO
bug is forthcoming.
|
r? @m-ou-se (rust-highfive has picked a reviewer for you, use r? to override) |
|
I wouldn't expect any significant difference unless there's some kind of const propagation being done by LLVM today, but have you checked disassembly or run some benchmarks? We could run rustc-perf but it's not really a float parsing workload, but can't hurt either - @bors try @rust-timer queue Otherwise this seems fine to me. An alternative could be splitting the constant in two so that there's just one use per "part", but doesn't feel obviously better. |
|
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
|
⌛ Trying commit bd51dea with merge 015314cb764d47a719b5c1c60c0f12101e8c0cc5... |
|
@Mark-Simulacrum yeah here's the disassembly: Beforecore::num::dec2flt::algorithm::power_of_ten::hf417b425dc5ff8a8:
0xd839f00 <+0>: pushq %rax
0xd839f01 <+1>: movl %edi, %eax
0xd839f03 <+3>: movswl %ax, %ecx
0xd839f06 <+6>: cmpl $0xfffffecf, %ecx
0xd839f0c <+12>: jl 0xd839f36 ; <+54> at algorithm.rs:17:5
0xd839f0e <+14>: addl $0x131, %eax
0xd839f13 <+19>: movswq %ax, %rdi
0xd839f17 <+23>: movzwl %ax, %eax
0xd839f1a <+26>: cmpl $0x263, %eax
0xd839f1f <+31>: jae 0xd839f51 ; <+81> at algorithm.rs:19:15
0xd839f21 <+33>: leaq 0x83bc830(%rip), %rcx
0xd839f28 <+40>: movq (%rcx,%rdi,8), %rax
0xd839f2c <+44>: movzwl 0x1318(%rcx,%rdi,2), %edx
0xd839f34 <+52>: popq %rcx
0xd839f35 <+53>: retq
0xd839f36 <+54>: leaq 0x83bc7cc(%rip), %rdi
0xd839f3d <+61>: leaq 0x65ab04c(%rip), %rdx
0xd839f44 <+68>: movl $0x23, %esi
0xd839f49 <+73>: callq *0x65ae839(%rip)
0xd839f4f <+79>: ud2
0xd839f51 <+81>: leaq 0x65ab050(%rip), %rdx
0xd839f58 <+88>: movl $0x263, %esi
0xd839f5d <+93>: callq *0x65ae81d(%rip)
0xd839f63 <+99>: ud2Expected table address in this assembly is $ (lldb) x/4xg 0x83bc830+0xd839f28
0x15bf6758: 0xf7c45042f7c45042 0xf7c45042f7c45042
0x15bf6768: 0xf7c45042f7c45042 0xf7c45042f7c45042
$ (lldb) memory find -c2 -e 0xe0b62e2929aba83c -- 0x00400000 0x20000000
data found at location: 0x12b6de68
0x12b6de68: 3c a8 ab 29 29 2e b6 e0 26 49 0b ba d9 dc 71 8c
0x12b6de78: 6f 1b 8e 28 10 54 8e af 4b a2 b1 32 14 e9 71 db
no more matches within the range.Aftercore::num::dec2flt::algorithm::power_of_ten::hf417b425dc5ff8a8:
0xd839ec0 <+0>: pushq %rax
0xd839ec1 <+1>: movl %edi, %eax
0xd839ec3 <+3>: movswl %ax, %ecx
0xd839ec6 <+6>: cmpl $0xfffffecf, %ecx
0xd839ecc <+12>: jl 0xd839ef6 ; <+54> at algorithm.rs:17:5
0xd839ece <+14>: addl $0x131, %eax
0xd839ed3 <+19>: movswq %ax, %rdi
0xd839ed7 <+23>: movzwl %ax, %eax
0xd839eda <+26>: cmpl $0x263, %eax
0xd839edf <+31>: jae 0xd839f11 ; <+81> at algorithm.rs:19:15
0xd839ee1 <+33>: leaq 0x53409c0(%rip), %rcx ; core::num::dec2flt::table::POWERS::hf668a1d7d9ba717f
0xd839ee8 <+40>: movq (%rcx,%rdi,8), %rax
0xd839eec <+44>: movzwl 0x1318(%rcx,%rdi,2), %edx
0xd839ef4 <+52>: popq %rcx
0xd839ef5 <+53>: retq
0xd839ef6 <+54>: leaq 0x83bc80c(%rip), %rdi
0xd839efd <+61>: leaq 0x65ab08c(%rip), %rdx
0xd839f04 <+68>: movl $0x23, %esi
0xd839f09 <+73>: callq *0x65ae879(%rip)
0xd839f0f <+79>: ud2
0xd839f11 <+81>: leaq 0x65ab090(%rip), %rdx
0xd839f18 <+88>: movl $0x263, %esi
0xd839f1d <+93>: callq *0x65ae85d(%rip)
0xd839f23 <+99>: ud2Expected table address: $ (lldb) x/4xg 0x53409c0+0xd839ee8
0x12b7a8a8: 0xe0b62e2929aba83c 0x8c71dcd9ba0b4926
0x12b7a8b8: 0xaf8e5410288e1b6f 0xdb71e91432b1a24brust/library/core/src/num/dec2flt/table.rs Lines 8 to 13 in 5233edc |
|
Is it worth adding a warning in rustc for such situations? |
|
Looks good to me. Presuming perf doesn't show anything surprising, r=me |
|
I would be shocked if there is a perf difference, since |
|
Yeah, I would also be shocked. If it's urgent feel free to approve before then, but I figured a few hours wouldn't really matter. |
|
☀️ Try build successful - checks-actions |
|
Queued 015314cb764d47a719b5c1c60c0f12101e8c0cc5 with parent 5233edc, future comparison URL. |
|
Finished benchmarking try commit (015314cb764d47a719b5c1c60c0f12101e8c0cc5): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
|
@bors r=Mark-Simulacrum rollup |
|
📌 Commit bd51dea has been approved by |
…laumeGomez Rollup of 7 pull requests Successful merges: - rust-lang#80734 (check that first arg to `panic!()` in const is `&str`) - rust-lang#81932 (Always compile rustdoc with debug logging enabled when `download-rustc` is set) - rust-lang#82018 (Remove the dummy cache in `DocContext`; delete RenderInfo) - rust-lang#82598 (Check stability and feature attributes in rustdoc) - rust-lang#82655 (Highlight identifier span instead of whole pattern span in `unused` lint) - rust-lang#82662 (Warn about unknown doc attributes) - rust-lang#82676 (Change twice used large const table to static) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
This table is used twice in core::num::dec2flt::algorithm::power_of_ten. According to the semantics of const, a separate huge definition of the table is inlined at both places.
rust/library/core/src/num/dec2flt/algorithm.rs
Lines 16 to 22 in 5233edc
Theoretically this gets cleaned up by optimization passes, but in practice I am experiencing a miscompile from LTO on this code. Making the table a static, which would only be defined a single time and not require attention from LTO, eliminates the miscompile and seems semantically more appropriate anyway. A separate bug report on the LTO bug is forthcoming.
Original addition of
constis from #27307.