Simplify/Optimize FileEncoder#115542

saethlin · 2023-09-04T17:37:40Z

FileEncoder is basically a BufWriter except that it exposes access to the not-written-to-yet region of the buffer so that some users can write directly to the buffer. This strategy is awesome because it lets us avoid calling memcpy for small copies, but the previous strategy was based on the writer accessing a &mut [MaybeUninit<u8>; N] and returning a &[u8] which is an API which currently mandates the use of unsafe code, making that interface in general not that appealing.

So this PR cleans up the FileEncoder implementation and builds on that general idea of direct buffer access in order to prevent memcpy calls in a few key places when encoding the dep graph and rmeta tables. The interface used here is now 100% safe, but with the caveat that internally we need to avoid trusting the number of bytes that the provided function claims to have written.

The original primary objective of this PR was to clean up the FileEncoder implementation so that the fix for the following issues would be easy to implement. The fix for these issues is to correctly update self.buffered even when writes fail, which I think it's easy to verify manually is now done, because all the FileEncoder methods are small.

Fixes #115298
Fixes #114671
Fixes #114045
Fixes #108100
Fixes #106787

saethlin · 2023-09-04T17:49:29Z

@bors try @rust-timer queue

bors · 2023-09-04T17:49:37Z

⌛ Trying commit a3671fd3993f906b9a9879b1465a918b9d80cac3 with merge 9bd418951e5c1d9a80f4d5ff524fde43a43149bf...

bors · 2023-09-04T19:00:23Z

☀️ Try build successful - checks-actions
Build commit: 9bd418951e5c1d9a80f4d5ff524fde43a43149bf (9bd418951e5c1d9a80f4d5ff524fde43a43149bf)

rust-timer · 2023-09-04T20:22:39Z

Finished benchmarking commit (9bd418951e5c1d9a80f4d5ff524fde43a43149bf): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.1%	[0.3%, 4.1%]	138
Regressions ❌ (secondary)	1.8%	[0.5%, 4.0%]	74
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.1%	[0.3%, 4.1%]	138

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.5%, 0.5%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.6%	[-2.6%, -2.6%]	1
All ❌✅ (primary)	0.5%	[0.5%, 0.5%]	1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.8%	[0.7%, 10.2%]	118
Regressions ❌ (secondary)	4.3%	[1.5%, 8.0%]	47
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	3.8%	[0.7%, 10.2%]	118

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 627.22s -> 627.249s (0.00%)
Artifact size: 316.03 MiB -> 316.50 MiB (0.15%)

saethlin · 2023-09-05T04:15:22Z

That's pretty slow but the regressions are almost exclusively confined to I have to wonder if one of my other PRs will obviate this. Ah well, let's see what happens if we optimize the leb128 path a bunch...
@bors try @rust-timer queue

bors · 2023-09-05T04:15:30Z

⌛ Trying commit f16b987d8162c60ac2f91c185451b1182cff1acf with merge a0f5ca19b76ebc68ddad77a862195f68e03eec62...

bors · 2023-09-05T04:18:32Z

💔 Test failed - checks-actions

saethlin · 2023-09-05T04:43:28Z

@bors try @rust-timer queue

bors · 2023-09-05T04:43:37Z

⌛ Trying commit f16b987d8162c60ac2f91c185451b1182cff1acf with merge 230afd4b68f6d1786901abded18b5dc9ea431540...

bors · 2023-09-05T05:55:18Z

☀️ Try build successful - checks-actions
Build commit: 230afd4b68f6d1786901abded18b5dc9ea431540 (230afd4b68f6d1786901abded18b5dc9ea431540)

compiler/rustc_serialize/src/opaque.rs

WaffleLapkin · 2023-09-11T13:43:14Z

compiler/rustc_serialize/src/opaque.rs

-    }
-}
-
-impl Drop for FileEncoder {


Why do we not need to make sure .flush() is called here anymore? How do we make sure either finish or flush are always called?

I'm not sure what the right thing to do here is. I can add back the Drop impl, but if we have encountered an error there is nothing useful to do with it. We can't report it because we don't have TyCtxt in this crate. We can't return it, and panicking with the error if there is one would probably just produce rare ICE reports.

I suppose we could make the Drop impl just an unreachable! that reminds the user to call finish?

For now I've put back the Drop impl. That seems the least controversial if nothing else?

compiler/rustc_serialize/src/opaque.rs

compiler/rustc_metadata/src/rmeta/table.rs

compiler/rustc_query_system/src/dep_graph/serialized.rs

compiler/rustc_serialize/src/opaque.rs

saethlin · 2023-09-11T22:20:38Z

@rustbot ready

WaffleLapkin

There are still a couple of "design questions":

However they might be tackled later (or more likely never 😅 )

r=me with or without the nit

compiler/rustc_serialize/src/opaque.rs

saethlin · 2023-09-20T20:59:31Z

@bors r=WaffleLapkin rollup=never

bors · 2023-09-20T20:59:32Z

📌 Commit 6cee6b0 has been approved by WaffleLapkin

It is now in the queue for this repository.

bors · 2023-09-20T21:47:57Z

⌛ Testing commit 6cee6b0 with merge 3223b0b...

bors · 2023-09-20T23:34:27Z

☀️ Test successful - checks-actions
Approved by: WaffleLapkin
Pushing 3223b0b to master...

bors · 2023-09-20T23:34:27Z

☀️ Test successful - checks-actions
Approved by: WaffleLapkin
Pushing 3223b0b to master...

rust-timer · 2023-09-21T02:09:51Z

Finished benchmarking commit (3223b0b): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.0%	[0.7%, 1.3%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.6%	[-1.2%, -0.2%]	95
Improvements ✅ (secondary)	-0.6%	[-1.2%, -0.3%]	28
All ❌✅ (primary)	-0.6%	[-1.2%, 1.3%]	97

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.4%	[1.1%, 3.7%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.0%	[1.0%, 1.0%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.0%	[1.0%, 1.0%]	1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.0%, 0.1%]	6
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.2%	[-0.5%, -0.1%]	10
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.1%	[-0.5%, 0.1%]	16

Bootstrap: 632.887s -> 633.664s (0.12%)
Artifact size: 317.80 MiB -> 318.05 MiB (0.08%)

rylev · 2023-09-26T18:43:35Z

@saethlin @WaffleLapkin looks like a small regression snuck back in. The regressions seem real, but perhaps it's not worth it given the much larger amount of improvements. Thoughts?

WaffleLapkin · 2023-09-26T20:51:52Z

@rustbot label: +perf-regression-triaged

lqd · 2023-09-26T20:57:22Z

You may be looking at the relative graph instead of the absolute one?

WaffleLapkin · 2023-09-26T21:18:26Z

@lqd yes I am 🤦🏻
@rustbot label: -perf-regression-triaged
(I still think it's justifiable, but since it's not a fluke I'm not so sure)

lqd · 2023-09-26T21:39:26Z

(yeah, agreed.)

saethlin · 2023-09-27T00:54:37Z

If #116188 doesn't recoup the regression, I'll mark this as triaged on the basis that the regression is very small and in a change that's overwhelmingly positive. As you suggest, it's not worth our time to look into a tiny regression in an overwhelmingly green report unless there's an easy fix.

saethlin · 2023-09-29T23:06:21Z

The cachegrind diff for the regressed benchmark indicates that the regression is somehow in LLVM. I'm not really sure how that's possible, but between this cachegrind diff and the linked PR I think I've done the amount of investigation that this merits and also come up with nothing that points towards fixing the lone regression.

Ir    
--------------------------------------------------------------------------------
2,016,129,215  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir           file:function
--------------------------------------------------------------------------------
183,084,207  ???:llvm::ConstraintSystem::eliminateUsingFM()
112,154,338  ???:(anonymous namespace)::LazyValueInfoImpl::solve() [clone 
102,793,767  ???:llvm::BasicAAResult::alias(llvm::MemoryLocation const&, llvm::MemoryLocation const&, llvm::AAQueryInfo&, llvm::Instruction const*)
 90,761,916  ???:llvm::AAResults::getModRefInfo(llvm::Instruction const*, std::optional<llvm::MemoryLocation> const&, llvm::AAQueryInfo&)
 64,597,572  ???:llvm::ObjectSizeOffsetVisitor::compute(llvm::Value*)
-62,036,937  ???:llvm::APInt::operator*(llvm::APInt const&) const
 61,801,373  <all-jemalloc-files>:<all-jemalloc-functions>
 60,275,661  ???:llvm::ConstantRange::multiply(llvm::ConstantRange const&) const
 54,151,328  ???:rustc_data_structures::graph::dominators::dominators::<rustc_middle::mir::basic_blocks::BasicBlocks>
-53,446,123  ???:<core::cell::once::OnceCell<rustc_data_structures::graph::dominators::Dominators<rustc_middle::mir::BasicBlock>>>::get_or_try_init::<<core::cell::once::OnceCell<rustc_data_structures::graph::dominators::Dominators<rustc_middle::mir::BasicBlock>>>::get_or_init<<rustc_middle::mir::basic_blocks::BasicBlocks>::dominators::{closure#0}>::{closure#0}, !>
 50,772,992  ???:llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const
 42,317,006  ???:llvm::IDFCalculatorBase<llvm::BasicBlock, false>::calculate(llvm::SmallVectorImpl<llvm::BasicBlock*>&)
 41,230,820  ???:computeKnownBitsFromOperator(llvm::Operator const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, llvm::SimplifyQuery const&) [clone 
 37,819,326  ???:combineInstructionsOverFunction(llvm::Function&, llvm::InstructionWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, unsigned int, llvm::LoopInfo*) [clone  
 35,505,970  ???:llvm::BasicAAResult::aliasGEP(llvm::GEPOperator const*, llvm::LocationSize, llvm::Value const*, llvm::LocationSize, llvm::Value const*, llvm::Value const*, llvm::AAQueryInfo&)
-35,083,171  ???:getDefaultInlineAdvice(llvm::CallBase&, llvm::AnalysisManager<llvm::Function>&, llvm::InlineParams const&) [clone 
 34,493,533  ???:<rustc_span::span_encoding::Span as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode
 32,163,222  ???:(anonymous namespace)::PromoteMem2Reg::run()
 32,137,790  ???:llvm::InstCombinerImpl::run()
 31,327,292  ???:llvm::Value::stripAndAccumulateConstantOffsets(llvm::DataLayout const&, llvm::APInt&, bool, bool, llvm::function_ref<bool (llvm::Value&, llvm::APInt&)>) const
 30,795,545  /usr/src/debug/glibc/glibc/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_avx2_unaligned_erms
 30,224,758  ???:<rustc_middle::ty::Ty as rustc_type_ir::fold::TypeSuperFoldable<rustc_middle::ty::context::TyCtxt>>::try_super_fold_with::<rustc_middle::ty::generic_args::ArgFolder>
 30,057,337  ???:llvm::shouldInline(llvm::CallBase&, llvm::function_ref<llvm::InlineCost (llvm::CallBase&)>, llvm::OptimizationRemarkEmitter&, bool)
-29,731,698  ???:<rustc_middle::ty::generic_args::GenericArg as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with::<rustc_middle::ty::generic_args::ArgFolder>

saethlin added the S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. label Sep 4, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 4, 2023

saethlin removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 4, 2023