Skip to content

Simplify/Optimize FileEncoder#115542

Merged
bors merged 2 commits intorust-lang:masterfrom
saethlin:fileencoder-is-bufwriter
Sep 20, 2023
Merged

Simplify/Optimize FileEncoder#115542
bors merged 2 commits intorust-lang:masterfrom
saethlin:fileencoder-is-bufwriter

Conversation

@saethlin
Copy link
Copy Markdown
Member

@saethlin saethlin commented Sep 4, 2023

FileEncoder is basically a BufWriter except that it exposes access to the not-written-to-yet region of the buffer so that some users can write directly to the buffer. This strategy is awesome because it lets us avoid calling memcpy for small copies, but the previous strategy was based on the writer accessing a &mut [MaybeUninit<u8>; N] and returning a &[u8] which is an API which currently mandates the use of unsafe code, making that interface in general not that appealing.

So this PR cleans up the FileEncoder implementation and builds on that general idea of direct buffer access in order to prevent memcpy calls in a few key places when encoding the dep graph and rmeta tables. The interface used here is now 100% safe, but with the caveat that internally we need to avoid trusting the number of bytes that the provided function claims to have written.

The original primary objective of this PR was to clean up the FileEncoder implementation so that the fix for the following issues would be easy to implement. The fix for these issues is to correctly update self.buffered even when writes fail, which I think it's easy to verify manually is now done, because all the FileEncoder methods are small.

Fixes #115298
Fixes #114671
Fixes #114045
Fixes #108100
Fixes #106787

@saethlin saethlin added the S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. label Sep 4, 2023
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 4, 2023
@saethlin saethlin removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 4, 2023
@saethlin
Copy link
Copy Markdown
Member Author

saethlin commented Sep 4, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 4, 2023
@bors
Copy link
Copy Markdown
Collaborator

bors commented Sep 4, 2023

⌛ Trying commit a3671fd3993f906b9a9879b1465a918b9d80cac3 with merge 9bd418951e5c1d9a80f4d5ff524fde43a43149bf...

@bors
Copy link
Copy Markdown
Collaborator

bors commented Sep 4, 2023

☀️ Try build successful - checks-actions
Build commit: 9bd418951e5c1d9a80f4d5ff524fde43a43149bf (9bd418951e5c1d9a80f4d5ff524fde43a43149bf)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (9bd418951e5c1d9a80f4d5ff524fde43a43149bf): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.1% [0.3%, 4.1%] 138
Regressions ❌
(secondary)
1.8% [0.5%, 4.0%] 74
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.1% [0.3%, 4.1%] 138

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.5%, 0.5%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.6% [-2.6%, -2.6%] 1
All ❌✅ (primary) 0.5% [0.5%, 0.5%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.8% [0.7%, 10.2%] 118
Regressions ❌
(secondary)
4.3% [1.5%, 8.0%] 47
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 3.8% [0.7%, 10.2%] 118

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 627.22s -> 627.249s (0.00%)
Artifact size: 316.03 MiB -> 316.50 MiB (0.15%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 4, 2023
@rust-log-analyzer

This comment has been minimized.

@saethlin
Copy link
Copy Markdown
Member Author

saethlin commented Sep 5, 2023

That's pretty slow but the regressions are almost exclusively confined to I have to wonder if one of my other PRs will obviate this. Ah well, let's see what happens if we optimize the leb128 path a bunch...
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@bors
Copy link
Copy Markdown
Collaborator

bors commented Sep 5, 2023

⌛ Trying commit f16b987d8162c60ac2f91c185451b1182cff1acf with merge a0f5ca19b76ebc68ddad77a862195f68e03eec62...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Copy Markdown
Collaborator

bors commented Sep 5, 2023

💔 Test failed - checks-actions

@bors bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 5, 2023
@saethlin
Copy link
Copy Markdown
Member Author

saethlin commented Sep 5, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 5, 2023
@bors
Copy link
Copy Markdown
Collaborator

bors commented Sep 5, 2023

⌛ Trying commit f16b987d8162c60ac2f91c185451b1182cff1acf with merge 230afd4b68f6d1786901abded18b5dc9ea431540...

@saethlin saethlin changed the title Use BufWriter to implement FileEncoder Simplify FileEncoder? Sep 5, 2023
@bors
Copy link
Copy Markdown
Collaborator

bors commented Sep 5, 2023

☀️ Try build successful - checks-actions
Build commit: 230afd4b68f6d1786901abded18b5dc9ea431540 (230afd4b68f6d1786901abded18b5dc9ea431540)

@rust-timer

This comment has been minimized.

}
}

impl Drop for FileEncoder {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not need to make sure .flush() is called here anymore? How do we make sure either finish or flush are always called?

Copy link
Copy Markdown
Member Author

@saethlin saethlin Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the right thing to do here is. I can add back the Drop impl, but if we have encountered an error there is nothing useful to do with it. We can't report it because we don't have TyCtxt in this crate. We can't return it, and panicking with the error if there is one would probably just produce rare ICE reports.

I suppose we could make the Drop impl just an unreachable! that reminds the user to call finish?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I've put back the Drop impl. That seems the least controversial if nothing else?

@WaffleLapkin WaffleLapkin added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 11, 2023
@saethlin
Copy link
Copy Markdown
Member Author

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 11, 2023
Copy link
Copy Markdown
Member

@WaffleLapkin WaffleLapkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still a couple of "design questions":

However they might be tackled later (or more likely never 😅 )

r=me with or without the nit

@WaffleLapkin WaffleLapkin added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 19, 2023
@saethlin saethlin force-pushed the fileencoder-is-bufwriter branch from 04246d8 to 6cee6b0 Compare September 20, 2023 20:58
@saethlin saethlin removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 20, 2023
@saethlin
Copy link
Copy Markdown
Member Author

@bors r=WaffleLapkin rollup=never

@bors
Copy link
Copy Markdown
Collaborator

bors commented Sep 20, 2023

📌 Commit 6cee6b0 has been approved by WaffleLapkin

It is now in the queue for this repository.

@bors
Copy link
Copy Markdown
Collaborator

bors commented Sep 20, 2023

⌛ Testing commit 6cee6b0 with merge 3223b0b...

@bors
Copy link
Copy Markdown
Collaborator

bors commented Sep 20, 2023

☀️ Test successful - checks-actions
Approved by: WaffleLapkin
Pushing 3223b0b to master...

1 similar comment
@bors
Copy link
Copy Markdown
Collaborator

bors commented Sep 20, 2023

☀️ Test successful - checks-actions
Approved by: WaffleLapkin
Pushing 3223b0b to master...

@rust-timer
Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (3223b0b): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [0.7%, 1.3%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.6% [-1.2%, -0.2%] 95
Improvements ✅
(secondary)
-0.6% [-1.2%, -0.3%] 28
All ❌✅ (primary) -0.6% [-1.2%, 1.3%] 97

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.4% [1.1%, 3.7%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [1.0%, 1.0%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.0% [1.0%, 1.0%] 1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.1%] 6
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.5%, -0.1%] 10
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.1% [-0.5%, 0.1%] 16

Bootstrap: 632.887s -> 633.664s (0.12%)
Artifact size: 317.80 MiB -> 318.05 MiB (0.08%)

@rylev
Copy link
Copy Markdown
Member

rylev commented Sep 26, 2023

@saethlin @WaffleLapkin looks like a small regression snuck back in. The regressions seem real, but perhaps it's not worth it given the much larger amount of improvements. Thoughts?

@WaffleLapkin
Copy link
Copy Markdown
Member

@rylev the two regressions are image-0.24.1 | opt | full | 1.27% and image-0.24.1 | opt | incr-full | 0.74%. If I'm reading it right, both of them spiked and then returned to normal, so this looks like a fluke.

@rustbot label: +perf-regression-triaged

@lqd
Copy link
Copy Markdown
Member

lqd commented Sep 26, 2023

You may be looking at the relative graph instead of the absolute one?

image

@WaffleLapkin
Copy link
Copy Markdown
Member

WaffleLapkin commented Sep 26, 2023

@lqd yes I am 🤦🏻
@rustbot label: -perf-regression-triaged
(I still think it's justifiable, but since it's not a fluke I'm not so sure)

@lqd
Copy link
Copy Markdown
Member

lqd commented Sep 26, 2023

(yeah, agreed.)

@saethlin
Copy link
Copy Markdown
Member Author

If #116188 doesn't recoup the regression, I'll mark this as triaged on the basis that the regression is very small and in a change that's overwhelmingly positive. As you suggest, it's not worth our time to look into a tiny regression in an overwhelmingly green report unless there's an easy fix.

@saethlin
Copy link
Copy Markdown
Member Author

The cachegrind diff for the regressed benchmark indicates that the regression is somehow in LLVM. I'm not really sure how that's possible, but between this cachegrind diff and the linked PR I think I've done the amount of investigation that this merits and also come up with nothing that points towards fixing the lone regression.

Ir    
--------------------------------------------------------------------------------
2,016,129,215  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir           file:function
--------------------------------------------------------------------------------
183,084,207  ???:llvm::ConstraintSystem::eliminateUsingFM()
112,154,338  ???:(anonymous namespace)::LazyValueInfoImpl::solve() [clone 
102,793,767  ???:llvm::BasicAAResult::alias(llvm::MemoryLocation const&, llvm::MemoryLocation const&, llvm::AAQueryInfo&, llvm::Instruction const*)
 90,761,916  ???:llvm::AAResults::getModRefInfo(llvm::Instruction const*, std::optional<llvm::MemoryLocation> const&, llvm::AAQueryInfo&)
 64,597,572  ???:llvm::ObjectSizeOffsetVisitor::compute(llvm::Value*)
-62,036,937  ???:llvm::APInt::operator*(llvm::APInt const&) const
 61,801,373  <all-jemalloc-files>:<all-jemalloc-functions>
 60,275,661  ???:llvm::ConstantRange::multiply(llvm::ConstantRange const&) const
 54,151,328  ???:rustc_data_structures::graph::dominators::dominators::<rustc_middle::mir::basic_blocks::BasicBlocks>
-53,446,123  ???:<core::cell::once::OnceCell<rustc_data_structures::graph::dominators::Dominators<rustc_middle::mir::BasicBlock>>>::get_or_try_init::<<core::cell::once::OnceCell<rustc_data_structures::graph::dominators::Dominators<rustc_middle::mir::BasicBlock>>>::get_or_init<<rustc_middle::mir::basic_blocks::BasicBlocks>::dominators::{closure#0}>::{closure#0}, !>
 50,772,992  ???:llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const
 42,317,006  ???:llvm::IDFCalculatorBase<llvm::BasicBlock, false>::calculate(llvm::SmallVectorImpl<llvm::BasicBlock*>&)
 41,230,820  ???:computeKnownBitsFromOperator(llvm::Operator const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, llvm::SimplifyQuery const&) [clone 
 37,819,326  ???:combineInstructionsOverFunction(llvm::Function&, llvm::InstructionWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, unsigned int, llvm::LoopInfo*) [clone  
 35,505,970  ???:llvm::BasicAAResult::aliasGEP(llvm::GEPOperator const*, llvm::LocationSize, llvm::Value const*, llvm::LocationSize, llvm::Value const*, llvm::Value const*, llvm::AAQueryInfo&)
-35,083,171  ???:getDefaultInlineAdvice(llvm::CallBase&, llvm::AnalysisManager<llvm::Function>&, llvm::InlineParams const&) [clone 
 34,493,533  ???:<rustc_span::span_encoding::Span as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode
 32,163,222  ???:(anonymous namespace)::PromoteMem2Reg::run()
 32,137,790  ???:llvm::InstCombinerImpl::run()
 31,327,292  ???:llvm::Value::stripAndAccumulateConstantOffsets(llvm::DataLayout const&, llvm::APInt&, bool, bool, llvm::function_ref<bool (llvm::Value&, llvm::APInt&)>) const
 30,795,545  /usr/src/debug/glibc/glibc/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_avx2_unaligned_erms
 30,224,758  ???:<rustc_middle::ty::Ty as rustc_type_ir::fold::TypeSuperFoldable<rustc_middle::ty::context::TyCtxt>>::try_super_fold_with::<rustc_middle::ty::generic_args::ArgFolder>
 30,057,337  ???:llvm::shouldInline(llvm::CallBase&, llvm::function_ref<llvm::InlineCost (llvm::CallBase&)>, llvm::OptimizationRemarkEmitter&, bool)
-29,731,698  ???:<rustc_middle::ty::generic_args::GenericArg as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with::<rustc_middle::ty::generic_args::ArgFolder>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

8 participants