Do not deduplicate captured args while expanding format_args!#149926
Do not deduplicate captured args while expanding format_args!#149926ShoyuVanilla wants to merge 1 commit intorust-lang:mainfrom
format_args!#149926Conversation
|
Some changes occurred in src/tools/clippy cc @rust-lang/clippy Some changes occurred in compiler/rustc_ast_lowering/src/format.rs cc @m-ou-se |
|
r? @spastorino rustbot has assigned @spastorino. Use |
This comment has been minimized.
This comment has been minimized.
|
@rustbot author |
fb523ee to
3ebdaa4
Compare
|
@rustbot ready |
|
Nominating as per #145739 (comment) |
|
It'd be worth adding a test for the drop behavior. |
3ebdaa4 to
af89685
Compare
|
Given that this makes more sense for the language, along with the clean crater results and the intuition that it'd be surprising if anything actually leaned on this, I propose: @rfcbot fcp merge lang |
|
Team member @traviscross has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns. |
|
I don't think we should do this. It will make the generated code for I don't want to end up in a situation where it would make sense for Clippy to suggest something like: Adding @rust-rfcbot concern equivalence |
|
So I wrote a little script (gist) to find all string literals, count the number with ANY interpolation variables (well, braces anyhow) and then count the number with repeats. I ran it across the rust repo and got 2.5%, though that number includes tests. I'd like to run it across crates.io. Do with that what you will. |
|
I ran the script across the top 222 crates from crates.io. I found that about 5% of strings have repeated variables...
...that's actually more than I expected! It pushes me to think I am right to hold this concern. |
|
One more piece of data: I found exactly ZERO instances of repeated "capital" identifiers. e.g., |
|
If we assume that 5% is common enough that we DO want avoid an extra field for local variables, then I think it's reasonable to assume we also want to avoid an extra field for fields. I don't see why Deref impls can, technically, have side-effects. So while This further pushes me to the conclusion that the most appealing options are
I think a key variable may be how much you think it matters whether |
I do. Because although I would very likely write format!("{x} {x}")then also if I ever stumble accross this code: format!("{self.field} {self.field}")I would very likely, and spontaneously, deduplicate it myself into the following to improve readability: format!("{x} {x}", x=self.field)Which I think is, if I'm not alone, an argument in favour of syntactic deduplication. |
|
@nikomatsakis: I don't know. Presumably we still wouldn't want to deduplicate value expressions. I just struggle with the idea that Let's also dig into the axiom that the compiler and stdlib aren't special. In a world where functions return places, the lowering would need type information to deduplicate only place expressions. That seems more complicated to me than @dianne's AST → HIR lowering optimization. But if the desugaring doesn't deduplicate place expressions, then this isn't a problem. |
|
We already can't really syntactically distinguish place expressions from value expressions -- we need name resolution at least, which arguably is a semantic analysis (user-defined macros cannot do it). |
I'm working on a somewhat more precise analysis. I've instrumented rustc to capture details about format strings. I'll be running this through crater. It's built on top of @dianne's branch (in #152480) so that we can see the effect of that optimization and how much it matters whether we do it at all. So far, I've run this on a stage 2 build of the compiler and standard library (including all dependencies). In this sample, of format strings that use interpolation at all, only 0.5% have any duplicates. Of these, almost all are places, and @dianne's optimization recovers 96.1% of the size cost (i.e., all but 48 bytes total). The cost of not doing deduplication at all (i.e., not doing dianne's optimization) on the See below for the full results. |
|
Based on an instrumented top-10k crater run (in #154205, which includes 10,885 crates), here's what I found. Excluding 22 outlier crates, except where mentioned: Only 1.8% of crates and 0.1% of The median per-crate cost of not deduplicating at all is zero bytes. Considering only the 194 affected crates, the median cost is 32 bytes (mean: 46 bytes). The total cost across all these crates, summed together, is under 9KB. Including the 22 outliers, the total cost (summed across all 216 affected crates) is under 28KB. With dianne's optimization, only 27 non-outlier and 37 total crates are affected, with a total cost (summed across all affected crates) of under 1.4KB and 7.4KB, respectively. The outlier crate most affected by turning off deduplication is hddsgen at 7KB. Every single duplicate for this crate, though, is a place, so dianne's optimization would drop this cost to zero. This crate was first published 27 days ago. The next-largest outlier crate is pikpaktui, first published 6 weeks ago, at 2.8KB. This one doesn't benefit at all from the optimization. I don't mean any judgment by saying this, but these two and many earlier outlier crates I looked at seem heavily AI-generated. There's something about the way the models write format strings (and the number of them they write), at least in these outlier cases, that seems different to me than what humans do. Anyway, the full report is below. For my part, I judge the practical cost of not deduplicating as ε-zero. |
For awareness there is a crater issue regarding top-{n} not actually testing the top-n crates: |
|
checks Yup, it doesn't even compile majorly-used crates like |
|
OK. I'll schedule a full crater run then. |
|
Looking at the data from an instrumented full crater run, the numbers (now have better confidence intervals and) aren't meaningfully different than what was found running it against an arbitrary set of 10k crates. Most importantly:
I.e., the effect of not deduplicating at all remains ε-zero. Full report |
|
The vast majority of the projects where this is important, are small embedded projects where binary size is important. You won't find much of those on crates.io. There are plenty of library crates for embedded on crates.io, but those often don't use much string formatting themselves; that's something that usually happens mostly in the actual binary projects which are often not open source. There are many changes we could make to Rust that don't meaningfully impact most crates, but that doesn't mean we should purposely cause regressions. I will dread the day that I have to explain to someone that adding a 'meaningless'
The representation of Once we no longer use wide pointers for the arguments, no longer storing the fmt function pointer in the args array, the impact will be bigger. This gets much more significant if we add the |
|
@m-ou-se Do you have any ways we could measure that impact? Maybe we can bring in voices from the embedded working group? I don't know who's the right people to cc there, but it'd be cool if people could run their own measurements and report in. @traviscross is it easy to identify those values as a percentage of total format args, as well? and as a percentage of total binary size? |
|
I want to see if people agree with the way I'm thinking about this. I don't like the idea of approving this contingent on the optimization. I think the optimization is too complex to be in Rust's reference; if the impact is that significant, I'd like it to be something that we guarantee. This isn't to say I don't like the optimization, I think it's "ok", I just don't like it being a key part of the logic we stand on. I'd rather we evaluate the "de-duplicated via syntactic criteria" or "not de-duplicated", which are kind of the two extremes. The argument in favor of "don't deduplicate" is basically "this complexity isn't worth it, the impact in real life is negligible, and it cleans up the semantics corner cases". The argument in favor of "deduplicate" is basically "Sure, it's clean in one perspective, but it's not what people will actually want most of the time, we should do what they want, particularly since the semantics are very unlikely to be surprising in practice." I find both of these peruasive-- I don't think the impact on semantics is all that meaningful, but I also think it's easier to explain the simpler desugaring. I don't think the impact on binary sizes is really all that significant, but I also think it's more measurable than you might think, and I'd be curious if indeed it has outsized impact on a particular population. <-- This is why i'm asking @m-ou-se about how to loop in a few more embedded devs. (There is also the aspect of preserving existing behavior, but this doesn't strike me as clear cut, it's not a regression in the usual sense of the term, so I'm not especially persuaded by that.) |
|
To follow up on something I half said in the meeting, the other half of this is: I'd like to hear @traviscross, perhaps in another forum or perhaps here, or maybe a link to it, your take on how this impacts @joshtriplett's RFC. What is the alternate syntax you would want? To put it in terms of process, I'd kind of like the lang-team as a whole to give vibes both on the question at hand here and on the question of "suppose we decided to keep things as they are here and josh changed the RFC, what would be the concern then". Because I think we are premising somewhat on "we should do this to unblock that RFC" and I'm not entirely convinced that RFC ought to be considered blocked. |
If we were to not clean up the opsem on this such that interpolated expressions under the current syntax follow our expected language semantics for expressions, then I would propose, when extending the interpolation syntax to support more expressions, that we pick a distinct syntax for it (and, for that new syntax, follow our opsem for expressions). (The further we go into supporting arbitrary expressions, the more any divergence from our generally-intended opsem for expressions would matter in practice.) |
At the moment, I don't have any particular proposals in mind for the specific other syntax — just that it be distinct. |
|
Given that deduplication is a tool the current internal representation offers (and it sounds like what @m-ou-se is saying implies that it might get even more beneficial to use deduplication in the future?) I feel it’s suboptimal if our “best”/“nicest” (and newest) syntax for creating So with the benefit of the newer syntax, arguably people should usually be encouraged to re-write their code from something like let bar = 123;
format!("{} {} {0}", long_complicated_expression(), bar);or like let bar = 123;
format!("{foo} {} {foo}", bar, foo=long_complicated_expression(), bar);over to the "new" style of something like: let foo = &long_complicated_expression();
let bar = 123;
format!("{foo} {bar} {foo}");If that new form loses deduplication, that might be a problem for code that has not yet been re-written in this way. I wonder @traviscross if you could - for a bigger picture (and admittedly still limited to only the current benefits available from the deduplicated case of internal representation) - somehow manage to adjust your crater test to see what effects "disabling all deduplication" may have (i.e. including in cases where currently " |
I would not agree with that. Introducing new let-bindings just for a format string does not strike me as something we should encourage. Personally I would recommend writing this as format!("{foo} {bar} {foo}", foo=long_complicated_expression()); |
|
Some of the discussion above frames this as 'consistency vs efficiency' or 'field expressions vs binary size', but I think none of that framing is correct. We can just do both. rust-lang/rfcs#3626 proposes this works: println!("{z.field1} {z.field2} {z.field3}", z = SomeStruct::new());In the future, I'd like that fmt::Arguments object to only store a single pointer to that Given that, it feels perfectly consistent to me to have these two be equivalent: println!("{z.field1} {z.field2} {z.field3}", z=z);
println!("{z.field1} {z.field2} {z.field3}");So if we want rust-lang/rfcs#3626, it is entirely consistent to not make the change proposed in this PR. Which is also the thing we need for optimal efficiency. Footnotes
|
|
I'd like that fmt::Arguments object to only store a single pointer to that z object
How is that supposed to work, especially when the fields are behind deref coercions?
|
Expressed per invocation (across 21,663,853 unique invocations of
|
One option would be for the static part of fmt::Arguments to store fn pointers to wrapped Whether that's a good idea or not, I don't know yet. There are plenty of arguments for and against that approach. There might be other options. The point is that the framing of "we need this change for rust-lang/rfcs#3626" is just wrong. We don't know yet what we need for that. That's a discussion that still needs to happen. Given that that RFC proposes that This PR is about fixing #145739, about a highly unusual "internal mutability in const in format" situation (that Clippy already warns about), which is quite possibly not a bug at all, and isn't worth regressing any performance over. But the discussion has diverged and is now about how we would need this change for rust-lang/rfcs#3626 or what would be most consistent in a potential future version of Rust that includes that new feature. But we haven't even established how that RFC would work and whether this change is needed for that feature or not. What's the expression in English? Cart before the horse? 🛒🐴I think we should:
Right now it feels like we're trying to get this PR through based on a gut feeling that it might help with an RFC that we haven't worked out in enough detail yet. |
This is not easy, but we can upper-bound the values. The size of the Again, that's a very not-tight upper bound. How much of the total binary do |
|
I had a conversation with @m-ou-se and I've been meaning to write-up the outcome here. Dang, how the time flies by. (See also her summary) The TL;DR is this: I think the right next step is to review this in the context of Josh's RFC, so to speak, or really the overall trajectory of formatting, and probably in a design meeting. As such, I'm not planning to resolve my concern. Mara identified for me that her goal is that This appears to me to be a classic Rust trade-off -- can we have our cake (really nice format) and eat it too (works in embedded and "count every byte" contexts)? God I hope so! I'd like to at least give that a serious look. I do think a legit question to ask is "where does the cost from |
To clarify what changed for me, previously I was thinking of formatting as primarily targeting "regular code", in which case I think the optimization is pretty small. But if we're thinking of it as targeting "no-std" or "embedded" code, I think the calculus is different. And I like the idea of targeting that code. |
The "have our cake and eat it too" answer is probably (Of course, as it pertains to formatting, we'd then need either a syntactic marker that the format string contained only |
View all comments
Resolves #145739
I ran crater with #149291.
While there are still a few seemingly flaky, spurious results, no crates appear to be affected by this breaking change.
The only hit from the lint was
https://github.com/multiversx/mx-sdk-rs/blob/813927c03a7b512a3c6ef9a15690eaf87872cc5c/framework/meta-lib/src/tools/rustc_version_warning.rs#L19-L30,
which performs formatting on consts of type
::semver::Version. These constants contain a nested::semver::Identifier(Version.pre.identifier) that has a custom destructor. However, this case is not impacted by the change, so no breakage is expected.