Do not deduplicate captured args while expanding format_args!#149926
Do not deduplicate captured args while expanding format_args!#149926ShoyuVanilla wants to merge 1 commit intorust-lang:mainfrom
format_args!#149926Conversation
|
Some changes occurred in src/tools/clippy cc @rust-lang/clippy Some changes occurred in compiler/rustc_ast_lowering/src/format.rs cc @m-ou-se |
|
r? @spastorino rustbot has assigned @spastorino. Use |
This comment has been minimized.
This comment has been minimized.
|
@rustbot author |
fb523ee to
3ebdaa4
Compare
|
@rustbot ready |
|
Nominating as per #145739 (comment) |
|
It'd be worth adding a test for the drop behavior. |
3ebdaa4 to
af89685
Compare
|
Given that this makes more sense for the language, along with the clean crater results and the intuition that it'd be surprising if anything actually leaned on this, I propose: @rfcbot fcp merge lang |
|
Team member @traviscross has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns. |
|
I don't think we should do this. It will make the generated code for I don't want to end up in a situation where it would make sense for Clippy to suggest something like: Adding @rust-rfcbot concern equivalence |
|
Thanks Dianne. Let me see if I can dash off a comment explaining where I stand here. The TL;DR is that my proposal would be:
This does cost some consistency, but I think in areas where consistency isn't necessarily expected. For example, it's already the case that The alternative gains in consistency but loses some optimization and backwards compatibility. It's not clear to me how much the optimization matters but I think that it may, particularly since we use format-args all over the place in code. I know that people often find the machinery is heavyweight in embedded land. I don't think the vast majority of users will care whichever way we decide here, but there will be some that are surprised by const-drops running or not running, and some who are surprised that their structs are bigger than they should be. I tend to think the former will only matter to Rust supermavens, and they can learn the way that the desugaring works, it's straightforward enough. The latter is an invisible tax across Rust codebases that may impact every user. Expanded version: I think there's definite tension between several good Rust design principles: Efficient by default -- the idiomatic, obvious Rust code should generate efficient things ~the same as what you would get if you did it by hand, or perhaps more efficient. To that end, if you desugared No need for a 'sufficiently smart compiler' -- we should not be leaning on super whiz-bang optimizations to get that efficient by default, just the "obvious" ones that compilers typically do, such as inlining, copy prop, CSE. The kind of thing you would do by hand automatically. (We kinda cheat on this one, sometimes, leaning on fancy alias analysis, I think that the work on minirust etc may let us out of that trap.) Stability without stagnation -- we should try to avoid changing behavior without a strong reason. Compiler and stdlib aren't special -- we try to expose primitives users could build themselves (or at least have a plan that they can eventually do so...). Context-free programming -- this is a tricky one, but obviously we aim to reduce the context needed for people to understand what some code will do when it executes. Probably need to either expand or refine this to be more specific. This one is somewhat aspirational but: Define through desugaring -- there should be a convenient syntax and an explicit syntax; the convenient one should desugar to the explicit one in a straightforward way. That is then used to resolve non-obvious edge cases around the convenient syntax. Looking forward, I think we want
Reading over the proposed optimization, it seems to violate "compiler isn't special", in that I don't think we would ever expect to expose that kind of test to a user-defined macro. That's not the end of the world, but it seems unfortunate. Doing no optimization violates efficient by default -- this may not matter, it's only a small thing, on its own I might say "whatever" but it also changes behavior. My inclination is to try and preserve wins when we can. I think my proposal wins on stability and perf by default; I think it is neutral towards "define through desugaring", it's a bit more of a complex desugaring, but not wildly so, and it increases consistency with some other things (e.g., The optimization loses big on stdlib isn't special and I think that's kinda worse. |
|
FWIW, Niko's “compiler and stdlib aren't special" argument has mostly won me over. If However, this is in tension with with the desire to support |
|
I've given this some thought, I've also talked to @traviscross and @joshtriplett. I'm finding that I have a hard time convincing myself one way or the other on this! I suspect that BOTH of these are, to a first approximation, true:
If you could convince me that one of those was not true -- that somebody would notice -- that'd push me one way or the other more firmly. But I'd need some data. I'm going to try and see how often repeated variables occur in practice. Assuming my assumptions are valid -- that neither is all that big a deal -- then you have two competing, but largely abstract, principles
I think both are important. I go back and forth on which I think are more important. When it comes to the "fancy compiler optimization", yeah, it kind of lets you have both, but I find it overengineered for the problem, and it expands our "scope" of what it takes to achieve efficiency. If efficiency matters that much, I might rather do it the simple way of saying "we deduplicate at the string level". It is, in a way, less surprising to me. |
FWIW, I sympathize with that. I have also gone through phases of preferring either approach to this.^^ Though IMO the compiler optimizations actually provide a nice way out of this, I find them an elegant solution (have our semantic cake and eat the perf benefits, too) -- except that @m-ou-se doesn't like them, which gives me pause. |
|
So I wrote a little script (gist) to find all string literals, count the number with ANY interpolation variables (well, braces anyhow) and then count the number with repeats. I ran it across the rust repo and got 2.5%, though that number includes tests. I'd like to run it across crates.io. Do with that what you will. |
|
I ran the script across the top 222 crates from crates.io. I found that about 5% of strings have repeated variables...
...that's actually more than I expected! It pushes me to think I am right to hold this concern. |
|
One more piece of data: I found exactly ZERO instances of repeated "capital" identifiers. e.g., |
|
If we assume that 5% is common enough that we DO want avoid an extra field for local variables, then I think it's reasonable to assume we also want to avoid an extra field for fields. I don't see why Deref impls can, technically, have side-effects. So while This further pushes me to the conclusion that the most appealing options are
I think a key variable may be how much you think it matters whether |
I do. Because although I would very likely write format!("{x} {x}")then also if I ever stumble accross this code: format!("{self.field} {self.field}")I would very likely, and spontaneously, deduplicate it myself into the following to improve readability: format!("{x} {x}", x=self.field)Which I think is, if I'm not alone, an argument in favour of syntactic deduplication. |
|
@nikomatsakis: I don't know. Presumably we still wouldn't want to deduplicate value expressions. I just struggle with the idea that Let's also dig into the axiom that the compiler and stdlib aren't special. In a world where functions return places, the lowering would need type information to deduplicate only place expressions. That seems more complicated to me than @dianne's AST → HIR lowering optimization. But if the desugaring doesn't deduplicate place expressions, then this isn't a problem. |
|
We already can't really syntactically distinguish place expressions from value expressions -- we need name resolution at least, which arguably is a semantic analysis (user-defined macros cannot do it). |
I'm working on a somewhat more precise analysis. I've instrumented rustc to capture details about format strings. I'll be running this through crater. It's built on top of @dianne's branch (in #152480) so that we can see the effect of that optimization and how much it matters whether we do it at all. So far, I've run this on a stage 2 build of the compiler and standard library (including all dependencies). In this sample, of format strings that use interpolation at all, only 0.5% have any duplicates. Of these, almost all are places, and @dianne's optimization recovers 96.1% of the size cost (i.e., all but 48 bytes total). The cost of not doing deduplication at all (i.e., not doing dianne's optimization) on the See below for the full results. |
|
Based on an instrumented top-10k crater run (in #154205, which includes 10,885 crates), here's what I found. Excluding 22 outlier crates, except where mentioned: Only 1.8% of crates and 0.1% of The median per-crate cost of not deduplicating at all is zero bytes. Considering only the 194 affected crates, the median cost is 32 bytes (mean: 46 bytes). The total cost across all these crates, summed together, is under 9KB. Including the 22 outliers, the total cost (summed across all 216 affected crates) is under 28KB. With dianne's optimization, only 27 non-outlier and 37 total crates are affected, with a total cost (summed across all affected crates) of under 1.4KB and 7.4KB, respectively. The outlier crate most affected by turning off deduplication is hddsgen at 7KB. Every single duplicate for this crate, though, is a place, so dianne's optimization would drop this cost to zero. This crate was first published 27 days ago. The next-largest outlier crate is pikpaktui, first published 6 weeks ago, at 2.8KB. This one doesn't benefit at all from the optimization. I don't mean any judgment by saying this, but these two and many earlier outlier crates I looked at seem heavily AI-generated. There's something about the way the models write format strings (and the number of them they write), at least in these outlier cases, that seems different to me than what humans do. Anyway, the full report is below. For my part, I judge the practical cost of not deduplicating as ε-zero. |
For awareness there is a crater issue regarding top-{n} not actually testing the top-n crates: |
|
checks Yup, it doesn't even compile majorly-used crates like |
|
OK. I'll schedule a full crater run then. |
|
Looking at the data from an instrumented full crater run, the numbers (now have better confidence intervals and) aren't meaningfully different than what was found running it against an arbitrary set of 10k crates. Most importantly:
I.e., the effect of not deduplicating at all remains ε-zero. Full report |
|
The vast majority of the projects where this is important, are small embedded projects where binary size is important. You won't find much of those on crates.io. There are plenty of library crates for embedded on crates.io, but those often don't use much string formatting themselves; that's something that usually happens mostly in the actual binary projects which are often not open source. There are many changes we could make to Rust that don't meaningfully impact most crates, but that doesn't mean we should purposely cause regressions. I will dread the day that I have to explain to someone that adding a 'meaningless'
The representation of Once we no longer use wide pointers for the arguments, no longer storing the fmt function pointer in the args array, the impact will be bigger. This gets much more significant if we add the |
|
@m-ou-se Do you have any ways we could measure that impact? Maybe we can bring in voices from the embedded working group? I don't know who's the right people to cc there, but it'd be cool if people could run their own measurements and report in. @traviscross is it easy to identify those values as a percentage of total format args, as well? and as a percentage of total binary size? |
|
I want to see if people agree with the way I'm thinking about this. I don't like the idea of approving this contingent on the optimization. I think the optimization is too complex to be in Rust's reference; if the impact is that significant, I'd like it to be something that we guarantee. This isn't to say I don't like the optimization, I think it's "ok", I just don't like it being a key part of the logic we stand on. I'd rather we evaluate the "de-duplicated via syntactic criteria" or "not de-duplicated", which are kind of the two extremes. The argument in favor of "don't deduplicate" is basically "this complexity isn't worth it, the impact in real life is negligible, and it cleans up the semantics corner cases". The argument in favor of "deduplicate" is basically "Sure, it's clean in one perspective, but it's not what people will actually want most of the time, we should do what they want, particularly since the semantics are very unlikely to be surprising in practice." I find both of these peruasive-- I don't think the impact on semantics is all that meaningful, but I also think it's easier to explain the simpler desugaring. I don't think the impact on binary sizes is really all that significant, but I also think it's more measurable than you might think, and I'd be curious if indeed it has outsized impact on a particular population. <-- This is why i'm asking @m-ou-se about how to loop in a few more embedded devs. (There is also the aspect of preserving existing behavior, but this doesn't strike me as clear cut, it's not a regression in the usual sense of the term, so I'm not especially persuaded by that.) |
|
To follow up on something I half said in the meeting, the other half of this is: I'd like to hear @traviscross, perhaps in another forum or perhaps here, or maybe a link to it, your take on how this impacts @joshtriplett's RFC. What is the alternate syntax you would want? To put it in terms of process, I'd kind of like the lang-team as a whole to give vibes both on the question at hand here and on the question of "suppose we decided to keep things as they are here and josh changed the RFC, what would be the concern then". Because I think we are premising somewhat on "we should do this to unblock that RFC" and I'm not entirely convinced that RFC ought to be considered blocked. |
If we were to not clean up the opsem on this such that interpolated expressions under the current syntax follow our expected language semantics for expressions, then I would propose, when extending the interpolation syntax to support more expressions, that we pick a distinct syntax for it (and, for that new syntax, follow our opsem for expressions). (The further we go into supporting arbitrary expressions, the more any divergence from our generally-intended opsem for expressions would matter in practice.) |
At the moment, I don't have any particular proposals in mind for the specific other syntax — just that it be distinct. |
|
Given that deduplication is a tool the current internal representation offers (and it sounds like what @m-ou-se is saying implies that it might get even more beneficial to use deduplication in the future?) I feel it’s suboptimal if our “best”/“nicest” (and newest) syntax for creating So with the benefit of the newer syntax, arguably people should usually be encouraged to re-write their code from something like let bar = 123;
format!("{} {} {0}", long_complicated_expression(), bar);or like let bar = 123;
format!("{foo} {} {foo}", bar, foo=long_complicated_expression(), bar);over to the "new" style of something like: let foo = &long_complicated_expression();
let bar = 123;
format!("{foo} {bar} {foo}");If that new form loses deduplication, that might be a problem for code that has not yet been re-written in this way. I wonder @traviscross if you could - for a bigger picture (and admittedly still limited to only the current benefits available from the deduplicated case of internal representation) - somehow manage to adjust your crater test to see what effects "disabling all deduplication" may have (i.e. including in cases where currently " |
I would not agree with that. Introducing new let-bindings just for a format string does not strike me as something we should encourage. Personally I would recommend writing this as format!("{foo} {bar} {foo}", foo=long_complicated_expression()); |
|
Some of the discussion above frames this as 'consistency vs efficiency' or 'field expressions vs binary size', but I think none of that framing is correct. We can just do both. rust-lang/rfcs#3626 proposes this works: println!("{z.field1} {z.field2} {z.field3}", z = SomeStruct::new());In the future, I'd like that fmt::Arguments object to only store a single pointer to that Given that, it feels perfectly consistent to me to have these two be equivalent: println!("{z.field1} {z.field2} {z.field3}", z=z);
println!("{z.field1} {z.field2} {z.field3}");So if we want rust-lang/rfcs#3626, it is entirely consistent to not make the change proposed in this PR. Which is also the thing we need for optimal efficiency. Footnotes
|
|
I'd like that fmt::Arguments object to only store a single pointer to that z object
How is that supposed to work, especially when the fields are behind deref coercions?
|
Expressed per invocation (across 21,663,853 unique invocations of
|
One option would be for the static part of fmt::Arguments to store fn pointers to wrapped Whether that's a good idea or not, I don't know yet. There are plenty of arguments for and against that approach. There might be other options. The point is that the framing of "we need this change for rust-lang/rfcs#3626" is just wrong. We don't know yet what we need for that. That's a discussion that still needs to happen. Given that that RFC proposes that This PR is about fixing #145739, about a highly unusual "internal mutability in const in format" situation (that Clippy already warns about), which is quite possibly not a bug at all, and isn't worth regressing any performance over. But the discussion has diverged and is now about how we would need this change for rust-lang/rfcs#3626 or what would be most consistent in a potential future version of Rust that includes that new feature. But we haven't even established how that RFC would work and whether this change is needed for that feature or not. What's the expression in English? Cart before the horse? 🛒🐴I think we should:
Right now it feels like we're trying to get this PR through based on a gut feeling that it might help with an RFC that we haven't worked out in enough detail yet. |
View all comments
Resolves #145739
I ran crater with #149291.
While there are still a few seemingly flaky, spurious results, no crates appear to be affected by this breaking change.
The only hit from the lint was
https://github.com/multiversx/mx-sdk-rs/blob/813927c03a7b512a3c6ef9a15690eaf87872cc5c/framework/meta-lib/src/tools/rustc_version_warning.rs#L19-L30,
which performs formatting on consts of type
::semver::Version. These constants contain a nested::semver::Identifier(Version.pre.identifier) that has a custom destructor. However, this case is not impacted by the change, so no breakage is expected.