[WIP] Token-based outer attributes handling#76130
[WIP] Token-based outer attributes handling#76130Aaron1011 wants to merge 6 commits intorust-lang:masterfrom
Conversation
|
(rust_highfive has picked a reviewer for you, use r? to override) |
In PR rust-lang#76130, I add a fourth field, which makes using a tuple variant somewhat unwieldy.
This comment has been minimized.
This comment has been minimized.
…nkov Factor out StmtKind::MacCall fields into `MacCallStmt` struct In PR rust-lang#76130, I add a fourth field, which makes using a tuple variant somewhat unwieldy.
602b066 to
083792b
Compare
|
@bors try @rust-timer queue |
|
Awaiting bors try build completion |
|
⌛ Trying commit 083792bcf67d1e50afc9ad50500ab011d34b0bee with merge 30ddab741f8c02392c4383445436138f5a6ec53d... |
|
☀️ Try build successful - checks-actions, checks-azure |
|
Queued 30ddab741f8c02392c4383445436138f5a6ec53d with parent 80cacd7, future comparison URL. |
|
Finished benchmarking try commit (30ddab741f8c02392c4383445436138f5a6ec53d): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
083792b to
23fd9fb
Compare
|
@bors try @rust-timer queue |
|
Awaiting bors try build completion |
|
⌛ Trying commit 23fd9fbd9d07096ca254592013a42b4cae12ba29 with merge b3535fba76a3fc7202284b7699afff8f8831e461... |
|
☀️ Try build successful - checks-actions, checks-azure |
|
Queued b3535fba76a3fc7202284b7699afff8f8831e461 with parent 42d896a, future comparison URL. |
|
Finished benchmarking try commit (b3535fba76a3fc7202284b7699afff8f8831e461): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
|
@Aaron1011 |
Fixes rust-lang#74616 Makes progress towards rust-lang#43081 Unblocks PR rust-lang#76130 When pretty-printing an AST node, we may insert additional parenthesis to ensure that precedence is properly preserved in code we output. However, the proc macro implementation relies on comparing a pretty-printed AST node to the captured `TokenStream`. Inserting extra parenthesis changes the structure of the reparsed `TokenStream`, making the comparison fail. This PR refactors the AST pretty-printing code to allow skipping the insertion of additional parenthesis. Several freestanding methods are moved to trait methods on `PrintState`, which keep track of an internal `insert_extra_parens` flag. This flag is normally `true`, but we expose a public method which allows pretty-printing a nonterminal with `insert_extra_parens = false`. To avoid changing the public interface of `rustc_ast_pretty`, the freestanding `_to_string` methods are changed to delegate to a newly-crated `State`. The main pretty-printing code is moved to a new `state` module to ensure that it does not accidentally call any of these public helper functions (instead, the internal functions with the same name should be used).
…ochenkov Refactor AST pretty-printing to allow skipping insertion of extra parens Fixes rust-lang#75734 Makes progress towards rust-lang#43081 Unblocks PR rust-lang#76130 When pretty-printing an AST node, we may insert additional parenthesis to ensure that precedence is properly preserved in code we output. However, the proc macro implementation relies on comparing a pretty-printed AST node to the captured `TokenStream`. Inserting extra parenthesis changes the structure of the reparsed `TokenStream`, making the comparison fail. This PR refactors the AST pretty-printing code to allow skipping the insertion of additional parenthesis. Several freestanding methods are moved to trait methods on `PrintState`, which keep track of an internal `insert_extra_parens` flag. This flag is normally `true`, but we expose a public method which allows pretty-printing a nonterminal with `insert_extra_parens = false`. To avoid changing the public interface of `rustc_ast_pretty`, the freestanding `_to_string` methods are changed to delegate to a newly-crated `State`. The main pretty-printing code is moved to a new `state` module to ensure that it does not accidentally call any of these public helper functions (instead, the internal functions with the same name should be used).
|
Blocked on #77250 |
Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR rust-lang#76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see rust-lang#77250 (comment)). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
…on, r=petrochenkov Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR rust-lang#76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see rust-lang#77250 (comment)). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
|
Triage: A reminder that #77250 has been merged, so it seems maybe this can continue... |
|
This is currently blocked on further work on statement attributes |
|
Closing in favor of #80689 |
Makes progress towards #43081
We now capture tokens for attributes, and remove them from the tokenstream when applying
#[cfg]attributes (in addition to modifying the AST). As a result, derive-proc-macros now receive the exact input (instead of the pretty-printed/retokenized input), even when fields/variants get#[cfg]-stripped.Several changes are made in order to accomplish this:
PreexpTokenStreamandPreexpTokenTreeare added. These are identical toTokenStreamandPreexpTokenstream,with the exception of anOuterAttributesvariant inPreexpTokenTree. This is used to represent captured attributes, allowing the target tokens to be removed by#[cfg]-stripping(and when invoking attribute macros).
Attribute. This allows us to removeprepend_attributes, which required pretty-printing/retokenizing attributes in certain cases.collect_tokenswas rewritten. The implementation is now much simpler - instead of keeping track of nestedTokenTree::Delimitedat various deps, we collect all tokens into a flat buffer (e.g. we pushTokenKind::OpenDelimandTokenKind::CloseDelim. After capturing, we losslessly re-package these tokens back into the normalTokenTree::Delimitedstructure.PreexpTokenStreamon AST s structs instead of a plainTokenStream. This contains both the attributes and the target itself.parse_outer_attributesnow passes the parsed attributes to a closure, instead of simply returning them. The closure is responsible for parsing the attribute target, and allows us to automatically construct the properPreexpTokenStream.HasAttrsnow has avisit_tokensmethod. This is used during#[cfg]-stripping to allow us to remove attribute targets from thePreexpTokenStream.This PR is quite large (though ~1000 lines are tests). I opened it mainly to show how all of the individual pieces fit together, since some of them (e.g. the
parse_outer_attributeschange) don't make sense if we're not going to land this PR. However, many pieces are mergeable on their own - I plan to split them into their own PRs.TODO:
#[cfg]/#[cfg_attr]are duplicated. This is because we run the same code for both the AST-based attributes and the token-based attributes. We could possibly skip validating the token-based attributes, but I was worried about accidentally skipping gating if the pretty-print/reparse check ever fails. These are deduplicated by the diagnostic infrastructure outside of compiletest, but it would be nice to fix this.PreexpTokenStreamalongside outer attributes, which allows us to handle#![cfg(FALSE)]and remove its parent. This should be good enough to handle all stable uses of inner attributes that are observable by proc-macros (currently, just cfg-stripping before#[derive]macros are invoked). Custom inner attributes are not handled.Parser::parse_stmt_without_recoverydoes not eat a trailing semicolon, which means we will not capture it in the tokenstream. I need to either refactor statement parsing to eat the semicolon inside theparse_outer_attributes_with_tokens, or manually adjust the captured tokens.cfg,cfg_attr,derive, or any non-builtin attribute). There are many scenarios where we can skip token collection for some/all ast structs (noderiveon an an item, nocfginside an item, no custom attributes on an item, etc.)