Skip to content

wit-parser: Remove duplicate spans from UnresolvedPackage#2441

Merged
alexcrichton merged 3 commits intobytecodealliance:mainfrom
wilfreddenton:wit-parser-dedupe-spans
Feb 7, 2026
Merged

wit-parser: Remove duplicate spans from UnresolvedPackage#2441
alexcrichton merged 3 commits intobytecodealliance:mainfrom
wilfreddenton:wit-parser-dedupe-spans

Conversation

@wilfreddenton
Copy link
Copy Markdown
Contributor

#2418

This PR removes redundant parallel span vectors from UnresolvedPackage by storing spans directly on the items they describe. This follows up on #2419 which added span fields to Interface, World, TypeDef, and Function.

I recognize that this is a relatively large change so if there is a desire to split it up into separate pieces of work I am amenable to that.

Removed from UnresolvedPackage

  • interface_spans: Vec<InterfaceSpan> - The InterfaceSpan stored a span for itself and spans for the functions contained within it. These are now stored in Interface.span and Function.span respectively.

  • world_spans: Vec<WorldSpan> - The WorldSpan struct stored a span for itself and for imports, exports, and includes. World spans now come from World.span. Import/export spans are now stored directly on WorldItem variants. Include spans are stored in the new WorldInclude struct.

  • type_spans: Vec<Span> - Type definition spans now come from TypeDef.span.

The following span storage remains in UnresolvedPackage and could potentially be addressed in follow-up work:

  • package_name_span: Span - Could be moved to a new span field on PackageName
  • unknown_type_spans: Vec<Span> - For unresolved type references during parsing
  • foreign_dep_spans: Vec<Span> - For foreign dependency references
  • required_resource_types: Vec<(TypeId, Span)> - For resource type requirements

These were left as-is to keep this PR focused on the main deduplication work.

Structural changes

  • <Item>.span - All item span properties were changed from Option<Span> to Span (using Span::Unknown for the empty case) primarily to avoid a bunch of expect/unwraps in AST code which is guaranteed to have range spans. Those hypothetical unwraps are now relegated to panics in the start/end getters which are guarded by is_known calls in a few specific cases. If Option<Span> is the preferred representation then I can change this back. Besides reverting, another option could be to just return 0; however, this could potentially mask logical errors.

  • WorldItem enum - Added span: Span field to the Interface and Type variants. Changed Type(TypeId) to Type { id: TypeId, span: Span }.

  • WorldInclude struct - New struct that consolidates stability, id, names, and span into a single type. Replaces World.includes: Vec<(Stability, WorldId)> and World.include_names: Vec<Vec<IncludeName>>.

  • SourceMap::highlight_span - New method that takes a Span directly and returns Option<String>, handling the is_known() check internally. Cleaning up error highlighting was motivated by the change from Option<Span> to Span because I wanted to centralize and reduce the number of calls to is_known.

Changes in other crates

Pattern matches on WorldItem needed updating in:

  • wit-component - encoding, validation, printing, metadata
  • wit-dylib - world item iteration
  • wit-encoder - conversion from parser types

These changes are mechanical - updating WorldItem::Type(id) to WorldItem::Type { id, .. } and adding .. to WorldItem::Interface patterns.

Additionally some test fixtures needed to be regenerated to pick up the new structure of the types.

encapsulate world item span adjustment

simplify highlighting

update other crates to build/test correctly
@wilfreddenton wilfreddenton requested a review from a team as a code owner February 6, 2026 23:13
@wilfreddenton wilfreddenton requested review from alexcrichton and removed request for a team February 6, 2026 23:13
Copy link
Copy Markdown
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me, thank you again for your work on this it's super appreciated! Two minor comments below, but otherwise happy to merge this 👍

Comment on lines +194 to +198
"type": 4
"type": {
"id": 4
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how difficult it would be to configure serde to preserve the old encoding? The JSON output is known to be unstable-ish but if we can avoid changing it with a bit of configuration that's also worth it I think. If it's too difficult to preserve, though (e.g. requiring complicated custom impls) then this is fine to keep

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It required a custom variant serializer but it's just a simple wrapper on the existing serialize_id that just skips the span parameter. So I think this is ok.

Comment thread crates/wit-parser/src/ast/lex.rs Outdated
Comment on lines 25 to 36
pub enum Span {
/// No source location available (e.g., decoded from binary).
#[default]
Unknown,
/// A range in the source text.
Range {
/// The start of the range.
start: u32,
/// The end of the range (exclusive).
end: u32,
},
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of keeping this structure small-ish (8-bytes instead of this current representation of 12-bytes), could this retain the Span { u32, u32 } field representation? My thinking is then:

  • The start and end fields would become private, retaining the function-based accessors.
  • Span::UNKNOWN would be Span { start: u32::MAX, end: u32::MAX } which is a sentinel value
  • is_known would test for start/end being not equal u32::MAX
  • There'd be a Span::new(a, b) constructor which would assert a/b are not both equal to u32::MAX and then would return a "real span"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I like this a lot better. Let me know if you want any changes to the assertions made in the impl Span.

@alexcrichton
Copy link
Copy Markdown
Member

Looks great, thanks!

@alexcrichton alexcrichton added this pull request to the merge queue Feb 7, 2026
Merged via the queue into bytecodealliance:main with commit 39cd181 Feb 7, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants