You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently have three closely-related symbol types.
Symbol is the fundamental type. A Symbol is an index. All operations work on that index. StableHash is not implemented for it, but there's no reason why it couldn't be. A Symbol can be a gensym, which gets special treatment -- it's a guaranteed unique index, even if its chars have been seen before.
InternedString is a thin wrapper around Symbol. You can convert a Symbol to an InternedString. It has two differences with Symbol.
Its PartialOrd/Ord/Hash impls use the chars, rather than the index.
Gensym-ness is ignored/irrelevant.
LocalInternedString is an alternative that contains a &str. You can convert both Symbol and InternedString to LocalInternedString. Its PartialOrd/Ord/Hash impls (plus PartialEq/Eq) naturally work on chars. Its main use is to provide a way to look some or all of the individual chars within a Symbol or InternedString, which is sometimes necessary.
I have always found the differences between these types confusing and hard to remember. Furthermore, the distinction between Symbol and InternedString is subtle and has caused bugs.
Also, gensyms in general make things a lot more complicated, and it would be great to eliminate them.
Here's what I would like as a final state.
Symbol exists.
InternedString does not exist.
LocalInternedString perhaps exists, but is only used temporarily when code needs access to the chars within a Symbol. Alternatively, Symbol could provide a with() method (like InternedString currently has) that provides access to the chars, and then LocalInternedString wouldn't be needed.
Symbol's impl of Hash uses the index, and its impl of StableHash uses the chars.
Not sure about Symbol's impl of PartialOrd/Ord. If a stable ordering is really needed (perhaps for error messages?) we could introduce a StableOrd trait and use that in the relevant places, or do a custom sort, or something.
Gensyms don't really exist. They are simulated: when you call gensym(), it just appends a unique suffix. It's worth noting that gensyms are always identifiers, and so the unique suffix can use a non-identifier char. And Interner could keep a counter. So "foo" would gensym to something lke "foo$1", "foo$2", etc. Once the suffix is added, they would just be treated as normal symbols (in terms of hashing, etc.) I would hope that identifier gensyms would never be compared with non-identifier symbols, so a false positive equality match should be impossible. (Different types for identifier symbols and non-identifier symbols would protect against that, but might cause other difficulties.) Alternatively, syntax_pos::symbol::Symbol::gensym() is incompatible with stable hashing. #49300 talks about other ways of dealing with gensyms.
All this should also help performance, because we'd end up with more operations on indexes, and only the necessary ones on chars (which require TLS lookups).
I haven't even touched on the way lifetimes work in the interner, which are subtle and error-prone. But having fewer types would only make improvements on that front simpler.
We currently have three closely-related symbol types.
Symbolis the fundamental type. ASymbolis an index. All operations work on that index.StableHashis not implemented for it, but there's no reason why it couldn't be. ASymbolcan be a gensym, which gets special treatment -- it's a guaranteed unique index, even if its chars have been seen before.InternedStringis a thin wrapper aroundSymbol. You can convert aSymbolto anInternedString. It has two differences withSymbol.PartialOrd/Ord/Hashimpls use the chars, rather than the index.LocalInternedStringis an alternative that contains a&str. You can convert bothSymbolandInternedStringtoLocalInternedString. ItsPartialOrd/Ord/Hashimpls (plusPartialEq/Eq) naturally work on chars. Its main use is to provide a way to look some or all of the individual chars within aSymbolorInternedString, which is sometimes necessary.I have always found the differences between these types confusing and hard to remember. Furthermore, the distinction between
SymbolandInternedStringis subtle and has causedbugs.
Also, gensyms in general make things a lot more complicated, and it would be great to eliminate them.
Here's what I would like as a final state.
Symbolexists.InternedStringdoes not exist.LocalInternedStringperhaps exists, but is only used temporarily when code needs access to the chars within aSymbol. Alternatively,Symbolcould provide awith()method (likeInternedStringcurrently has) that provides access to the chars, and thenLocalInternedStringwouldn't be needed.Symbol's impl ofHashuses the index, and its impl ofStableHashuses the chars.Symbol's impl ofPartialOrd/Ord. If a stable ordering is really needed (perhaps for error messages?) we could introduce aStableOrdtrait and use that in the relevant places, or do a custom sort, or something.gensym(), it just appends a unique suffix. It's worth noting that gensyms are always identifiers, and so the unique suffix can use a non-identifier char. AndInternercould keep a counter. So "foo" would gensym to something lke "foo$1", "foo$2", etc. Once the suffix is added, they would just be treated as normal symbols (in terms of hashing, etc.) I would hope that identifier gensyms would never be compared with non-identifier symbols, so a false positive equality match should be impossible. (Different types for identifier symbols and non-identifier symbols would protect against that, but might cause other difficulties.) Alternatively, syntax_pos::symbol::Symbol::gensym() is incompatible with stable hashing. #49300 talks about other ways of dealing with gensyms.I haven't even touched on the way lifetimes work in the interner, which are subtle and error-prone. But having fewer types would only make improvements on that front simpler.
Thoughts?
CC @petrochenkov @Zoxc @eddyb @Mark-Simulacrum @michaelwoerister