Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This is part of the larger project to implement StringViewArray -- see #5374
In #5508, @RinChanNOWWW tracked adding casting to/from StringArray 🙏 ❤️
This ticket tracks adding additional data type support for StringViewArray and ByteViewArray in the cast kernel: https://docs.rs/arrow/latest/arrow/compute/kernels/cast/index.html
Many systems (e.g InfluxDB 3.0, Apache DataFusion Comet, and I think Coralogix) use DictionaryArrays. Thus supporting casting to/from DictionaryArray will be important to permit easy integration into downstream consumers
Describe the solution you'd like
Specifically the following conversions should be supported in the cast kernels:
StringViewArray <--> DictionaryArray<IndexType, Utf8>
StringViewArray <--> DictionaryArray<IndexType, LargeUtf8>
And similarly for Binary:
BinaryViewArray <--> DictionaryArray<IndexType, Binary>
BinaryViewArray <--> DictionaryArray<IndexType, LargeBinary>
Notes:
- Good test coverage is the most important part of this ticket
- I recommend smaller PRs if possible
- I think
DictionaryArray<IndexType, LargeUtf8> --> StringViewArray can be implemented without copying strings
- I think
StringViewArray --> DictionaryArray<IndexType, LargeUtf8> will likely require copying the strings
Describe alternatives you've considered
I think casting from Dictionary
Additional context
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This is part of the larger project to implement
StringViewArray-- see #5374In #5508, @RinChanNOWWW tracked adding casting to/from StringArray 🙏 ❤️
This ticket tracks adding additional data type support for
StringViewArrayandByteViewArrayin thecastkernel: https://docs.rs/arrow/latest/arrow/compute/kernels/cast/index.htmlMany systems (e.g InfluxDB 3.0, Apache DataFusion Comet, and I think Coralogix) use DictionaryArrays. Thus supporting casting to/from
DictionaryArraywill be important to permit easy integration into downstream consumersDescribe the solution you'd like
Specifically the following conversions should be supported in the cast kernels:
StringViewArray<-->DictionaryArray<IndexType, Utf8>StringViewArray<-->DictionaryArray<IndexType, LargeUtf8>And similarly for
Binary:BinaryViewArray<-->DictionaryArray<IndexType, Binary>BinaryViewArray<-->DictionaryArray<IndexType, LargeBinary>Notes:
DictionaryArray<IndexType, LargeUtf8>-->StringViewArraycan be implemented without copying stringsStringViewArray-->DictionaryArray<IndexType, LargeUtf8>will likely require copying the stringsDescribe alternatives you've considered
I think casting from Dictionary
Additional context