Skip to content

fix: cast Binary/String dictionary to view#8912

Merged
Jefffrey merged 2 commits into
apache:mainfrom
Jefffrey:fix-dict-to-view-cast
Dec 12, 2025
Merged

fix: cast Binary/String dictionary to view#8912
Jefffrey merged 2 commits into
apache:mainfrom
Jefffrey:fix-dict-to-view-cast

Conversation

@Jefffrey
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Be able to successfully cast from Dictionary type to View types.

What changes are included in this PR?

Add checks on which array types can use the fast path that was previously erroring.

Also do a little refactoring in surrounding code.

Are these changes tested?

Added new tests.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Nov 23, 2025
Comment on lines +39 to +49
// TODO: handle LargeUtf8/LargeBinary -> View (need to check offsets can fit)
// TODO: handle cross types (String -> BinaryView, Binary -> StringView)
// (need to validate utf8?)
(Utf8, Utf8View) => view_from_dict_values::<K, Utf8Type, StringViewType>(
array.keys(),
array.values().as_string::<i32>(),
),
(Binary, BinaryView) => view_from_dict_values::<K, BinaryType, BinaryViewType>(
array.keys(),
array.values().as_binary::<i32>(),
),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change; we check the values beforehand to see if it is valid for the fast path, which was missing before (and the assumption lead to an error). I've intentionally kept the behaviour as intended (limited the fast path only for Dictionary<Utf8> -> Utf8View, Dictionary<Binary> -> BinaryView) since was mainly interested in just fixing the cast bug. Left comments for potentially extending this fast path to be valid for more combinations (can raise issues for this).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised for tracking: #8985


match to_type {
Dictionary(to_index_type, to_value_type) => {
let dict_array = array
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this inner logic into a separate function to be similar to how the other arms delegate to functions for the full logic.

array: &dyn Array,
// Unpack a dictionary into a flattened array of type to_type
pub(crate) fn unpack_dictionary<K: ArrowDictionaryKeyType>(
array: &DictionaryArray<K>,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the cast to dictionary array to the parent function (dictionary_cast()) so we don't need to always cast in each of the specialized functions.

@Jefffrey Jefffrey marked this pull request as ready for review November 23, 2025 11:04
@Jefffrey Jefffrey changed the title fix: cast Binary/String dictionary to view issue fix: cast Binary/String dictionary to view Nov 23, 2025
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Jefffrey -- sorry for the late review -- I have been trying to work through the arrow-rs backlog recently

I double checked that this PR fixes the bug and it makes sense to me (and is an improvement in clarity in my mind).

I also merged up from main to resolve a conflict

Thank you again

}

#[test]
fn test_string_dicts_to_binary_view() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that these tests cover the fix by running them without the changes in this PR and they fail as expected


---- cast::tests::test_binary_dicts_to_string_view stdout ----

thread 'cast::tests::test_binary_dicts_to_string_view' (16945850) panicked at arrow-cast/src/cast/mod.rs:12222:72:
called `Result::unwrap()` on an `Err` value: ComputeError("Internal Error: Cannot cast Utf8View to StringArray of expected type")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- cast::tests::test_string_dicts_to_binary_view stdout ----

thread 'cast::tests::test_string_dicts_to_binary_view' (16946091) panicked at arrow-cast/src/cast/mod.rs:12194:74:
called `Result::unwrap()` on an `Err` value: ComputeError("Internal Error: Cannot cast BinaryView to BinaryArray of expected type")


failures:
    cast::tests::test_binary_dicts_to_string_view
    cast::tests::test_string_dicts_to_binary_view

test result: FAILED. 288 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

@Jefffrey Jefffrey merged commit 08dcc0b into apache:main Dec 12, 2025
26 checks passed
@Jefffrey Jefffrey deleted the fix-dict-to-view-cast branch December 12, 2025 10:49
@Jefffrey
Copy link
Copy Markdown
Contributor Author

Thanks @alamb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot cast string dictionary to binary view

2 participants