Skip to content

[Variant] add strict mode to cast_to_variant#8233

Merged
alamb merged 9 commits into
apache:mainfrom
codephage2020:feat/issue-8155
Sep 9, 2025
Merged

[Variant] add strict mode to cast_to_variant#8233
alamb merged 9 commits into
apache:mainfrom
codephage2020:feat/issue-8155

Conversation

@codephage2020

@codephage2020 codephage2020 commented Aug 27, 2025

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

cast_to_variant will panic for values of Date64 / Timestamp that can not be converted to NaiveDate

What changes are included in this PR?

  1. add new api :
    pub fn cast_to_variant_with_options(input: &dyn Array, strict: bool) -> Result<VariantArray, ArrowError>
  • strict = true: Returns errors on conversion failures (default behavior)
  • strict = false: Returns null values for failed conversions
  1. add some tests to test non-strict mode.
  2. refactor: eliminate duplication in timestamp conversion using macro

Are these changes tested?

Yes.

Are there any user-facing changes?

no.

Signed-off-by: codephage2020 <tingwangyan2020@163.com>
@github-actions github-actions Bot added the parquet-variant parquet-variant* crates label Aug 27, 2025
Signed-off-by: codephage2020 <tingwangyan2020@163.com>
@codephage2020 codephage2020 marked this pull request as ready for review August 27, 2025 15:50

@liamzwbao liamzwbao left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Left few comments

Comment thread parquet-variant-compute/src/cast_to_variant.rs
Comment thread parquet-variant-compute/src/type_conversion.rs Outdated
Comment thread parquet-variant-compute/src/type_conversion.rs Outdated
Signed-off-by: codephage2020 <tingwangyan2020@163.com>
Signed-off-by: codephage2020 <tingwangyan2020@163.com>
@codephage2020

Copy link
Copy Markdown
Contributor Author

Thanks for your reviews @liamzwbao .

All feedback implemented - ready for re-review! 🔍

CC @alamb .

@codephage2020 codephage2020 changed the title [Variant]add strict mode to cast_to_variant [Variant] add strict mode to cast_to_variant Aug 28, 2025

@scovich scovich left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, couple questions

Comment thread parquet-variant-compute/src/cast_to_variant.rs
/// * `strict` - If true, return error on conversion failure. If false, insert null for failed conversions.
pub fn cast_to_variant_with_options(
input: &dyn Array,
strict: bool,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any possibility/risk that we may need additional options in the future? If so, it might be better to pass a struct? Or maybe we just deal with that if/when it comes?

@codephage2020 codephage2020 Aug 29, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, very good discovery. I have thought about this implementation. We can refactor this part in the next PR.

Done.

Comment thread parquet-variant-compute/src/cast_to_variant.rs Outdated
Comment thread parquet-variant-compute/src/type_conversion.rs
Comment thread parquet-variant-compute/src/type_conversion.rs Outdated

@liamzwbao liamzwbao left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! it’d be good to address @scovich’s comments as well.

run_test_non_strict(
values,
vec![
None, // Invalid timestamp becomes null

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that for overflow value, we may need to convert it to Variant::Null instead of None like what we did for Decimal, see this comment for context. However, I don't think it should be in this PR, we could discuss and make them consistent later.

cc @alamb

codephage2020 and others added 4 commits August 30, 2025 00:11
Signed-off-by: codephage2020 <tingwangyan2020@163.com>
Co-authored-by: Ryan Johnson <scovich@users.noreply.github.com>
Signed-off-by: codephage2020 <tingwangyan2020@163.com>
Signed-off-by: codephage2020 <tingwangyan2020@163.com>
fn convert_timestamp(
/// Options for controlling the behavior of `cast_to_variant_with_options`.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct CastOptions {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite similar to https://docs.rs/arrow/latest/arrow/compute/struct.CastOptions.html

However that seems to be defined in arrow-compute

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought more about this and I think a dedicated CastOptions struct makes sense for variant

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically they are different kernels and so some options like FormatOptions for the normal cast kernel may not be relevant for this one

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @codephage2020 -- this looks good to me. Sorry for the delay in reviewing this PR.

It looks like this PR has a few merge conflicts, but once those are resolved this will be ready to go

@alamb

alamb commented Sep 8, 2025

Copy link
Copy Markdown
Contributor

Thank you @scovich and @liamzwbao for the reviews

@codephage2020

codephage2020 commented Sep 9, 2025

Copy link
Copy Markdown
Contributor Author

Thanks @codephage2020 -- this looks good to me. Sorry for the delay in reviewing this PR.

It looks like this PR has a few merge conflicts, but once those are resolved this will be ready to go

Thanks @alamb , these conflicts have been resolved.

@alamb alamb merged commit 77df2ee into apache:main Sep 9, 2025
12 checks passed
@alamb

alamb commented Sep 9, 2025

Copy link
Copy Markdown
Contributor

Thanks again everyone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] cast_to_variant will panic on certain Date64 or Timestamp Values values

4 participants