Skip to content

[arrow-ipc]: dictionary builders for delta - doc fix and integration tests for nested types#9853

Merged
alamb merged 5 commits into
apache:mainfrom
albertlockett:albert/delta-dict-minor-improvements
May 3, 2026
Merged

[arrow-ipc]: dictionary builders for delta - doc fix and integration tests for nested types#9853
alamb merged 5 commits into
apache:mainfrom
albertlockett:albert/delta-dict-minor-improvements

Conversation

@albertlockett
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #NNN.

Rationale for this change

In #9600 we added capability to call finish_preserve_values on any array builder, and it will propagate the choice to preserve dictionary values down into the nested builders. I thought it would be good to extend the integration tests we have in https://github.com/apache/arrow-rs/blob/main/arrow-ipc/tests/test_delta_dictionary.rs to cover the use cases, which was to call the method on builders with nested child builders (such as StructBuilder and ListBuilder).

While reviewing the PR for #9600 I also noticed a small issue with the docs related to using the StreamWriter with delta dictionaries, notably that we set up some options to use delta dictionaries but don't pass the options into the StreamWriter constructor:
https://docs.rs/arrow-ipc/58.1.0/arrow_ipc/writer/struct.StreamWriter.html#example---efficient-delta-dictionaries

What changes are included in this PR?

  • adds cases to the integration tests for delta dictionaries covering ListBuilder and StructBuilder
  • small docs correction for StreamWriter

Are these changes tested?

it is simply docs & tests

Are there any user-facing changes?

No

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Apr 29, 2026
@albertlockett albertlockett changed the title Delta Dictionary Builder doc fix and integration tests [arrow-ipc]: dictionary builders for delta - doc fix and integration tests for nested types Apr 29, 2026
@albertlockett
Copy link
Copy Markdown
Contributor Author

@JakeDern would you mind having a look at this?

@JakeDern
Copy link
Copy Markdown
Contributor

@JakeDern would you mind having a look at this?

Looks good to me, thanks for adding the tests!

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 1, 2026

FYI @adamreichold

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @albertlockett and @JakeDern

Comment thread arrow-ipc/src/tests/delta_dictionary.rs Outdated
RecordBatch::try_new(schema.clone(), vec![Arc::new(array) as ArrayRef]).unwrap()
}

/// build batches where the dictionary array is nested within a struct array
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth also mentioning that the dict array is the first array?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, can't hurt. added in e5d900c

@alamb alamb merged commit 70e4069 into apache:main May 3, 2026
26 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 3, 2026

Thanks @albertlockett

Rich-T-kid pushed a commit to Rich-T-kid/arrow-rs that referenced this pull request Jun 2, 2026
…tests for nested types (apache#9853)

# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Closes #NNN.

# Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

In apache#9600 we added capability to
call `finish_preserve_values` on any array builder, and it will
propagate the choice to preserve dictionary values down into the nested
builders. I thought it would be good to extend the integration tests we
have in
https://github.com/apache/arrow-rs/blob/main/arrow-ipc/tests/test_delta_dictionary.rs
to cover the use cases, which was to call the method on builders with
nested child builders (such as `StructBuilder` and `ListBuilder`).

While reviewing the PR for apache#9600 I also noticed a small issue with the
docs related to using the StreamWriter with delta dictionaries, notably
that we set up some options to use delta dictionaries but don't pass the
options into the `StreamWriter` constructor:

https://docs.rs/arrow-ipc/58.1.0/arrow_ipc/writer/struct.StreamWriter.html#example---efficient-delta-dictionaries

# What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- adds cases to the integration tests for delta dictionaries covering
`ListBuilder` and `StructBuilder`
- small docs correction for `StreamWriter`

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

it is simply docs & tests

# Are there any user-facing changes?

No

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants