Skip to content

Fix RecordBatch::normalize() null bitmap bug and add StructArray::flatten()#9733

Merged
alamb merged 4 commits into
apache:mainfrom
sqd:oss_normalize_bug
Apr 22, 2026
Merged

Fix RecordBatch::normalize() null bitmap bug and add StructArray::flatten()#9733
alamb merged 4 commits into
apache:mainfrom
sqd:oss_normalize_bug

Conversation

@sqd
Copy link
Copy Markdown
Contributor

@sqd sqd commented Apr 15, 2026

Currently RecordBatch::normalize() has a bug in that the top level struct's null bitmap is not propagated into the resulting normalized arrays' null bitmap. In other words, a child element may suddenly appear non-null, losing the fact that the parent level struct is null at that index. See the test in this change for a bug reproduction.

This change fixes that behavior. Also adds StructArray::flatten() which mirrors arrow-cpp's semantics and handles the aforementioned behavior correctly. The fixed RecordBatch::normalize() now uses StructArray::flatten() under the hood.

Which issue does this PR close?

Are these changes tested?

Yes

Are there any user-facing changes?

No

…tten()

Currently RecordBatch::normalize() has a bug in that the top level
struct's null bitmap is not propagated into the resulting normalized
arrays' null bitmap. In other words, a child element may suddenly
appear non-null, losing the fact that the parent level struct is null at
that index. See the test in this change for a bug reproduction.

This change fixes that behavior. Also adds StructArray::flatten() which
mirrors arrow-cpp's semantics and handles the aforementioned behavior
correctly. The fixed RecordBatch::normalize() now uses
StructArray::flatten() under the hood.
@github-actions github-actions Bot added the arrow Changes to the arrow crate label Apr 15, 2026
@sqd
Copy link
Copy Markdown
Contributor Author

sqd commented Apr 15, 2026

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 15, 2026

@negli-me, since you added the normalize function, could you help review this PR

It was added in

@sqd
Copy link
Copy Markdown
Contributor Author

sqd commented Apr 18, 2026

I think this is the correct github handle, without "e" @ngli-me

Copy link
Copy Markdown
Contributor

@ngli-me ngli-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the ping, switched companies recently and got stuck in approval purgatory for outside work. Code looks good, tests had no issues for me. Thanks for the fix.

@sqd
Copy link
Copy Markdown
Contributor Author

sqd commented Apr 19, 2026

Thank you @ngli-me ! This is ready for another look @alamb

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sqd and @ngli-me (the right one this time!)

@sqd
Copy link
Copy Markdown
Contributor Author

sqd commented Apr 20, 2026

Thank you @alamb ! I am very new to this repo so not sure what the process is, but feel free to merge whenever, or let me know what else is needed.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 22, 2026

Thanks @sqd -- I will merge this PR. Sorry for the delay

@alamb alamb merged commit 9d27619 into apache:main Apr 22, 2026
26 checks passed
Rich-T-kid pushed a commit to Rich-T-kid/arrow-rs that referenced this pull request Jun 2, 2026
…tten() (apache#9733)

Currently RecordBatch::normalize() has a bug in that the top level
struct's null bitmap is not propagated into the resulting normalized
arrays' null bitmap. In other words, a child element may suddenly appear
non-null, losing the fact that the parent level struct is null at that
index. See the test in this change for a bug reproduction.

This change fixes that behavior. Also adds StructArray::flatten() which
mirrors arrow-cpp's semantics and handles the aforementioned behavior
correctly. The fixed RecordBatch::normalize() now uses
StructArray::flatten() under the hood.

# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Closes apache#9732.

# Are these changes tested?

Yes

# Are there any user-facing changes?

No

Co-authored-by: Han You <han.you@imc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RecordBatch::normalize() does not propagate top level null bitmap into the results

3 participants