Prevent `FixedSizeBinaryArray` `i32` offset overflows (try 2)#9872

alamb · 2026-05-01T20:07:41Z

Which issue does this PR close?

Closes Prevent FixedSizeBinaryArray::value offset truncation #9850

Rationale for this change

FixedSizeBinaryArray::value_offset_at works use i32 arithmetic which can overflow. For offsets beyond i32::MAX, that can be bad

What changes are included in this PR?

Prevent any FixedSizedBinaryArrays from being constructed where the offset calculation could overflow
Add some other overflow checks

As @adamreeve pointed out on #9850 there are several places where the i32 arithmetic is problematic in FixedSizeBinaryArray. I will fix them for real in a different, follow on PR, by switching to entirely usize based arithmetic for offset calculations

However, since I hope to backport this PR to older releases, I would like something that is easy to review and has the least potential for unintended consequences.

Are these changes tested?

I added unit tests. However, I can't find any way to fully trigger the actual paths short of trying to allocate very large arrays, which I don't think is appropriate for unit tests.

Are there any user-facing changes?

Better limit checking

alamb · 2026-05-01T20:18:15Z

        T: Iterator<Item = Option<U>>,
        U: AsRef<[u8]>,
    {
+        let value_size = value_length.to_usize().ok_or_else(|| {


this is a new check that value_length can be converted to usize without truncation / wrap (aka that value_length is not negative)

alamb · 2026-05-01T20:18:38Z

        })?;

        let nulls = NullBuffer::from_unsliced_buffer(null_buf, len);
+        Self::validate_lengths(value_size, len)?;


new call to validate_lengths

alamb · 2026-05-01T20:19:44Z


-        let size = size.unwrap_or(0).try_into().unwrap();
+        let value_size = size.unwrap_or(0);
+        Self::validate_lengths(value_size, len)?;


validate_lengths here

alamb · 2026-05-01T20:20:04Z

            _ => panic!("Expected data type to be FixedSizeBinary"),
        };

-        let size = value_length as usize;


here is another unchecked conversion from value_lengh to usize

alamb · 2026-05-01T21:23:04Z

            }
        };

+        Self::validate_lengths(s, len)?;


this is the key check -- it prevents creating arrays that are susceptible to this overflow

I added it on each constructor path

alamb · 2026-05-01T21:26:27Z

                    // the data.
-                    buffer.reserve(iter_size_hint * len);
-                    buffer.extend_zeros(slice.len() * prepend);
+                    if let Some(capacity) = iter_size_hint.checked_mul(len) {


I also updated some of the other arithmetic to used checked multiplication

adamreeve

This looks good to me thanks Andrew

alamb · 2026-05-04T20:39:42Z

Thank you for the review @adamreeve

alamb · 2026-05-05T14:05:58Z

I filed this ticket to track the follow on work to lift this restriction

(re) Allow Large FixedSizeBinaryArrays #9906

…ray (#9905) # Which issue does this PR close? - Part of #9906 - First follow on to #9872 # Rationale for this change While trying to avoid overflows due to using i32 arithmetic in FixedSizeBinaryArray, I found the use of the term `size` in parameters to be confusing when the field name is called `value_length` # What changes are included in this PR? Change several parameter / variable names to `value_length` to keep the code consistent # Are these changes tested? By CI # Are there any user-facing changes? No this is an internal code refactor

alamb · 2026-05-05T19:17:03Z

Here is a PR to fix this overflow error for real:

(re) Allow Large FixedSizeBinaryArrays #9906

@adamreeve

…#9872) - Closes apache#9850 `FixedSizeBinaryArray::value_offset_at` works use `i32` arithmetic which can overflow. For offsets beyond `i32::MAX`, that can be bad 1. Prevent any FixedSizedBinaryArrays from being constructed where the offset calculation could overflow 2. Add some other overflow checks As @adamreeve [pointed out](apache#9850 (comment)) on apache#9850 there are several places where the `i32` arithmetic is problematic in `FixedSizeBinaryArray`. I will fix them for real in a different, follow on PR, by switching to entirely `usize` based arithmetic for offset calculations However, since I hope to backport this PR to older releases, I would like something that is easy to review and has the least potential for unintended consequences. I added unit tests. However, I can't find any way to fully trigger the actual paths short of trying to allocate very large arrays, which I don't think is appropriate for unit tests. Better limit checking

@alamb

…9872) (#9928) - Part of #9858 - Fixes #9898 in 57.x releases This PR: - Backports #9872 from @alamb to the `57_maintenance` line - Supersedes the earlier closed attempt #9850 referenced by the issue

@alamb

…9872) (#9917) - Part of #9857 - Fixes #9898 in 56.x releases This PR: - Backports #9872 from @alamb to the `56_maintenance` line - Supersedes the earlier closed attempt #9850 referenced by the issue

# Which issue does this PR close? - Part of #9859 # Rationale for this change Even though we just did a release from 58, I want to get a release out that has these changes: - #9872 - #9813 # What changes are included in this PR? 1. Update version to 58.3.0 2. Update CHANGELOG. See Rendered preview here: https://github.com/alamb/arrow-rs/blob/alamb/prepare_58.3.0/CHANGELOG.md # Are these changes tested? By CI # Are there any user-facing changes? yes

…e overflow checks (#9910) # Which issue does this PR close? - Closes #9906. # Rationale for this change `FixedSizeBinaryArray` still stores its public fixed width as `i32`, which means internal address calculations rely on repeated conversions between `i32` and pointer-sized offsets. We recently had issue with some i32 based arithmetic overflowing (see #9898) To avoid inadvertently using i32 arithmetic, this PR proposes to change the internal representation of the FixedSizeBinaryArray to use `usize` and compute byte positions using `usize` ( pointer-sized arithmetic) directly, with checked conversions only at the public API boundaries that still require `i32`. I am quite pleased it is a net reduction in lines of code (admittedly most of that was the checks added in #9872 # What changes are included in this PR? - Store the fixed-width element size as `value_size: usize` inside `FixedSizeBinaryArray`. - Rewrite internal position calculations in accessors and slicing to use `usize` arithmetic. - Remove the old `validate_lengths` invariant that existed only to keep internal `i32` offset arithmetic in range. - Remove implicit `as` casts from the implementation and replace them with checked conversions or typed bindings. # Are these changes tested? These changes are covered by CI. # Are there any user-facing changes? No. --------- Co-authored-by: Adam Reeve <adreeve@gmail.com> Co-authored-by: Adam Reeve <adam.reeve@gr-oss.io>

@adamreeve

…#9872) # Which issue does this PR close? - Closes apache#9850 # Rationale for this change `FixedSizeBinaryArray::value_offset_at` works use `i32` arithmetic which can overflow. For offsets beyond `i32::MAX`, that can be bad # What changes are included in this PR? 1. Prevent any FixedSizedBinaryArrays from being constructed where the offset calculation could overflow 2. Add some other overflow checks As @adamreeve [pointed out](apache#9850 (comment)) on apache#9850 there are several places where the `i32` arithmetic is problematic in `FixedSizeBinaryArray`. I will fix them for real in a different, follow on PR, by switching to entirely `usize` based arithmetic for offset calculations However, since I hope to backport this PR to older releases, I would like something that is easy to review and has the least potential for unintended consequences. # Are these changes tested? I added unit tests. However, I can't find any way to fully trigger the actual paths short of trying to allocate very large arrays, which I don't think is appropriate for unit tests. # Are there any user-facing changes? Better limit checking

…ray (apache#9905) # Which issue does this PR close? - Part of apache#9906 - First follow on to apache#9872 # Rationale for this change While trying to avoid overflows due to using i32 arithmetic in FixedSizeBinaryArray, I found the use of the term `size` in parameters to be confusing when the field name is called `value_length` # What changes are included in this PR? Change several parameter / variable names to `value_length` to keep the code consistent # Are these changes tested? By CI # Are there any user-facing changes? No this is an internal code refactor

# Which issue does this PR close? - Part of apache#9859 # Rationale for this change Even though we just did a release from 58, I want to get a release out that has these changes: - apache#9872 - apache#9813 # What changes are included in this PR? 1. Update version to 58.3.0 2. Update CHANGELOG. See Rendered preview here: https://github.com/alamb/arrow-rs/blob/alamb/prepare_58.3.0/CHANGELOG.md # Are these changes tested? By CI # Are there any user-facing changes? yes

…e overflow checks (apache#9910) # Which issue does this PR close? - Closes apache#9906. # Rationale for this change `FixedSizeBinaryArray` still stores its public fixed width as `i32`, which means internal address calculations rely on repeated conversions between `i32` and pointer-sized offsets. We recently had issue with some i32 based arithmetic overflowing (see apache#9898) To avoid inadvertently using i32 arithmetic, this PR proposes to change the internal representation of the FixedSizeBinaryArray to use `usize` and compute byte positions using `usize` ( pointer-sized arithmetic) directly, with checked conversions only at the public API boundaries that still require `i32`. I am quite pleased it is a net reduction in lines of code (admittedly most of that was the checks added in apache#9872 # What changes are included in this PR? - Store the fixed-width element size as `value_size: usize` inside `FixedSizeBinaryArray`. - Rewrite internal position calculations in accessors and slicing to use `usize` arithmetic. - Remove the old `validate_lengths` invariant that existed only to keep internal `i32` offset arithmetic in range. - Remove implicit `as` casts from the implementation and replace them with checked conversions or typed bindings. # Are these changes tested? These changes are covered by CI. # Are there any user-facing changes? No. --------- Co-authored-by: Adam Reeve <adreeve@gmail.com> Co-authored-by: Adam Reeve <adam.reeve@gr-oss.io>

Prevent FixedSizeBinaryArray offset overflows

701436e

github-actions Bot added the arrow Changes to the arrow crate label May 1, 2026

Check other paths, add errors

1152dbc

alamb commented May 1, 2026

View reviewed changes

alamb added 2 commits May 1, 2026 16:34

Adjust bounds

b4ea0e8

A few more potential overflows

e47512a

alamb changed the title ~~Prevent FixedSizeBinaryArray offset overflows (try 2)~~ Prevent FixedSizeBinaryArray i32 offset overflows (try 2) May 1, 2026

alamb mentioned this pull request May 1, 2026

Prevent FixedSizeBinaryArray::value offset truncation #9850

Closed

[arrow-array] revert naming-only size changes

4d0c5e0

alamb mentioned this pull request May 1, 2026

[arrow-array] rename size parameters to value_length alamb/arrow-rs#55

Draft

alamb commented May 1, 2026

View reviewed changes

alamb mentioned this pull request May 1, 2026

[arrow-array] rename FixedSizeBinaryArary parameters for consistency #9873

Closed

alamb commented May 1, 2026

View reviewed changes

alamb marked this pull request as ready for review May 1, 2026 21:27

alamb requested a review from adamreeve May 1, 2026 21:27

alamb mentioned this pull request May 2, 2026

Release arrow-rs / parquet Minor/Patch version 58.3.0 or 58.2.1 (May 2026) #9859

Closed

5 tasks

adamreeve approved these changes May 4, 2026

View reviewed changes

alamb merged commit 75f7916 into apache:main May 4, 2026
26 checks passed

This was referenced May 5, 2026

[arrow-array] Integer overflow in FixedSizeBinaryArray::value leads to undefined behavior #9898

Closed

[56_maintenance] Prevent FixedSizeBinaryArray i32 offset overflows (#9872) #9917

Merged

alamb mentioned this pull request May 5, 2026

[arrow-array] use usize arithmetic in FixedSizeBinaryArray, aggressive overflow checks #9910

Merged

alamb mentioned this pull request May 5, 2026

[57_maintenance] Prevent FixedSizeBinaryArray i32 offset overflows (#9872) #9928

Merged

Conversation

alamb commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alamb May 1, 2026

Choose a reason for hiding this comment

Uh oh!

alamb May 1, 2026

Choose a reason for hiding this comment

Uh oh!

alamb May 1, 2026

Choose a reason for hiding this comment

Uh oh!

alamb May 1, 2026

Choose a reason for hiding this comment

Uh oh!

alamb May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb May 1, 2026

Choose a reason for hiding this comment

Uh oh!

adamreeve left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented May 4, 2026

Uh oh!

alamb commented May 5, 2026

Uh oh!

alamb commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alamb commented May 1, 2026 •

edited

Loading

alamb May 1, 2026 •

edited

Loading