Prevent `FixedSizeBinaryArray::value` offset truncation#9850

alamb · 2026-04-29T17:06:19Z

Which issue does this PR close?

None.

Rationale for this change

FixedSizeBinaryArray::value_offset_at cast the requested index to i32 before multiplying by the element width. For indexes beyond i32::MAX, that truncation could produce a negative byte offset and cause value() to read before the start of the value buffer.

What changes are included in this PR?

Check for offset overflow
Adds regression tests

Note I also added some more docs for FixedSizeBinaryArray that may help reviewers

Are these changes tested?

I can't find any way to test this this issue without actually allocating a large array (over 2GB)

Are there any user-facing changes?

Better limit checking

alamb · 2026-04-29T17:43:44Z

+    /// checking for overflow.
    #[inline]
    fn value_offset_at(&self, i: usize) -> i32 {
        self.value_length * i as i32


this arithmetic can overflow if the result is larger than i32::MAX

adamreeve · 2026-04-30T00:24:34Z

    /// Caller is responsible for ensuring that the index is within the bounds
-    /// of the array
+    /// of the array and the resulting byte offset fits in `i32`
    pub unsafe fn value_unchecked(&self, i: usize) -> &[u8] {


Would it make sense to add new methods that are alternatives to value_offset and value_offset_at that return usize, so we don't need this limitation? Or at least update this method so it doesn't use them and doesn't suffer from i32 overflow. Because with this change, value now handles when value_offset is greater than max i32, but value_unchecked still doesn't.

I would expect value_unchecked to work correctly for all cases where value doesn't panic.

And existing code that uses value_unchecked might validate the index but not be aware of this hidden safety requirement.

Short answer is yes. I also spent some more time reviewing the code in FixedSizeBinaryArray and I am now convinced there are several other miuses of i32 <-> usize . I am working on an improvement, though I worry it will be a larger PR

See Prevent FixedSizeBinaryArray i32 offset overflows (try 2) #9872

alamb · 2026-04-30T19:49:17Z

I also added some more docs for FixedSizeBinaryArray that may help reviewers

Add more documentation for FixedSizeBinary arrays #9866

alamb · 2026-04-30T20:12:27Z

I have played around with several options for improving this code. I think there are several potential i32 math overflow issues, but fixing them all in a single PR is going to be somewhat hard to review and take some time

What I am thinking about is adding a new invariant to the FixedSizeArray constructor that prevents constructing arrays with value buffers larger than 2GB as a temporary workaround in one PR. Then I can fixup the actual arithmetic in a second:

# Which issue does this PR close? - related to #9850 # Rationale for this change While working on #9850 I noticed that FixedSizeBinaryArray::offset always returns zero. This was confusing to me and so I wanted to document it for future readrs # What changes are included in this PR? Add some comments # Are these changes tested? By CI # Are there any user-facing changes? No, this is internal comments

alamb · 2026-05-01T20:52:45Z

I have played around with several options for improving this code. I think there are several potential i32 math overflow issues, but fixing them all in a single PR is going to be somewhat hard to review and take some time

What I am thinking about is adding a new invariant to the FixedSizeArray constructor that prevents constructing arrays with value buffers larger than 2GB as a temporary workaround in one PR. Then I can fixup the actual arithmetic in a second:

I took this approach in the following PR:

Prevent FixedSizeBinaryArray i32 offset overflows (try 2) #9872

I will create a follow on PR from that one that changes the internals of FixedSizeBinary to use usize arithmetic everywhere, but that will be a more invasive change

# Which issue does this PR close? - related to https://github.com/apache/arrow-rs/pull/9850/changes#r3170266352 # Rationale for this change While working on #9850 I felt it would help to have some better docs about what a FixedSizeBinaryArray actually was, so I made some # What changes are included in this PR? Add some more background / explanatory docs about this array # Are these changes tested? By CI # Are there any user-facing changes? Docs only

@adamreeve

# Which issue does this PR close? - Closes #9850 # Rationale for this change `FixedSizeBinaryArray::value_offset_at` works use `i32` arithmetic which can overflow. For offsets beyond `i32::MAX`, that can be bad # What changes are included in this PR? 1. Prevent any FixedSizedBinaryArrays from being constructed where the offset calculation could overflow 2. Add some other overflow checks As @adamreeve [pointed out](#9850 (comment)) on #9850 there are several places where the `i32` arithmetic is problematic in `FixedSizeBinaryArray`. I will fix them for real in a different, follow on PR, by switching to entirely `usize` based arithmetic for offset calculations However, since I hope to backport this PR to older releases, I would like something that is easy to review and has the least potential for unintended consequences. # Are these changes tested? I added unit tests. However, I can't find any way to fully trigger the actual paths short of trying to allocate very large arrays, which I don't think is appropriate for unit tests. # Are there any user-facing changes? Better limit checking

@adamreeve

…#9872) - Closes apache#9850 `FixedSizeBinaryArray::value_offset_at` works use `i32` arithmetic which can overflow. For offsets beyond `i32::MAX`, that can be bad 1. Prevent any FixedSizedBinaryArrays from being constructed where the offset calculation could overflow 2. Add some other overflow checks As @adamreeve [pointed out](apache#9850 (comment)) on apache#9850 there are several places where the `i32` arithmetic is problematic in `FixedSizeBinaryArray`. I will fix them for real in a different, follow on PR, by switching to entirely `usize` based arithmetic for offset calculations However, since I hope to backport this PR to older releases, I would like something that is easy to review and has the least potential for unintended consequences. I added unit tests. However, I can't find any way to fully trigger the actual paths short of trying to allocate very large arrays, which I don't think is appropriate for unit tests. Better limit checking

@alamb

…9872) (#9928) - Part of #9858 - Fixes #9898 in 57.x releases This PR: - Backports #9872 from @alamb to the `57_maintenance` line - Supersedes the earlier closed attempt #9850 referenced by the issue

@alamb

…9872) (#9917) - Part of #9857 - Fixes #9898 in 56.x releases This PR: - Backports #9872 from @alamb to the `56_maintenance` line - Supersedes the earlier closed attempt #9850 referenced by the issue

# Which issue does this PR close? - related to apache#9850 # Rationale for this change While working on apache#9850 I noticed that FixedSizeBinaryArray::offset always returns zero. This was confusing to me and so I wanted to document it for future readrs # What changes are included in this PR? Add some comments # Are these changes tested? By CI # Are there any user-facing changes? No, this is internal comments

# Which issue does this PR close? - related to https://github.com/apache/arrow-rs/pull/9850/changes#r3170266352 # Rationale for this change While working on apache#9850 I felt it would help to have some better docs about what a FixedSizeBinaryArray actually was, so I made some # What changes are included in this PR? Add some more background / explanatory docs about this array # Are these changes tested? By CI # Are there any user-facing changes? Docs only

@adamreeve

…#9872) # Which issue does this PR close? - Closes apache#9850 # Rationale for this change `FixedSizeBinaryArray::value_offset_at` works use `i32` arithmetic which can overflow. For offsets beyond `i32::MAX`, that can be bad # What changes are included in this PR? 1. Prevent any FixedSizedBinaryArrays from being constructed where the offset calculation could overflow 2. Add some other overflow checks As @adamreeve [pointed out](apache#9850 (comment)) on apache#9850 there are several places where the `i32` arithmetic is problematic in `FixedSizeBinaryArray`. I will fix them for real in a different, follow on PR, by switching to entirely `usize` based arithmetic for offset calculations However, since I hope to backport this PR to older releases, I would like something that is easy to review and has the least potential for unintended consequences. # Are these changes tested? I added unit tests. However, I can't find any way to fully trigger the actual paths short of trying to allocate very large arrays, which I don't think is appropriate for unit tests. # Are there any user-facing changes? Better limit checking

github-actions Bot added the arrow Changes to the arrow crate label Apr 29, 2026

[arrow-array]: prevent FixedSizeBinaryArray offset truncation

7faf57d

alamb force-pushed the codex/fixed-size-binary-offset-overflow branch from e77819a to 7faf57d Compare April 29, 2026 17:42

alamb commented Apr 29, 2026

View reviewed changes

alamb marked this pull request as ready for review April 29, 2026 18:22

adamreeve reviewed Apr 30, 2026

View reviewed changes

alamb mentioned this pull request Apr 30, 2026

Alamb/fixed size binary overfkow try 2 #9867

Closed

alamb mentioned this pull request May 1, 2026

Prevent FixedSizeBinaryArray i32 offset overflows (try 2) #9872

Merged

alamb marked this pull request as draft May 1, 2026 20:51

alamb closed this in #9872 May 4, 2026

ksj1230 mentioned this pull request May 5, 2026

[arrow-array] Integer overflow in FixedSizeBinaryArray::value leads to undefined behavior #9898

Closed

This was referenced May 5, 2026

[56_maintenance] Prevent FixedSizeBinaryArray i32 offset overflows (#9872) #9917

Merged

[57_maintenance] Prevent FixedSizeBinaryArray i32 offset overflows (#9872) #9928

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent `FixedSizeBinaryArray::value` offset truncation#9850