Skip to content

Prevent FixedSizeBinaryArray i32 offset overflows (try 2)#9872

Merged
alamb merged 5 commits into
apache:mainfrom
alamb:alamb/fsb_overflow_try3
May 4, 2026
Merged

Prevent FixedSizeBinaryArray i32 offset overflows (try 2)#9872
alamb merged 5 commits into
apache:mainfrom
alamb:alamb/fsb_overflow_try3

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented May 1, 2026

Which issue does this PR close?

Rationale for this change

FixedSizeBinaryArray::value_offset_at works use i32 arithmetic which can overflow. For offsets beyond i32::MAX, that can be bad

What changes are included in this PR?

  1. Prevent any FixedSizedBinaryArrays from being constructed where the offset calculation could overflow
  2. Add some other overflow checks

As @adamreeve pointed out on #9850 there are several places where the i32 arithmetic is problematic in FixedSizeBinaryArray. I will fix them for real in a different, follow on PR, by switching to entirely usize based arithmetic for offset calculations

However, since I hope to backport this PR to older releases, I would like something that is easy to review and has the least potential for unintended consequences.

Are these changes tested?

I added unit tests. However, I can't find any way to fully trigger the actual paths short of trying to allocate very large arrays, which I don't think is appropriate for unit tests.

Are there any user-facing changes?

Better limit checking

@github-actions github-actions Bot added the arrow Changes to the arrow crate label May 1, 2026
Comment thread arrow-array/src/array/fixed_size_binary_array.rs Outdated
Comment thread arrow-array/src/array/fixed_size_binary_array.rs Outdated
Comment thread arrow-array/src/array/fixed_size_binary_array.rs Outdated
Comment thread arrow-array/src/array/fixed_size_binary_array.rs Outdated
Comment thread arrow-array/src/array/fixed_size_binary_array.rs Outdated
T: Iterator<Item = Option<U>>,
U: AsRef<[u8]>,
{
let value_size = value_length.to_usize().ok_or_else(|| {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a new check that value_length can be converted to usize without truncation / wrap (aka that value_length is not negative)

})?;

let nulls = NullBuffer::from_unsliced_buffer(null_buf, len);
Self::validate_lengths(value_size, len)?;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new call to validate_lengths


let size = size.unwrap_or(0).try_into().unwrap();
let value_size = size.unwrap_or(0);
Self::validate_lengths(value_size, len)?;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_lengths here

_ => panic!("Expected data type to be FixedSizeBinary"),
};

let size = value_length as usize;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is another unchecked conversion from value_lengh to usize

@alamb alamb changed the title Prevent FixedSizeBinaryArray offset overflows (try 2) Prevent FixedSizeBinaryArray i32 offset overflows (try 2) May 1, 2026
}
};

Self::validate_lengths(s, len)?;
Copy link
Copy Markdown
Contributor Author

@alamb alamb May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the key check -- it prevents creating arrays that are susceptible to this overflow

I added it on each constructor path

// the data.
buffer.reserve(iter_size_hint * len);
buffer.extend_zeros(slice.len() * prepend);
if let Some(capacity) = iter_size_hint.checked_mul(len) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also updated some of the other arithmetic to used checked multiplication

@alamb alamb marked this pull request as ready for review May 1, 2026 21:27
@alamb alamb requested a review from adamreeve May 1, 2026 21:27
Copy link
Copy Markdown
Contributor

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me thanks Andrew

@alamb alamb merged commit 75f7916 into apache:main May 4, 2026
26 checks passed
@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented May 4, 2026

Thank you for the review @adamreeve

@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented May 5, 2026

I filed this ticket to track the follow on work to lift this restriction

alamb added a commit that referenced this pull request May 5, 2026
…ray (#9905)

# Which issue does this PR close?
- Part of #9906
- First follow on to #9872

# Rationale for this change

While trying to avoid overflows due to using i32 arithmetic in
FixedSizeBinaryArray, I found the use of the term `size` in parameters
to be confusing when the field name is called `value_length`

# What changes are included in this PR?

Change several parameter / variable names to `value_length` to keep the
code consistent

# Are these changes tested?

By CI

# Are there any user-facing changes?

No this is an internal code refactor
@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented May 5, 2026

Here is a PR to fix this overflow error for real:

alamb added a commit to alamb/arrow-rs that referenced this pull request May 5, 2026
…#9872)

- Closes apache#9850

`FixedSizeBinaryArray::value_offset_at` works use `i32` arithmetic which
can overflow. For offsets beyond `i32::MAX`, that can be bad

1. Prevent any FixedSizedBinaryArrays from being constructed where the
offset calculation could overflow
2. Add some other overflow checks

As @adamreeve [pointed
out](apache#9850 (comment))
on apache#9850 there are several places
where the `i32` arithmetic is problematic in `FixedSizeBinaryArray`. I
will fix them for real in a different, follow on PR, by switching to
entirely `usize` based arithmetic for offset calculations

However, since I hope to backport this PR to older releases, I would
like something that is easy to review and has the least potential for
unintended consequences.

I added unit tests. However, I can't find any way to fully trigger the
actual paths short of trying to allocate very large arrays, which I
don't think is appropriate for unit tests.

Better limit checking
alamb added a commit that referenced this pull request May 6, 2026
…9872) (#9928)

- Part of #9858
- Fixes #9898 in 57.x releases

This PR:
- Backports #9872 from @alamb to
the `57_maintenance` line
- Supersedes the earlier closed attempt
#9850 referenced by the issue
alamb added a commit that referenced this pull request May 6, 2026
…9872) (#9917)

- Part of #9857
- Fixes #9898 in 56.x releases

This PR:
- Backports #9872 from @alamb to
the `56_maintenance` line
- Supersedes the earlier closed attempt
#9850 referenced by the issue
alamb added a commit that referenced this pull request May 7, 2026
# Which issue does this PR close?

- Part of #9859

# Rationale for this change

Even though we just did a release from 58, I want to get a release out
that has these changes:
- #9872
- #9813

# What changes are included in this PR?

1. Update version to 58.3.0
2. Update CHANGELOG. See Rendered preview here:
https://github.com/alamb/arrow-rs/blob/alamb/prepare_58.3.0/CHANGELOG.md

# Are these changes tested?

By CI
# Are there any user-facing changes?

yes
alamb added a commit that referenced this pull request May 14, 2026
…e overflow checks (#9910)

# Which issue does this PR close?

- Closes #9906.

# Rationale for this change

`FixedSizeBinaryArray` still stores its public fixed width as `i32`,
which means internal address calculations rely on repeated conversions
between `i32` and pointer-sized offsets. We recently had issue with some
i32 based arithmetic overflowing (see
#9898)

To avoid inadvertently using i32 arithmetic, this PR proposes to change
the internal representation of the FixedSizeBinaryArray to use `usize`
and compute byte positions using `usize` ( pointer-sized arithmetic)
directly, with checked conversions only at the public API boundaries
that still require `i32`.

I am quite pleased it is a net reduction in lines of code (admittedly
most of that was the checks added in
#9872

# What changes are included in this PR?

- Store the fixed-width element size as `value_size: usize` inside
`FixedSizeBinaryArray`.
- Rewrite internal position calculations in accessors and slicing to use
`usize` arithmetic.
- Remove the old `validate_lengths` invariant that existed only to keep
internal `i32` offset arithmetic in range.
- Remove implicit `as` casts from the implementation and replace them
with checked conversions or typed bindings.

# Are these changes tested?

These changes are covered by CI.

# Are there any user-facing changes?

No.

---------

Co-authored-by: Adam Reeve <adreeve@gmail.com>
Co-authored-by: Adam Reeve <adam.reeve@gr-oss.io>
Rich-T-kid pushed a commit to Rich-T-kid/arrow-rs that referenced this pull request Jun 2, 2026
…#9872)

# Which issue does this PR close?

- Closes apache#9850

# Rationale for this change

`FixedSizeBinaryArray::value_offset_at` works use `i32` arithmetic which
can overflow. For offsets beyond `i32::MAX`, that can be bad

# What changes are included in this PR?

1. Prevent any FixedSizedBinaryArrays from being constructed where the
offset calculation could overflow
2. Add some other overflow checks

As @adamreeve [pointed
out](apache#9850 (comment))
on apache#9850 there are several places
where the `i32` arithmetic is problematic in `FixedSizeBinaryArray`. I
will fix them for real in a different, follow on PR, by switching to
entirely `usize` based arithmetic for offset calculations

However, since I hope to backport this PR to older releases, I would
like something that is easy to review and has the least potential for
unintended consequences.

# Are these changes tested?

I added unit tests. However, I can't find any way to fully trigger the
actual paths short of trying to allocate very large arrays, which I
don't think is appropriate for unit tests.

# Are there any user-facing changes?
Better limit checking
Rich-T-kid pushed a commit to Rich-T-kid/arrow-rs that referenced this pull request Jun 2, 2026
…ray (apache#9905)

# Which issue does this PR close?
- Part of apache#9906
- First follow on to apache#9872

# Rationale for this change

While trying to avoid overflows due to using i32 arithmetic in
FixedSizeBinaryArray, I found the use of the term `size` in parameters
to be confusing when the field name is called `value_length`

# What changes are included in this PR?

Change several parameter / variable names to `value_length` to keep the
code consistent

# Are these changes tested?

By CI

# Are there any user-facing changes?

No this is an internal code refactor
Rich-T-kid pushed a commit to Rich-T-kid/arrow-rs that referenced this pull request Jun 2, 2026
# Which issue does this PR close?

- Part of apache#9859

# Rationale for this change

Even though we just did a release from 58, I want to get a release out
that has these changes:
- apache#9872
- apache#9813

# What changes are included in this PR?

1. Update version to 58.3.0
2. Update CHANGELOG. See Rendered preview here:
https://github.com/alamb/arrow-rs/blob/alamb/prepare_58.3.0/CHANGELOG.md

# Are these changes tested?

By CI
# Are there any user-facing changes?

yes
Rich-T-kid pushed a commit to Rich-T-kid/arrow-rs that referenced this pull request Jun 2, 2026
…e overflow checks (apache#9910)

# Which issue does this PR close?

- Closes apache#9906.

# Rationale for this change

`FixedSizeBinaryArray` still stores its public fixed width as `i32`,
which means internal address calculations rely on repeated conversions
between `i32` and pointer-sized offsets. We recently had issue with some
i32 based arithmetic overflowing (see
apache#9898)

To avoid inadvertently using i32 arithmetic, this PR proposes to change
the internal representation of the FixedSizeBinaryArray to use `usize`
and compute byte positions using `usize` ( pointer-sized arithmetic)
directly, with checked conversions only at the public API boundaries
that still require `i32`.

I am quite pleased it is a net reduction in lines of code (admittedly
most of that was the checks added in
apache#9872

# What changes are included in this PR?

- Store the fixed-width element size as `value_size: usize` inside
`FixedSizeBinaryArray`.
- Rewrite internal position calculations in accessors and slicing to use
`usize` arithmetic.
- Remove the old `validate_lengths` invariant that existed only to keep
internal `i32` offset arithmetic in range.
- Remove implicit `as` casts from the implementation and replace them
with checked conversions or typed bindings.

# Are these changes tested?

These changes are covered by CI.

# Are there any user-facing changes?

No.

---------

Co-authored-by: Adam Reeve <adreeve@gmail.com>
Co-authored-by: Adam Reeve <adam.reeve@gr-oss.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants