Prevent FixedSizeBinaryArray i32 offset overflows (try 2)#9872
Conversation
| T: Iterator<Item = Option<U>>, | ||
| U: AsRef<[u8]>, | ||
| { | ||
| let value_size = value_length.to_usize().ok_or_else(|| { |
There was a problem hiding this comment.
this is a new check that value_length can be converted to usize without truncation / wrap (aka that value_length is not negative)
| })?; | ||
|
|
||
| let nulls = NullBuffer::from_unsliced_buffer(null_buf, len); | ||
| Self::validate_lengths(value_size, len)?; |
There was a problem hiding this comment.
new call to validate_lengths
|
|
||
| let size = size.unwrap_or(0).try_into().unwrap(); | ||
| let value_size = size.unwrap_or(0); | ||
| Self::validate_lengths(value_size, len)?; |
| _ => panic!("Expected data type to be FixedSizeBinary"), | ||
| }; | ||
|
|
||
| let size = value_length as usize; |
There was a problem hiding this comment.
here is another unchecked conversion from value_lengh to usize
FixedSizeBinaryArray offset overflows (try 2)FixedSizeBinaryArray i32 offset overflows (try 2)
| } | ||
| }; | ||
|
|
||
| Self::validate_lengths(s, len)?; |
There was a problem hiding this comment.
this is the key check -- it prevents creating arrays that are susceptible to this overflow
I added it on each constructor path
| // the data. | ||
| buffer.reserve(iter_size_hint * len); | ||
| buffer.extend_zeros(slice.len() * prepend); | ||
| if let Some(capacity) = iter_size_hint.checked_mul(len) { |
There was a problem hiding this comment.
I also updated some of the other arithmetic to used checked multiplication
adamreeve
left a comment
There was a problem hiding this comment.
This looks good to me thanks Andrew
|
Thank you for the review @adamreeve |
|
I filed this ticket to track the follow on work to lift this restriction |
…ray (#9905) # Which issue does this PR close? - Part of #9906 - First follow on to #9872 # Rationale for this change While trying to avoid overflows due to using i32 arithmetic in FixedSizeBinaryArray, I found the use of the term `size` in parameters to be confusing when the field name is called `value_length` # What changes are included in this PR? Change several parameter / variable names to `value_length` to keep the code consistent # Are these changes tested? By CI # Are there any user-facing changes? No this is an internal code refactor
|
Here is a PR to fix this overflow error for real: |
…#9872) - Closes apache#9850 `FixedSizeBinaryArray::value_offset_at` works use `i32` arithmetic which can overflow. For offsets beyond `i32::MAX`, that can be bad 1. Prevent any FixedSizedBinaryArrays from being constructed where the offset calculation could overflow 2. Add some other overflow checks As @adamreeve [pointed out](apache#9850 (comment)) on apache#9850 there are several places where the `i32` arithmetic is problematic in `FixedSizeBinaryArray`. I will fix them for real in a different, follow on PR, by switching to entirely `usize` based arithmetic for offset calculations However, since I hope to backport this PR to older releases, I would like something that is easy to review and has the least potential for unintended consequences. I added unit tests. However, I can't find any way to fully trigger the actual paths short of trying to allocate very large arrays, which I don't think is appropriate for unit tests. Better limit checking
# Which issue does this PR close? - Part of #9859 # Rationale for this change Even though we just did a release from 58, I want to get a release out that has these changes: - #9872 - #9813 # What changes are included in this PR? 1. Update version to 58.3.0 2. Update CHANGELOG. See Rendered preview here: https://github.com/alamb/arrow-rs/blob/alamb/prepare_58.3.0/CHANGELOG.md # Are these changes tested? By CI # Are there any user-facing changes? yes
…e overflow checks (#9910) # Which issue does this PR close? - Closes #9906. # Rationale for this change `FixedSizeBinaryArray` still stores its public fixed width as `i32`, which means internal address calculations rely on repeated conversions between `i32` and pointer-sized offsets. We recently had issue with some i32 based arithmetic overflowing (see #9898) To avoid inadvertently using i32 arithmetic, this PR proposes to change the internal representation of the FixedSizeBinaryArray to use `usize` and compute byte positions using `usize` ( pointer-sized arithmetic) directly, with checked conversions only at the public API boundaries that still require `i32`. I am quite pleased it is a net reduction in lines of code (admittedly most of that was the checks added in #9872 # What changes are included in this PR? - Store the fixed-width element size as `value_size: usize` inside `FixedSizeBinaryArray`. - Rewrite internal position calculations in accessors and slicing to use `usize` arithmetic. - Remove the old `validate_lengths` invariant that existed only to keep internal `i32` offset arithmetic in range. - Remove implicit `as` casts from the implementation and replace them with checked conversions or typed bindings. # Are these changes tested? These changes are covered by CI. # Are there any user-facing changes? No. --------- Co-authored-by: Adam Reeve <adreeve@gmail.com> Co-authored-by: Adam Reeve <adam.reeve@gr-oss.io>
…#9872) # Which issue does this PR close? - Closes apache#9850 # Rationale for this change `FixedSizeBinaryArray::value_offset_at` works use `i32` arithmetic which can overflow. For offsets beyond `i32::MAX`, that can be bad # What changes are included in this PR? 1. Prevent any FixedSizedBinaryArrays from being constructed where the offset calculation could overflow 2. Add some other overflow checks As @adamreeve [pointed out](apache#9850 (comment)) on apache#9850 there are several places where the `i32` arithmetic is problematic in `FixedSizeBinaryArray`. I will fix them for real in a different, follow on PR, by switching to entirely `usize` based arithmetic for offset calculations However, since I hope to backport this PR to older releases, I would like something that is easy to review and has the least potential for unintended consequences. # Are these changes tested? I added unit tests. However, I can't find any way to fully trigger the actual paths short of trying to allocate very large arrays, which I don't think is appropriate for unit tests. # Are there any user-facing changes? Better limit checking
…ray (apache#9905) # Which issue does this PR close? - Part of apache#9906 - First follow on to apache#9872 # Rationale for this change While trying to avoid overflows due to using i32 arithmetic in FixedSizeBinaryArray, I found the use of the term `size` in parameters to be confusing when the field name is called `value_length` # What changes are included in this PR? Change several parameter / variable names to `value_length` to keep the code consistent # Are these changes tested? By CI # Are there any user-facing changes? No this is an internal code refactor
# Which issue does this PR close? - Part of apache#9859 # Rationale for this change Even though we just did a release from 58, I want to get a release out that has these changes: - apache#9872 - apache#9813 # What changes are included in this PR? 1. Update version to 58.3.0 2. Update CHANGELOG. See Rendered preview here: https://github.com/alamb/arrow-rs/blob/alamb/prepare_58.3.0/CHANGELOG.md # Are these changes tested? By CI # Are there any user-facing changes? yes
…e overflow checks (apache#9910) # Which issue does this PR close? - Closes apache#9906. # Rationale for this change `FixedSizeBinaryArray` still stores its public fixed width as `i32`, which means internal address calculations rely on repeated conversions between `i32` and pointer-sized offsets. We recently had issue with some i32 based arithmetic overflowing (see apache#9898) To avoid inadvertently using i32 arithmetic, this PR proposes to change the internal representation of the FixedSizeBinaryArray to use `usize` and compute byte positions using `usize` ( pointer-sized arithmetic) directly, with checked conversions only at the public API boundaries that still require `i32`. I am quite pleased it is a net reduction in lines of code (admittedly most of that was the checks added in apache#9872 # What changes are included in this PR? - Store the fixed-width element size as `value_size: usize` inside `FixedSizeBinaryArray`. - Rewrite internal position calculations in accessors and slicing to use `usize` arithmetic. - Remove the old `validate_lengths` invariant that existed only to keep internal `i32` offset arithmetic in range. - Remove implicit `as` casts from the implementation and replace them with checked conversions or typed bindings. # Are these changes tested? These changes are covered by CI. # Are there any user-facing changes? No. --------- Co-authored-by: Adam Reeve <adreeve@gmail.com> Co-authored-by: Adam Reeve <adam.reeve@gr-oss.io>
Which issue does this PR close?
FixedSizeBinaryArray::valueoffset truncation #9850Rationale for this change
FixedSizeBinaryArray::value_offset_atworks usei32arithmetic which can overflow. For offsets beyondi32::MAX, that can be badWhat changes are included in this PR?
As @adamreeve pointed out on #9850 there are several places where the
i32arithmetic is problematic inFixedSizeBinaryArray. I will fix them for real in a different, follow on PR, by switching to entirelyusizebased arithmetic for offset calculationsHowever, since I hope to backport this PR to older releases, I would like something that is easy to review and has the least potential for unintended consequences.
Are these changes tested?
I added unit tests. However, I can't find any way to fully trigger the actual paths short of trying to allocate very large arrays, which I don't think is appropriate for unit tests.
Are there any user-facing changes?
Better limit checking