fix(parquet): Avoid panic on malformed thrift bool fields in parquet metadata#9840
Conversation
Return a Parquet error when a compact thrift field is expected to be a bool but the field header contains a non-bool wire type. This replaces the remaining bool_val unwraps in generated thrift readers and the hand-expanded DataPageHeaderV2 reader. Add regression tests for malformed DictionaryPageHeader.is_sorted and DataPageHeaderV2.is_compressed headers to ensure corrupt input returns an error instead of panicking.
|
run benchmark metadata |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing fix-parquet-thrift-field-types-error-handling (d134340) to 4fa8d2f (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
etseidl
left a comment
There was a problem hiding this comment.
Looks like an improvement to me. Thanks @BoazC-MSFT
|
I plan to merge this after the 58.2.0 RC is tagged. Thanks again @BoazC-MSFT. |
…metadata (apache#9840) Return a Parquet error when a compact thrift field is expected to be a `bool` but the field header contains a non-bool wire type. This replaces the remaining `bool_val` unwraps in generated thrift readers and the hand-expanded `DataPageHeaderV2` reader. Add regression tests for malformed `DictionaryPageHeader.is_sorted` and `DataPageHeaderV2.is_compressed` headers to ensure corrupt input returns an error instead of panicking. # Which issue does this PR close? - Closes apache#9839. # Rationale for this change See apache#9839 # What changes are included in this PR? Proper error handling instead of using `unwrap`. # Are these changes tested? Yes. Added 2 UTs. # Are there any user-facing changes? An error string that can find its way all the way to the end user.
Return a Parquet error when a compact thrift field is expected to be a
boolbut the field header contains a non-bool wire type. This replaces the remainingbool_valunwraps in generated thrift readers and the hand-expandedDataPageHeaderV2reader.Add regression tests for malformed
DictionaryPageHeader.is_sortedandDataPageHeaderV2.is_compressedheaders to ensure corrupt input returns an error instead of panicking.Which issue does this PR close?
Rationale for this change
See #9839
What changes are included in this PR?
Proper error handling instead of using
unwrap.Are these changes tested?
Yes. Added 2 UTs.
Are there any user-facing changes?
An error string that can find its way all the way to the end user.