Add get_byte_ranges method to AsyncFileReader trait#2115
Add get_byte_ranges method to AsyncFileReader trait#2115tustvold merged 4 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2115 +/- ##
==========================================
- Coverage 83.76% 83.74% -0.03%
==========================================
Files 225 225
Lines 59457 59473 +16
==========================================
- Hits 49806 49805 -1
- Misses 9651 9668 +17
|
parquet/src/arrow/async_reader.rs
Outdated
| .get_byte_ranges(fetch_ranges) | ||
| .await? | ||
| .into_iter() | ||
| .enumerate() |
There was a problem hiding this comment.
.zip(update_chunks.iter_mut()) might be cleaner?
parquet/src/arrow/async_reader.rs
Outdated
| let mut fetch_ranges = | ||
| Vec::with_capacity(column_chunks.len()); | ||
|
|
||
| let mut update_chunks: Vec<( |
There was a problem hiding this comment.
My gut says that it would be cleaner to just iterate through the column_chunks and use filter_map to extract the ranges, pass this to AsyncFileReader. Convert the result to an iterator and then iterate the column_chunks again, popping the next element from the iterator for each included column.
Not a big deal though
There was a problem hiding this comment.
yeah, I think this was cleaner
|
Benchmark runs are scheduled for baseline = 3096591 and contender = be0d34d. be0d34d is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2110
Rationale for this change
In certain cases it is better from a performance perspective to fetch data in parallel such as reading from object storage. This adds a hook into the
AsyncFileReadertrait to allow upstream consumers of this API to do that.What changes are included in this PR?
Add
get_byte_ranges(&mut self, ranges: Vec<Range<usize>>)method toAsyncFileReadertrait with a default implementation that will fallback to callingget_bytesserially for the provided ranges.Are there any user-facing changes?
No