Skip to content

Docs: add example of how to read parquet row groups in parallel#9396

Merged
alamb merged 5 commits into
apache:mainfrom
alamb:alamb/parallel_parquet_reading
Apr 22, 2026
Merged

Docs: add example of how to read parquet row groups in parallel#9396
alamb merged 5 commits into
apache:mainfrom
alamb:alamb/parallel_parquet_reading

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented Feb 11, 2026

Which issue does this PR close?

Rationale for this change

It is possible to read a parquet file in parallel today using the arrow-rs APIs (making an individual reader to read individual parts), however, it is not always obvious how to do so as @pmarks observes on #9381

What changes are included in this PR?

Add additional documentation explaining how to read files in parallel, along with a doc example

Here is an example of what it looks like rendered:

Screenshot 2026-02-11 at 5 33 02 PM

Are these changes tested?

By CI

Are there any user-facing changes?

more docs; No functional changes

@alamb alamb added the documentation Improvements or additions to documentation label Feb 11, 2026
@github-actions github-actions Bot added the parquet Changes to the parquet crate label Feb 11, 2026
@alamb alamb mentioned this pull request Feb 11, 2026
@alamb alamb marked this pull request as ready for review February 11, 2026 22:33
Comment thread parquet/src/arrow/async_reader/mod.rs Outdated
/// Each [`ParquetRecordBatchStream`] is independent and can be used to read
/// from the same underlying source in parallel. Use
/// [`ParquetRecordBatchStream::next_row_group`] with a single stream to
/// begin prefetching the next Row Group. To read a read in parallel, create
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read a file

Comment thread parquet/src/arrow/async_reader/mod.rs Outdated
Copy link
Copy Markdown
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice addition. Thanks @alamb

Comment thread parquet/src/arrow/async_reader/mod.rs Outdated
/// )?;
/// let mut streams = vec![];
/// for row_group_index in 0..10 {
/// // each stream needs its own source instance to issue parallel IO requests, so we clone the file for each stream
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could break this line, it gets pretty wide when rendered

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eased the eyes in b9e9de8 hopefully

@alamb alamb changed the title Docs: add exmaple of how to read parquet row groups in parallel Docs: add example of how to read parquet row groups in parallel Apr 21, 2026
@alamb alamb merged commit 98f8450 into apache:main Apr 22, 2026
16 checks passed
@alamb alamb deleted the alamb/parallel_parquet_reading branch April 22, 2026 04:21
@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented Apr 22, 2026

Thanks again @etseidl

Rich-T-kid pushed a commit to Rich-T-kid/arrow-rs that referenced this pull request Jun 2, 2026
…he#9396)

# Which issue does this PR close?

- Closes apache#9381

# Rationale for this change

It is possible to read a parquet file in parallel today using the
arrow-rs APIs (making an individual reader to read individual parts),
however, it is not always obvious how to do so as @pmarks observes on
apache#9381

# What changes are included in this PR?

Add additional documentation explaining how to read files in parallel,
along with a doc example

Here is an example of what it looks like rendered: 

<img width="1054" height="1050" alt="Screenshot 2026-02-11 at 5 33
02 PM"
src="https://github.com/user-attachments/assets/abfbbb8f-24de-427e-97dd-71977540255f"
/>


# Are these changes tested?

By CI
# Are there any user-facing changes?
more docs; No functional changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallel Parquet Reading

3 participants