-
-
Notifications
You must be signed in to change notification settings - Fork 48
Parquet Reader #576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
norberttech
merged 2 commits into
flow-php:1.x
from
norberttech:feature/parquet-reader-writer
Oct 13, 2023
Merged
Parquet Reader #576
norberttech
merged 2 commits into
flow-php:1.x
from
norberttech:feature/parquet-reader-writer
Oct 13, 2023
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This comment has been minimized.
This comment has been minimized.
2 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
b2d4455 to
7860195
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
stloyd
reviewed
Oct 13, 2023
src/adapter/etl-adapter-parquet/src/Flow/ETL/Adapter/Parquet/ParquetExtractor.php
Show resolved
Hide resolved
e559f6f to
9adcc1b
Compare
This comment has been minimized.
This comment has been minimized.
9adcc1b to
d9d883c
Compare
This comment has been minimized.
This comment has been minimized.
stloyd
reviewed
Oct 13, 2023
Added reading MAP logical types Added reading LIST type Fixed nullable lists handling First implementation of Dremel encoding Added logging to Dremel algorithm Reconstruction of nested data structures based on schema definition Performance optimization Attempt to read map with values as a lists Rebuilding structures Extracted rebuilding columns logic to ColumnBuilder class Reading nested structures Restored usage of flow array functions Added dremel/parquet to test suite Updated github workflows Avoid calculating remaining lenght/current position in BinaryReader on the fly Make DataSize value object mutable Move reading multiple values from Buffer into BufferReader Allow to read from stream Retrieve column chunks as generator Moved reading flat columns into generics Read parquet files struct columns through generators Fixed reading column chunks Reduced number of iterations over generators Keep stream offset to avoid generators overlapping Read all column chunks from a row group at once to avoid dealing with rows split between pages Added notes for performance optimizations Added PageByPage ChunkReader implementation Fixed reding bytes of array when it's not a string Adjusted schema ddl generation Allow to limit numbers of returned rows Fixed limit when there is more than one column chunk Adjusted composer.json files in all subrepos Added Parquet Reader options - handling INT96 as DateTime, reading byte arrays as strings, convert nanos to micros timestamps Marked codename parquet extractor as deprecated Added snappy extension detection Converted testsuite fixtures into gzip from snappy Fixed issues related to missing snapy_uncompress function Added python scripts used to generate test/fixtures data for reader Added resources folder into gitattributes as export-igonre Close stream on ParquetFile destructor Static analyze fixes Detached Thrift from Flow Parquet Schema in order to reuse objects by writer CR Fixes
d9d883c to
968b5e3
Compare
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change Log
Added
Fixed
Changed
Removed
Deprecated
Security
Description
Refs: #506