Fixed bug in RLE/bitpacking hybrid algorithm#640
Merged
norberttech merged 2 commits intoflow-php:1.xfrom Oct 24, 2023
Merged
Conversation
norberttech
commented
Oct 24, 2023
| $columnChunkContainers = []; | ||
| $previousChunkData = null; | ||
|
|
||
| foreach (\array_chunk($this->data, 1000) as $dataChunk) { |
Member
Author
There was a problem hiding this comment.
this is hardcoded for now, but the idea is to take a chunk of data, build a data page, and check if it's bigger than 8Kb (if not, drop it, merge data from the next page, and try over again).
Of course size of data page and that chunk size will come from the configuration with default values:
- data page size = 8Kb
- data page probe rows count = 1_000
norberttech
commented
Oct 24, 2023
|
|
||
| foreach ($floatBytes as $bytes) { | ||
| $floats[] = \unpack($this->byteOrder === ByteOrder::LITTLE_ENDIAN ? 'g' : 'G', \pack('C*', ...$bytes))[1]; | ||
| $floats[] = \round(\unpack($this->byteOrder === ByteOrder::LITTLE_ENDIAN ? 'g' : 'G', \pack('C*', ...$bytes))[1], 7); |
Member
Author
There was a problem hiding this comment.
yeah, php is a bit retarded with floats, I can't find a way to write/read floats without losing a precision
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Log
Added
Fixed
Changed
Removed
Deprecated
Security
Description
Ref: #575While progressing on the writer, I'm discovering some bugs, this PR started as a approach to split rows into multiple data pages when the size of a single one becomes too large according to parquet recommendations