Skip to content

branch-4.0: [enhancement](parquet)Optimize the performance of parquet reader when decode RLE_DICTIONARY encoding #57208#57563

Merged
yiguolei merged 1 commit intobranch-4.0from
auto-pick-57208-branch-4.0
Nov 10, 2025
Merged

branch-4.0: [enhancement](parquet)Optimize the performance of parquet reader when decode RLE_DICTIONARY encoding #57208#57563
yiguolei merged 1 commit intobranch-4.0from
auto-pick-57208-branch-4.0

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #57208

… decode RLE_DICTIONARY encoding (#57208)

### What problem does this PR solve?
Problem Summary:
When parsing RLE_DICTIONARY encoding, the parquet reader uniformly uses
memcpy. However, for INT32, INT64, etc., direct assignment is faster
than memcpy.

In Parquet dictionary encoding, the actual data is not stored
contiguously, resulting in very small memcpy sizes. When analyzing the
implementation of `memcpy`, we can see that for such small sizes,
`__builtin_memcpy` is used instead. The implementation of
`__builtin_memcpy` essentially behaves like a series of simple
assignments. You can observe the corresponding assembly code here:
https://godbolt.org/z/r9Ma1ozvd.
@github-actions github-actions bot requested a review from yiguolei as a code owner October 31, 2025 05:06
@Thearas
Copy link
Contributor

Thearas commented Oct 31, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Oct 31, 2025
@Thearas
Copy link
Contributor

Thearas commented Oct 31, 2025

run buildall

@yiguolei yiguolei merged commit 2dbbc6c into branch-4.0 Nov 10, 2025
24 of 27 checks passed
@github-actions github-actions bot deleted the auto-pick-57208-branch-4.0 branch November 10, 2025 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments