perf(c++): optimize primitive struct fields read performance#2960
Merged
chaokunyang merged 3 commits intoapache:mainfrom Dec 2, 2025
Merged
perf(c++): optimize primitive struct fields read performance#2960chaokunyang merged 3 commits intoapache:mainfrom
chaokunyang merged 3 commits intoapache:mainfrom
Conversation
LiangliangSui
approved these changes
Dec 2, 2025
Collaborator
Author
|
The benchmark is unfair, fory is actually same performance for Sample serialization, but in current benchmark, fory didn't wirte to a preallocsated buffer, it allocate a buffer every time. This is fixed in #2963 |
2 tasks
pandalee99
added a commit
that referenced
this pull request
Dec 2, 2025
## Why? The previouse benchmark is not fair: - Protobuf encode negative varint use 5 bytes, but fory may only use one bytes. And for small varint, fory has zigzag cost. this is not a fair compare - When serialize Sample, Fory allocate a vector every time, but protobuf serialize to a buffer instead. ## What does this PR do? - Make NumericStruct contains int32 of all kinds size, and positive and negative - Make fory serialize to a buffer to for sample With this fair compare, fory is similair performance as protobuf ## Related issues #2958 #2960 ## Does this PR introduce any user-facing change? <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/fory/issues/new/choose) describing the need to do so and update the document if necessary. Delete section if not applicable. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark | Datatype | Operation | Fory (ns) | Protobuf (ns) | Faster | |----------|-----------|-----------|---------------|--------| | Sample | Serialize | 345.6 | 316.4 | Protobuf (1.1x) | | Sample | Deserialize | 1376.4 | 1374.6 | Protobuf (1.0x) | | Struct | Serialize | 129.4 | 157.0 | Fory (1.2x) | | Struct | Deserialize | 207.5 | 154.4 | Protobuf (1.3x) | --------- Co-authored-by: Pan Li <1162953505@qq.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why?
What does this PR do?
- Compute field offsets at compile time with compute_fixed_field_offset<T, I>()
- Read all fixed fields at absolute offsets without per-field reader_index updates
- Single reader_index update after all fixed fields
- Track offset locally during batch reading
- Removed overly conservative max-varint-bytes pre-check (varints are variable-length)
- Single reader_index update after all varints
- Phase 1: Batch read leading fixed-size primitives
- Phase 2: Batch read consecutive varint primitives
- Phase 3: Read remaining fields normally
Related issues
#2958
#2906
Does this PR introduce any user-facing change?
Benchmark