perf(c++): optimize primitive struct fields read performance#2960

chaokunyang · 2025-12-02T08:22:26Z

Why?

What does this PR do?

Type-based encoding detection: Added compile-time helpers to correctly distinguish signed (varint) vs unsigned (fixed) integers:
- field_is_fixed_primitive() - bool, int8, uint8, int16, uint16, uint32, uint64, float, double
- field_is_varint_primitive() - int32_t, int, int64_t, long long (zigzag varint)
Optimized fixed field reading:
- Compute field offsets at compile time with compute_fixed_field_offset<T, I>()
- Read all fixed fields at absolute offsets without per-field reader_index updates
- Single reader_index update after all fixed fields
Optimized varint field reading:
- Track offset locally during batch reading
- Removed overly conservative max-varint-bytes pre-check (varints are variable-length)
- Single reader_index update after all varints
Three-phase deserialization:
- Phase 1: Batch read leading fixed-size primitives
- Phase 2: Batch read consecutive varint primitives
- Phase 3: Read remaining fields normally

Related issues

#2958
#2906

Does this PR introduce any user-facing change?

Does this PR introduce any public API change?
Does this PR introduce any binary protocol compatibility change?

Benchmark

Datatype	Operation	Fory (ns)	Protobuf (ns)	Faster
Sample	Serialize	103.9	59.2	Protobuf (1.8x)
Sample	Deserialize	329.3	478.1	Fory (1.5x)
Struct	Serialize	10.3	20.1	Fory (1.9x)
Struct	Deserialize	19.1	16.0	Protobuf (1.2x)

chaokunyang · 2025-12-02T17:19:13Z

The benchmark is unfair, fory is actually same performance for Sample serialization, but in current benchmark, fory didn't wirte to a preallocsated buffer, it allocate a buffer every time.

This is fixed in #2963

## Why? The previouse benchmark is not fair: - Protobuf encode negative varint use 5 bytes, but fory may only use one bytes. And for small varint, fory has zigzag cost. this is not a fair compare - When serialize Sample, Fory allocate a vector every time, but protobuf serialize to a buffer instead. ## What does this PR do? - Make NumericStruct contains int32 of all kinds size, and positive and negative - Make fory serialize to a buffer to for sample With this fair compare, fory is similair performance as protobuf ## Related issues #2958 #2960 ## Does this PR introduce any user-facing change?  - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark | Datatype | Operation | Fory (ns) | Protobuf (ns) | Faster | |----------|-----------|-----------|---------------|--------| | Sample | Serialize | 345.6 | 316.4 | Protobuf (1.1x) | | Sample | Deserialize | 1376.4 | 1374.6 | Protobuf (1.0x) | | Struct | Serialize | 129.4 | 157.0 | Fory (1.2x) | | Struct | Deserialize | 207.5 | 154.4 | Protobuf (1.3x) | --------- Co-authored-by: Pan Li <1162953505@qq.com>

optimize primitive fields read

dfb8f38

chaokunyang requested review from LiangliangSui, pandalee99, theweipeng and urlyy December 2, 2025 08:22

chaokunyang requested a review from PragmaTwice as a code owner December 2, 2025 08:22

separate fields write for primitives too

57225c7

LiangliangSui approved these changes Dec 2, 2025

View reviewed changes

remove lambda from hotpath

a204602

chaokunyang merged commit f384e4f into apache:main Dec 2, 2025
56 checks passed

chaokunyang mentioned this pull request Dec 2, 2025

RoadMap for 1.0 #1017

Open

16 tasks

chaokunyang mentioned this pull request Dec 2, 2025

perf(c++): fair benchmark for cpp #2963

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(c++): optimize primitive struct fields read performance#2960