Skip to content

feat(parquet): add sparse-column writer benchmarks#9654

Merged
alamb merged 1 commit into
apache:mainfrom
HippoBaro:sparse_benchmarks
Apr 7, 2026
Merged

feat(parquet): add sparse-column writer benchmarks#9654
alamb merged 1 commit into
apache:mainfrom
HippoBaro:sparse_benchmarks

Conversation

@HippoBaro

@HippoBaro HippoBaro commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Measure sparse and all-null cases in benchmarks.

What changes are included in this PR?

Add three new benchmark cases to the arrow_writer benchmark suite for evaluating write performance on sparse and all-null data:

  • primitive_sparse_99pct_null: a flat primitive column with 99% nulls, exercising long RLE runs in definition levels.
  • list_primitive_sparse_99pct_null: a list-of-primitive column with 99% nulls, exercising null batching in the list level builder.
  • primitive_all_null: a flat primitive column with 100% nulls, exercising the uniform_levels fast path for entirely-null columns.

Are these changes tested?

N/A

Are there any user-facing changes?

None.

Add three new benchmark cases to the arrow_writer benchmark suite for
evaluating write performance on sparse and all-null data:

- `primitive_sparse_99pct_null`: a flat primitive column with 99% nulls,
  exercising long RLE runs in definition levels.
- `list_primitive_sparse_99pct_null`: a list-of-primitive column with
  99% nulls, exercising null batching in the list level builder.
- `primitive_all_null`: a flat primitive column with 100% nulls,
  exercising the uniform_levels fast path for entirely-null columns.

Baseline results (Apple M1 Max):

  primitive_sparse_99pct_null/default          40.3 ms
  primitive_sparse_99pct_null/parquet_2        43.5 ms
  primitive_sparse_99pct_null/zstd_parquet_2   44.4 ms
  list_primitive_sparse_99pct_null/default     39.9 ms
  list_primitive_sparse_99pct_null/parquet_2   39.9 ms
  list_primitive_sparse_99pct_null/zstd_p2     40.7 ms
  primitive_all_null/default                   38.0 ms
  primitive_all_null/parquet_2                 36.9 ms
  primitive_all_null/zstd_parquet_2            36.1 ms

Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

@etseidl etseidl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@etseidl

etseidl commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

I'm holding off merging in case we want to add more cases.

@alamb

alamb commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Are we still waiting on edge cases?

@etseidl

etseidl commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Are we still waiting on edge cases?

Yes, but I don't think we need to hold this up for them. They can be a separate PR if necessary.

@alamb alamb merged commit f5d6dc3 into apache:main Apr 7, 2026
17 checks passed
@alamb

alamb commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

🚀

Rich-T-kid pushed a commit to Rich-T-kid/arrow-rs that referenced this pull request Jun 2, 2026
# Which issue does this PR close?

- None, but relates to apache#9652

# Rationale for this change

Measure sparse and all-null cases in benchmarks.

# What changes are included in this PR?

Add three new benchmark cases to the arrow_writer benchmark suite for
evaluating write performance on sparse and all-null data:

- `primitive_sparse_99pct_null`: a flat primitive column with 99% nulls,
exercising long RLE runs in definition levels.
- `list_primitive_sparse_99pct_null`: a list-of-primitive column with
99% nulls, exercising null batching in the list level builder.
- `primitive_all_null`: a flat primitive column with 100% nulls,
exercising the uniform_levels fast path for entirely-null columns.

# Are these changes tested?

N/A

# Are there any user-facing changes?

None.

Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants