Skip to content

Shared canonical Huffman builder (hash.huffman) for compress.deflate and net.http HPACK #27358

Description

@quaesitor-scientiam

Summary

compress.deflate and the new HTTP/2 HPACK code (#27353) both implement
canonical Huffman coding. Per discussion on #27353 (with @JalonSolov), the
shared piece should be factored into a common builder — tentatively
hash.huffman — that both modules use, rather than duplicating the logic.

This issue tracks the design and the follow-up PR. PR #27353 lands as-is; this
is deliberately separate so the cross-cutting refactor of compress.deflate
(which is stable and used by zlib/gzip) gets its own review.

What is actually shared

The canonical code-assignment step: given an array of per-symbol bit lengths,
produce the per-symbol codes (and a decode structure). Verified that HPACK's
RFC 7541 Appendix B table is canonical — all 257 codes rebuild exactly from the
lengths alone using the same bl_count/next_code algorithm compress.deflate
already uses. So HPACK could ship only the lengths and derive its codes via the
shared builder.

What differs (and must be parameters, not assumptions)

  • Max code length: DEFLATE caps at 15 bits and decodes via a single flat
    2^max_bits table (~32K entries); HPACK codes go up to 30 bits, where a flat
    2^30 table is infeasible (needs a tree / multi-level table / bit-at-a-time
    decode). So the builder must choose its decode representation based on the
    bit-width.
  • Bit order: DEFLATE is LSB-first (it bit_reverses every code); HPACK is
    MSB-first.
  • Not shared: the actual bit I/O loops, and codec-specific semantics
    (HPACK EOS/all-ones padding; DEFLATE end-of-block, extra-bits, distance
    alphabet, length-code RLE).

Proposed shape (per @JalonSolov)

A hash.huffman builder parameterized by:

  • number of bits (max code length), and
  • bit order (LSB vs MSB).

Open design question raised on the PR: whether those params should have
defaults. Current lean is no defaults, with the config marked required /
@[noinit] so callers must specify them explicitly. Decide as part of the PR.

Acceptance criteria

  • A shared canonical-Huffman builder in hash.huffman (or agreed location).
  • compress.deflate migrated to it with no regression to its flat-table
    decode performance (benchmark before/after).
  • net.http HPACK migrated to build its codes from the lengths array via the
    shared builder.
  • Tests for the shared module plus the existing deflate/HPACK suites stay green.

References

Note

You can use the 👍 reaction to increase the issue's priority for developers.

Please note that only the 👍 reaction to the issue itself counts as a vote.
Other reactions and those to comments will not be taken into account.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions