Osimons/gpu sampling equivalence checks #2

ORippler · 2025-12-18T17:47:17Z

Spun up a script to investigate differences between backend-samplers and samplers working inside llama-cli/outside of ggml's opset. Main investigation insights:

std::uniform_real_distribution<double> returns different vals than std::uniform_real_distribution<float> when seeded with the same rng.
Apart from top k not having to return sorted outputs, main differences come from soft_max and cumsum, which will behave differently in ggml than in top-p/dist-sampler's due being parallelizeable across cores/sm
Tried changing dist-sampler formulation (threshold based on cumsum(exp(logits)) vs. threshold based on (cumsum(probs)), did not yield that many differences. Thus did not push the code
We use unstable std::partial_sort in llama-cpp samplers for sorting

After this PR + fixing warmup currently advancing rng for backend-samplers by 1 call, outputs are much closer for backend-based compared to llama-cpp-based sampling.

Filed this as a separate PR as I'm not sure we want the vibe-coded script in main llama.cpp

github-actions bot added the testing label Dec 18, 2025

Add test script to check for equivalence between samplers

408eefb

ORippler force-pushed the osimons/gpu-sampling-equivalence-checks branch from 7ee23c0 to da27c9b Compare December 19, 2025 10:46

ORippler added 2 commits December 19, 2025 11:54

Return probs in backend-top-p-sampler as well

199afa9

Log all errors in compare_token_data

8693457

ORippler force-pushed the osimons/gpu-sampling-equivalence-checks branch from da27c9b to 8693457 Compare December 19, 2025 10:55

ORippler mentioned this pull request Dec 19, 2025

sampling : add support for backend sampling ggml-org/llama.cpp#17004

Open

25 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Osimons/gpu sampling equivalence checks #2

Osimons/gpu sampling equivalence checks #2

ORippler commented Dec 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Osimons/gpu sampling equivalence checks #2

Are you sure you want to change the base?

Osimons/gpu sampling equivalence checks #2

Conversation

ORippler commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ORippler commented Dec 18, 2025 •

edited

Loading