speculative: Ensure draft and target model vocab matches #3812

KerfuffleV2 · 2023-10-27T12:55:17Z

It's currently possible to shoot yourself in the foot by trying to speculate using a draft model with vocab that doesn't match the target, and weird stuff will happen in that case. Naturally the draft model will fail 100% of the time, but looking at the logs it'll appear that the draft is just generating random unrelated stuff (even draft candidates with NaN as the probability).

When there's a mismatch you'll now get an error like:

main: error: draft model vocab must match target model to use speculation but target vocab size 152064 does not match draft vocab size 32002

or

main: error: draft model vocab must match target model to use speculation but token 0 content differs

This approach may be too strict, since one might possibly want to use a draft model with a few special tokens that differ. One way to deal with that might be to just say there can be X mismatches at most.

strcmp on every entry might also be overkill. On my system, checking 32,000 entries is instant but doing something like looping with a step of 10 would probably be fine also.

ggerganov · 2023-10-27T13:10:15Z

examples/speculative/speculative.cpp


+    {
+        int n_vocab_tgt = llama_n_vocab(model_tgt);
+        if (n_vocab_tgt != llama_n_vocab(model_dft)) {


This is not ideal. Codellama 7B and 13B have vocab size of 32000 while Codellama 34B has vocab size of 32016. It's the same vocab but with some extra tokens.

We should not disallow such cases. Maybe just print errors / warnings, but still continue?

How about we error if the size differs by more than 100, and also check the content of min(n_vocab_tgt, n_vocab_dft) tokens? Maybe even start 5 or something, to allow for cases where something like BOS or EOS has different content.

100 and 5 are just random numbers I plucked out of the air. The actual values can be whatever you prefer.

Ok, thinks it would work

I changed it. Also made the token content mismatch message a bit more helpful. For example, trying to use Orca 3B to draft Mistral 7B:

main: error: draft model vocab must match target model to use speculation but token 259 content differs - target ' ', draft ' t'

) * speculative: Ensure draft and target model vocab matches * Tolerate small differences when checking dft vs tgt vocab

speculative: Ensure draft and target model vocab matches

a089765

ggerganov reviewed Oct 27, 2023

View reviewed changes

Tolerate small differences when checking dft vs tgt vocab

41f5d2a

ggerganov approved these changes Oct 27, 2023

View reviewed changes

ggerganov merged commit 41aee4d into ggml-org:master Oct 27, 2023

KerfuffleV2 deleted the fix-speculate-mismatched-models branch November 17, 2023 03:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

speculative: Ensure draft and target model vocab matches #3812

speculative: Ensure draft and target model vocab matches #3812

Uh oh!

KerfuffleV2 commented Oct 27, 2023

Uh oh!

ggerganov Oct 27, 2023 •

edited

Loading

Uh oh!

KerfuffleV2 Oct 27, 2023

Uh oh!

ggerganov Oct 27, 2023

Uh oh!

KerfuffleV2 Oct 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

speculative: Ensure draft and target model vocab matches #3812

speculative: Ensure draft and target model vocab matches #3812

Uh oh!

Conversation

KerfuffleV2 commented Oct 27, 2023

Uh oh!

ggerganov Oct 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KerfuffleV2 Oct 27, 2023

Choose a reason for hiding this comment

Uh oh!

ggerganov Oct 27, 2023

Choose a reason for hiding this comment

Uh oh!

KerfuffleV2 Oct 27, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggerganov Oct 27, 2023 •

edited

Loading