Skip to content

Conversation

@JohannesGaessler
Copy link
Collaborator

@JohannesGaessler JohannesGaessler commented Jun 30, 2023

As described in #2019 the current fixed size of the VRAM scratch buffer at some point becomes an issue if the context is increased beyond 2048. This PR adjusts the VRAM scratch value based on test values; it seems that 1 MB of scratch size is needed per 512 extra context and per n_head:

Details
Model Context size Min. VRAM scratch size [MiB] Delta [MiB]
7b q6_K 512 136 -
7b q6_K 1024 171 35
7b q6_K 1536 243 72
7b q6_K 2048 318 75
7b q6_K 2560 350 32
7b q6_K 3072 382 32
7b q6_K 3584 414 32
7b q6_K 4096 446 32
7b q6_K 4608 478 32
7b q6_K 5120 510 32
7b q6_K 5632 542 32
7b q6_K 6144 574 32
7b q6_K 6656 606 32
7b q6_K 7168 638 32
7b q6_K 7680 670 32
7b q6_K 8192 702 32
Model Context size Min. VRAM scratch size [MiB] Delta [MiB]
13b q4_0 512 170 -
13b q4_0 1024 214 44
13b q4_0 1536 251 37
13b q4_0 2048 398 147
13b q4_0 2560 438 40
13b q4_0 3072 478 40
13b q4_0 3584 518 40
13b q4_0 4096 558 40
13b q4_0 4608 598 40
13b q4_0 5120 638 40
13b q4_0 5632 678 40
13b q4_0 6144 718 40
13b q4_0 6656 758 40
13b q4_0 7168 798 40
13b q4_0 7680 838 40
13b q4_0 8192 878 40
Model Context size Min. VRAM scratch size [MiB] Delta [MiB]
33b q2_k 512 226 -
33b q2_k 1024 278 52
33b q2_k 1536 395 117
33b q2_k 2048 517 122
33b q2_k 2560 569 52
33b q2_k 3072 621 52
33b q2_k 3584 673 52
33b q2_k 4096 725 52
33b q2_k 4608 777 52

I did not test 3b, I'm using the 7b values instead. Testing for 33b was limited by VRAM. Similarly I was not able to test 65b at all so I'm using double the values of 33b to be conservative. The base size of the VRAM scratch buffers is chosen so that there is a +25% margin at 2048 context. This PR does not fix the issue with the RAM scratch buffers being considered too small at high context.

@jxy
Copy link
Contributor

jxy commented Jun 30, 2023

I have used your numbers posted in #2019 to made changes to scaling of scratch0 in #2054, though I didn't test for large models.

@jxy jxy mentioned this pull request Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants