-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Closed
Labels
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
fineunteing Llama 2 70B should succeed
Current Behavior
fineunteing Llama 2 70B fails with
ggml_allocr_alloc: not enough space in the buffer (needed 1048576000, largest block available 939524096)
GGML_ASSERT: ggml-alloc.c:148: !"not enough space in the buffer"
Aborted (core dumped)
I should add that finetuning Llama 2 13B works.
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
System Memory
(base) user@server:/srv/shared$ free -h
total used free shared buff/cache available
Mem: 377Gi 2.1Gi 8.4Gi 1.0Mi 367Gi 373Gi
Swap: 0B 0B 0B
Physical (or virtual) hardware you are using, e.g. for Linux:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU @ 2.20GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
Stepping: 7
BogoMIPS: 4400.36
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflu
sh mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good
nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16
pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor
lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced f
sgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx51
2dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsave
c xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
Virtualization features:
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 1.5 MiB (48 instances)
L1i: 1.5 MiB (48 instances)
L2: 48 MiB (48 instances)
L3: 77 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
Vulnerabilities:
Gather data sampling: Unknown: Dependent on hypervisor status
Itlb multihit: Not affected
L1tf: Not affected
Mds: Mitigation; Clear CPU buffers; SMT Host state unknown
Meltdown: Not affected
Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Retbleed: Mitigation; Enhanced IBRS
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequ
ence
Srbds: Not affected
Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown
Operating System, e.g. for Linux:
Linux 6.2.0-1016-gcp #18~22.04.1-Ubuntu SMP Fri Sep 29 04:56:44 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
SDK version, e.g. for Linux:
Python 3.10.9
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Failure Information (for bugs)
ggml_allocr_alloc: not enough space in the buffer (needed 1048576000, largest block available 939524096)
GGML_ASSERT: ggml-alloc.c:148: !"not enough space in the buffer"
Aborted (core dumped)
Steps to Reproduce
The following steps assume that:
- You have miniconda installed and the base environment is loaded.
- You have access to the Llama 2 base model from Meta.
- In addition, you have already downloaded the Llama 2 70B model and placed it in
./models/llama-2-70b. - You have downloaded shakespeare.txt and placed it in the root of the git repository.
- Clone llama.cpp.
cdinto the directory where llama.cpp was cloned.- Run
make - Create a miniconda environment called llama.
conda create -yn llama python=3.10.9
- Switch to the miniconda environment you just created.
conda activate llama
- Install dependencies
pip install --upgrade --requirement requirements.txt
pip install --upgrade torch transformers
- Convert the Llama 2 70B model into the GGUF format.
python convert.py ./models/llama-2-70b
- Quantize the converted model
./quantize ./models/llama-2-70b/ggml-model-f16.gguf ./models/llama-2-70b/ggml-model-q8_0.gguf q8_0
- Attempt to finetune the model. For testing, I used this text file
./finetune \
--model-base ./models/llama-2-70b/ggml-model-q8_0.gguf \
--checkpoint-in llama-2-70b-shakespeare-LATEST.gguf \
--checkpoint-out llama-2-70b-shakespeare-ITERATION.gguf \
--lora-out llama-2-70b-shakespeare-ITERATION.bin \
--train-data "./shakespeare.txt" \
--save-every 10 \
--threads 48 --adam-iter 150 --batch 4 --ctx 64 \
--use-checkpointing
Failure Logs
The logs are too long to include as a comment. Instead, I am attaching them here. You'll also find that I ran a finetune on Llama 13B just to demonstrate that it's working.