[User] Unable to Finetune Llama 2 70B

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

`fineunte`ing Llama 2 70B should succeed

# Current Behavior

`fineunte`ing Llama 2 70B fails with 

```
ggml_allocr_alloc: not enough space in the buffer (needed 1048576000, largest block available 939524096)
GGML_ASSERT: ggml-alloc.c:148: !"not enough space in the buffer"
Aborted (core dumped)
```

I should add that finetuning Llama 2 13B works.

# Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

<details>
<summary>System Memory</summary>

```
(base) user@server:/srv/shared$ free -h
               total        used        free      shared  buff/cache   available
Mem:           377Gi       2.1Gi       8.4Gi       1.0Mi       367Gi       373Gi
Swap:             0B          0B          0B
```

</details>

<details>
<summary>Physical (or virtual) hardware you are using, e.g. for Linux:</summary>

```
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  96
  On-line CPU(s) list:   0-95
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU @ 2.20GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  24
    Socket(s):           2
    Stepping:            7
    BogoMIPS:            4400.36
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflu
                         sh mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good
                          nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16
                          pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor 
                         lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced f
                         sgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx51
                         2dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsave
                         c xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   1.5 MiB (48 instances)
  L1i:                   1.5 MiB (48 instances)
  L2:                    48 MiB (48 instances)
  L3:                    77 MiB (2 instances)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-23,48-71
  NUMA node1 CPU(s):     24-47,72-95
Vulnerabilities:         
  Gather data sampling:  Unknown: Dependent on hypervisor status
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Mitigation; Clear CPU buffers; SMT Host state unknown
  Meltdown:              Not affected
  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Retbleed:              Mitigation; Enhanced IBRS
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequ
                         ence
  Srbds:                 Not affected
  Tsx async abort:       Mitigation; Clear CPU buffers; SMT Host state unknown
```

</details>

<details>
<summary>Operating System, e.g. for Linux:</summary>

```
Linux 6.2.0-1016-gcp #18~22.04.1-Ubuntu SMP Fri Sep 29 04:56:44 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
```

</details>

<details>
<summary>SDK version, e.g. for Linux:</summary>

```
Python 3.10.9
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```

</details>

# Failure Information (for bugs)

```
ggml_allocr_alloc: not enough space in the buffer (needed 1048576000, largest block available 939524096)
GGML_ASSERT: ggml-alloc.c:148: !"not enough space in the buffer"
Aborted (core dumped)
```

# Steps to Reproduce

The following steps assume that:
- You have miniconda installed and the base environment is loaded.
- You have access to the Llama 2 base model from Meta. 
- In addition, you have already downloaded the Llama 2 70B model and placed it in `./models/llama-2-70b`.
- You have downloaded [shakespeare.txt](https://raw.githubusercontent.com/brunoklein99/deep-learning-notes/master/shakespeare.txt) and placed it in the root of the git repository.


1. Clone llama.cpp.
2. `cd` into the directory where llama.cpp was cloned.
3. Run `make`
4. Create a miniconda environment called llama.
```
conda create -yn llama python=3.10.9
```
5. Switch to the miniconda environment you just created.
```
conda activate llama
```
6. Install dependencies
```
pip install --upgrade --requirement requirements.txt
pip install --upgrade torch transformers
```
7. Convert the Llama 2 70B model into the GGUF format.

```
python convert.py ./models/llama-2-70b
```

8. Quantize the converted model

```
./quantize ./models/llama-2-70b/ggml-model-f16.gguf ./models/llama-2-70b/ggml-model-q8_0.gguf q8_0
```

9. Attempt to finetune the model. For testing, I used [this text file](https://raw.githubusercontent.com/brunoklein99/deep-learning-notes/master/shakespeare.txt)


```
./finetune \
    --model-base ./models/llama-2-70b/ggml-model-q8_0.gguf \
    --checkpoint-in llama-2-70b-shakespeare-LATEST.gguf \
    --checkpoint-out llama-2-70b-shakespeare-ITERATION.gguf \
    --lora-out llama-2-70b-shakespeare-ITERATION.bin \
    --train-data "./shakespeare.txt" \
    --save-every 10 \
    --threads 48 --adam-iter 150 --batch 4 --ctx 64 \
    --use-checkpointing
```

# Failure Logs

The logs are too long to include as a comment. Instead, I am attaching them here. You'll also find that I ran a finetune on Llama 13B just to demonstrate that it's working.

[error.log](https://github.com/ggerganov/llama.cpp/files/12917987/error.log)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[User] Unable to Finetune Llama 2 70B #3644

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[User] Unable to Finetune Llama 2 70B #3644

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions