From the course: Large Language Models on AWS: Building and Deploying Open-Source LLMs
Unlock this course with a free trial
Join today to access over 24,900 courses taught by industry experts.
Llama.cpp on AWS G5 demo
From the course: Large Language Models on AWS: Building and Deploying Open-Source LLMs
Llama.cpp on AWS G5 demo
- [Instructor] All right, so we're on a AWS machine that's a monster machine. It's a G5 12Xlarge. It has GPUs attached to it. And this would be a perfect type of machine for doing inference for open source large language model. So, you don't need to use any service. You can just SCP a model that you compiled somewhere, or even get it compiled yourself. More likely what you would do is compile it on a build server. For this particular architecture, optimize it, and then do a copy to S3, and then pull it in. But in our case, we're going to do more of a kick the tires type approach here. So, the first thing that I'll mention is let's go ahead and take a look at what this actually has going. So if I type htop, you can see a lot of cores available right here. And you can see here that it's got 187 gigs of RAM as well. Now in terms of the GPUs, we have lots and lots of GPUs as well. So, we have four different GPUs, NVIDIA A10Gs. These are also pretty powerful machines here with about 23…
Contents
-
-
-
(Locked)
Implications of Amdahl’s law: A walkthrough4m 5s
-
(Locked)
Compiling llama.cpp demo4m 17s
-
(Locked)
GGUF file format3m 18s
-
(Locked)
Python UV scripting3m 55s
-
Python UV packaging overview1m 59s
-
(Locked)
Key concepts in llama.cpp walkthrough4m 37s
-
(Locked)
GGUF quantized llama.cpp end-to-end demo4m 3s
-
(Locked)
Llama.cpp on AWS G5 demo4m 20s
-
(Locked)
-