From the course: Large Language Models on AWS: Building and Deploying Open-Source LLMs

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Llama.cpp on AWS G5 demo

Llama.cpp on AWS G5 demo

- [Instructor] All right, so we're on a AWS machine that's a monster machine. It's a G5 12Xlarge. It has GPUs attached to it. And this would be a perfect type of machine for doing inference for open source large language model. So, you don't need to use any service. You can just SCP a model that you compiled somewhere, or even get it compiled yourself. More likely what you would do is compile it on a build server. For this particular architecture, optimize it, and then do a copy to S3, and then pull it in. But in our case, we're going to do more of a kick the tires type approach here. So, the first thing that I'll mention is let's go ahead and take a look at what this actually has going. So if I type htop, you can see a lot of cores available right here. And you can see here that it's got 187 gigs of RAM as well. Now in terms of the GPUs, we have lots and lots of GPUs as well. So, we have four different GPUs, NVIDIA A10Gs. These are also pretty powerful machines here with about 23…

Contents