LinkedIn respects your privacy

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Join now Sign in

From the course: Large Language Models on AWS: Building and Deploying Open-Source LLMs

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Llama.cpp on AWS G5 demo

Llama.cpp on AWS G5 demo

From the course: Large Language Models on AWS: Building and Deploying Open-Source LLMs

Start my 1-month free trial Buy for my team

Llama.cpp on AWS G5 demo

“

- [Instructor] All right, so we're on a AWS machine that's a monster machine. It's a G5 12Xlarge. It has GPUs attached to it. And this would be a perfect type of machine for doing inference for open source large language model. So, you don't need to use any service. You can just SCP a model that you compiled somewhere, or even get it compiled yourself. More likely what you would do is compile it on a build server. For this particular architecture, optimize it, and then do a copy to S3, and then pull it in. But in our case, we're going to do more of a kick the tires type approach here. So, the first thing that I'll mention is let's go ahead and take a look at what this actually has going. So if I type htop, you can see a lot of cores available right here. And you can see here that it's got 187 gigs of RAM as well. Now in terms of the GPUs, we have lots and lots of GPUs as well. So, we have four different GPUs, NVIDIA A10Gs. These are also pretty powerful machines here with about 23…

Contents