LinkedIn respects your privacy

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Join now Sign in

From the course: Large Language Models on AWS: Building and Deploying Open-Source LLMs

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

GGUF quantized llama.cpp end-to-end demo

GGUF quantized llama.cpp end-to-end demo

From the course: Large Language Models on AWS: Building and Deploying Open-Source LLMs

Start my 1-month free trial Buy for my team

GGUF quantized llama.cpp end-to-end demo

“

- [Instructor] It can be a little bit confusing about how to use a research model in a local environment. You may have heard about someone fine tuning some kind of foundation model, but then you were like, "Wait, I can't run this." And even tools like, Ollama or Llamafile don't have access to it. So, what do you do? Well, you need to use llama.cpp. So first up here, what are we going to do? We're going to make sure that we have UV installed. So, the way you would do this is you would actually run this command right here, which is uv installer. We can go ahead and run it. You can see, oh, it's already installed. It's a very tiny utility that is going to help us with a lot of stuff. Next step, we also would want to make sure we're cloning llama.cpp. In this case, if we go ahead and we say git remote -v, we can see it's also there, right? So we would make in terms of our CP architecture, also set GGML_CUDA flag. And I also say, "Hey, go ahead and spawn a bunch of threads here." And what…

Contents