llama.cpp
Overview
Version & Lifecycle
Community Notes
llama.cpp HTTP server deployment note
For managed llama.cpp deployments that expose a local model endpoint, the official server tool can run a GGUF model with an explicit bind address and port, for example llama-server -m models\model.gguf --host 127.0.0.1 --port 8080. Use a non-loopback host only when you intentionally want other systems to reach the service.
The project documents OpenAI-compatible chat-completions, responses, and embeddings routes in the HTTP server, so endpoint tests can validate both the Windows process and the API surface after packaging. Source: official llama.cpp HTTP server documentation.
Release Notes & Updates
Help us match vulnerabilities
No vulnerability match yet. Pick the right product:
Don’t see it? Paste a CPE
Also known as
Other names people use for this app — helps search and matching.
Sign in to suggest a name.
Packaging Notes
Build from source using make; supports CPU and GPU backends (CUDA, Metal). Uses GGML for efficient tensor operations.