Icon

llama.cpp

llama.cpp is an efficient C++ implementation leveraging the GGML library to run large language model (LLM) inference locally, focusing on CPU efficiency with SIMD support and optional GPU acceleration.
Latest: 9102 Winget
Last checked: Jun 9, 2026 12:10am
Rank: 243/15140
Also monitored via:
GitHub Releases Site Monitor
Follow to track new versions in your feed.
Report

Overview

0
License: MIT LicenseWinget: Available

Version & Lifecycle

0
Current: 9102 N-2: 9085 Avg cadence: Every 1 day

Top Contributors

Top sitewide contributors:

  1. Anbarasan
  2. nico_k
  3. Bob
  4. Vigneshwaran

Community Notes

Deployment note • May 15, 2026
0

llama.cpp HTTP server deployment note

For managed llama.cpp deployments that expose a local model endpoint, the official server tool can run a GGUF model with an explicit bind address and port, for example llama-server -m models\model.gguf --host 127.0.0.1 --port 8080. Use a non-loopback host only when you intentionally want other systems to reach the service.

The project documents OpenAI-compatible chat-completions, responses, and embeddings routes in the HTTP server, so endpoint tests can validate both the Windows process and the API surface after packaging. Source: official llama.cpp HTTP server documentation.

Release Notes & Updates

0
Avg cadence:
Updates • 0

Help us match vulnerabilities

No vulnerability match yet. Pick the right product:

Looking for matching products…
Don’t see it? Paste a CPE

Also known as

Other names people use for this app — helps search and matching.

llamacppggml llamacpp

Packaging Notes

0

Build from source using make; supports CPU and GPU backends (CUDA, Metal). Uses GGML for efficient tensor operations.

Notes

0

Requires model weights converted to GGML or GGUF format. Supports quantized models for faster inference and lower memory usage.