-
Notifications
You must be signed in to change notification settings - Fork 14k
Open
Labels
good first issueGood for newcomersGood for newcomershelp wantedNeeds help from the communityNeeds help from the communityroadmapPart of a roadmap projectPart of a roadmap project
Description
Project: ggml-org : tutorials
List:
- guide : running gpt-oss with llama.cpp
- guide : adding new model architectures
- guide : using the new WebUI of llama.cpp
- tutorial : offline agentic coding with llama-server
- tutorial : compute embeddings using llama.cpp
- tutorial : parallel inference using Hugging Face dedicated endpoints
- tutorial : KV cache reuse with llama-server
- tutorial : measuring time to first token (TTFT) and time between tokens (TBT)
- tutorial : reusing multiple prompt prefixes with slots (-np) in llama-server
TODO:
- Is there a way to cache multiple prompt prefixes? #13488
- How to use function calls? #13134
- how to measure time to first token (TTFT) and time between tokens (TBT) #13251
- Apple A-chipsets; how to estimate a suitable model size ? #12742
- How to get started with webui development (ref: tutorials : list for llama.cpp #13523 (comment))
- etc.
Simply search for "How to" in the Discussions: https://github.com/ggml-org/llama.cpp/discussions?discussions_q=is%3Aopen+How+to
Contributions for writing tutorials are welcome!
jeffzhou2000, nmandic78, simpala, yuiseki, ServeurpersoCom and 2 moreVaibhavs10, jeffzhou2000, nphalem, ServeurpersoCom and jek-bao-chooo7si and jek-bao-choongxson, qnixsynapse, ServeurpersoCom and jek-bao-choo
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomershelp wantedNeeds help from the communityNeeds help from the communityroadmapPart of a roadmap projectPart of a roadmap project
Type
Projects
Status
In Progress