cli: new CLI experience #17824

ngxson · 2025-12-06T14:07:01Z

We are moving to a new CLI experience with the main code built on top of llama-server. This brings many additional features into llama-cli, making the experience feels mostly like a smaller version of the web UI:

Multimodal support
Regenerate last message
Speculative decoding
Fully jinja support (including some edge cases that old llama-cli doesn't support)

TODO:

Use reasoning_content when possible, depends on server: delegate result_state creation to server_task #17835
Allow using left/right arrow to edit current message, depends on Feature request: add support for left/right arrow for console::readline #17828
Maybe allow up/down arrow for history (not sure), depends on Feature request: add support for up/down history in console::readline #17829
Allow reporting timings
Support --image and --audio arguments
Add messages to warn user to move to llama-completion for legacy features

Features planned for next versions:

Allow hide/show timings and reasoning content (saving user preferences to disk and reuse later)
Allow exporting/importing conversation
Support raw completion
Support remote media URL (downloaded from internet)
Show progress (in percentage) if prompt processing takes too long

TODO for console system:

Auto-completion for commands and file paths
Support "temporary display", for example clear loading messages when it's done

ggerganov · 2025-12-10T15:03:12Z

When we read a file, maybe we can start processing the prompt immediately? Just post a task with n_predict == 0, stream == false.

ngxson · 2025-12-10T16:21:10Z

Ah nevermind, I see what you mean. Hmm yeah that could also be a good idea

bandoti · 2025-12-10T21:11:47Z

@ngxson great work on this. You're lighting fast ⚡️!! 😊

MB7979 · 2025-12-11T00:26:12Z

I hope that llama-completion will remain available long term. It’s not possible to output to a file with this new CLI experience, or do raw completions, as well as a bunch of other things. Not everyone wants a chat experience. Would also be super helpful if llama-completion was documented somewhere.

Thank you for your work.

andrew-aladev · 2025-12-11T09:39:33Z

Well, we are missing llama-completion in the Docker image; for now, we only have a way to build a full image with the .devops/tools.sh entrypoint. But .devops/tools.sh doesn't have llama-completion included.

Meanwhile, there is no way to disable this chat experience or interactive mode in order to get just raw output.

andrew-aladev · 2025-12-11T09:47:44Z

Hello @ngxson, what does it mean --no-conversation is not supported by llama-cli? So you mean that llama.cpp project itself is dropping regular inference capabilities and only keeps interactive chats (like web UI)? Why have you dropped regular inference to llama-completion and marked it as legacy? Is this an expected behavior of the llama.cpp project, and should users that don't want interactive mode leave llama.cpp? Or this is just a temporal issue and you plan to implement non-interactive mode later?

ngxson · 2025-12-11T09:56:13Z

Friendly reminder that this is an open-source project and missing features can be added by contributors.

I won't comment further on missing features, there are already TODOs in the code for this purpose.

andrew-aladev · 2025-12-11T10:02:24Z

I don't want to offend anyone, but I can just predict that I won't comment further is not possible, because you just have completely dropped the ability for users to catch LLM output (!). So users will create issues and make references to this pull request, and you will have to make comments anyway.

bandoti · 2025-12-11T11:39:57Z

@andrew-aladev please see my comments on the referenced issue. It is fine to reference this PR in issues but it is best to keep the conversation in those issues.

It is good that folks speak up if there's a need and file bug reports. These changes are needed to move forward because of technical debt in llama-cli that built up during evolution of the capabilities.

If there are any further issues regarding this please copy me on there and I will work on cataloging user needs.

CISC · 2025-12-11T14:46:36Z

Well, we are missing llama-completion in the Docker image; for now, we only have a way to build a full image with the .devops/tools.sh entrypoint. But .devops/tools.sh doesn't have llama-completion included.

That is obviously an oversight that needs to be fixed, llama-completion should be included in the Docker images.

tildebyte · 2025-12-11T17:37:36Z

Guys (@andrew-aladev, @MB7979, others who complain first-listen later);

If you have the time to post here (multiple posts, even), you certainly have the time for due diligence, E.G.!! #17618, which starts off with

Important

For people coming here to complain about this breaking your workflow:

    - llama-completion is there and we won't remove it
    - Read https://github.com/ggml-org/llama.cpp/discussions/17618#discussioncomment-15233169 to understand why this move is needed

This kind of thing is why open source maintainers[1] lose all of their hair, drop out, and decide to go to nursing school.

[1] Yes, I am one.

MB7979 · 2025-12-11T17:57:36Z

Guys (@andrew-aladev, @MB7979, others who complain first-listen later);

If you have the time to post here (multiple posts, even), you certainly have the time for due diligence, E.G.!! #17618, which starts off with
Important

For people coming here to complain about this breaking your workflow:

    - llama-completion is there and we won't remove it
    - Read https://github.com/ggml-org/llama.cpp/discussions/17618#discussioncomment-15233169 to understand why this move is needed 
This kind of thing is why open source maintainers[1] lose all of their hair, drop out, and decide to go to nursing school.

[1] Yes, I am one.

That message was edited to add that clarification 30 minutes ago. So you are chastising us for not finding something that didn’t exist when I posted and is in fact a direct response to said questions being raised.

For what it’s worth I did check PRs, issues, and discussions yesterday. I missed that discussion (it’s a few weeks old) and as soon as I found it I took my feedback there.

ngxson · 2025-12-11T19:25:12Z

@MB7979 Didn't you ignore this message that was added for a whole 2 weeks ago?

https://github.com/ngxson/llama.cpp/blob/3632271eb98541bd6fd726f4cd2a973d89a195ed/tools/main/main.cpp#L524-L528

    LOG_WRN("*****************************\n");
    LOG_WRN("IMPORTANT: The current llama-cli will be moved to llama-completion in the near future\n");
    LOG_WRN("  New llama-cli will have enhanced features and improved user experience\n");
    LOG_WRN("  More info: https://github.com/ggml-org/llama.cpp/discussions/17618\n");
    LOG_WRN("*****************************\n");

If you missed it, you have just implicitly proved that llama-cli UX was not very good.

MB7979 · 2025-12-11T19:37:48Z

I had not updated llama.cpp for a few weeks. I only did so on reading the new cli commit, as I was concerned the old functionality would be removed, which it was.

I really don’t understand the defensive tone being taken here. I’m not sure about the other poster, but my only intent was to ascertain whether llama.completion would be an ongoing part of the project, and to suggest some documentation to go with the changes to avoid people like me wasting your time with such questions.

I take it feedback is unwelcome here and I will not participate further.

ngxson · 2025-12-11T19:45:26Z

I really don’t understand the defensive tone being taken here

I am just speaking the truth.

and to suggest some documentation to go with the changes to avoid people like me wasting your time with such questions.

Then what is your suggestion? There was already a discussion and a notice inside llama-cli

I had not updated llama.cpp for a few weeks

I missed that discussion (it’s a few weeks old)

If we do have had better documentation, what will prevent you from missing it again?

(Again, I'm just speaking the truth here)

MB7979 · 2025-12-11T19:52:51Z

Include it with all the other examples, listed on the front page of this repository.

ngxson · 2025-12-11T19:54:51Z

Include it with all the other examples

you said earlier:

I had not updated llama.cpp for a few weeks

listed on the front page of this repository.

probably fair, I haven't touched the main README.md for a long time - even the last 2 big changes in llama-server and llama-cli weren't on the the list

tildebyte · 2025-12-11T22:16:55Z

@MB7979 TBH I was much more addressing @andrew-aladev than you. Apologies if I offended; I knew about these changes at least from the beginning of this week, but I was mistaken in how I knew 😬 (not unusual for me :) )

bandoti · 2025-12-11T23:03:15Z

I think for many folks who rely on llama-cli old features for now, no need to panic: llama-completions isn't going anywhere. It is the same old (legacy) llama-cli application with a new name. Just because it is deemed "legacy" does not mean it will be deleted 🙂

Lots of changes happen quickly in this codebase compared to other projects because of the interest in AI. It is sometimes hard to track everything happening, but it is important for everyone to try their best.

I will be creating a discussion in the following week to establish some of the user journeys and see if I can't come up with a roadmap of sorts.

* wip * wip * fix logging, add display info * handle commands * add args * wip * move old cli to llama-completion * rm deprecation notice * move server to a shared library * move ci to llama-completion * add loading animation * add --show-timings arg * add /read command, improve LOG_ERR * add args for speculative decoding, enable show timings by default * add arg --image and --audio * fix windows build * support reasoning_content * fix llama2c workflow * color default is auto * fix merge conflicts * properly fix color problem Co-authored-by: bandoti <[email protected]> * better loading spinner * make sure to clean color on force-exit * also clear input files on "/clear" * simplify common_log_flush * add warning in mtmd-cli * implement console writter * fix data race * add attribute * fix llama-completion and mtmd-cli * add some notes about console::log * fix compilation --------- Co-authored-by: bandoti <[email protected]>

jsjtxietian · 2025-12-13T14:25:31Z

tools/completion/README.md

https://github.com/ggml-org/llama.cpp/blame/master/tools/llama-bench/README.md has an outdated link to main/README.md, IMHO it should be updated too? Happy to help if needed.

Sure that would be great. If you find any broken links like that please feel free to submit a PR. Thank you!

@bandoti Hello, I've added a PR #17993 it fixes dead links including link in llama-bench. Please review, thank you 😊

PS I am walking to shop, so accidentally made double post. Sorry!

I've added a PR #17993 it fixes dead links including link in llama-bench

That's great!

Notable changes: - Fix race conditions in threadpool (ggml-org#17748) - New CLI experience (ggml-org#17824) - Vision model improvements (clip refactor, new models) - Performance fixes (CUDA MMA, Vulkan improvements) - tools/main renamed to tools/completion Conflict resolution: - ggml-cpu.c: Use new threadpool->n_threads API (replaces n_threads_max), keep warning suppressed to reduce log noise 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

ngxson added 11 commits November 30, 2025 14:38

wip

da400e0

wip

fda30b8

Merge branch 'master' into xsn/cli_server_based

315d09d

fix logging, add display info

85c85ea

handle commands

820c46d

add args

33551d4

wip

9261022

move old cli to llama-completion

b08a426

rm deprecation notice

42e9b38

move server to a shared library

29f03bc

move ci to llama-completion

61de19b

github-actions bot added script Script related testing Everything test related examples devops improvements to build systems and github actions server labels Dec 6, 2025

add loading animation

b9b91ca

This was referenced Dec 6, 2025

Feature request: add support for left/right arrow for console::readline #17828

Closed

Feature request: add support for up/down history in console::readline #17829

Closed

add --show-timings arg

0799d30

ngxson mentioned this pull request Dec 6, 2025

common : change --color to accept on/off/auto, default to auto #17827

Merged

ngxson added 8 commits December 6, 2025 23:44

add /read command, improve LOG_ERR

fb252d7

add args for speculative decoding, enable show timings by default

9987ccb

add arg --image and --audio

57b8d60

fix windows build

f193bbf

support reasoning_content

fa95df0

fix llama2c workflow

7d76234

Merge branch 'master' into xsn/cli_server_based

3300a7a

color default is auto

9b26375

loci-dev mentioned this pull request Dec 7, 2025

UPSTREAM PR #17824: cli: new CLI experience auroralabs-loci/llama.cpp#479

Open

6 tasks

ngxson added 3 commits December 10, 2025 12:23

fix llama-completion and mtmd-cli

ed3fe19

add some notes about console::log

9fdf597

fix compilation

c5faae9

ngxson merged commit 6c21317 into ggml-org:master Dec 10, 2025
73 of 75 checks passed

This comment was marked as outdated.

Sign in to view

andrew-aladev mentioned this pull request Dec 11, 2025

Eval bug: It is not possible to catch LLM output anymore #17933

Closed

andrew-aladev mentioned this pull request Dec 13, 2025

CLI: fixed dead links to tools/main for cli and completion, fixed code owners #17993

Merged

jsjtxietian reviewed Dec 13, 2025

View reviewed changes

andrew-aladev mentioned this pull request Dec 13, 2025

CLI: fixed adding cli and completion into docker containers, improved docs #18003

Open

cli: new CLI experience #17824

cli: new CLI experience #17824

Conversation

ngxson commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Dec 10, 2025

Uh oh!

This comment was marked as outdated.

ngxson commented Dec 10, 2025

Uh oh!

bandoti commented Dec 10, 2025

Uh oh!

MB7979 commented Dec 11, 2025

Uh oh!

andrew-aladev commented Dec 11, 2025

Uh oh!

andrew-aladev commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Dec 11, 2025

Uh oh!

andrew-aladev commented Dec 11, 2025

Uh oh!

bandoti commented Dec 11, 2025

Uh oh!

CISC commented Dec 11, 2025

Uh oh!

tildebyte commented Dec 11, 2025

Uh oh!

MB7979 commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MB7979 commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MB7979 commented Dec 11, 2025

Uh oh!

ngxson commented Dec 11, 2025

Uh oh!

tildebyte commented Dec 11, 2025

Uh oh!

bandoti commented Dec 11, 2025

Uh oh!

jsjtxietian Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bandoti Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

andrew-aladjev Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsjtxietian Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

ngxson commented Dec 6, 2025 •

edited

Loading

andrew-aladev commented Dec 11, 2025 •

edited

Loading

MB7979 commented Dec 11, 2025 •

edited

Loading

ngxson commented Dec 11, 2025 •

edited

Loading

MB7979 commented Dec 11, 2025 •

edited

Loading

ngxson commented Dec 11, 2025 •

edited

Loading

jsjtxietian Dec 13, 2025 •

edited

Loading

andrew-aladjev Dec 13, 2025 •

edited

Loading