Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Apr 21, 2024

Resolve #6391

The core idea is to use llama_chat_apply_template to apply the template twice: with and without the last user message. Then, we find the diff between 2 output strings and finally feed it into inference.

Example:

<start_of_turn>user
You are a helpful assistant

Hello<end_of_turn>
<start_of_turn>model
Hi there<end_of_turn>
<start_of_turn>user
Who are you<end_of_turn>
<start_of_turn>model
I am an assistant<end_of_turn>
<start_of_turn>user
Another question<end_of_turn>
<start_of_turn>model

-----
chat_get_added_part(): <start_of_turn>user
Another question<end_of_turn>
<start_of_turn>model

This approach will require minimal effort to maintain the chat template infrastructure, while using the extract same logic for main and server (remind: server also have the notion of "prompt cache" which works the same way)

Having to re-format the whole chat history each time seems inefficient at first glance, but it is needed because:

Then, we find the diff between the 2 strings.

  • Implement chat_get_added_part to get the diff part with / without the last user message
  • main must keep track of the list of messages
  • Update arguments for main, deprecate -cml (but not remove it) while adding -chat-template argument

@mofosyne mofosyne added enhancement New feature or request Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement (properly) different chat templates in main.cpp

2 participants