Skip to content

Conversation

@krasserm
Copy link
Contributor

@krasserm krasserm commented Aug 6, 2023

Description

This PR extend server.cpp to support submitting a grammar with completion requests (equivalent to the --grammar command line option for main.cpp)

Usage example

Start a server with a Llama-2-13B-GGML model:

./server -m llama-2-13b.ggmlv3.q4_0.bin -eps 1e-5 --n-gpu-layers 43 --host 0.0.0.0 --port 8080

Define schema of two functions in functions.json:

{
    "oneOf": [
        {
            "type": "object",
            "properties": {
                "function": {"const": "create_event"},
                "arguments": {
                    "type": "object",
                    "properties": {
                        "title": {"type": "string"},
                        "date": {"type": "string"},
                        "time": {"type": "string"}
                    }
                }
            }
        },
        {
            "type": "object",
            "properties": {
                "function": {"const": "image_search"},
                "arguments": {
                    "type": "object",
                    "properties": {
                        "query": {"type": "string"}
                    }
                }
            }
        }
    ]
}

A Python program that uses the served model for grammar-based sampling:

import importlib
import json

import requests

# import examples/json-schema-to-grammar.py
schema = importlib.import_module("examples.json-schema-to-grammar")

with open("functions.json", "r") as f:
    functions_schema = json.load(f)

# Convert functions schema to grammar
prop_order = {name: idx for idx, name in enumerate(["function", "arguments"])}
converter = schema.SchemaConverter(prop_order)
converter.visit(functions_schema, '')
grammar = converter.format_grammar()

prompts = [
    "Schedule a birthday party on Aug 14th 2023 at 8pm.",
    "Find an image of a dog."
]

for prompt in prompts:
    data_json = { "prompt": prompt, "temperature": 0.1, "n_predict": 512, "stream": False, "grammar": grammar }

    resp = requests.post(
        url="http://127.0.0.1:8080/completion",
        headers={"Content-Type": "application/json"},
        json=data_json,
    )

    result = json.loads(resp.json()["content"].strip().replace("\n", "\\n"))

    print(f"Prompt: {prompt}")
    print(f"Result: {result}\n")

When running, it produces the following output:

Prompt: Schedule a birthday party on Aug 14th 2023 at 8pm.
Result: {'function': 'create_event', 'arguments': {'date': '2023-08-14T20:00:00Z', 'time': '20:00', 'title': 'Birthday Party'}}

Prompt: Find an image of a dog.
Result: {'function': 'image_search', 'arguments': {'query': 'dog'}}

@krasserm
Copy link
Contributor Author

krasserm commented Aug 7, 2023

@SlyEcho thanks for reviewing my PR. I just added a second commit that should cover your review comments. I'm not a C++ programmer but still hope these changes make sense. Any guidance how to further improve this PR, if necessary, is highly appreciated!

@SlyEcho
Copy link
Contributor

SlyEcho commented Aug 8, 2023

Looks good now.

@krasserm
Copy link
Contributor Author

krasserm commented Aug 8, 2023

Thanks again for reviewing @SlyEcho. I just fixed the reported trailing whitespace error in another commit. Sorry for the extra round, checks should now pass.

@tobi
Copy link
Collaborator

tobi commented Aug 8, 2023

Very nice

@SlyEcho SlyEcho merged commit f5bfea0 into ggml-org:master Aug 8, 2023
@razorback16
Copy link

razorback16 commented Aug 8, 2023

I modified the example code to just one type of schema and one prompt and I am getting error

The schema (functions.json)

{
    "type": "object",
    "properties": {
        "function": {"const": "create_event"},
        "arguments": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "date": {"type": "string"},
                "time": {"type": "string"}
            }
        }
    }
}

The python code is this

import requests
import json
import importlib

schema = importlib.import_module("json-schema-to-grammar")

with open("functions.json", "r") as f:
    functions_schema = json.load(f)

# Convert functions schema to grammar
prop_order = {name: idx for idx, name in enumerate(["function", "arguments"])}
converter = schema.SchemaConverter(prop_order)
converter.visit(functions_schema, '')
grammar = converter.format_grammar()

prompt = "Schedule a birthday party on Aug 14th 2023 at 8pm."

data_json = { "prompt": prompt, "temperature": 0.1, "n_predict": 512, "stream": False, "grammar": grammar }

resp = requests.post(
    url="http://localhost:7860/completion",
    headers={"Content-Type": "application/json"},
    json=data_json,
)

result = json.loads(resp.json()['content'].strip().replace("\n", "\\n"))

print(f"Prompt: {prompt}")
print(f"Result: {result}\n")

The error I get on llama.cpp server is

{"timestamp":1691531624,"level":"INFO","function":"log_server_request","line":1144,"message":"request","remote_addr":"17.190.192.114","remote_port":60190,"status":200,"method":"POST","path":"/completion","params":{}}
{"timestamp":1691531689,"level":"INFO","function":"log_server_request","line":1144,"message":"request","remote_addr":"17.190.192.114","remote_port":60211,"status":404,"method":"POST","path":"/completion","params":{}}
space ::= space_1
space_1 ::= [ ] |
0-function ::= ["] [c] [r] [e] [a] [t] [e] [_] [e] [v] [e] [n] [t] ["]
string ::= ["] string_6 ["] space
string_4 ::= [^"\] | [\] string_5
string_5 ::= ["\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]
string_6 ::= string_4 string_6 |
0-arguments ::= [{] space ["] [d] [a] [t] [e] ["] space [:] space string [,] space ["] [t] [i] [m] [e] ["] space [:] space string [,] space ["] [t] [i] [t] [l] [e] ["] space [:] space string [}] space
0 ::= [{] space ["] [f] [u] [n] [c] [t] [i] [o] [n] ["] space [:] space 0-function [,] space ["] [a] [r] [g] [u] [m] [e] [n] [t] [s] ["] space [:] space 0-arguments [}] space
1-function ::= ["] [i] [m] [a] [g] [e] [_] [s] [e] [a] [r] [c] [h] ["]
1-arguments ::= [{] space ["] [q] [u] [e] [r] [y] ["] space [:] space string [}] space
1 ::= [{] space ["] [f] [u] [n] [c] [t] [i] [o] [n] ["] space [:] space 1-function [,] space ["] [a] [r] [g] [u] [m] [e] [n] [t] [s] ["] space [:] space 1-arguments [}] space
root ::= 0 | 1

llama_print_timings:        load time =   221.62 ms
llama_print_timings:      sample time =   171.83 ms /    50 runs   (    3.44 ms per token,   290.98 tokens per second)
llama_print_timings: prompt eval time =   189.00 ms /    22 tokens (    8.59 ms per token,   116.40 tokens per second)
llama_print_timings:        eval time =   915.24 ms /    49 runs   (   18.68 ms per token,    53.54 tokens per second)
llama_print_timings:       total time =  1308.12 ms
{"timestamp":1691531717,"level":"INFO","function":"log_server_request","line":1144,"message":"request","remote_addr":"17.190.192.114","remote_port":60220,"status":200,"method":"POST","path":"/completion","params":{}}
space ::= space_1
space_1 ::= [ ] |
function ::= ["] [c] [r] [e] [a] [t] [e] [_] [e] [v] [e] [n] [t] ["]
string ::= ["] string_6 ["] space
string_4 ::= [^"\] | [\] string_5
string_5 ::= ["\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]
string_6 ::= string_4 string_6 |
arguments ::= [{] space ["] [d] [a] [t] [e] ["] space [:] space string [,] space ["] [t] [i] [m] [e] ["] space [:] space string [,] space ["] [t] [i] [t] [l] [e] ["] space [:] space string [}] space
root ::= [{] space ["] [f] [u] [n] [c] [t] [i] [o] [n] ["] space [:] space function [,] space ["] [a] [r] [g] [u] [m] [e] [n] [t] [s] ["] space [:] space arguments [}] space
LLAMA_ASSERT: llama.cpp:2135: is_positive_char || pos->type == LLAMA_GRETYPE_CHAR_NOT
Aborted (core dumped)

@krasserm
Copy link
Contributor Author

krasserm commented Aug 9, 2023

@razorback16 Thanks for reporting. I can can reproduce the issue and will investigate.

@krasserm
Copy link
Contributor Author

krasserm commented Aug 9, 2023

@razorback16 I just submitted #2566 which fixes the issue you reported.

@krasserm krasserm deleted the wip-server-grammar branch August 9, 2023 13:38
@razorback16
Copy link

razorback16 commented Aug 9, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants