docker-compose config not always considered

When running an embedding model via Docker Model Runner, the runtime configuration (--ubatch-size, context_size, etc.) behaves inconsistently depending on how the request is sent.

Using Docker Desktop GUI:
The model starts with the custom configuration (e.g., ubatch-size=2048).

Using curl from inside a container:
The model falls back to default values (ubatch-size=512, n_ctx=4096).

This makes it impossible to reliably control the physical batch size.

Steps to Reproduce

1. docker-compose.yaml

```
models:
  embedding:
    model: ai/embeddinggemma:300M-Q8_0
    context_size: 2048
    runtime_flags:
      - "--ubatch-size"
      - "2048"

services:
  curl-tester:
    image: curlimages/curl:8.11.1
    command: ["sh", "-lc", "sleep 1000000"]
    models:
      embedding:
        endpoint_var: EMBEDDING_ENDPOINT
        model_var: EMBEDDING_MODEL

```
3. Trigger a request via Docker Desktop GUI
Observe logs:

```
[2025-09-23T17:19:35.420131000Z] llama_context: constructing llama_context
[2025-09-23T17:19:35.420164000Z] llama_context: n_seq_max     = 1
[2025-09-23T17:19:35.420178000Z] llama_context: n_ctx         = 2048
[2025-09-23T17:19:35.420190000Z] llama_context: n_ctx_per_seq = 2048
[2025-09-23T17:19:35.420200000Z] llama_context: n_batch       = 2048
[2025-09-23T17:19:35.420210000Z] llama_context: n_ubatch      = 2048
```

4. Trigger a request via curl inside the container

```
docker exec -it tailscale-curl-tester-1 sh -lc '
echo "EMBED:" $EMBEDDING_ENDPOINT $EMBEDDING_MODEL

curl -sS "$EMBEDDING_ENDPOINT/embeddings" \
  -H "Authorization: Bearer dummy" \
  -H "Content-Type: application/json" \
  -d "{\"model\":\"$EMBEDDING_MODEL\",\"input\":\"hello world\"}" \
  | head -c 300; echo
'
```

Observe logs:

```
[2025-09-23T17:21:57.905751000Z] llama_context: constructing llama_context
[2025-09-23T17:21:57.905780000Z] llama_context: n_ctx         = 4096
[2025-09-23T17:21:57.905798000Z] llama_context: n_ctx_per_seq = 4096
[2025-09-23T17:21:57.905807000Z] llama_context: n_batch       = 2048
[2025-09-23T17:21:57.905815000Z] llama_context: n_ubatch      = 512
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docker-compose config not always considered #166

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

docker-compose config not always considered #166

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions