Skip to content

docker-compose config not always considered #166

@updiversity

Description

@updiversity

When running an embedding model via Docker Model Runner, the runtime configuration (--ubatch-size, context_size, etc.) behaves inconsistently depending on how the request is sent.

Using Docker Desktop GUI:
The model starts with the custom configuration (e.g., ubatch-size=2048).

Using curl from inside a container:
The model falls back to default values (ubatch-size=512, n_ctx=4096).

This makes it impossible to reliably control the physical batch size.

Steps to Reproduce

  1. docker-compose.yaml
models:
  embedding:
    model: ai/embeddinggemma:300M-Q8_0
    context_size: 2048
    runtime_flags:
      - "--ubatch-size"
      - "2048"

services:
  curl-tester:
    image: curlimages/curl:8.11.1
    command: ["sh", "-lc", "sleep 1000000"]
    models:
      embedding:
        endpoint_var: EMBEDDING_ENDPOINT
        model_var: EMBEDDING_MODEL

  1. Trigger a request via Docker Desktop GUI
    Observe logs:
[2025-09-23T17:19:35.420131000Z] llama_context: constructing llama_context
[2025-09-23T17:19:35.420164000Z] llama_context: n_seq_max     = 1
[2025-09-23T17:19:35.420178000Z] llama_context: n_ctx         = 2048
[2025-09-23T17:19:35.420190000Z] llama_context: n_ctx_per_seq = 2048
[2025-09-23T17:19:35.420200000Z] llama_context: n_batch       = 2048
[2025-09-23T17:19:35.420210000Z] llama_context: n_ubatch      = 2048
  1. Trigger a request via curl inside the container
docker exec -it tailscale-curl-tester-1 sh -lc '
echo "EMBED:" $EMBEDDING_ENDPOINT $EMBEDDING_MODEL

curl -sS "$EMBEDDING_ENDPOINT/embeddings" \
  -H "Authorization: Bearer dummy" \
  -H "Content-Type: application/json" \
  -d "{\"model\":\"$EMBEDDING_MODEL\",\"input\":\"hello world\"}" \
  | head -c 300; echo
'

Observe logs:

[2025-09-23T17:21:57.905751000Z] llama_context: constructing llama_context
[2025-09-23T17:21:57.905780000Z] llama_context: n_ctx         = 4096
[2025-09-23T17:21:57.905798000Z] llama_context: n_ctx_per_seq = 4096
[2025-09-23T17:21:57.905807000Z] llama_context: n_batch       = 2048
[2025-09-23T17:21:57.905815000Z] llama_context: n_ubatch      = 512

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions