A practical guide to running multiple Qwen3 models through a single llama-server instance using model routing. Covers embedding, reranking, and chat/vision models.
Tested on Windows with RTX 3090 (24GB VRAM), llama-server build from llama.cpp master branch. Last updated: 2025-03-09.
