In LM Studio 0.3.23, you can enable a setting similar to llama.cpp's `--cpu-moe`.
If you're low on VRAM and need to split the model between GPU and CPU, give this setting a try.
If you're using gpt-oss via LM Studio's /v1/chat/completions API, try to turn this setting on to separate reasoning from regular content.
Need to update to v0.3.22 first!
LM Studio 0.3.22 b2 ships with Harmony, OpenAI's prompt formatting library for gpt-oss.
LM Studio uses Harmony to construct the input to gpt-oss models, for both the llama.cpp as well as the MLX engine. Both through the app's chat UI and API.
We ship it as an updatable "Framework", which is an extension to the LM Studio runtime.
See more details:
> Mac: Cmd + Shift + R
> PC: Ctrl + Shift + R