Cloud Models
Ollama’s cloud models are a new kind of model in Ollama that can run without a powerful GPU. Instead, cloud models are automatically offloaded to Ollama’s cloud service while offering the same capabilities as local models, making it possible to keep using your local tools while running larger models that wouldn’t fit on a personal computer.Supported models
For a list of supported models, see Ollama’s model library.Running Cloud models
Ollama’s cloud models require an account on ollama.com. To sign in or create an account, run:- CLI
- Python
- JavaScript
- cURL
To run a cloud model, open the terminal and run:
Cloud API access
Cloud models can also be accessed directly on ollama.com’s API. In this mode, ollama.com acts as a remote Ollama host.Authentication
For direct access to ollama.com’s API, first create an API key. Then, set theOLLAMA_API_KEY environment variable to your API key.
Listing models
For models available directly via Ollama’s API, models can be listed via:Generating a response
- Python
- JavaScript
- cURL
Local only
Ollama can run in local-only mode by disabling Ollama’s cloud features.Deprecations
Ollama will occasionally deprecate and retire older cloud models as newer and better open-source models are released. Tools and applications relying on Ollama Cloud models may need to be updated to keep working. Impacted users will be notified in advance of model deprecation and retirement. Deprecations will be communicated through email and on the Ollama website. Ollama Cloud model retirement does not affect local models.Upcoming deprecations
| Retirement date | Model | Recommended alternative |
|---|---|---|
| June 16, 2026 | kimi-k2-thinking | kimi-k2.6 |
| June 16, 2026 | kimi-k2:1t | kimi-k2.6 |
| June 16, 2026 | minimax-m2 | minimax-m3 |
| June 16, 2026 | glm-4.6 | glm-5.1 |
| June 16, 2026 | qwen3-next:80b | qwen3.5 |
| June 16, 2026 | qwen3-vl:235b | qwen3.5 |
| June 16, 2026 | qwen3-vl:235b-instruct | qwen3.5 |
| June 16, 2026 | cogito-2.1:671b | deepseek-v4-flash |

