Kimi K2.6 can now run on CPU, GPU and SSD setups! 🔥 We shrank the SOTA 1T parameter model to 340GB via Dynamic GGUFs where important layers are upcasted. Run at >40 tok/s on 350GB RAM/VRAM setups. Run full precision on 610 GB. Guide: https://lnkd.in/dQvDqXDV GGUF: https://lnkd.in/d42-UqGK
Unsloth AI
Technology, Information and Internet
San Francisco, California 36,407 followers
Making AI accessible for everyone! 🦥
About us
Making open-source AI more accessible.
- Website
-
https://unsloth.ai
External link for Unsloth AI
- Industry
- Technology, Information and Internet
- Company size
- 11-50 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2023
- Specialties
- artificial intelligence, ai, llms, language models, finetuning, and open-source
Locations
-
Primary
Get directions
San Francisco, California 94114, US
Employees at Unsloth AI
Updates
-
We benchmarked Gemma 4 26B-A4B GGUFs to identify the best performing quants. Unsloth ranks first in ALL 22 of 22 model sizes on mean KL divergence, making them SOTA. GGUFs: https://lnkd.in/gW3BebDi HQ graphs: https://lnkd.in/gdUvHAKJ
-
-
We ran Qwen3.6-35-A3B GGUF performance benchmarks to help you choose the best quant. 🚀 Unsloth ranks first in 21 of 22 model sizes on mean KL divergence, making them SOTA. GGUFs: https://lnkd.in/gkG_a9vm
-
-
2-bit Qwen3.6-35B-A3B did a complete repo bug hunt with evidence, repro, fixes, tests and a PR writeup. 🔥 Run it locally in Unsloth Studio with just 13GB RAM. The 2-bit Qwen3.6 GGUF made 30+ tool calls, searched 20 sites and executed Python code. GitHub: https://lnkd.in/dcqhW9Vv GGUF: https://lnkd.in/gkG_a9vm Guide: https://lnkd.in/gWsiUmQh
-
Qwen3.6-35B-A3B can now be run locally! 💜 The model is the strongest mid-sized LLM on nearly all benchmarks. Run Qwen on 23GB RAM via Unsloth Dynamic GGUFs. GGUFs to run: https://lnkd.in/gkG_a9vm Guide: https://lnkd.in/gWsiUmQh
-
-
MiniMax 2.7 can now be run locally!🔥 Run the new 230B open model on 128GB Mac or RAM/VRAM setups for Dynamic 4-bit. MiniMax-M2.7 achieves SOTA performance on SWE-Pro and Terminal Bench 2. Guide: https://lnkd.in/g6XUiPae GGUFs on Hugging Face: https://lnkd.in/g3qZVfcu
-
-
GLM-5.1 is out now! ⚡ GLM-5.1 is a new open model for SOTA agentic coding & chat and you can run it locally. Run it locally on 256GB Mac or RAM/VRAM setups. We shrank the 744B model from 1.65TB to 220GB (-86%) via Dynamic 2-bit. Guide: https://lnkd.in/g6XvkEtt GGUF: https://lnkd.in/gCbdfsNH
-
-
Gemma 4 E4B is able to search and cite 10+ websites, execute code, to find the best answer (4-bit GGUF)! 🔥 You only need 6GB RAM to try this in Unsloth Studio. Unsloth Studio GitHub repo: https://lnkd.in/dcqhW9Vv
-
We shipped 50+ updates to Unsloth Studio in just one week! 🚀 - Unsloth Studio now installs in just 2mins - 10x faster via pre-compiled llama.cpp binaries - New Desktop app icon shortcuts - 50% less disk space - Update via `unsloth studio update` - Upload multiple files to Data Recipes - Context length now adjustable - Inference token, context observability - Windows, CPU, GPU now works great - Tool calling improved with parsing, no raw tool markup in chat, faster inference, a new Tool Outputs panel, timers. Full Changelog: https://lnkd.in/gKyqCRCj GitHub: https://lnkd.in/dcqhW9Vv