Google DeepMind announced on Thursday the release of Gemma 4, the latest generation of its open large language models (LLMs) that break from the restrictive licensing of previous versions.

Google released the new family under the Apache 2.0 license, officially moving the project from “open weights” to “open source” artificial intelligence (AI) models.

While Google’s flagship Gemini remains a closed, subscription-based service, Gemma 4 utilizes the same core research and technology.

In adopting the Apache 2.0 standard, Google now grants developers freedom to use, modify, and redistribute the models for personal or commercial gain without royalties. The shift addresses long-standing critiques from the developer community regarding Google’s previous “permissive but controlled” terms.

The release is structured into four distinct models designed to scale from pocket-sized hardware to massive data centers.

Heavyweight models 31B (Dense) and 26B (Mixture of Experts) are built for high-end servers equipped with NVIDIA H100 GPUs. Despite their smaller footprint compared to industry giants, these models currently hold the third and sixth spots on the Arena AI leaderboard, reportedly outperforming models 20 times their size.

Lightweight models E2B (2-billion) and E4B (4-billion) target edge devices like smartphones, Raspberry Pis, and IoT sensors. These versions are optimized for near-zero latency and can operate entirely offline.

“This provides a foundation for complete developer flexibility and digital sovereignty,” Google DeepMind leadership said in a statement, noting that offline capabilities are critical for industries like healthcare, where data confidentiality and regulatory restrictions often prohibit cloud-based AI processing.

Gemma 4 is not merely a text processor. All four models are natively multimodal, capable of processing video and images for tasks like optical character recognition (OCR) and complex chart analysis. The smaller edge models (E2B and E4B) go a step further by including native audio input for speech recognition.

Additional technical leaps include:

Massive context windows: The larger models support up to a 256K context window, allowing users to process entire code repositories or massive documents in a single prompt.

Agentic workflows: The models can now deploy autonomous agents to interact with APIs and execute multi-step logic.

Linguistic breadth: Trained in over 140 languages, the model’s training data likely even includes enough web-scraped data to handle rudimentary translation for niche languages like Klingon.

Since the original Gemma launch in February 2024, the models have been downloaded more than 400 million and inspired more than 100,000 variants.

By removing the leash of proprietary licensing, Google expects adoption to accelerate as companies begin bundling these AI engines directly into consumer hardware and local enterprise software.

Gemma 4 model weights are available for immediate download via Hugging Face, Kaggle, and Ollama.