Lang ZH

SMALL SIZE, SUPER POWER

Edge Model for Everyone, Everyday, Everywhere

MiniCPM InsidePhones

MiniCPM InsideAIPC

MiniCPM InsideIntelligent Cabins

MiniCPM InsideEmbodied Robots

MiniCPM InsideWearable Devices

Put ChatGPT, GPT-4V Level LLMs on Your Phone, Pad and PC

Learn More

The 'MiniCPM' edge model series is a world-leading, lightweight, and high-performance LLM. Since its release in February 2024, it has been widely tested and acclaimed by the global open-source community for its "achieving more with less" efficiency and outstanding on-device performance. It has repeatedly topped GitHub and Hugging Face trending charts, becoming one of the most popular LLMs on Hugging Face in 2024. The 'MiniCPM' has partnered with industry benchmark leaders, emerging as an indispensable player in driving innovation across sectors such as AIPC, AI phones, intelligent cabins, and embodied robots.

High Efficiency, Low Cost, Achieving More with LessFundation Model MiniCPM

4B2.4B1.2B

The On-Device ChatGPT Moment

8B Lightning Edition + 0.5B — Small But Powerful100+x Speed Boost Proficient in Long-Form On-Device Text

Fast!
Inference speed up to 220x ultra-acceleration 5x regular acceleration

Smooth!
Efficient dual-stream sliding window switching Sparse computation for long texts Dense computation for short texts

Powerful!
Punches above its weight with flagship-level performance Requires only 22% of training data to reach comparable quality

Compact!
25% ultra-low storage footprint 90% slimmed-down quantized version Optimized for on-device deployment

Unbelievably Strong for 4B size edge Model on your device!
ChatGPT-level Basic Performance Surpassing GPT-3.5, Qwen2-7B, GLM4-9B

New Architecture, New Benchmark of LLM Knowledge Density

Light! Fast! On-Device Friendly
Only 2GB of memory after quantization
Versatile and Sharp as a Swiss Army KnifeSurpassing Kimi! Infinite Long Text
32, 128, 256, 512K... Unlimited Context Expansion

GPT-4o-level Function Calling
Surpassing GPT-3.5, GLM4-9B, Close to GPT-4o

Superior RAG External Attachment Set Number One in Chinese Retrieval, Results Generation Surpassing Llama3-8B
Learn More

Leapfrogging Global Benchmark WorksSurpassing Mistral-7B, Llama2-13B, Gemma-7B-it, ChatGLM3-6B and other small open-source models.High Efficiency and Low CostSupports CPU inference, Fine-tuning with consumer-GPU Inference speed up to 33 tokens/s Inference cost as low as 1 $ = 12,500,000 tokens*Inference speed is the actual performance of the Intel Core ULTRA9 processor *Inference cost calculation: Snapdragon 855 chip (costing about 82 Dollars) with 7.5 tokens/sLearn More

Smaller Size, Everywhere ScenariosHalf the Size, Performance Surpassing Llama2-13B Inference speed 25 tokens/s, 25 times the human speaking speed 60% reduction in inference cost 1 $ = 30,300,000 tokens*Apple A17 Pro is $130. If using meta, the maximum speed is 25 tokens/s Assuming the chip is used for 5 years, the inference cost is (25x3600x24x365x5)/130 = 30.3 Million Tokens/DollarLearn More

View the detailed features of each version

GPT-4o level Omni Model runs on deviceMultimodal Model MiniCPM-V

8B Full-Modal8B Live Video8B2.8B

The On-Device GPT-4o New Era

Edge-Side GPT-4oReal-time streaming, end-to-end Full-modal, all SOTA The best edge visual general model The best audio general model
Continuous watching, real videos Not just a single frame-based model Real-time listening, truly smooth Hear clearly, understand distinctly Natural speaking, emotional engagement Real-time interruptions without confusionFull Capability, End-to-EndHigh performance, low latency More natural, more coherent Context understanding Interruptible at any time Noise resistance Easy deployment and maintenanceLearn More

Top On-Device Multimodal, Comprehensive PerformanceCompetitive to GPT-4V, SOTA performance in Real-time Video Understanding, Multiple Images Understanding, and Single Image Understanding, among models below 20B. It's the first appearance of real-time understanding on the edge model on the device.Light! Fast! On-Device Friendly!Only 6GB of memory on the device side after quantization On-Device inference speed up to 18 tokens/s, 33% faster Supports Llama.cpp, ollama, vllm inferenceOthersExtremely low hallucination, better than GPT-4o and GPT-4V, based on self-developed RLHF-V efficient alignment technology.Learn More

Strongest on-device multimodal general capabilitySurpassing Gemini Pro, GPT-4VStrongest on-device OCRComparable to GPT-4V benchmark model OCRBench surpasses GPT-4o, GPT-4V, Claude 3V Opus Gemini Pro and other benchmark models on ranking9 times clearer pixels Precise recognition on difficult image and long textOur self-developed high-definition image decoding technology enables on-device lossless recognition of 1.8 million pixel high-definition images, supporting any aspect ratio. It excels in interpreting and reasoning difficult images, as well as providing precise OCR recognition and extraction of long image and text content.

Multimodal acceleration of on-device systemImage encoding is 150 times faster! Efficient operation on mobile phones at 6-8 tokens/sSupports 30+ languagesAdded mainstream languages such as German, French, Spanish, Italian, Russian and moreLearn More

Strongest on-device OCR Breakthrough in multimodal model capabilityGPT-4V/ Recognition on image and text comparable to Gemini Pro.Hallucination level on par with GPT-4VLearn More

View the detailed features of each version

Compare the functionalities of various versions

Global Partner

On-Device Native

Personalized for All Scenarios

Chip-Level Fit

AI Native OA

PLAY

Technical Blog

Large Model

Agent

Infra

Ultra Alignment

Others

新版 UltraEval-Audio 开源：新增一键复现、隔离推理运行机制，让模型测评更高效、更可控如今，音频模型已经成为重要的生产力工具，对研究者而言，如何提高音频模型的性能和效果是重中之重。在音频模型快速更新的背景下，研究者普遍面临以下几类问题，导致研发效率低下：第一，论文与公开报告中的指标往往难以在本地稳定复现，复现流程依赖零散脚本、隐藏参数与复杂前后处理，导致“结果对不上、过程跑不通”；第二，不同模型对运行环境的要求差异巨大（框架版本、CUDA 依赖、第三方库互斥等），在同一台机器上并行评测多个模型时极易陷入依赖冲突与环境反复重装的问题。第三，除通用语音大模型外，TTS、ASR、Codec 等专有音频模型在社区与产业侧的关注度持续提升、热门模型与方案快速涌现，围绕这些模型的“可复现评测”需求也显著增长。基于上述痛点，清华、OpenBMB、面壁智能联合发布 UltraEval-Audio v1.1 版本，在原有的“一键测评”音频模型的基础上，重点新增热门音频模型的一键复现能力，扩展对 TTS/ASR/Codec 等专业模型与专项评测的支持，并引入隔离推理运行机制，以在工程层面降低复现门槛、提升评测流程的可控性与可迁移性。

清华首发语音大模型评测工作，评测框架、竞技场、榜单全系开放近年来，大模型技术正从单一的文本模态向多模态领域快速扩展，发展势头迅猛。特别是在2024年5月，OpenAI 发布了具备多模态能力的 GPT-4o，再次引领了大模型的研究方向与技术潮流，推动了多模态模型的蓬勃发展。在此背景下，清华 NLP 实验室、OpenBMB 与面壁智能联合推出首个全模态语音大模型评测框架 UltraEval-Audio 以及 AudioArena 语音大模型 PK 平台。通过专业、定量、科学的评测方法全面评估了当前市场上语音大模型的性能，并发布了首个权威的语音测评榜单 UltraEval-Audio-Leaderboard，为语音大模型行业提供龙虎榜。

VoxCPM 登顶 HuggingFace TOP1面壁小钢炮首款语音生成模型 VoxCPM 一经发布，广受好评，成功登顶 HuggingFace TOP1。

面壁小钢炮迎新：VoxCPM 语音生成媲美真人、声音复刻超像！今天，我们隆重介绍面壁小钢炮新成员VoxCPM，一款 0.5B 参数尺寸的语音生成基座模型。该模型由面壁智能与清华大学深圳国际研究生院人机语音交互实验室（THUHCSI）联合研发。

MiniCPM-V 4.5 技术报告正式出炉今天，MiniCPM-V 4.5 技术报告正式出炉。报告从模型结构、训练数据和训练策略三个维度探索了高效多模态大模型的实现路径，以解决多模态大模型的训练和推理的效率瓶颈。提出统一的 3D-Resampler 架构实现高密度视频压缩、面向文档的统一 OCR 和知识学习范式、可控混合快速/深度思考的多模态强化学习三大技术。基于这些关键技术，MiniCPM-V 4.5 在视频理解、图像理解、OCR、文档解析等多项任务上取得显著突破，不仅以 8B 的参数规模超越 GPT-4o-latest 和 Qwen2.5-VL-72B，更在推理速度上具有显著优势。

MiniCPM-V 4.5 登上 HuggingFace TOP2面壁最新诚意之作 MiniCPM-V 4.5，8B 小身材爆发超高性能，高刷视频理解被称为「鹰眼级」，一经开源，就收获了开发者的热情回应，登上 HuggingFace Trending TOP2。

多模态新旗舰MiniCPM-V 4.5：8B 性能超越 72B，高刷视频理解又准又快今天，我们正式开源 8B 参数的面壁小钢炮 MiniCPM-V 4.5 多模态旗舰模型，成为行业首个具备“高刷”视频理解能力的多模态模型，看得准、看得快，看得长！高刷视频理解、长视频理解、OCR、文档解析能力同级 SOTA，且性能超过 Qwen2.5-VL 72B，堪称最强端侧多模态模型。

MiniCPM-V4.0开源，多模态能力进化，手机可用，还有最全CookBook！MiniCPM-V 4.0 以 4B 的参数量真正做到了稳定运行、快速响应，且在手机、平板等设备长时间连续使用无发热、无卡顿。

面壁“小钢炮”登上 Nature 子刊，8B 多模态综合性能超越 GPT-4V、Gemini Pro7 月 1 日，国际顶级学术期刊《Nature》旗下子刊《Nature Communications》正式刊登了来自清华、面壁等研究团队联合研发的高效端侧多模态大模型MiniCPM-V 核心研究成果。

MiniCPM-o 2.6 技术博客为了促进开源社区的探索，我们推出了 MiniCPM-o 2.6，一个从 MiniCPM-V 系列升级而来的最新性能最佳的端侧多模态大模型。该模型接受图像、视频、文本和音频输入，并以端到端方式生成高质量的文本和语音输出。虽然总参数量仅有 8B，MiniCPM-o 2.6 的视觉、语音和多模态流式能力达到了 GPT-4o-202405 级别，是开源社区中模态支持最丰富、性能最佳的模型之一。

H-Neurons：大语言模型中幻觉相关神经元的存在、作用及其起源清华大学 THUNLP、清华大学新闻与传播学院、OpenBMB 以及面壁智能的联合团队近期的一项工作《H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs》从微观神经元视角出发，系统研究了 LLM 中的幻觉机制：不仅找到了与幻觉相关的极少数神经元（H-Neurons），更揭示了一个令人意外的真相：幻觉并非无序的生成错误，而是模型为了顺从你进行的“过度配合”。

基座上新：MiniCPM 4.1 将「高效深思考」引入端侧今天，我们发布新版本的面壁小钢炮 MiniCPM 4.1 基座模型。在 MiniCPM 4.0 的基础上，MiniCPM 4.1 新增 8B 参数的行业首个原生稀疏架构深思考模型，同级 SOTA 表现带来超快、超准的深思考能力，真正让端侧设备实现高效「深思考」。

技术Blog-9｜ArkInfer：面向端侧AI的跨平台高效部署系统如何突破硬件壁垒，实现“一次开发，处处运行”？这一问题的核心在于解耦与代码复用 ——即将算法创新与底层硬件实现解耦，让单一的工程成果能够自动、高效地应用于多个平台。为解决上述痛点，我们提出了 ArkInfer，一个创新的、面向未来的跨平台 AI 模型部署系统。ArkInfer 旨在通过提供极致的推理效率和卓越的平台兼容性，彻底克服端侧芯片碎片化带来的障碍，为各类模型应用提供坚实的部署基石。

技术Blog-8 | Chunk-wise Rollout：基于分块采样的高吞吐强化学习加速策略Chunk-wise Rollout 是一种加速 RL 训练的策略，通过对 RL 训练流程进行异步化，限制每次采样过程中每条轨迹的最大生成长度，并在下一次采样中复用本轮中未生成完毕的轨迹，从而减小采样过程中因部分超长轨迹造成的计算空泡，提升 GPU 利用率。同时，为了减小引入 Chunk-wise Rollout 策略带来的离线采样数据对训练效果和稳定性的影响，我们引入了多项策略，包括分块级重要性采样、双向裁剪、 KL 正则约束与动态参考模型更新以及异常内容过滤。这些策略共同实现了稳定且高效的 RL 训练，在实现了 2.05 倍训练加速的同时取得了和常规 RL 训练方式相近甚至超越的效果。

技术Blog-7 | BitCPM：极低位宽的模型量化极低位宽的模型量化近年来一直是学术界研究的热点，也涌现出了许多效果显著的工作。对于如何使用 QAT 方法高效训练出极低位宽的模型，我们在 MiniCPM 系列模型上进行了实验探索，并推出了我们的低位宽模型 BitCPM 系列。

技术Blog-6 | 风洞 2.0：高效的模型训练最优超参数搜索方法早在训练 MiniCPM1 的时候，面壁团队就已经使用了模型风洞实验来指导模型的训练，本次我们使用了更有效的模型评估方法，同时对模型风洞流程进行了一些升级，提升了我们的参数搜索的效率。同时我们将风洞方案和业界有代表性的超参搜索方案进行了对比。

技术Blog-5 | MiniCPM4-Survey：基于多步智能体强化学习的可信长综述生成技术随着 OpenAI 等机构陆续推出 DeepResearch（深度研究）系统，大模型的实际应用中增添了一项重要任务：生成长篇幅、多引用、结构化的综述报告。然而，目前市面上的多数解决方案依赖于闭源的大参数规模模型，使用成本较高、调用次数受限。为了解决这一问题，让更多用户在更广泛的场景中能够使用综述报告生成功能，我们重点探索了多步智能体强化学习（Multi-Step Agent Reinforcement Learning）方法，推出以端侧级参数规模完成可信长综述生成的 MiniCPM4-Survey。其生成的综述文章质量在多个维度的对比中可以比肩 OpenAI Deep Research。

MiniCPM 4.0 部署及微调教程MiniCPM 4.0 推出端侧性能“大小王”组合，拥有 8B 、0.5B 两种参数规模，延续「以小博大」特性，实现了同级最佳的模型性能。其中，MiniCPM 4.0 -8B 是首个原生稀疏模型，5% 的极高稀疏度加持系统级创新技术的大爆发，让长文本、深思考在端侧真正跑起来，以仅 22% 的训练开销，性能比肩 Qwen-3-8B，超越Gemma-3-12B。MiniCPM4.0 -0.5B 取得同级 SOTA，并通过原生 QAT 技术实现几乎不掉点的 int4 量化，实现了 600 Token/s 的极速推理速度。

技术Blog-4 | 新一代InfLLM：可训练的稀疏注意力机制MiniCPM4 创新性地引入了 InfLLM v2，一种可训练的稀疏注意力机制，极大程度降低了计算开销，并结合定制化推理算子 CPM.cu，MiniCPM4 在预填充和解码阶段都实现了切实的加速效果。

技术Blog-3 | CPM.cu：轻量且高效的端侧大模型推理框架CPM.cu 是一款基于 CUDA 构建的高效大模型推理框架，它集成了稀疏注意力、投机采样与模型量化等多种加速策略，并支持将它们复合使用以实现更极致的推理性能，框架十分轻量且易读。

Efficiency FirstWe believe the best model is the one with superior power, faster speed and lower costEfficiency comes from mastering the science of large language models (LLMs), with knowledge density as the key principle. As knowledge density grows, it becomes a core competitive advantage, unlocking vast potential for edge intelligence and applications.

Modelbest LawMoore’s Law

Model capability density increases exponentially over time, with the number of parameters required to reach a certain intelligence level halving every 3.3 months.Capability density: The ratio of effective parameter size to actual parameter size. Effective parameter size refers to the minimum number of parameters required for the reference model (e.g., MiniCPM) to achieve performance equivalent to the given target model.

News

A G I F O R L I V E S