<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by SiliconFlow on Medium]]></title>
        <description><![CDATA[Stories by SiliconFlow on Medium]]></description>
        <link>https://medium.com/@SiliconFlowAI?source=rss-91ba7914dfb6------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*yFRYHEEib9y9ZiQ3FGy3xA.png</url>
            <title>Stories by SiliconFlow on Medium</title>
            <link>https://medium.com/@SiliconFlowAI?source=rss-91ba7914dfb6------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Mon, 06 Apr 2026 20:02:21 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@SiliconFlowAI/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[GLM-4.7 Now on SiliconFlow: Advanced Coding, Reasoning & Tool Use Capabilities]]></title>
            <link>https://medium.com/@SiliconFlowAI/glm-4-7-now-on-siliconflow-advanced-coding-reasoning-tool-use-capabilities-40967c6c3e4d?source=rss-91ba7914dfb6------2</link>
            <guid isPermaLink="false">https://medium.com/p/40967c6c3e4d</guid>
            <category><![CDATA[coding]]></category>
            <category><![CDATA[tool-use]]></category>
            <category><![CDATA[zhipu-ai]]></category>
            <category><![CDATA[reasoning]]></category>
            <category><![CDATA[siliconflow]]></category>
            <dc:creator><![CDATA[SiliconFlow]]></dc:creator>
            <pubDate>Thu, 25 Dec 2025 14:22:21 GMT</pubDate>
            <atom:updated>2025-12-25T14:22:21.335Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zzK4eUN6YkNs3Mi-Fm7cng.png" /></figure><p>We’re excited to announce that <a href="https://www.siliconflow.com/models/glm-4-7"><strong>GLM-4.7</strong></a>, Z.ai’s latest flagship model, is now available on SiliconFlow with Day 0 support. Compared with its predecessor <a href="https://www.siliconflow.com/models/glm-4-6">GLM-4.6</a>, this release brings significant advancements across coding, complex reasoning, and tool utilization — delivering performance that rivals or even outperforms industry leaders like <a href="https://www.anthropic.com/news/claude-sonnet-4-5">Claude Sonnet 4.5</a> and <a href="https://openai.com/index/gpt-5-1/">GPT-5.1</a>.</p><p>Currently, SiliconFlow supports the entire GLM model series, including <a href="https://www.siliconflow.com/models/glm-4-5">GLM-4.5</a>, <a href="https://www.siliconflow.com/models/glm-4-5-air">GLM-4.5-Air</a>, <a href="https://www.siliconflow.com/models/glm-4-5v">GLM-4.5V</a>, <a href="https://www.siliconflow.com/models/glm-4-6">GLM-4.6</a>, <a href="https://www.siliconflow.com/models/glm-4-6v">GLM-4.6V</a>, and now <a href="https://www.siliconflow.com/models/glm-4-7">GLM-4.7</a>.</p><h3>SiliconFlow Day 0 support with:</h3><ul><li><strong>Competitive Pricing</strong>: GLM-4.7 $0.6/M tokens (input) and $2.2/M tokens (output)</li><li><strong>205K Context Window</strong>: Tackle complex coding tasks, deep document analysis, and extended agentic workflows.</li><li><strong>Anthropic &amp; OpenAI-Compatible APIs:</strong> Deploy via SiliconFlow with seamless integration into <a href="https://claude.com/product/claude-code">Claude Code</a>, <a href="https://kilo.ai/">Kilo Code</a>, <a href="https://cline.bot/">Cline</a>, <a href="https://roocode.com/">Roo Code</a>, and other mainstream agent workflows with significant improvements on complex tasks.</li></ul><h3>What Makes GLM-4.7 Special</h3><p><strong>GLM-4.7</strong>, your new coding partner, is coming with the following features:</p><h3>Core Coding Excellence</h3><p>GLM-4.7 sets a new standard for multilingual agentic coding and terminal-based tasks. Compared to its predecessor, the improvements are substantial:</p><ul><li><strong>73.8% (+5.8%)</strong> on SWE-bench Verified</li><li><strong>66.7% (+12.9%)</strong> on SWE-bench Multilingual</li><li><strong>41% (+16.5%)</strong> on Terminal Bench 2.0</li></ul><p>The model now supports “thinking before acting” enabling more reliable performance on complex tasks across mainstream agent frameworks including Claude Code, Kilo Code, Cline, and Roo Code.</p><h3>Vibe Coding</h3><p>GLM-4.7 takes a major leap forward in UI quality. It produces cleaner, more modern webpages and generates better-looking slides with more accurate layout and sizing. Whether you’re prototyping interfaces or creating presentations, the visual output quality is noticeably enhanced.</p><h3>Advanced Tool Using</h3><p>Tool utilization has been significantly enhanced. On multi-step benchmarks like τ²-Bench and web browsing tasks via BrowseComp, GLM-4.7 surpasses both Claude Sonnet 4.5 and GPT-5.1 High, demonstrating superior capability for complex, real-world workflows.</p><h3>Complex Reasoning Capabilities</h3><p>Mathematical and reasoning abilities see a substantial boost, with GLM-4.7 achieving <strong>42.8% (+12.4%)</strong> on the HLE (Humanity’s Last Exam) benchmark compared to GLM-4.6. Moreover, you can also see significant improvements in many other scenarios such as chat, creative writing, and role-play scenarios.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*cbQMsm-9Od8ZcspU" /></figure><p>Whether it’s coding, creativity, or complex reasoning — get started now to see what GLM-4.7 brings to your workflow.</p><h3>Get Started Immediately</h3><ol><li><strong>Explore:</strong> Try <a href="https://www.siliconflow.com/models/glm-4-7">GLM-4.7</a> in the SiliconFlow playground.</li><li><strong>Integrate:</strong> Use our OpenAI/Anthropic-compatible API. Explore the full API specifications in the <a href="https://docs.siliconflow.com/en/api-reference/chat-completions/chat-completions">SiliconFlow API documentation</a>.</li></ol><pre>import requests<br>url = &quot;https://api.siliconflow.com/v1/chat/completions&quot;<br>payload = {<br>    &quot;model&quot;: &quot;zai-org/GLM-4.7&quot;,<br>    &quot;messages&quot;: [<br>        {<br>            &quot;role&quot;: &quot;system&quot;,<br>            &quot;content&quot;: &quot;You are an assistant&quot;<br>        },<br>        {<br>            &quot;role&quot;: &quot;user&quot;,<br>            &quot;content&quot;: &quot;What&#39;s the weather like in America?&quot;<br>        }<br>    ],<br>    &quot;stream&quot;: True,<br>    &quot;max_tokens&quot;: 4096,<br>    &quot;enable_thinking&quot;: True,<br>    &quot;temperature&quot;: 1,<br>    &quot;top_p&quot;: 0.95<br>}<br>headers = {<br>    &quot;Authorization&quot;: &quot;Bearer &lt;token&gt;&quot;,<br>    &quot;Content-Type&quot;: &quot;application/json&quot;<br>}<br>response = requests.post(url, json=payload, headers=headers)<br>print(response.text)</pre><ul><li><a href="https://siliconflow.com/contact">Business or Sales Inquiries →</a></li><li><a href="https://discord.com/invite/siliconflow">Join our Discord community now →</a></li><li><a href="https://x.com/saborrolab">Follow us on X for the latest updates →</a></li><li><a href="https://cloud.siliconflow.com/models">Explore all available models on SiliconFlow →</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=40967c6c3e4d" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[GLM-4.6V Now on SiliconFlow: Native Multimodal Tool Use Meets SoTA Visual Intelligence]]></title>
            <link>https://medium.com/@SiliconFlowAI/glm-4-6v-now-on-siliconflow-native-multimodal-tool-use-meets-sota-visual-intelligence-b638150246fc?source=rss-91ba7914dfb6------2</link>
            <guid isPermaLink="false">https://medium.com/p/b638150246fc</guid>
            <category><![CDATA[zhipu-ai]]></category>
            <category><![CDATA[vlm]]></category>
            <category><![CDATA[siliconflow]]></category>
            <dc:creator><![CDATA[SiliconFlow]]></dc:creator>
            <pubDate>Thu, 18 Dec 2025 14:22:26 GMT</pubDate>
            <atom:updated>2025-12-18T14:22:26.086Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*0HugPmqYXUsmp3ZwgwxLTA.png" /></figure><blockquote><em>TL;DR: </em><strong><em>GLM-4.6V</em></strong><em>, Z.ai’s latest multimodal large language model, is now </em><strong><em>available on SiliconFlow</em></strong><em>. Featuring a </em><strong><em>131K</em></strong><em> multimodal context window and native </em><strong><em>function calling</em></strong><em> integration, it delivers </em><strong><em>SoTA</em></strong><em> performance in </em><strong><em>visual understanding and reasoning </em></strong><em>— seamlessly bridging the gap between “</em><strong><em>visual</em></strong><em> </em><strong><em>perception” </em></strong><em>and “</em><strong><em>executable</em></strong><em> </em><strong><em>action”</em></strong><em>. The GLM-4.6V series provides a unified technical foundation for multimodal agents in real-world business scenarios. Try </em><strong><em>GLM-4.6V</em></strong><em> now and level up your</em><strong><em> multimodal agents </em></strong><em>with </em><strong><em>SiliconFlow APIs</em></strong><em>.</em></blockquote><p>We are thrilled to announce <a href="https://www.siliconflow.com/models/glm-4-6v"><strong>GLM-4.6V</strong></a>, <a href="https://huggingface.co/zai-org/GLM-4.6V">Z.ai</a>’s latest multimodal foundation model designed for cloud and enterprise-grade scenarios, is now available on <a href="https://www.siliconflow.com/models"><strong>SiliconFlow</strong></a>. It integrates <strong>native multimodal function calling capability</strong> and excels in <strong>long-context visual reasoning</strong>, directly closing the loop from<strong> perception to understanding to execution.</strong></p><p>Now, through SiliconFlow’s <strong>GLM-4.6V</strong> API, you can expect:</p><ul><li><strong>Budget-friendly Pricing:</strong> GLM-4.6V $0.30/M tokens (input) and $0.90/M tokens (output)</li><li><strong>131K Context Window:</strong> Enables processing lengthy industry reports, extensive slide decks, or long-form video content</li><li><strong>Seamless Integration:</strong> Instantly deploy via SiliconFlow’s OpenAI-compatible API, or plug into your existing agentic frameworks, automation tools, or workflows.</li></ul><p>Whether you are building agents, workflows, or tools for:</p><ul><li><strong>Rich-Text Content Creation:</strong> Convert papers, reports, and slides into polished posts for social media and knowledge bases</li><li><strong>Design-to-Code Automation:</strong> Upload screenshots/designs for pixel-level HTML/CSS/JS code generation</li><li><strong>Business Document Processing: </strong>Process reports to extract metrics and synthesize comparative tables</li><li><strong>Video Content Operations: </strong>Summarize, tag, and extract insights at scale</li></ul><p>Through SiliconFlow’s production-ready API, you can leverage GLM-4.6V to power your multimodal agents in minutes — no cost concerns, no engineering overhead.</p><p>Let’s dive into the key capabilities with live demos from the SiliconFlow Platform.</p><h3>Key Features &amp; Benchmark Performance</h3><p>In most LLM pipelines, tool calling is still text-only: even for image or document tasks, everything must be converted into text first, then back again. This process potentially leads to information loss and increases system complexity. GLM-4.6V changes this with <strong>native multimodal tool calling</strong> capability:</p><ul><li>Multimodal Input: Images, UI screenshots, and document pages can be passed directly as tool arguments, avoiding manual text conversion and preserving layout and visual cues.</li><li>Multimodal Output: The model can directly interpret tool results such as search pages, charts, rendered web screenshots, or product images, and feed them back into its reasoning and final response.</li></ul><p>By closing the loop from <strong>perception → understanding → execution</strong>, GLM-4.6V supports the following key features:</p><ul><li><strong>Rich-Text Content Understanding and Creation: </strong>Accurately understands complex text, charts, tables, and formulas, then autonomously invokes visual tools to crop key visuals during generation, and audits image quality to compose publication-ready content perfect for social media &amp; knowledge bases.</li><li><strong>Visual Web Search: </strong>Recognizes search intent and autonomously triggers appropriate search tools, then comprehends and aligns the mixed visual-textual results to identify relevant information, and finally performs reasoning to deliver structured, visually-rich answers.</li><li><strong>Frontend Replication &amp; Visual Interaction: </strong>Achieves <strong>pixel-level </strong>replication by identifying layouts, components, and color schemes from screenshots to generate high-fidelity <strong>HTML/CSS/JS code</strong>, then lets you refine it interactively — just circle an element and tell it what you want, like “make this button bigger and change it to green.”</li><li><strong>Long-Context Understanding: </strong>Processes ~150 pages of documents, 200 slides, or a one-hour video in a single pass with its 131K context window, enabling tasks like analyzing financial reports or summarizing an entire football match while pinpointing specific goal events and timestamps.</li></ul><p>GLM-4.6V has also been evaluated across <strong>20+</strong> mainstream multimodal benchmarks including <strong>MMBench</strong>, <strong>MathVista</strong>, and <strong>OCRBench</strong>, achieving SoTA performance among open-source models. It matches or outperforms comparable-scale models like<strong> Qwen3-VL-235B</strong>, <strong>Kimi-VL-A3B-Thinking-2506</strong>, and <strong>Step3–321B</strong> in key capabilities: multimodal understanding, multimodal agentic tasks, and long-context processing.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*bVIyVUbT0tRuEPlT" /></figure><h3>Techniques</h3><p><strong>GLM-4.6V sets the technical foundation for multimodal agents in real-world business scenarios.</strong> To achieve this performance, GLM-4.6V introduces a comprehensive suite of innovations:</p><ul><li><strong>Model architecture &amp; long-sequence modeling: </strong>GLM-4.6V is continually pre-trained on long-context image–text data, with visual–language compression alignment (inspired by Glyph) to better couple visual encoding with linguistic semantics.</li><li><strong>Multimodal world knowledge: </strong>A <strong>billion-scale multimodal perception and world-knowledge corpus was introduced</strong> to enhance both basic visual understanding and the accuracy and completeness of cross-modal QA.</li><li><strong>Agentic data &amp; MCP extensions: </strong>Through large-scale synthetic <strong>agentic training</strong>, GLM-4.6V extends <strong>Model Context Protocol (MCP)</strong> with URL-based multimodal handling and end-to-end <strong>interleaved text–image output</strong> using a “Draft → Image Selection → Final Polish” workflow.</li><li><strong>RL for multimodal agents: </strong>Tool-calling behaviors are integrated into a unified <strong>RL objective</strong>, and a <strong>visual feedback loop</strong> (building on UI2Code^N) lets the model use rendered results to self-correct its code and actions, pushing toward self-improving multimodal agents.</li></ul><h3>Get Started Immediately</h3><ol><li><strong>Explore:</strong> Try <a href="https://cloud.siliconflow.com/me/playground/chat/17885302910">GLM-4.6V</a> in the SiliconFlow playground.</li><li><strong>Integrate:</strong> Use our OpenAI-compatible API. Explore the full API specifications in the<a href="https://docs.siliconflow.com/en/api-reference/chat-completions/chat-completions"> SiliconFlow API documentation</a>.</li></ol><pre>import requests<br>url = &quot;https://api.siliconflow.com/v1/chat/completions&quot;<br>payload = {<br>    &quot;model&quot;: &quot;zai-org/GLM-4.6V&quot;,<br>    &quot;messages&quot;: [<br>        {<br>            &quot;content&quot;: [<br>                {<br>                    &quot;type&quot;: &quot;image_url&quot;,<br>                    &quot;image_url&quot;: {<br>                        &quot;detail&quot;: &quot;auto&quot;,<br>                        &quot;url&quot;: &quot;https://tse4.mm.bing.net/th/id/OIP.mDDGH4uc_a7tmLFLJvKXrQHaEo?rs=1&amp;pid=ImgDetMain&amp;o=7&amp;rm=3&quot;<br>                    }<br>                },<br>                {<br>                    &quot;type&quot;: &quot;text&quot;,<br>                    &quot;text&quot;: &quot;What is in the picture?&quot;<br>                }<br>            ],<br>            &quot;role&quot;: &quot;user&quot;<br>        }<br>    ],<br>    &quot;stream&quot;: True,<br>    &quot;temperature&quot;: 1<br>}<br>headers = {<br>    &quot;Authorization&quot;: &quot;Bearer &lt;token&gt;&quot;,<br>    &quot;Content-Type&quot;: &quot;application/json&quot;<br>}<br>response = requests.request(&quot;POST&quot;, url, json=payload, headers=headers)<br>print(response.text)</pre><ul><li><a href="https://siliconflow.com/contact">Business or Sales Inquiries →</a></li><li><a href="https://discord.com/invite/siliconflow">Join our Discord community now →</a></li><li><a href="https://x.com/saborrolab">Follow us on X for the latest updates →</a></li><li><a href="https://cloud.siliconflow.com/models">Explore all available models on SiliconFlow →</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b638150246fc" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Z-Image-Turbo Now on SiliconFlow: Photorealistic & Bilingual Text Rendering]]></title>
            <link>https://medium.com/@SiliconFlowAI/z-image-turbo-now-on-siliconflow-photorealistic-bilingual-text-rendering-d557128562f8?source=rss-91ba7914dfb6------2</link>
            <guid isPermaLink="false">https://medium.com/p/d557128562f8</guid>
            <category><![CDATA[z-image]]></category>
            <category><![CDATA[z-image-turbo]]></category>
            <category><![CDATA[siliconflow]]></category>
            <category><![CDATA[tongyi-qianwen]]></category>
            <category><![CDATA[t2i]]></category>
            <dc:creator><![CDATA[SiliconFlow]]></dc:creator>
            <pubDate>Wed, 17 Dec 2025 14:16:48 GMT</pubDate>
            <atom:updated>2025-12-17T14:16:48.691Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*pX04O_uYaIOzXjKomrYNwg.png" /></figure><p>Today, <a href="https://www.siliconflow.com/models/z-image-turbo">Z-Image-Turbo</a> — <a href="https://github.com/Tongyi-MAI/Z-Image">Alibaba Tongyi</a>’s latest lightweight 6B-parameter text-to-image model — is now available on SiliconFlow. Through systematic optimization and a Single-Stream Diffusion Transformer architecture, it delivers photorealistic image generation and bilingual text rendering on par with leading commercial models, proving that top-tier performance doesn’t require massive model sizes.</p><p>Whether you’re building creative tools, marketing assets, or visual AI applications, Z-Image-Turbo delivers the speed and precision to bring your workflow to the next level.</p><p>With SiliconFlow’s Z-Image-Turbo API, you can expect:</p><ul><li>Budget-Friendly Pricing: Z-Image-Turbo at just $0.005/image.</li><li>Extreme Efficiency: As a distilled model, it delivers top-tier performance in only 8 steps, matching or exceeding leading competitors.</li><li>Photorealistic &amp; Bilingual: Excels in both photorealistic image generation and accurate English &amp; Chinese text rendering, with robust adherence to complex instructions.</li><li>SOTA Performance: Powered by a Single-Stream Diffusion Transformer architecture, it achieves state-of-the-art results among open-source models on the <a href="http://aiarena.alibaba-inc.com/">Alibaba AI Arena</a> (Elo-based evaluation).</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*0NuFeC71IfUaQxwl.png" /></figure><h3>Key Capabilities &amp; Real-world Performance</h3><p>Unlike traditional foundation models that rely on massive parameters for quality or struggle with specific cultural nuances, Z-Image redefines efficiency and is designed to support:</p><ul><li><strong>Efficient Photorealistic Quality</strong></li></ul><p>Z-Image-Turbo excels at producing images with photography-level realism, demonstrating fine control over details, lighting, and textures. It balances high fidelity with strong aesthetic quality in composition and overall mood.</p><p>As shown in the examples below, the model handles complex visual phenomena with remarkable accuracy — from the intricate light refraction inside ice cubes, to lifelike human features, to the subtle sheen and flowing folds of silk fabric.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*yEKwbRYZyhhSNXfxqCoUkA.png" /><figcaption>All images were generated using Z-Image-Turbo on the SiliconFlow platform</figcaption></figure><ul><li><strong>Excellent Bilingual Text Rendering</strong></li></ul><p>It can also accurately render English and Chinese text while preserving facial realism and overall aesthetic composition, with results comparable to top-tier closed-source models. In poster design, it demonstrates strong compositional skills and a good sense of typography. It can render high-quality text even in challenging scenarios with small font sizes, delivering designs that are both textually precise and visually compelling</p><p>As shown in the posters generated with Z-Image-Turbo on the SiliconFlow platform, the model renders text with impressive clarity and style, delivering layouts that combine accurate typography with strong artistic aesthetics across editorial, realistic and cartoon-like designs.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*DsexGHZqB6eGLcpfVXF3zA.png" /></figure><ul><li><strong>Rich World Knowledge and Cultural Understanding</strong></li></ul><p>Z-Image possesses a vast understanding of world knowledge and diverse cultural concepts. This allows it to accurately generate a wide array of subjects, including famous landmarks, well-known characters, and specific real-world objects.</p><p>As demonstrated in our examples, the model captures cultural elements such as the costumes and atmosphere of the Venice Carnival, iconic objects like the Venetian gondola, as well as world-famous landmarks like the Eiffel Tower — all with impressive accuracy and stylistic fidelity.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_REJupBundo8dQDUqGcFlg.png" /></figure><h3>Get Started Immediately</h3><ul><li><strong>Explore:</strong> Try <a href="https://www.siliconflow.com/models/z-image-turbo">Z-Image</a> in the <a href="https://cloud.siliconflow.com/me/playground/image/17885302909">SiliconFlow playground</a>.</li><li><strong>Integrate:</strong> Use our OpenAI-compatible API. Explore the full API specifications in the <a href="https://docs.siliconflow.com/en/api-reference/images/images-generations">SiliconFlow API documentation</a>.</li></ul><pre>import requests<br>url = &quot;https://api.siliconflow.com/v1/images/generations&quot;<br>payload = {<br>    &quot;model&quot;: &quot;Tongyi-MAI/Z-Image-Turbo&quot;,<br>    &quot;prompt&quot;: &quot;A small, adorable green frog with big round eyes gently swims through the clear blue ocean water. Sunlight beams down from above, creating shimmering ripples on the frog’s skin and the sandy ocean floor. The frog paddles its tiny legs gracefully, leaving soft trails of bubbles behind. Colorful tropical fish and coral reefs surround it, adding a vibrant and lively atmosphere. The overall style is bright, whimsical, and cinematic, with smooth, fluid motion and a playful, heartwarming mood.&quot;,<br>    &quot;image_size&quot;: &quot;1024x1024&quot;,<br>    &quot;seed&quot;: 1,<br>}<br>headers = {<br>    &quot;Authorization&quot;: &quot;Bearer &lt;token&gt;&quot;,<br>    &quot;Content-Type&quot;: &quot;application/json&quot;<br>}<br>response = requests.post(url, json=payload, headers=headers)<br>print(response.text)</pre><ul><li><a href="https://www.siliconflow.com/contact">Business or Sales Inquiries →</a></li><li><a href="https://discord.com/invite/7Ey3dVNFpT">Join our Discord community now →</a></li><li><a href="https://x.com/SiliconFlowAI">Follow us on X for the latest updates →</a></li><li><a href="https://cloud.siliconflow.com/models">Explore all available models on SiliconFlow →</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=d557128562f8" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[DeepSeek-V3.2 Now on SiliconFlow: Reasoning-first model built for agents]]></title>
            <link>https://medium.com/@SiliconFlowAI/deepseek-v3-2-now-on-siliconflow-reasoning-first-model-built-for-agents-efbb333fec8a?source=rss-91ba7914dfb6------2</link>
            <guid isPermaLink="false">https://medium.com/p/efbb333fec8a</guid>
            <category><![CDATA[siliconflow]]></category>
            <category><![CDATA[deepseek]]></category>
            <category><![CDATA[deepseek-v3]]></category>
            <category><![CDATA[reasoning-model]]></category>
            <category><![CDATA[ai-agent]]></category>
            <dc:creator><![CDATA[SiliconFlow]]></dc:creator>
            <pubDate>Tue, 16 Dec 2025 14:36:39 GMT</pubDate>
            <atom:updated>2025-12-16T14:36:39.704Z</atom:updated>
            <content:encoded><![CDATA[<p>TL;DR: DeepSeek-V3.2 (official version of V3.2-Exp) is now live on SiliconFlow. As a reasoning-first model built for agents, it combines high efficiency with GPT-5-level reasoning performance and a 164K context window. It also features tool-use capabilities in thinking mode, validated across 85K+ complex instructions and 1,800+ environments. Start building today with SiliconFlow’s API to supercharge your agentic workflows.</p><p>We are thrilled to unlock access to DeepSeek’s latest model on SiliconFlow, <a href="https://www.siliconflow.com/models/deepseek-v3-2">DeepSeek-V3.2</a>, a new series that harmonizes computational efficiency with superior reasoning and agentic performance. As the first DeepSeek model to integrate thinking directly into tool-use, DeepSeek-V3.2 delivers <a href="https://openai.com/gpt-5/">GPT-5 </a>level reasoning with significantly shorter outputs.</p><p>Meanwhile, DeepSeek-V3.2-Speciale pushes open-source boundaries of theorem proving and coding to rival <a href="https://deepmind.google/models/gemini/pro/">Gemini 3 Pro</a>. Together, they set a new benchmark for developers building next-generation AI agents.</p><p>Now, through SiliconFlow’s DeepSeek-V3.2 API, you can expect:</p><ul><li>Cost-effective Pricing:</li></ul><p>DeepSeek-V3.2 $0.27/M tokens (input) and $0.42/M tokens (output)</p><p>DeepSeek-V3.2-Speciale is coming soon and stay tuned for first-hand updates</p><ul><li>164K Context Window: Perfect for long documents, complex multi-turn conversations, and extended agentic tasks.</li><li>Seamless Integration: Instantly deploy via SiliconFlow’s OpenAI-compatible API, or plug into your existing stack through Claude Code, Gen-CLI and Cline.</li></ul><p>Whether you’re building agents, coding assistants, or complex reasoning pipelines, SiliconFlow’s DeepSeek-V3.2 API delivers the performance you need at a fraction of the expected cost and latency.</p><h3>Why it matters</h3><p>For developers building agents, multi-step reasoning pipelines or any AI system that needs to think and act, the DeepSeek-V3.2 series finally delivers the combination the industry has been waiting for: frontier-grade reasoning, integrated tool-use during thinking, and real-world efficiency:</p><ul><li><strong>World-Leading Reasoning Capabilities</strong></li><li>DeepSeek-V3.2: The Efficient “Daily Driver” for Agents</li></ul><p>Engineered to strike the perfect balance between reasoning capabilities and output length, DeepSeek-V3.2 is your go-to choice for production workflows, such as advanced Q&amp;A and general agent tasks.</p><ul><li>Performance: Delivers reasoning capabilities on par with GPT-5.</li><li>Efficiency: Compared to <a href="https://www.siliconflow.com/models/kimi-k2-thinking"><em>Kimi-K2-Thinking</em></a>, V3.2 has significantly shorter output lengths, translating to lower computational overhead and reduced overall generation time.</li><li>DeepSeek-V3.2-Speciale: Maxed-out reasoning capabilities (Research Preview)</li></ul><p>As the enhanced long-thinking variant of V3.2, V3.2-Speciale aims to push the boundaries of open-source reasoning capabilities, integrating the theorem-proving capabilities of DeepSeek-Math-V2.</p><ul><li>Gold-Medal Performance: V3.2-Speciale attains gold-level results in IMO, CMO, ICPC World Finals &amp; IOI 2025.</li><li>Benchmarks: It excels in complex instruction following, rigorous mathematical reasoning and logical verification, effectively rivaling Gemini 3 Pro on mainstream reasoning leaderboards.</li><li><strong>Thinking in Tool-Use</strong></li></ul><p>DeepSeek-V3.2 breaks the barrier between “reasoning” and “acting.” Unlike previous versions where tool usage was restricted during the thinking process, DeepSeek-V3.2 is the first to <strong>seamlessly integrate thinking directly into tool-use</strong>, supporting tool invocation in both Thinking and Non-Thinking modes.</p><p>To deliver this level of agentic reliability, DeepSeek introduces a massive-scale training synthesis method:</p><ul><li><strong>Robust Generalization:</strong> The model was forged through <strong>“hard-to-solve, easy-to-verify”</strong> reinforcement learning tasks.</li><li><strong>Extensive Coverage:</strong> Training spanned <strong>1,800+ distinct environments</strong> and over <strong>85,000+ complex instructions, </strong>significantly enhancing the model’s generalization and instruction-following capability in the agent context.</li></ul><h3>What makes it powerful</h3><p>DeepSeek-V3.2 series’ performance is enabled by three core technical breakthroughs:</p><ul><li><strong>DeepSeek Sparse Attention (DSA):</strong></li></ul><p>To tackle the challenge of long-context processing, the model introduces <strong>DeepSeek Sparse Attention (DSA)</strong>. This efficient attention mechanism substantially reduces computational complexity without compromising performance, specifically optimized for long-context scenarios.</p><ul><li><strong>Scalable Reinforcement Learning:</strong></li></ul><p>DeepSeek-V3.2 leverages a robust Reinforcement Learning (RL) protocol combined with scaled post-training compute. This advanced training framework is the key driver behind the model’s exceptional reasoning capabilities.</p><ul><li><strong>Large-Scale Agentic Task Synthesis Pipeline:</strong></li></ul><p>DeepSeek has revolutionized agent capability through a novel <strong>Large-Scale Agentic Task Synthesis Pipeline</strong>. By systematically generating training data at scale, the model integrates reasoning directly into tool-use scenarios. This results in superior compliance and generalization, ensuring that your agents can reliably navigate <strong>complex, multi-step interactive environments</strong> with precision.</p><h3>Developer-Ready Integration</h3><p>Beyond DeepSeek-V3.2’s industry-leading agentic performance, SiliconFlow delivers instant compatibility with your existing development ecosystem:</p><ul><li><strong>OpenAI-Compatible Tools</strong>: Seamless integration with <a href="https://marketplace.visualstudio.com/items?itemName=saoudrizwan.claude-dev">Cline</a>, <a href="https://github.com/QwenLM/Qwen-Agent">Qwen Code</a>, <a href="https://github.com/generative-ai-cli/gen-cli">Gen-CLI</a>, and other standard development environments — just plug in your SiliconFlow API key.</li><li><strong>Anthropic-Compatible API</strong>: Works with <a href="https://claude.com/code">Claude Code</a> and any Anthropic-compatible tools for code reviews, debugging, and architectural refactoring.</li><li><strong>Platform Integrations</strong>: Ready-to-use in <a href="https://docs.siliconflow.com/docs/dify">Dify</a>, <a href="https://chathub.gg">ChatHub</a>, <a href="https://chatboxai.app">Chatbox</a>, <a href="https://sider.ai">Sider</a>, <a href="https://github.com/InternLM/MindSearch">MindSearch</a>, <a href="https://github.com/eosphoros-ai/DB-GPT">DB-GPT</a>, and also available through <a href="https://openrouter.ai">OpenRouter</a>.</li></ul><p>With powerful models, seamless integrations, and competitive pricing, SiliconFlow transforms how you build — letting you ship faster and scale smarter.</p><h3>Get Started Immediately</h3><ol><li><strong>Explore:</strong> Try <a href="https://www.siliconflow.com/models/deepseek-v3-2">DeepSeek-V3.2 </a>in the SiliconFlow playground.</li><li><strong>Integrate:</strong> Use our OpenAI-compatible API. Explore the full API specifications in the <a href="https://docs.siliconflow.com">SiliconFlow API documentation</a>.</li></ol><pre>import requests<br>url = &quot;https://api.siliconflow.com/v1/chat/completions&quot;<br>payload = {<br>    &quot;model&quot;: &quot;deepseek-ai/DeepSeek-V3.2&quot;,<br>    &quot;messages&quot;: [<br>        {<br>            &quot;role&quot;: &quot;user&quot;,<br>            &quot;content&quot;: &quot;an island near sea, with seagulls, moon shining over the sea, light house, boats int he background, fish flying over the sea&quot;<br>        }<br>    ],<br>    &quot;stream&quot;: True,<br>    &quot;max_tokens&quot;: 4096,<br>    &quot;enable_thinking&quot;: False,<br>    &quot;thinking_budget&quot;: 4096,<br>    &quot;min_p&quot;: 0,<br>    &quot;stop&quot;: &quot;1&quot;,<br>    &quot;temperature&quot;: 0.7,<br>    &quot;top_p&quot;: 0.7,<br>    &quot;top_k&quot;: 50,<br>    &quot;frequency_penalty&quot;: 0.5,<br>    &quot;n&quot;: 1,<br>    &quot;response_format&quot;: { &quot;type&quot;: &quot;json_object&quot; },<br>    &quot;tools&quot;: [<br>        {<br>            &quot;type&quot;: &quot;function&quot;,<br>            &quot;function&quot;: {<br>                &quot;name&quot;: &quot;&lt;string&gt;&quot;,<br>                &quot;description&quot;: &quot;&lt;string&gt;&quot;,<br>                &quot;parameters&quot;: {},<br>                &quot;strict&quot;: False<br>            }<br>        }<br>    ]<br>}<br>headers = {<br>    &quot;Authorization&quot;: &quot;Bearer &lt;token&gt;&quot;,<br>    &quot;Content-Type&quot;: &quot;application/json&quot;<br>}<br>response = requests.post(url, json=payload, headers=headers)<br>print(response.text)</pre><ul><li><a href="https://siliconflow.com/contact">Business or Sales Inquiries →</a></li><li><a href="https://discord.com/invite/siliconflow">Join our Discord community now →</a></li><li><a href="https://x.com/saborrolab">Follow us on X for the latest updates →</a></li><li><a href="https://cloud.siliconflow.com/models">Explore all available models on SiliconFlow →</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=efbb333fec8a" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Kimi K2 Thinking Now on SiliconFlow: Thinking Agent That Reasons and Acts]]></title>
            <link>https://medium.com/@SiliconFlowAI/kimi-k2-thinking-now-on-siliconflow-thinking-agent-that-reasons-and-acts-f70fd43da53a?source=rss-91ba7914dfb6------2</link>
            <guid isPermaLink="false">https://medium.com/p/f70fd43da53a</guid>
            <category><![CDATA[kimi-k2]]></category>
            <category><![CDATA[moonshot-thinking]]></category>
            <category><![CDATA[kimi-k2-thinking]]></category>
            <category><![CDATA[kimi]]></category>
            <category><![CDATA[siliconflow]]></category>
            <dc:creator><![CDATA[SiliconFlow]]></dc:creator>
            <pubDate>Mon, 24 Nov 2025 14:03:00 GMT</pubDate>
            <atom:updated>2025-11-24T14:03:51.394Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hCA4gN1gfB24D6gb2LgL1Q.png" /></figure><blockquote><strong><em>TL;DR:</em></strong><em> </em><strong><em>Kimi K2 Thinking</em></strong><em> is now available on SiliconFlow, Moonshot AI’s latest and most advanced open-source thinking model. Designed as a reasoning agent, it thinks step by step and can execute up to</em><strong><em> 200–300</em></strong><em> sequential tool calls without human interference, reasoning coherently across hundreds of steps to solve complex problems. It excels in </em><strong><em>reasoning</em></strong><em>, </em><strong><em>agentic search</em></strong><em>, </em><strong><em>coding</em></strong><em>, </em><strong><em>writing </em></strong><em>and</em><strong><em> general capabilities</em></strong><em>. Get started with Kimi K2 Thinking on SiliconFlow with OpenAI/Anthropic-compatible APIs for seamless integration into your agents and workflows.</em></blockquote><p>We’re excited to welcome <a href="https://www.siliconflow.com/models/kimi-k2-thinking">Kimi K2 Thinking</a>, <a href="https://www.moonshot.ai/">Moonshot AI</a>’s most advanced open-source thinking model now available on SiliconFlow. Unlike traditional reasoning models that only think, it reasons <strong>and acts</strong>, autonomously chaining up to 300 tool calls — search, code, data tools — to solve complex problems end-to-end. This marks Moonshot’s breakthrough in test-time scaling: simultaneously extending both reasoning depth and agentic capabilities to unlock new levels of problem-solving power.</p><p>With SiliconFlow’s Kimi K2 Thinking API, you can expect:</p><ul><li><strong>Budget-friendly Pricing</strong>: Kimi K2 Thinking $1.1/M tokens (input) and $4.5/M tokens (output).</li><li><strong>262K Context Window: </strong>Perfect for long documents, complex reasoning, and extended agentic tasks.</li><li><strong>Outperforms GPT-5 &amp; Claude Sonnet 4.5:</strong> across key reasoning, coding, and agent benchmarks.</li></ul><p>Whether you’re building reasoning agents, coding copilots, or research assistants, Kimi K2 Thinking is now accessible through SiliconFlow’s OpenAI/Anthropic-compatible API — ready to plug into your existing workflows.</p><h3>Key Features</h3><p>The Kimi K2 Thinking now available on SiliconFlow features the following key capabilities:</p><ul><li><strong>Deep Thinking &amp; Tool Orchestration</strong>: End-to-end trained to interleave chain-of-thought reasoning with function calls, enabling autonomous research, coding, and writing workflows that last hundreds of steps without drift. For example, when building interactive visual simulations, it coordinates reasoning with tool calls to convert high-level instructions into runnable code — greatly improving automation and reliability in complex development tasks.</li><li><strong>Production-Ready Speed</strong>: Native INT4 quantization achieves 2x inference speed with no quality loss — important when you’re running tasks that involve hundreds of operations.</li><li><strong>Reliable Over Long Sessions</strong>: Handles <strong>200–300</strong> sequential consecutive actions through adaptive reasoning cycles: <strong><em>Plan → Reason → Execute → Adapt → Refine</em></strong>. Unlike typical models that lose focus after 30–50 steps, it decomposes complex problems into clear subtasks and completes end-to-end workflows.</li><li><strong>Strong General Writing:</strong> Handles creative, analytical, and personalized writing with coherent logic, vivid detail, and empathetic tone — adapting smoothly across styles without losing quality.</li></ul><h3>Benchmark Performance</h3><p>Kimi K2 Thinking sets new records across benchmarks assessing reasoning, coding, and agent capabilities, outperforming leading models like <a href="https://openai.com/gpt-5/">GPT-5</a> and <a href="https://www.anthropic.com/news/claude-sonnet-4-5">Claude Sonnet 4.5</a>:</p><ul><li><strong>Agentic Reasoning:</strong> Achieves <strong>44.9% on HLE</strong>, a rigorous benchmark of thousands of expert-level questions across 100+ subjects.</li><li><strong>Agentic Coding:</strong> Scores <strong>71.3% on SWE-Bench Verified</strong> and <strong>61.1% on SWE-Multilingual</strong>, showcasing strong generalization across programming languages and agent scaffolds. Also delivers notable improvements on HTML, React, and component-intensive front-end tasks.</li><li><strong>Agentic Search and Browsing:</strong> Reaches <strong>60.2% on BrowseComp, </strong>double the human baseline of 29.2%.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*NnnySuexaLlowNnhEOFNPQ.png" /></figure><h3>Developer-Ready Integration</h3><p>Beyond Kimi K2 Thinking’s industry-leading performance, SiliconFlow delivers instant compatibility with your existing development ecosystem:</p><ul><li><strong>OpenAI-Compatible Tools</strong>: Seamless integration with <a href="https://marketplace.visualstudio.com/items?itemName=saoudrizwan.claude-dev">Cline</a>, <a href="https://github.com/topics/qwen-code">Qwen Code</a>, <a href="https://github.com/gen-cli/gen-cli/">Gen-CLI</a>, and other standard development environments — just plug in your SiliconFlow API key.</li><li><strong>Anthropic-Compatible API</strong>: Works with <a href="https://www.claude.com/product/claude-code">Claude Code</a> and any Anthropic-compatible tools for code reviews, debugging, and architectural refactoring.</li><li><strong>Platform Integrations</strong>: Ready-to-use in <a href="https://docs.siliconflow.com/en/usercases/use-siliconcloud-in-dify">Dify</a>, <a href="https://chathub.gg/">ChatHub</a>, <a href="https://chatboxai.app/">Chatbox</a>, <a href="https://sider.ai/en/">Sider</a>, <a href="https://github.com/InternLM/MindSearch/blob/main/README.md">MindSearch</a>, <a href="https://github.com/eosphoros-ai/DB-GPT">DB-GPT</a>, and also available through <a href="https://openrouter.ai/provider/siliconflow">OpenRouter</a>.</li></ul><p>With powerful models, seamless integrations, and cost-effective pricing, SiliconFlow transforms how you build — letting you ship faster and scale smarter.</p><h3>Get Started Immediately</h3><ol><li><strong>Explore:</strong> Try <a href="https://www.siliconflow.com/models/kimi-k2-thinking">Kimi K2 Thinking</a> in the <a href="https://cloud.siliconflow.com/me/playground/chat/17885302907">SiliconFlow Playground</a>.</li><li><strong>Integrate:</strong> Use our OpenAI-compatible API. Explore the full API specifications in the <a href="https://docs.siliconflow.com/en/api-reference/chat-completions/chat-completions">SiliconFlow API documentation</a>.</li></ol><pre>import requests<br><br>url = &quot;https://api.siliconflow.com/v1/chat/completions&quot;<br><br>payload = {<br>    &quot;model&quot;: &quot;moonshotai/Kimi-K2-Thinking&quot;,<br>    &quot;messages&quot;: [<br>        {<br>            &quot;role&quot;: &quot;user&quot;,<br>            &quot;content&quot;: &quot;Please provide information about a person in the following JSON format: {   \&quot;name\&quot;: \&quot;string\&quot;,   \&quot;age\&quot;: \&quot;number\&quot;,   \&quot;occupation\&quot;: \&quot;string\&quot;,   \&quot;hobbies\&quot;: [\&quot;string\&quot;] }  Generate a realistic example.&quot;<br>        }<br>    ],<br>    &quot;max_tokens&quot;: 4096,<br>    &quot;stop&quot;: &quot;1&quot;,<br>    &quot;temperature&quot;: 0.7,<br>    &quot;response_format&quot;: {&quot;type&quot;: &quot;json_object&quot;}<br>}<br>headers = {<br>    &quot;Authorization&quot;: &quot;Bearer &lt;token&gt;&quot;,<br>    &quot;Content-Type&quot;: &quot;application/json&quot;<br>}<br><br>response = requests.request(&quot;POST&quot;, url, json=payload, headers=headers)<br><br>print(response.text)</pre><ul><li><a href="https://www.siliconflow.com/contact">Business or Sales Inquiries →</a></li><li><a href="https://discord.com/invite/7Ey3dVNFpT">Join our Discord community now →</a></li><li><a href="https://x.com/SiliconFlowAI">Follow us on X for the latest updates →</a></li><li><a href="https://cloud.siliconflow.com/models">Explore all available models on SiliconFlow →</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f70fd43da53a" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[8 Key Insights on AI Infra from the co-founder of SiliconFlow]]></title>
            <link>https://medium.com/@SiliconFlowAI/8-key-insights-on-ai-infra-from-the-co-founder-of-siliconflow-366fbd8964b1?source=rss-91ba7914dfb6------2</link>
            <guid isPermaLink="false">https://medium.com/p/366fbd8964b1</guid>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[ama]]></category>
            <category><![CDATA[ai-infrastructure]]></category>
            <category><![CDATA[siliconflow]]></category>
            <category><![CDATA[ai-agent]]></category>
            <dc:creator><![CDATA[SiliconFlow]]></dc:creator>
            <pubDate>Wed, 12 Nov 2025 14:11:02 GMT</pubDate>
            <atom:updated>2025-11-12T14:11:02.079Z</atom:updated>
            <content:encoded><![CDATA[<p>Pan Yang, co-founder of SiliconFlow, delivered a speech entitled “AI Infra: For Whom and Why?” at “Real-Time AI Infra Session” of Convo AI &amp; RTE 2025. There are 8 core insights into the field of AI Infra.</p><h3>TL;DR</h3><p>8 key insights from Pan Yang’s speech on AI Infrastructure:</p><ol><li><strong>Inference First</strong> — The shift toward inference computing is driven by exponential growth in AI customers and computation needs.</li><li><strong>Open-Source Opportunities</strong> — Open-source models catching up with 3–5 month gap, with breakthrough potential in multimodal areas.</li><li><strong>The Calling for MaaS</strong> — One-stop platforms providing single API access to multiple models.</li><li><strong>Three Major MaaS Challenges</strong> — Availability issues, performance variations, and the cost reduction illusion.</li><li><strong>Do the Difficult but Right Thing</strong> — SiliconFlow’s commitment to delivering faster, better, and more cost-effective AI Infra.</li><li><strong>Four AI Scenarios 2025 </strong>— Content generation, Agentic AI (Year of Agent), Coding, and Multimodal applications.</li><li><strong>AI is Work, Not Tool</strong> — Jensen Huang’s paradigm shift emphasizing building for Agents rather than humans.</li><li><strong>AI Infra — No Bubble</strong>— The market reality is showing massive unfulfilled demand proving there’s no bubble, only supply shortage.</li></ol><h3>Inference first</h3><p>SiliconFlow predicted that “in the future, the vast majority of computing power will be used for inference, rather than training” in 2023. This trend is becoming a reality in 2025, mainly driven by two factors: the exponential growth in the number and usage of AI customers, and the exponential growth in the amount of computation required to complete a single task.</p><h3>The opportunities of open-source models</h3><p>Open-source models are rapidly catching up with closed-source models at a dynamic gap of 3–5 months. Currently, the open-source ecosystem for LLMs is close to state-of-the-art (SOTA), while for multimodal models such as image, audio and video, there are still significant opportunities for breakthroughs.</p><h3>The calling for Model as a Service (MaaS)</h3><p>This year, we witnessed frequent model updates, diverse specifications, varied architectures, and multiple modalities, no single company can independently deploy and maintain all models. Therefore, a one-stop MaaS platform capable of integrating various models has become an indispensable entry point for developers. This is precisely the direction that SiliconFlow continues to focus on, allowing users to quickly experience various models with just one API.</p><h3>MaaS platforms currently face three major challenges</h3><ul><li><strong>Availability and reliability challenges:</strong> Issues such as insufficient resources and 429/503 errors have occurred.</li><li><strong>Performance and quality vary significantly</strong>: the same open-source model provided by different service providers exhibits significant differences in actual performance, reflecting the varying levels of model quantization and optimization, which directly affect the model’s final capabilities.</li><li><strong>The illusion of decreasing costs</strong>: Although the cost of a single model may decrease tenfold annually, users always seek the latest and most powerful state-of-the-art (SOTA) models, while the invocation prices of these top-tier models remain relatively stable. Meanwhile, the number of tokens consumed to complete a task increases exponentially, resulting in no significant decrease in actual application costs.</li></ul><h3>Do the difficult but right thing</h3><p>SiliconFlow has always been deeply rooted in the AI Infra field, deeply understanding the challenges involved, and continuously committed to promoting the implementation of solutions to provide users with faster, better-performing, and lower-cost AI Infra services.</p><h3>Four highly consensus AI scenarios by 2025</h3><ul><li><strong>Content generation</strong>: generating an article, providing customer service by chatbot, or building a knowledge base, everything revolves around language.</li><li><strong>Agentic AI</strong>: This year has been called the year of Agent. Although there are various understandings of the concept of Agent, there have been some changes. For example, Manus has made great efforts to promote how to define Agent.</li><li><strong>Coding</strong>: The first thing the mainstream models released this year did was to align with Agent and Coding capabilities. The industry generally agrees that Agent and Coding are the areas that consume the most tokens.</li><li><strong>Multimodal</strong>: Especially in the Chinese Internet environment, the model consumption of multimodal is far greater than that of other forms.</li></ul><h3>“AI is Work, Not Tool”</h3><p>Jensen Huang proposed that “AI is Work, Not Tool”, which is essentially a paradigm shift. AI will proactively operate tools to complete tasks, rather than passively responding to instructions. This will trigger a paradigm shift: building for agents, rather than for humans. Humans will increasingly delegate tasks to agents, operating less directly on software interfaces.</p><h3>AI Infra — No Bubble</h3><p>The entire AI infrastructure industry is free of bubbles, and it is actually in a state of “far from insufficient” supply. The global top technology companies have planned to purchase infrastructure worth hundreds of billions of dollars that have not yet been delivered. The current bottlenecks in the industry are the inability to produce chips and the lack of energy. Demand far exceeds supply capacity, proving the market’s authenticity and enormous potential.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*ms4PjANr96FexNLN" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=366fbd8964b1" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[MiniMax-M2 Now on SiliconFlow: Frontier-Style Coding and Agentic Intelligence]]></title>
            <link>https://medium.com/@SiliconFlowAI/minimax-m2-now-on-siliconflow-frontier-style-coding-and-agentic-intelligence-3d83f63b41b2?source=rss-91ba7914dfb6------2</link>
            <guid isPermaLink="false">https://medium.com/p/3d83f63b41b2</guid>
            <category><![CDATA[minimax-m2]]></category>
            <category><![CDATA[minimax]]></category>
            <category><![CDATA[ai-agent]]></category>
            <category><![CDATA[coding]]></category>
            <category><![CDATA[siliconflow]]></category>
            <dc:creator><![CDATA[SiliconFlow]]></dc:creator>
            <pubDate>Tue, 11 Nov 2025 09:14:07 GMT</pubDate>
            <atom:updated>2025-11-11T09:14:07.526Z</atom:updated>
            <content:encoded><![CDATA[<blockquote><strong><em>TL;DR: MiniMax-M2</em></strong><em>, the latest open-source MoE model from the MiniMax AI, is now available on </em><strong><em>SiliconFlow</em></strong><em>. With </em><strong><em>230B total parameters and 10B active</em></strong><em>, it delivers </em><strong><em>frontier-level reasoning, coding, and agentic performance</em></strong><em> in a compact, efficient form. M2 strikes the perfect balance between </em><strong><em>intelligence, speed, and cost, </em></strong><em>achieving top benchmark results while offering </em><strong><em>fast inference and affordable pricing</em></strong><em> through SiliconFlow’s API. </em><strong><em>Try MiniMax-M2 on SiliconFlow</em></strong><em> — explore frontier-grade intelligence at a fraction of the cost.</em></blockquote><p>SiliconFlow is excited to introduce <strong>MiniMax-M2</strong>, a compact yet powerful model designed for advanced coding and agentic workflows, now available on our platform. It is a compact and efficient MoE model (230B total parameters with 10B active) designed for strong performance in coding and agentic tasks while maintaining robust general intelligence. With 10B active parameters, it delivers advanced reasoning and tool-use capabilities comparable to larger models.</p><p>Through SiliconFlow’s MiniMax-M2 API, you can expect:</p><ul><li><strong>Budget-friendly Pricing</strong>: MiniMax-M2 $0.3/M tokens (input) and $1.2/M tokens (output).</li><li><strong>192K Context Window: </strong>Perfect for long documents, complex reasoning, and extended agentic tasks.</li><li><strong>Proven Real-World Performance:</strong> Ranked <strong>#1 among open-source models</strong> on <a href="https://artificialanalysis.ai/"><em>Artificial Analysis</em></a> benchmarks, excelling in math, science, instruction following, coding, and agentic tasks.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cV6tjK316yMSRf19Gkxk0Q.jpeg" /></figure><h3>Key Features &amp; Benchmark Performance</h3><p>In today’s fast-moving era of intelligent Agents, most teams still face a familiar dilemma: no single model truly balances <strong>performance, cost, and speed</strong>. Top-tier models deliver frontier-level results but are <strong>expensive and slow</strong>, while lighter alternatives are <strong>affordable yet limited</strong> in reasoning depth and responsiveness.</p><p>MiniMax M2 is positioned to break this trade-off, delivering frontier-level coding, tool-use, and reasoning with fast inference and exceptional cost efficiency. Based on SiliconFlow’s API pricing, running M2 costs around <strong>92% less than </strong><a href="https://www.anthropic.com/news/claude-sonnet-4-5"><strong>Claude Sonnet 4.5</strong></a> — while delivering <strong>comparable coding and reasoning capabilities</strong>.</p><ul><li><strong>Superior Intelligence: </strong>According to benchmarks from <a href="https://artificialanalysis.ai/">Artificial Analysis</a>, MiniMax-M2 demonstrates highly competitive general intelligence across mathematics, science, instruction following, coding, and agentic tool use. Its composite score <strong>ranks #1 among open-source models globally</strong>.</li><li><strong>Advanced Coding: </strong>Engineered for end-to-end developer workflows, MiniMax-M2 excels at multi-file edits, coding-run-fix loops, and test-validated repairs. Strong performance on Terminal-Bench and (Multi-)SWE-Bench–style tasks demonstrates practical effectiveness in terminals, IDEs, and CI across languages.</li><li><strong>Agent Performance: </strong>Plans and executes long-horizon toolchains across shell, browser, retrieval, and code runners. In BrowseComp-style evaluations, it consistently locates hard-to-surface sources, maintains evidence traceable, and gracefully recovers from flaky steps.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*8N57mav4iU7ns_PS" /></figure><h3>Use SiliconFlow’s MiniMax-M2 API</h3><p>Let’s take a look at <strong>MiniMax-M2 in action</strong> — running on <strong>SiliconFlow’s API</strong> via <strong>Claude Code</strong>, tackling a real-world coding task:</p><p><em>“Create a space-themed brick breaker game in React using HTML5 Canvas. Spaceship paddle moves with arrow keys, glowing ball bounces to destroy alien bricks. Include dark starry background, 5 rows of colorful bricks, score display, and 3 lives. Game over when lives run out, win when all bricks cleared. Add simple particle effects when bricks break and use Vite for setup.”</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*g4DaL9KqqgsCMEeR" /></figure><h3>Claude Code</h3><p>Now, you can easily integrate <strong>SiliconFlow’s MiniMax-M2 API into </strong><a href="https://www.anthropic.com/claude-code"><strong>Claude Code</strong></a>.</p><h4>Step 1: Get Your SiliconFlow API Key</h4><ol><li>Log in to your SiliconFlow dashboard.</li><li>Navigate to API Keys section.</li><li>Generate a new API key for MiniMax-M2<strong> </strong>access.</li><li>Copy and secure your API key.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*bFHKdtA8IZvyEb3J" /></figure><h4>Step 2: Configure Environment Variables</h4><p>Open your terminal and set the following environment variables:</p><pre>export ANTHROPIC_BASE_URL=&quot;https://api.siliconflow.com/v1/chat/completions&quot;<br>export ANTHROPIC_MODEL=&quot;MiniMaxAI/MiniMax-M2&quot;  # You can modify this to use other models as needed<br>export ANTHROPIC_API_KEY=&quot;YOUR_SILICONFLOW_API_KEY&quot; # Please replace with your actual API Key</pre><h4>Step 3: Start Using Claude Code with MiniMax-M2</h4><p>Navigate to your project directory and launch Claude Code:</p><pre>cd your-project-directory<br>claude</pre><p>Claude Code will now use MiniMax-M2 via SiliconFlow’s API service for all your coding assistance needs!</p><p>What’s more, you can also access SiliconFlow’s MiniMax-M2<strong> </strong>model through gen-cli and Cline.</p><h3>Gen-CLI</h3><p><a href="https://github.com/gen-cli/gen-cli/">Gen-CLI</a> is based on the open-source Gemini-CLI and is now available on GitHub. Install using the following steps:</p><ol><li>Ensure your system has Node.js 18+ installed.</li><li>Set the API key environment variable:</li></ol><pre>export SILICONFLOW_API_KEY=&quot;YOUR_API_KEY&quot;</pre><p>Run Gen-CLI:</p><p>Via npx:</p><pre>npx https://github.com/gen-cli/gen-cli</pre><p>Or install via npm:</p><pre>npm install -g @gen-cli/gen-cli<br>gen</pre><h3>Get Started Immediately</h3><ol><li><strong>Explore:</strong> Try <a href="https://cloud.siliconflow.com/models?target=MiniMaxAI/MiniMax-M2">MiniMax-M2</a> in the <a href="https://cloud.siliconflow.com/me/models">SiliconFlow Playground</a>.</li><li><strong>Integrate:</strong> Use our OpenAI-compatible API. Explore the full API specifications in the <a href="https://docs.siliconflow.com/en/userguide/capabilities/vision">SiliconFlow API documentation</a>.</li></ol><pre>import requests<br><br>url = &quot;https://api.siliconflow.com/v1/chat/completions&quot;<br><br>payload = {<br>    &quot;model&quot;: &quot;MiniMaxAI/MiniMax-M2&quot;,<br>    &quot;messages&quot;: [<br>        {<br>            &quot;role&quot;: &quot;user&quot;,<br>            &quot;content&quot;: &quot;Please provide information about a person in the following JSON format:&quot;<br>        }<br>    ],<br>    &quot;max_tokens&quot;: 4096,<br>    &quot;stream&quot;: True,<br>    &quot;enable_thinking&quot;: False,<br>    &quot;temperature&quot;: 0.1,<br>    &quot;response_format&quot;: {&quot;type&quot;: &quot;json_object&quot;}<br>}<br>headers = {<br>    &quot;Authorization&quot;: &quot;Bearer &lt;token&gt;&quot;,<br>    &quot;Content-Type&quot;: &quot;application/json&quot;<br>}<br><br>response = requests.request(&quot;POST&quot;, url, json=payload, headers=headers)<br><br>print(response.text)</pre><p><a href="https://www.siliconflow.com/contact">Business or Sales Inquiries →</a></p><p><a href="https://discord.com/invite/7Ey3dVNFpT">Join our Discord community now →</a></p><p><a href="https://x.com/SiliconFlowAI">Follow us on X for the latest updates →</a></p><p><a href="https://cloud.siliconflow.com/models">Explore all available models on SiliconFlow →</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=3d83f63b41b2" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[OneDiff 1.0 is out!]]></title>
            <link>https://medium.com/@SiliconFlowAI/onediff-1-0-is-out-470d8e46fffe?source=rss-91ba7914dfb6------2</link>
            <guid isPermaLink="false">https://medium.com/p/470d8e46fffe</guid>
            <category><![CDATA[stable-diffusion]]></category>
            <category><![CDATA[stable-video-diffusion]]></category>
            <category><![CDATA[onediff]]></category>
            <category><![CDATA[inference-engine]]></category>
            <category><![CDATA[generative-ai-tools]]></category>
            <dc:creator><![CDATA[SiliconFlow]]></dc:creator>
            <pubDate>Fri, 19 Apr 2024 09:00:31 GMT</pubDate>
            <atom:updated>2024-04-19T09:00:31.166Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*y2b-5qCE17X-4BVm" /><figcaption>(With OneDiff, RTX 3090 can even surpass the performance of A100 GPUs, helping save costs on A100. )</figcaption></figure><p><a href="https://github.com/siliconflow/onediff/releases/tag/1.0.0"><strong>OneDiff 1.0 </strong></a>is for Stable Diffusion and Stable Video Diffusion models(UNet/VAE/CLIP based) acceleration. We have got a lot of support/feedback from the community(<a href="https://github.com/siliconflow/onediff/wiki">https://github.com/siliconflow/onediff/wiki</a>), big thanks!</p><p>The later version 2.0 will focus on DiT/Sora-like models.</p><p>OneDiff 1.0’s updates are mainly the issues in milestone <a href="https://github.com/siliconflow/onediff/milestone/2?closed=1">0.13</a>, which includes the following new features and several bug fixes:</p><ul><li><a href="https://github.com/siliconflow/OneDiffGenMetrics">OneDiff Quality Evaluation</a></li><li>Reuse compiled graph</li><li><a href="https://github.com/siliconflow/onediff/issues/703">Refine support for Playground v2.5</a></li><li>Support ComfyUI-AnimateDiff-Evolved</li><li><a href="https://github.com/siliconflow/onediff/tree/main/onediff_comfy_nodes/modules/oneflow/hijack_ipadapter_plus">Support ComfyUI_IPAdapter_plus</a></li><li><a href="https://github.com/siliconflow/onediff/pull/659">Support stable cascade</a></li><li>Improvements</li></ul><p><a href="https://github.com/siliconflow/onediff/issues/667">Improve performance of VAE</a></p><ul><li>Quantize tools for enterprise edition</li></ul><p><a href="https://github.com/siliconflow/onediff/tree/main/src/onediff/quantization">https://github.com/siliconflow/onediff/tree/main/src/onediff/quantization</a></p><p><a href="https://github.com/siliconflow/onediff/blob/main/README_ENTERPRISE.md#onediff-enterprise">https://github.com/siliconflow/onediff/blob/main/README_ENTERPRISE.md#onediff-enterprise</a></p><ul><li>SD-WebUI supports offline quantized model</li></ul><h3>State-of-the-art performance</h3><h4>SDXL E2E time</h4><ul><li>Model stabilityai/stable-diffusion-xl-base-1.0</li><li>Image size 1024*1024, batch size 1, steps 30</li><li>NVIDIA A100 80G SXM4</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*RevV75Oims84gGnI" /></figure><h4>SVD E2E time</h4><ul><li>Model stabilityai/stable-video-diffusion-img2vid-xt</li><li>Image size 576*1024, batch size 1, steps 25, decoder chunk size 5</li><li>NVIDIA A100 80G SXM4</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*KO0eG_IRFvs1mBV2" /></figure><p><strong>More intro about onediff</strong>: <a href="https://github.com/siliconflow/onediff?tab=readme-ov-file#about-onediff">https://github.com/siliconflow/onediff?tab=readme-ov-file#about-onediff</a></p><p>Looking forward to your feedback from SD community! Welcome to join <a href="https://discord.com/invite/RKJTjZMcPQ"><strong><em>OneDiff Discord group</em></strong></a><strong><em> </em></strong>to discuss related questions.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=470d8e46fffe" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[OneDiff v0.12.1 is released(Stable acceleration of SD and SVD for production environment)]]></title>
            <link>https://medium.com/@SiliconFlowAI/onediff-v0-12-1-is-released-stable-acceleration-of-sd-and-svd-for-production-environment-c33268f8783c?source=rss-91ba7914dfb6------2</link>
            <guid isPermaLink="false">https://medium.com/p/c33268f8783c</guid>
            <category><![CDATA[onediff]]></category>
            <category><![CDATA[stable-video-diffusion]]></category>
            <category><![CDATA[stable-diffusion]]></category>
            <category><![CDATA[generative-ai-tools]]></category>
            <category><![CDATA[ai-art]]></category>
            <dc:creator><![CDATA[SiliconFlow]]></dc:creator>
            <pubDate>Fri, 08 Mar 2024 07:09:22 GMT</pubDate>
            <atom:updated>2024-03-08T07:09:22.751Z</atom:updated>
            <content:encoded><![CDATA[<p><a href="https://github.com/siliconflow/onediff"><strong>OneDiff v0.12.1</strong></a> is now released! This update includes the following highlights, and welcome to install the new version for a better experience:</p><ul><li><a href="https://github.com/siliconflow/onediff/tree/main?tab=readme-ov-file#state-of-the-art-performance">SOTA performance update for SDXL and SVD</a></li><li><a href="https://github.com/siliconflow/onediff/blob/55627d50157d4a0c4b484ba76b088c90f39179ff/onediff_diffusers_extensions/examples/text_to_image_sdxl.py#L96">Fully support dynamic resolution run of SD and SVD</a></li><li><a href="https://github.com/siliconflow/onediff/tree/main/onediff_diffusers_extensions#compile-save-and-load-pipeline">Compile/Save/Load pipeline for HF diffusers</a></li><li><a href="https://github.com/siliconflow/onediff/tree/main/onediff_diffusers_extensions#fast-lora-loading-and-switching">Fast LoRA loading and switching for HF diffusers</a></li><li><a href="https://www.reddit.com/r/StableDiffusion/comments/1al19ek/instantid_can_run_18x_faster_with_onediff/">Accelerate InstantID(1.8x faster)</a></li><li><a href="https://github.com/siliconflow/onediff/blob/main/onediff_diffusers_extensions/examples/text_to_image_sdxl_light.py">Accelerate SDXL lightning</a></li></ul><p>Here is the new Performance update:</p><p><strong>SDXL E2E time</strong></p><ul><li>Model stabilityai/stable-diffusion-xl-base-1.0</li><li>Image size 1024*1024, batch size 1, steps 30</li><li>NVIDIA A100 80G SXM4</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*QJVejYWEC_jh1Xdf" /></figure><p><strong>SVD E2E time</strong></p><ul><li>Model stabilityai/stable-video-diffusion-img2vid-xt</li><li>Image size 576*1024, batch size 1, steps 25, decoder chunk size 5</li><li>NVIDIA A100 80G SXM4</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*Ay-BpsRFtMTK9Ees" /></figure><p>Furthermore, I would like to quote an awesome guide written by <a href="https://www.felixsanz.dev/articles/ultimate-guide-to-optimizing-stable-diffusion-xl">Felix Sanz</a> for those who are interested in optimizing the SDXL:</p><blockquote><strong><em>“It(OneDiff) improves the visual quality of the result, almost halves inference time and the only penalty is a small wait at compilation time. What a great job!”</em></strong></blockquote><p>This report provided very comprehensive and clear analyses for SDXL inference engines. OneDiff was also surveyed. Enjoy: <a href="https://www.felixsanz.dev/articles/ultimate-guide-to-optimizing-stable-diffusion-xl"><strong>https://www.felixsanz.dev/articles/ultimate-guide-to-optimizing-stable-diffusion-xl</strong></a></p><p>Check below for more details:</p><ul><li>OneDiff 0.12.1 release log, <a href="https://github.com/siliconflow/onediff/releases/tag/0.12.1">https://github.com/siliconflow/onediff/releases/tag/0.12.1</a></li><li>OneDiff roadmap and feedbacks, <a href="https://github.com/siliconflow/onediff/wiki">https://github.com/siliconflow/onediff/wiki</a></li></ul><p>Follow us on <a href="https://twitter.com/SiliconFlowAI"><strong>X</strong></a>, Or join <a href="https://discord.com/invite/RKJTjZMcPQ"><strong><em>OneDiff Discord group</em></strong></a><strong><em> </em></strong>to discuss related questions.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c33268f8783c" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Unlock the Potential of AI and Accelerate into the Future with Us!]]></title>
            <link>https://medium.com/@SiliconFlowAI/unlock-the-potential-of-ai-and-accelerate-into-the-future-with-us-a2868705dc7e?source=rss-91ba7914dfb6------2</link>
            <guid isPermaLink="false">https://medium.com/p/a2868705dc7e</guid>
            <category><![CDATA[text-to-video]]></category>
            <category><![CDATA[onediff]]></category>
            <category><![CDATA[stable-diffusion]]></category>
            <category><![CDATA[text-to-image-generation]]></category>
            <category><![CDATA[generative-ai-tools]]></category>
            <dc:creator><![CDATA[SiliconFlow]]></dc:creator>
            <pubDate>Thu, 29 Feb 2024 10:30:33 GMT</pubDate>
            <atom:updated>2024-02-29T10:30:33.096Z</atom:updated>
            <content:encoded><![CDATA[<p>As we step into the vibrant year of 2024, we cordially invite you to participate in a groundbreaking AI Inference Acceleration Challenge. Whether you’re a seasoned AI professional or just starting your journey, we’re eager to hear about your innovative applications and breakthroughs powered by <a href="https://github.com/siliconflow/onediff"><strong>OneDiff</strong></a> image/video generation inference engine.</p><p><strong>GitHub：</strong><a href="https://github.com/siliconflow/onediff"><strong>https://github.com/siliconflow/onediff</strong></a></p><p>This platform offers you not only the chance to showcase your technical prowess but also the opportunity to win substantial prizes, <strong>including OneDiff Enterprise Edition licenses and exquisite gifts</strong>. Let’s work together to advance AI technology and witness how AI shapes our future.</p><p>Get ready to take action, compile your case, and join this AI extravaganza! For more details about the event, please refer to the poster below.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/900/0*nsi_d3EQtWRBzN2B" /></figure><p>Welcome to join <a href="https://discord.com/invite/RKJTjZMcPQ"><strong><em>OneDiff Discord group</em></strong></a><strong><em> </em></strong>to discuss related questions.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a2868705dc7e" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>