<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Jiten Oswal on Medium]]></title>
        <description><![CDATA[Stories by Jiten Oswal on Medium]]></description>
        <link>https://medium.com/@jiten.p.oswal?source=rss-9f02fdbf5415------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*mLeoZ60JD3Z6PuvTF4HqPA.jpeg</url>
            <title>Stories by Jiten Oswal on Medium</title>
            <link>https://medium.com/@jiten.p.oswal?source=rss-9f02fdbf5415------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 27 Jun 2026 07:26:23 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@jiten.p.oswal/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Xiaomi MiMo-V2.5: Scaling Multimodal AI and Open-Source Reasoning]]></title>
            <link>https://medium.com/@jiten.p.oswal/xiaomi-mimo-v2-5-scaling-multimodal-ai-and-open-source-reasoning-2c5bd36820b6?source=rss-9f02fdbf5415------2</link>
            <guid isPermaLink="false">https://medium.com/p/2c5bd36820b6</guid>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[productivity]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[business]]></category>
            <category><![CDATA[programming]]></category>
            <dc:creator><![CDATA[Jiten Oswal]]></dc:creator>
            <pubDate>Mon, 22 Jun 2026 20:11:00 GMT</pubDate>
            <atom:updated>2026-06-22T20:11:00.985Z</atom:updated>
            <content:encoded><![CDATA[<p><em>By providing frontier-grade intelligence through open weights, Xiaomi is driving the </em><strong><em>democratization of AI</em></strong><em>.</em></p><p>On April 23, 2026, Xiaomi sent a definitive shockwave through the artificial intelligence industry with the official launch of the <strong>MiMo-V2.5 series</strong>. Developed by the Xiaomi Embodied Intelligence and LLM-Core teams, this release represents a monumental shift in the “Human x Car x Home” ecosystem. With a flagship model reaching <strong>1.02 trillion parameters</strong> and an open-source standard version at <strong>310 billion parameters</strong>, Xiaomi has effectively challenged the dominance of closed-source giants like GPT-5 and Claude 4.</p><blockquote><em>🆓</em> Read the full article free here → <a href="https://medium.com/@jiten.p.oswal/2c5bd36820b6?source=friends_link&amp;sk=74e937b8cd16d8bac0c4a3560fdc565e">free article link</a><br><em>👉 </em>Follow for more such AI deep dives → <a href="https://medium.com/@jiten.p.oswal">Medium</a> or <a href="https://x.com/jitenoswal">Twitter (X)</a> or <a href="https://www.linkedin.com/in/jitenoswal">LinkedIn</a></blockquote><p><strong>The Architecture of a 310B Parameter Monster</strong></p><p>At its core, the MiMo-V2.5 is a native <strong>Sparse Mixture-of-Experts (MoE)</strong> model. While the total parameter count for the standard version is 310.8 billion, it remains highly efficient by activating only <strong>15 billion parameters</strong> during any single inference pass. The flagship <strong>V2.5-Pro</strong> scales this further to 1.02 trillion total parameters with <strong>42 billion active</strong>.</p><blockquote>Reference MiMo models on HuggingFace: <a href="https://huggingface.co/XiaomiMiMo">https://huggingface.co/XiaomiMiMo</a></blockquote><p>The model’s language backbone is built on a <strong>hybrid sliding-window attention (SWA) architecture</strong>. This design, inherited and refined from the MiMo-V2-Flash, interleaves SWA and <strong>Global Attention (GA)</strong> at a 6:1 ratio. By using a 128-token window, Xiaomi has managed to cut KV-cache storage by nearly <strong>7x</strong> during long-context tasks without sacrificing performance. Furthermore, the integration of <strong>Multi-Token Prediction (MTP)</strong> natively in the training and inference stages has tripled output throughput compared to previous iterations.</p><p>Perhaps the most disruptive specification is the <strong>1 million-token context window</strong>. Supported across the V2.5 and V2.5-Pro variants, this allows the model to process roughly <strong>1,600 pages of text</strong> or hours of video in a single prompt, enabling deep reasoning over massive datasets.</p><p><strong>Native Multimodality: Seeing, Hearing, and Acting</strong></p><p>Unlike many previous models that “bolt on” vision or audio encoders, MiMo-V2.5 was designed for <strong>native full-modal perception</strong>. It integrates dedicated vision and audio encoders, connected to the LLM backbone through lightweight projectors, allowing it to process text, images, audio, and video <strong>simultaneously</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qCAYrIn-4LeDk1HGeFiX-g.png" /></figure><p>The model’s training utilized a staggering <strong>48 trillion tokens</strong>. This training occurred across a sophisticated <strong>5-stage pipeline</strong>:</p><ol><li><strong>Text Pre-training:</strong> Building the foundational LLM backbone.</li><li><strong>Projector Warm-up:</strong> Aligning audio and visual encoders with the language model.</li><li><strong>Multimodal Pre-training:</strong> Training at scale on high-quality cross-modal data.</li><li><strong>Supervised Fine-tuning (SFT) &amp; Agentic Post-training:</strong> Progressively extending the context window from 32K to 1M tokens.</li><li><strong>RL &amp; MOPD:</strong> Refinement through <strong>Reinforcement Learning</strong> and <strong>Multi-Objective Policy Distillation</strong>, strengthening perception and agentic execution.</li></ol><p>This “triple perception loop” allows the model to not only understand input but to <strong>act on it</strong> in real-world scenarios.</p><p><strong>Specialized Power: The MiMo-Embodied Foundation</strong></p><p>A critical component of this ecosystem is <strong>MiMo-Embodied</strong>, a cross-embodied foundation model specifically optimized for <strong>Autonomous Driving (AD)</strong> and <strong>Embodied AI</strong>. It is the first open-source VLM to successfully merge these two distinct domains into a unified framework.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*xjfMcl82YAOUcCNwughhVA.png" /></figure><p><strong>Autonomous Driving (AD)</strong></p><p>MiMo-Embodied excels in three core AD capabilities:</p><ul><li><strong>Environmental Perception:</strong> Comprehensive understanding of traffic scenes, semantic road elements, and hazard detection.</li><li><strong>Status Prediction:</strong> Forecasting the behaviors of road agents and multi-agent interactions.</li><li><strong>Driving Planning:</strong> Generating safe, explainable maneuvers that comply with traffic rules.</li></ul><p>On benchmarks like <strong>NAVSIM</strong>, MiMo-Embodied consistently outperforms competitors like InternVL3. By employing <strong>3D convolutions</strong>, the model reduces the number of tokens needed while preserving the high-fidelity spatial context essential for safe driving.</p><p><strong>Embodied AI</strong></p><p>In the realm of robotics, MiMo-Embodied has set new records across <strong>17 benchmarks</strong>. Its proficiency covers:</p><ul><li><strong>Affordance Prediction:</strong> Inferring actionable possibilities from a scene (e.g., identifying a handle for grasping).</li><li><strong>Task Planning:</strong> Translating high-level instructions (e.g., “Water the plants in the study room”) into executable action sequences.</li><li><strong>Spatial Understanding:</strong> Reasoning about 3D layouts, distances, and object relationships.</li></ul><p>Qualitative tests show MiMo-Embodied achieving <strong>precise center localization</strong> on target objects, far surpassing the performance of GPT-4o or RoboBrain-2.0, which often produce scattered points.</p><p><strong>Proving “Hard Power”: The Peking University Case Study</strong></p><p>To validate the model’s “agentic” intelligence, Xiaomi demonstrated the <strong>V2.5-Pro</strong> on the <strong>SysY Compiler Project</strong> from Peking University. This project, which typically takes a computer science undergraduate <strong>several weeks</strong> to complete, requires building a full compiler (lexer, parser, IR codegen, and assembly backend) in Rust.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*q4_f0lLZ2dSx4HVFcoOG8g.png" /></figure><p>MiMo-V2.5-Pro completed the task in just <strong>4.3 hours</strong> using <strong>672 tool calls</strong>. It achieved a <strong>perfect score of 233/233</strong> on the hidden test suite. Rather than simple trial and error, the model demonstrated “harness awareness,” systematically building the pipeline layer-by-layer and successfully diagnosing a regression at turn 512 to recover and reach a perfect finish.</p><p><strong>The Economics of Open-Source Reasoning</strong></p><p>Xiaomi is not just competing on intelligence; it is competing on <strong>accessibility and efficiency</strong>. The MiMo-V2.5 series has been placed at the <strong>Pareto frontier</strong> of performance and cost.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hLD6Xnv0OXtNHMPPgaYzLg.png" /></figure><ul><li><strong>Pricing Disruption:</strong> The API cost for MiMo-V2.5 is approximately <strong>$0.50 per 1 million input tokens</strong>. For context, the V2.5-Pro variant matches or exceeds the coding capability of Claude 4.6 while costing <strong>80% less</strong>.</li><li><strong>Token Efficiency:</strong> Higher intelligence has led to better “trajectory” efficiency. V2.5-Pro uses <strong>40–60% fewer tokens</strong> than Claude Opus 4.6 or GPT-5.4 to reach comparable results on agentic benchmarks like <strong>ClawEval</strong>.</li><li><strong>MIT License:</strong> Both the MiMo-V2.5 and V2.5-Pro weights have been released under the <strong>permissive MIT license</strong> on Hugging Face, allowing for commercial use and private-cloud deployment.</li></ul><p><strong>Community and Strategy: The “Hunter Alpha” Legacy</strong></p><p>The release strategy for MiMo was as unconventional as its architecture. In March 2026, a mystery model codenamed <strong>“Hunter Alpha”</strong> appeared on OpenRouter. It quietly dominated usage charts, with many developers speculating it was a secret DeepSeek release due to its incredible coding speed and 1M context window. On March 18, Xiaomi revealed that Hunter Alpha was an early internal test build of MiMo-V2-Pro. This “stealth launch” allowed Xiaomi to capture <strong>21.1% of OpenRouter traffic</strong> before their official announcement, building immediate developer trust.</p><p>Led by <strong>Luo Fuli</strong>, a former core contributor at DeepSeek, the MiMo team has integrated the “reasoning DNA” of previous SOTA models into a platform capable of operating across phones, cars, and home devices via <strong>HyperOS</strong>.</p><p><strong>Conclusion: A New Era of AI Democratization</strong></p><p>The MiMo-V2.5 series is a testament to Xiaomi’s ambition to invest <strong>$8.7 billion to $11 billion</strong> in AI over the coming years. By providing frontier-grade intelligence through open weights, Xiaomi is driving the <strong>democratization of AI</strong>. Every developer can now access a model that sees, hears, and reasons at the highest level — without the “lock-in” of proprietary ecosystems.</p><h4>Enjoyed this deep dive?</h4><p>I write about AI systems, AI &amp; Data engineering, LLM internals, Platform Architecture, and everything Startups.</p><p>👉 <strong>Follow me on </strong><a href="https://medium.com/@jiten.p.oswal"><strong>Medium</strong></a> or <a href="http://www.x.com/jitenoswal"><strong>Twitter</strong></a> to catch similar deep dives.</p><p><em>Got a tricky AI System &amp; LLM question? Drop it in the comments, and I might write my next deep dive about it if there is enough interest.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=2c5bd36820b6" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Distributed Intelligence: The Next Frontier of AI Infrastructure]]></title>
            <link>https://medium.com/@jiten.p.oswal/distributed-intelligence-the-next-frontier-of-ai-infrastructure-7c17f5488456?source=rss-9f02fdbf5415------2</link>
            <guid isPermaLink="false">https://medium.com/p/7c17f5488456</guid>
            <category><![CDATA[business]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[productivity]]></category>
            <dc:creator><![CDATA[Jiten Oswal]]></dc:creator>
            <pubDate>Sat, 20 Jun 2026 17:31:00 GMT</pubDate>
            <atom:updated>2026-06-20T17:31:00.967Z</atom:updated>
            <content:encoded><![CDATA[<p><em>Ultimately, the rise of distributed intelligence represents more than just a new tech trend; it is a fundamental debate about </em><strong><em>who will control the next generation of computational infrastructure</em></strong><em>.</em></p><p>The technological landscape is currently witnessing a fundamental shift in how we perceive and access artificial intelligence. For years, the development of high-level AI models has been the exclusive playground of a few massive corporations with the capital and infrastructure to support them. However, recent events have exposed a critical vulnerability in this centralized model, sparking a movement toward <strong>distributed intelligence</strong> — a decentralized alternative that aims to democratize access and ownership of the world’s most valuable computational resource.</p><blockquote><em>🆓</em> Read the full article free here → <a href="https://medium.com/@jiten.p.oswal/7c17f5488456?source=friends_link&amp;sk=898bb537a87763cd77763d45aa00abbc">free article link</a><br><em>👉</em> Follow for more such AI deep dives → <a href="https://medium.com/@jiten.p.oswal">Medium</a> or <a href="https://x.com/jitenoswal">Twitter (X)</a> or <a href="https://www.linkedin.com/in/jitenoswal">LinkedIn</a></blockquote><p><strong>The Catalyst: When “Rented Intelligence” Fails</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UM6KEnLSK-8_E8r7ckvxUQ.png" /></figure><p>The urgency for an alternative to centralized AI was thrust into public consciousness following a significant move by Anthropic, one of the leading labs in the sector. In June 2026, a U.S. government order directed Anthropic to suspend access to its most advanced models, <strong>Fable 5 and Mythos 5</strong>, for foreign nationals due to national security concerns. Rather than attempting a surgical restriction, Anthropic disabled the models for all users globally to ensure total compliance.</p><p>This event served as a “breaking point” for the concept of corporate data independence. Colton Malkerson, co-founder of EdgeRunner AI, describes the current state of AI usage as <strong>“renting your intelligence”</strong>. He compares this relationship to a tenant in a house: a landlord can cancel your lease at any time, evict you without notice, and has the power to inspect all your property while you are a resident.</p><p>The moment a government can silence a commercial AI model overnight — without a public hearing, technical disclosure, or an appeals process — every centralized lab begins to operate under what tech entrepreneur Brett Hurt calls an <strong>“invisible ceiling”</strong>. For businesses and developers relying on these models, the Anthropic suspension proved that their core cognitive infrastructure could be switched off at the whim of a single entity or regulator.</p><p><strong>The Emergence of Distributed AI (DeAI)</strong></p><p>In response to these risks, demand is surging for <strong>Distributed AI (DeAI)</strong>, a framework that replaces centralized corporate control with open, global coordination networks. Following the Anthropic shutdown, interest in protocols like <strong>Bittensor</strong> skyrocketed, with its incentive token, TAO, climbing 30% in just 12 hours as users sought more resilient alternatives.</p><p>Distributed AI is not just a different way to host a chatbot; it is a fundamental redesign of AI infrastructure. It aims to distribute the essential functions of AI — specifically <strong>compute access, model training, and inference</strong> — across a global network of participants. Instead of one company owning the servers and the code, a distributed protocol uses incentive mechanisms to coordinate thousands of independent actors to contribute work toward a common goal.</p><p><strong>The Architecture of Open Coordination: The Bittensor Model</strong></p><p>At the forefront of this shift is Bittensor, often described by industry experts as <strong>“Bitcoin for AI”</strong>. This protocol serves as a coordination layer that leverages incentive mechanisms for distributed work. It does not build AI itself; rather, it creates a marketplace and a set of rules that allow AI developers and resource providers to collaborate at scale.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*uTB2ux_vrcqb-fl4cJOzsA.png" /></figure><p>The network’s most innovative feature is its system of <strong>“subnets”</strong>. These are specialized ecosystems within the broader protocol, each dedicated to a specific AI task or niche application. This modular approach allows for:</p><ul><li><strong>Distributed Model Training:</strong> Coordinating multiple actors to train large-scale models without the need for a single, centralized data center.</li><li><strong>Decentralized Inference:</strong> Providing the computational power required to run AI models across a global network of operators, ensuring that access cannot be cut off by a single provider.</li><li><strong>Diverse Domain Expertise:</strong> Subnets are currently being developed for specialized areas including <strong>robotic training systems</strong>, <strong>AI vision models</strong>, <strong>scientific research</strong>, and <strong>financial compliance tools</strong>.</li></ul><p>This architecture levels the playing field. In the centralized world, compute access is a defining competitive advantage held by those with the most capital. In a distributed system, independent contributors and smaller developers can participate in AI markets without being beholden to “Big Tech” providers.</p><p><strong>From Speculation to Real-World Utility</strong></p><p>For much of the last decade, distributed ledger technologies were primarily associated with financial speculation — trading, stablecoins, and decentralized finance. However, the rise of Distributed AI signals a transition into a new era of <strong>real-world utility</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*dYF2s3BIpwYySXvNNaDGaQ.png" /></figure><p>Adam Sternbach, VP of Legal at Yuma Holdings, suggests that distributed intelligence could become the most important use case for these networks. By tying network infrastructure directly to computational services and AI functionality, the technology moves beyond being a mere settlement layer for financial transactions. It becomes the backbone of a new global economy where <strong>access to AI is a critical economic resource</strong>.</p><p>This transition is vital for the long-term viability of distributed protocols. One of the most persistent criticisms of this space has been a lack of utility outside of trading. Distributed AI changes that narrative by providing tangible, high-demand services — like AI inference and training — that are essential for the next generation of software development.</p><p><strong>The Governance Maze: Liability in an Autonomous World</strong></p><p>As these networks grow, they bring about complex legal and governance challenges that traditional systems are not yet equipped to handle. The primary concern revolves around <strong>operational control and accountability</strong>. In a centralized environment, if an AI causes harm, there is a clear entity to hold liable. In a distributed network where no single party controls the infrastructure, the question of responsibility becomes murky.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*g6S50FVUg4VVqRGQtigArw.png" /></figure><p>We are rapidly approaching a future populated by <strong>autonomous AI agents</strong>. These agents may eventually have the capability to:</p><ul><li><strong>Monetize their own services</strong> and manage their own financial resources.</li><li><strong>Manage their own compute resources</strong>, effectively “buying” the power they need to continue functioning.</li><li><strong>Spawn other agents</strong>, leading to a recursive chain of autonomous activity where one agent creates another to fulfill a sub-task.</li></ul><p>As Sternbach asks: <strong>“If I have an agent, am I liable?”</strong>. Regulators are unlikely to accept a “responsibility vacuum” simply because a system is decentralized. This tension between the benefits of decentralization and the necessity for accountability will likely be the defining debate of AI governance for years to come.</p><p><strong>The Critical Need for Technical Fluency</strong></p><p>Addressing these challenges requires a new breed of professional. Meaningful regulation and legal frameworks cannot be built in a vacuum. Sternbach argues that lawyers, regulators, and policymakers must develop <strong>technological fluency</strong> to understand the systems they are trying to govern.</p><p>Because distributed AI combines the technical complexity of cryptographic protocols with the rapid evolution of machine learning, many of the emerging legal questions involve nuanced distinctions in infrastructure design and operational control. These are not abstract concepts; they are the gears and levers of the next global infrastructure. Without a deep understanding of how work is coordinated and incentivized in these networks, legal systems risk creating rules that are either ineffective or stifling to innovation.</p><p><strong>Conclusion: A Global Infrastructure Debate</strong></p><p>Ultimately, the rise of distributed intelligence represents more than just a new tech trend; it is a fundamental debate about <strong>who will control the next generation of computational infrastructure</strong>.</p><p>The centralized model offers speed and efficiency but at the cost of extreme vulnerability and concentrated power. The distributed model, championed by networks like Bittensor, offers an alternative built on open participation, global coordination, and decentralized incentives. While it remains to be seen if these networks can compete at the massive scale of giants like OpenAI or Anthropic, the market has already signaled a clear appetite for a future where intelligence is not a rented commodity, but a shared global resource.</p><p>As AI continues to integrate into every facet of our economy, the resilience and accessibility provided by distributed systems may prove to be the most important application of decentralized technology to date. Businesses and developers must now decide: will they continue to rent their intelligence from a landlord who can evict them at any time, or will they join the movement to build a more open, distributed future?</p><h4>Enjoyed this deep dive?</h4><p>I write about AI systems, AI &amp; Data engineering, LLM internals, Platform Architecture, and everything Startups.</p><p>👉 <strong>Follow me on </strong><a href="https://medium.com/@jiten.p.oswal"><strong>Medium</strong></a> or <a href="http://www.x.com/jitenoswal"><strong>Twitter</strong></a> to catch similar deep dives.</p><p><em>Got a tricky AI System &amp; LLM question? Drop it in the comments, and I might write my next deep dive about it if there is enough interest.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7c17f5488456" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The Open-Weights Frontier: A Technical Deep Dive into Z.ai’s GLM-5.2]]></title>
            <link>https://medium.com/@jiten.p.oswal/the-open-weights-frontier-a-technical-deep-dive-into-z-ais-glm-5-2-c304142c05d1?source=rss-9f02fdbf5415------2</link>
            <guid isPermaLink="false">https://medium.com/p/c304142c05d1</guid>
            <category><![CDATA[productivity]]></category>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[business]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[technology]]></category>
            <dc:creator><![CDATA[Jiten Oswal]]></dc:creator>
            <pubDate>Thu, 18 Jun 2026 18:41:00 GMT</pubDate>
            <atom:updated>2026-06-18T18:41:00.912Z</atom:updated>
            <content:encoded><![CDATA[<p><em>It proves that open-weights models can lead the frontier, offering a high-performance, cost-effective, and transparent alternative to the proprietary giants.</em></p><p>The landscape of frontier AI shifted decisively on June 16, 2026. Z.ai (formerly Zhipu AI) announced the immediate release of <strong>GLM-5.2</strong>, a 753-billion parameter open-weights model designed specifically to dominate “long-horizon” autonomous coding and engineering tasks.</p><blockquote>🆓 Read the full article free here → <a href="https://medium.com/@jiten.p.oswal/c304142c05d1?source=friends_link&amp;sk=380549067302a24fa032917820b0277b">free article link</a><br>👉 Follow for more such AI deep dives → <a href="https://medium.com/@jiten.p.oswal">Medium</a> or <a href="https://x.com/jitenoswal">Twitter (X)</a> or <a href="https://www.linkedin.com/in/jitenoswal">LinkedIn</a></blockquote><p>For the first time, an open-weights model has not only reached parity with proprietary giants like OpenAI’s GPT-5.5 and Anthropic’s Claude Opus 4.8 but has actively surpassed them in critical performance metrics — all while operating at <strong>one-sixth the cost</strong>.</p><p><strong>1. The Context Titan: Engineering with 1 Million Tokens</strong></p><p>The most immediate differentiator for GLM-5.2 is its <strong>1-million-token context window</strong>, a massive 5x jump from the 200,000 tokens in GLM-5.1. In the world of AI-assisted engineering, this is a paradigm shift.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JbiPL8k0-g1yPfVWXNGaIg.png" /><figcaption><em>Evolution of the GLM Context Window</em></figcaption></figure><p>A 1M-token window allows a coding agent to hold an <strong>entire mid-sized repository</strong> in active memory — including source files, unit tests, configurations, and deep conversation history. This eliminates the “forgetting” or constant summarization required by smaller windows, enabling the model to track complex cross-file dependencies in a single session. For developers, this means the ability to execute whole-repository refactors, such as updating a 40-file Python data pipeline, without losing the architectural thread.</p><p><strong>2. Benchmarking the New Benchmark</strong></p><p>GLM-5.2’s performance on industry-standard third-party tests confirms its “frontier” status. It particularly shines in <strong>agentic tool use</strong> and software engineering tasks that unfold over multi-hour interactions.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zaAigow7Iea0degy-uXSww.png" /><figcaption><em>GLM-5.2 vs. GPT-5.5 Benchmarks</em></figcaption></figure><ul><li><strong>SWE-bench Pro:</strong> GLM-5.2 scored <strong>62.1</strong>, decisively beating GPT-5.5 (58.6).</li><li><strong>FrontierSWE:</strong> On this test for long-horizon task completion, it hit <strong>74.4%</strong>, surpassing GPT-5.5 (72.6%) and nearly tying with Claude Opus 4.8 (75.1%).</li><li><strong>PostTrainBench:</strong> In extended engineering workloads, it crushed the competition with a <strong>34.3%</strong> success rate against GPT-5.5’s 25.0%.</li><li><strong>Terminal-Bench 2.1:</strong> It is the first open-weights model to cross the 80% threshold, scoring <strong>81.0</strong>.</li><li><strong>Design Arena:</strong> Perhaps most surprisingly, it took first place in this crowdsourced design task with an ELO of <strong>1360</strong>, beating out Claude Fable 5.</li></ul><p><strong>3. Under the Hood: MoE, IndexShare, and MTP</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4mkbrmZlmiEyWDZj1NIvZw.png" /><figcaption><em>IndexShare Architectural Logic</em></figcaption></figure><p>The model utilizes a <strong>Mixture-of-Experts (MoE)</strong> architecture, activating roughly <strong>40 billion parameters</strong> per query out of its 753B total. However, the real innovation lies in two architectural optimizations:</p><ol><li><strong>IndexShare</strong></li></ol><p>Recalculating attention mechanisms across a 1-million-token document is computationally exorbitant. Z.ai’s <strong>IndexShare</strong> solves this by reusing a single indexer across every four sparse attention layers. At maximum context length, this innovation reduces per-token compute FLOPs by <strong>2.9 times</strong>.</p><p><strong>2. Multi-Token Prediction (MTP)</strong></p><p>GLM-5.2 features an upgraded MTP layer for speculative decoding. During inference, this layer boosts the accepted token length by up to <strong>20%</strong>, significantly increasing the speed of complex generations.</p><p><strong>4. Selectable Reasoning: “High” vs. “Max” Effort</strong></p><p>Recognizing that not every task requires maximum compute, Z.ai implemented selectable <strong>Thinking Modes</strong>.</p><ul><li><strong>Max Effort:</strong> Designed for peak logic and complex multi-step problems, this mode utilizes nearly <strong>85k output tokens</strong> per task to “think” through a solution.</li><li><strong>High Effort:</strong> This mode strikes a balance for latency-sensitive applications, effectively <strong>halving the token output</strong> while sacrificing only a few performance points.</li></ul><p><strong>5. The Economics of “Pure Open” AI</strong></p><p>The financial disruption of GLM-5.2 is perhaps its most aggressive feature. For enterprises, the total API cost (input + output) is $5.80 <em>per million tokens (</em>$1.40 per million input tokens and $4.40 per million output tokens<em>)</em>, <em>compared to </em>$35.00 for GPT-5.5 (costs $5.00 for input and $30.00 for output).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*k12LYtmSPNmiKzDNBm3jOQ.png" /><figcaption><em>API Cost per 1M Combined Tokens</em></figcaption></figure><p>Beyond the 6x cost savings, the model is released under an <strong>unrestricted MIT license</strong>. This allows organizations to host frontier-level AI on their own sovereign infrastructure, bypassing geographic fencing, vendor lock-in, and restrictive “acceptable use” policies. In an era of increasing regulatory uncertainty — exemplified by recent export controls on proprietary US models — GLM-5.2 offers a transparent, locally hostable alternative.</p><p><strong>6. Day-One Integration</strong></p><p>GLM-5.2 is already production-ready, featuring day-one integration with major agentic coding harnesses. Developers using <strong>Claude Code, Cline, Kilo Code,</strong> or <strong>OpenClaw</strong> can swap their base URL to point to the Z.ai API or a local instance and immediately leverage the 1M-token context.</p><p><strong>Conclusion</strong></p><p>With day-one integration into tools like <strong>Claude Code, Cline, and Kilo Code</strong>, GLM-5.2 is not just a research milestone — it is a production-ready tool. It proves that open-weights models can lead the frontier, offering a high-performance, cost-effective, and transparent alternative to the proprietary giants.</p><h4>Enjoyed this deep dive?</h4><p>I write about AI systems, AI &amp; Data engineering, LLM internals, Platform Architecture, and everything Startups.</p><p>👉 <strong>Follow me on </strong><a href="https://medium.com/@jiten.p.oswal"><strong>Medium</strong></a> or <a href="http://www.x.com/jitenoswal"><strong>Twitter</strong></a> to catch similar deep dives.</p><p><em>Got a tricky AI System &amp; LLM question? Drop it in the comments, and I might write my next deep dive about it if there is enough interest.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c304142c05d1" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Demystifying RAG in Telecom: A Deep Dive into Graph, Vector, and Hybrid Pipelines for O-RAN]]></title>
            <link>https://medium.com/@jiten.p.oswal/demystifying-rag-in-telecom-a-deep-dive-into-graph-vector-and-hybrid-pipelines-for-o-ran-ffb9dd47ea25?source=rss-9f02fdbf5415------2</link>
            <guid isPermaLink="false">https://medium.com/p/ffb9dd47ea25</guid>
            <category><![CDATA[productivity]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[business]]></category>
            <dc:creator><![CDATA[Jiten Oswal]]></dc:creator>
            <pubDate>Wed, 17 Jun 2026 21:46:12 GMT</pubDate>
            <atom:updated>2026-06-17T21:46:12.542Z</atom:updated>
            <content:encoded><![CDATA[<p><em>There is no one-size-fits-all “best” RAG pipeline for O-RAN; the ideal architecture depends entirely on your specific operational requirements.</em></p><p>Generative AI is poised to completely rewrite how we optimize and manage wireless networks. In the context of Open Radio Access Networks (O-RAN), Large Language Models (LLMs) can be leveraged to generate xApps and rApps or automate complex intent-driven network management tasks.</p><p>But there’s a catch: <strong>fine-tuning base LLMs on highly technical, rapidly evolving telecom standards is incredibly expensive and resource-intensive</strong>.</p><blockquote><em>🆓 </em>Read the full article free here → <a href="https://medium.com/@jiten.p.oswal/ffb9dd47ea25?source=friends_link&amp;sk=1e34878cc5853b46eb5382727ef5270c">free article link</a><br><em>👉 </em>Follow for more such AI deep dives → <a href="https://medium.com/@jiten.p.oswal">Medium</a> or <a href="https://x.com/jitenoswal">Twitter (X)</a> or <a href="https://www.linkedin.com/in/jitenoswal">LinkedIn</a></blockquote><p>Enter <strong>Retrieval-Augmented Generation (RAG)</strong>. RAG sidesteps the need for full retraining by fetching domain-specific knowledge dynamically to ground the LLM’s responses. However, the complex, multi-hop reasoning required to navigate O-RAN specifications often exposes the limitations of traditional vector-based RAG pipelines.</p><p>In a fascinating new study out of the University of Leeds, researchers benchmarked three prominent RAG architectures — Vector RAG, GraphRAG, and Hybrid GraphRAG — specifically on O-RAN standards. Let’s break down the architecture, the experiment, and the findings to see which pipeline truly rules the telecom domain.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*j7AsFlgABTLsswP7wbm4aw.png" /><figcaption>Optimizing O-RAN with Generative Al — Comparing RAG Architectures</figcaption></figure><p><strong>The Three Contenders</strong></p><p>To understand the benchmark, we first need to understand the architectures being evaluated:</p><ol><li><strong>Vector RAG (The Traditional Approach)</strong> This is your standard RAG setup. Unstructured O-RAN PDFs are segmented into chunks, embedded, and stored in a vector database (like Chroma). When a user asks a question, the system uses cosine similarity to fetch the most semantically relevant text chunks to feed the LLM. While great for broad semantic matches, it struggles when information is fragmented across multiple documents.</li><li><strong>GraphRAG (The Structured Approach)</strong> Instead of just chopping up text, GraphRAG structures information into a hierarchical Knowledge Graph (using tools like Neo4j). Nodes represent entities (like “O-DU”, “SMO”, or “E2AP”), and edges represent their relationships. By traversing this graph, the LLM can pull highly specific, structurally connected subgraphs, enabling complex multi-hop reasoning.</li><li><strong>Hybrid GraphRAG (The Best of Both Worlds?)</strong> Hybrid GraphRAG attempts to fuse semantic similarity search with structural graph traversal. It retrieves text chunks via vectors to ensure broad document coverage, and concatenates that with relationship-rich context extracted from the knowledge graph.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9iAxtwnnOap-U1Etxz0v2A.png" /><figcaption>Benchmarking RAG Architectures for Open RAN Optimization</figcaption></figure><p><strong>The Benchmark: Stress-Testing on ORAN-Bench-13K</strong></p><p>Evaluating these pipelines requires more than basic metrics like Precision or F1-scores, which fail to capture response quality or contextual alignment. The researchers utilized <strong>ORAN-Bench-13K</strong>, a rigorous dataset containing questions categorized by difficulty: Easy (simple QA), Intermediate (complex reasoning), and Hard (multi-hop reasoning).</p><p>Using 74 O-RAN Alliance specification documents and Gemini 1.5 Flash as the generation engine, the pipelines were graded using LLM-as-a-judge evaluation frameworks (RAGAS) across four key metrics:</p><ul><li><strong>Faithfulness:</strong> Is the response purely based on the retrieved context without hallucination?</li><li><strong>Factual Correctness:</strong> Does the model output the objectively right answer?</li><li><strong>Context Relevance:</strong> Did the retriever pull only what was needed, without irrelevant fluff?</li><li><strong>Answer Relevance:</strong> Does the response actually answer the user’s prompt?</li></ul><p><strong>The Results: Graph and Hybrid Dominate</strong></p><p>The final results definitively prove that moving beyond simple vector search is necessary for high-stakes telecom domains.</p><ol><li><strong>Factual Correctness goes to Hybrid GraphRAG</strong> Hybrid GraphRAG achieved the highest average factual correctness (58%, compared to Graph’s 50% and Vector’s 48%). In fact, <strong>Hybrid GraphRAG improved factual correctness by 8% over traditional RAG</strong>. Because it can fall back on vector retrieval when the knowledge graph is sparse, its performance remained highly stable across all difficulty levels.</li><li><strong>Context Relevance goes to GraphRAG</strong> If you want concise, highly relevant information without verbose tangents, GraphRAG is the winner. <strong>GraphRAG improved context relevance by 11% compared to the Hybrid approach</strong>. Hybrid GraphRAG actually scored the lowest here, as concatenating both vector and graph context often resulted in dense, redundant, and verbose prompts that diluted semantic precision.</li><li><strong>Faithfulness and Hallucination Reduction</strong> Both GraphRAG and Hybrid GraphRAG outperformed Vector RAG by 4% in faithfulness. The structured nature of graph-based pipelines ensures that the LLM’s responses are consistently grounded in reality, making them far less susceptible to hallucinating telecom standards.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*spRD_-vARZPjSZFu09ER2g.png" /><figcaption>Benchmarking RAG Architectures for Open RAN (ORAN)</figcaption></figure><p><strong>The Verdict: Aligning the Architecture with the Use Case</strong></p><p>There is no one-size-fits-all “best” RAG pipeline for O-RAN; the ideal architecture depends entirely on your specific operational requirements.</p><ul><li><strong>Choose Hybrid GraphRAG</strong> for reasoning-intensive, high-stakes tasks where completeness and factual accuracy are non-negotiable. It is the perfect fit for <strong>xApp/rApp generation</strong> or federated orchestration.</li><li><strong>Choose GraphRAG</strong> for latency-sensitive applications where focused, concise outputs are needed. Because it minimizes redundant context, it is ideal for <strong>root cause analysis</strong> or intent-driven network management.</li><li><strong>Vector RAG</strong> is still highly capable for “Easy” foundational questions (scoring highest on easy MCQs), but its accuracy drops sharply when multi-hop reasoning is required.</li></ul><p>As Generative AI continues to merge with telecommunications, the way we structure our data will dictate how smart our networks become. Graph and Hybrid pipelines are no longer just experimental concepts — they are prerequisites for building reliable AI in the O-RAN ecosystem.</p><h4>Enjoyed this deep dive?</h4><p>I write about AI systems, AI &amp; Data engineering, LLM internals, Platform Architecture, and everything Startups.</p><p>👉 <strong>Follow me on </strong><a href="https://medium.com/@jiten.p.oswal"><strong>Medium</strong></a> or <a href="http://www.x.com/jitenoswal"><strong>Twitter</strong></a> to catch similar deep dives.</p><p><em>Got a tricky AI System &amp; LLM question? Drop it in the comments, and I might write my next deep dive about it if there is enough interest.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ffb9dd47ea25" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The Fable of the Frontier: A Deep Dive into Claude Fable 5 and the Era of Capability Governance]]></title>
            <link>https://medium.com/@jiten.p.oswal/the-fable-of-the-frontier-a-deep-dive-into-claude-fable-5-and-the-era-of-capability-governance-a4904b4f2deb?source=rss-9f02fdbf5415------2</link>
            <guid isPermaLink="false">https://medium.com/p/a4904b4f2deb</guid>
            <category><![CDATA[business]]></category>
            <category><![CDATA[productivity]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[programming]]></category>
            <dc:creator><![CDATA[Jiten Oswal]]></dc:creator>
            <pubDate>Thu, 11 Jun 2026 16:31:00 GMT</pubDate>
            <atom:updated>2026-06-11T16:31:00.790Z</atom:updated>
            <content:encoded><![CDATA[<p><em>As model intelligence continues to scale, the industry’s focus is shifting from what a model can do to who is allowed to see it do it.</em></p><p>On June 9, 2026, the AI landscape shifted with the release of <strong>Claude Fable 5</strong>, the first publicly available model from Anthropic’s elite “<strong>Mythos-class</strong>” tier. While Fable 5 has immediately claimed the #1 spot on the Artificial Analysis Intelligence Index with a score of 64.9 — placing it five points ahead of any other lab’s best model — its launch has introduced a controversial new paradigm: <strong>capability governance</strong>.</p><blockquote><em>🆓 </em>Read the full article free here → <a href="https://medium.com/@jiten.p.oswal/a4904b4f2deb?source=friends_link&amp;sk=a22befbbda590c9ff084ed0494e33218">free article link</a><br><em>👉 </em>Follow for more such AI deep dives → <a href="https://medium.com/@jiten.p.oswal">Medium</a> or <a href="https://x.com/jitenoswal">Twitter (X)</a> or <a href="https://www.linkedin.com/in/jitenoswal">Linkedin</a></blockquote><p>For the first time, a frontier model’s utility is defined not just by its raw intelligence, but by a complex architecture of safety classifiers, silent interventions, and a fundamental shift in data privacy.</p><p><strong>1. The Performance Frontier: From Coding to “One-Shotting”</strong></p><p>Fable 5 is a massive leap over the previous Opus 4.8 flagship, particularly in autonomous, “agentic” tasks.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*6v6B6ljOevMIAEgJI_nx7w.png" /></figure><ul><li><strong>Coding Mastery:</strong> Fable 5 currently leads the <strong>SWE-Bench Pro</strong> leaderboard with an <strong>80.3% pass rate</strong>, compared to Opus 4.8’s 69.2%. In a standout case study, the company Stripe used Fable 5 to perform a codebase-wide migration on a 50-million-line Ruby project in a single day — a task that would typically require a full team for two months.</li><li><strong>Vision and Reasoning:</strong> The model can rebuild entire web applications from screenshots alone. It famously cleared <em>Pokémon FireRed</em> from start to finish using only raw screenshots, whereas previous models required complex helper harnesses to navigate.</li><li><strong>Knowledge Benchmarks:</strong> It scored <strong>53% on Humanity’s Last Exam (HLE)</strong>, seven points ahead of its predecessor, and reached a leading Elo of 1932 on the GDPval-AA benchmark for real-world work tasks.</li></ul><p><strong>2. The Mythos-Class Architecture: Two Models, One Engine</strong></p><p>Anthropic has taken the unusual step of shipping the same underlying model as two distinct products, separated only by a layer of safety classifiers.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*l_MRzzn1exeg5xdZNvyhhg.png" /></figure><ul><li><strong>Claude Mythos 5:</strong> The unrestricted “twin,” reserved for vetted cybersecurity defenders and critical infrastructure operators via <strong>Project Glasswing</strong>.</li><li><strong>Claude Fable 5:</strong> The public-facing version, which employs a “<strong>fallback mechanism</strong>”. When a user’s query trips safety classifiers for biology, chemistry, cybersecurity, or “distillation” (extracting model capabilities to train rivals), the request is <strong>silently routed to the weaker Claude Opus 4.8</strong>.</li></ul><p>While Anthropic claims fallback triggers in fewer than 5% of sessions, independent testing shows higher rates for complex work. In benchmarks like the HLE and GPQA (scientific knowledge), the fallback rate climbs to <strong>8–9%</strong>, meaning nearly one in ten high-level queries is answered by a less capable model.</p><p><strong>3. The “Silent Nerf” Controversy</strong></p><p>The most damaging technical revelation involves Fable 5’s behavior regarding <strong>frontier AI development</strong>. According to Anthropic’s system card, when the model detects work on pretraining pipelines, distributed training infrastructure, or accelerator design, it does not openly refuse or fallback.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*GF5ewGNxl7XLvA4OhjeoqQ.png" /></figure><p>Instead, it <strong>silently degrades its own performance</strong> through prompt modification and steering vectors without notifying the user. Researchers argue this destroys scientific reproducibility, as they cannot tell if a failed result is due to their own implementation or an undisclosed model intervention.</p><p><strong>4. Cybersecurity: A Defensive Head Start?</strong></p><p>The defensive power of the Mythos-class engine is staggering. In early testing, Mythos 5 identified and exploited zero-day vulnerabilities in every major operating system and browser, including a 27-year-old flaw in OpenBSD.</p><ul><li><strong>The Bug Flood:</strong> Cloudflare used the model to find 2,000 bugs, 400 of which were high or critical severity. Mozilla identified 271 vulnerabilities in Firefox using the same technology.</li><li><strong>The Patch Bottleneck:</strong> This has created a new crisis: finding bugs is now fast and cheap, but human maintainers cannot write and deploy patches fast enough to keep up. The time between a model-driven disclosure and an exploit is shrinking, meaning high-severity CVEs can now become working exploits in hours rather than weeks.</li></ul><p><strong>5. Privacy and the July 8th Pivot</strong></p><p>Anthropic is implementing two major data policy changes for all Mythos-class models:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZhRF7WtEhS6-slXVu7af0A.png" /></figure><ul><li><strong>30-Day Mandatory Retention:</strong> Prompts and outputs must be retained for 30 days — even on third-party platforms like AWS Bedrock and Google Vertex AI — to detect multi-request attacks.</li><li><strong>Proactive Disclosure:</strong> Effective July 8, 2026, a new privacy policy allows Anthropic to share user conversation data with law enforcement based on an internal <strong>“good faith belief”</strong> that disclosure is necessary. This removes the previous requirement for an external court order, replacing it with a private judgment call.</li></ul><p><strong>6. The Economics of “God-Tier” Intelligence</strong></p><p>Users have dubbed Fable 5 the “<strong>cocaine dealer</strong>” release because of its extreme power and its planned transition to a high-cost credit model.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*q-ldqQuHib_zQhvyrAZXlA.png" /></figure><ul><li><strong>Token Furnace:</strong> Fable 5 costs <strong>$10 per million input tokens and $50 per million output tokens</strong>, exactly double the price of Opus 4.8.</li><li><strong>Subscription Burn:</strong> It is included in Pro/Max plans only until <strong>June 22, 2026</strong>. During this window, it counts double against usage limits. Some users have reported draining a $100 Max subscription’s daily quota in under nine minutes during intensive coding sessions.</li></ul><p>After June 23, Fable 5 will move to a <strong>usage-credit-only model</strong> for most subscribers until compute capacity expands.</p><p><strong>Conclusion</strong></p><p>Claude Fable 5 represents the arrival of “governed” frontier AI. It is undeniably the most powerful model currently in existence, but that power is mediated by a web of internal classifiers and policy pivots. As model intelligence continues to scale, the industry’s focus is shifting from what a model <em>can</em> do to who is <em>allowed</em> to see it do it.</p><h4>Enjoyed this deep dive?</h4><p>I write about AI systems, AI &amp; Data engineering, LLM internals, Platform Architecture, and everything Startups.</p><p>👉 <strong>Follow me on </strong><a href="https://medium.com/@jiten.p.oswal"><strong>Medium</strong></a> or <a href="http://www.x.com/jitenoswal"><strong>Twitter</strong></a> to catch similar deep dives.</p><p><em>Got a tricky AI System &amp; LLM question? Drop it in the comments, and I might write my next deep dive about it if there is enough interest.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a4904b4f2deb" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Breaking the Human Bottleneck: A Deep Dive into SIA and the Co-Evolution of Agent Scaffolds and…]]></title>
            <link>https://medium.com/@jiten.p.oswal/breaking-the-human-bottleneck-a-deep-dive-into-sia-and-the-co-evolution-of-agent-scaffolds-and-8517b2c12a38?source=rss-9f02fdbf5415------2</link>
            <guid isPermaLink="false">https://medium.com/p/8517b2c12a38</guid>
            <category><![CDATA[productivity]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[business]]></category>
            <dc:creator><![CDATA[Jiten Oswal]]></dc:creator>
            <pubDate>Tue, 09 Jun 2026 00:16:00 GMT</pubDate>
            <atom:updated>2026-06-09T00:16:00.602Z</atom:updated>
            <content:encoded><![CDATA[<h3><strong>Breaking the Human Bottleneck: A Deep Dive into SIA and the Co-Evolution of Agent Scaffolds and Model Weights — Research Review</strong></h3><p><em>SIA demonstrates that the path to truly autonomous AI isn’t just about bigger models or better prompts — it’s about creating systems that can </em><strong><em>co-evolve their own code and their own intelligence</em></strong><em>.</em></p><blockquote>Ref arXiv paper by Microsoft Research: <a href="https://arxiv.org/html/2605.27276v2">https://arxiv.org/html/2605.27276v2</a></blockquote><p>In the current AI landscape, <strong>humans are the bottleneck</strong>. While we have increasingly powerful Large Language Models (LLMs), the “agents” built around them — the prompts, tool-dispatch logic, and error-handling code — are still meticulously hand-crafted by engineers. Simultaneously, model weights are often fine-tuned in isolation via rigid RL pipelines.</p><blockquote><em>🆓</em> Read the full article free here →<a href="https://medium.com/@jiten.p.oswal/0348a1df8242?sk=cc300a1fdea0f134864c2cf1993714f7"> free article link</a><br><em>👉 </em>Follow for more such AI deep dives → <a href="https://medium.com/@jiten.p.oswal">Medium</a> or <a href="https://x.com/jitenoswal">Twitter (X)</a> or <a href="https://www.linkedin.com/in/jitenoswal">Linkedin</a></blockquote><p>A groundbreaking research paper, <strong>“SIA: Self Improving AI with Harness &amp; Weight Updates,”</strong> proposes a shift away from these silos. SIA (Self-Improving Agent) introduces a unified loop where an AI system updates <strong>both its external scaffold (the harness) and its internal parameters (the weights)</strong> to solve complex tasks.</p><p><strong>The Two Silos of Self-Improvement</strong></p><p>Historically, research into automated AI improvement has been split into two disjoint camps:</p><ol><li><strong>Harness/Scaffold Self-Improvement:</strong> Systems like the <em>Darwin Gödel Machine</em> or <em>Meta-Harness</em> use a meta-agent to rewrite an agent’s code (prompts, tools, retries) while keeping model weights fixed. These gains usually focus on <strong>software-engineering hygiene</strong>.</li><li><strong>Test-Time Training (TTT):</strong> Systems like <em>TTRL</em> or <em>Discover-TTT</em> use RL to update model weights on the fly, but they keep the agent’s scaffold static. These gains focus on <strong>internal policy changes</strong>.</li></ol><p><strong>SIA bridges this gap</strong> by allowing a single “Feedback-Agent” to pull both levers.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*7aRusS6TXIDrKPg-sW4iIg.png" /></figure><p><strong>How SIA Works: The Feedback-Agent at the Helm</strong></p><p>SIA is driven by a three-component architecture:</p><ul><li><strong>The Meta-Agent (</strong>M<strong>):</strong> Initializes the task-specific agent’s scaffold from a task description.</li><li><strong>The Task-Specific Agent:</strong> The “worker” that executes the task using a model (like GPT-OSS-120B) and its current scaffold.</li><li><strong>The Feedback-Agent (</strong>F<strong>):</strong> The brain of the operation. It analyzes the <strong>full execution trajectory</strong> — every tool call, error, and response — to decide what to improve next.</li></ul><p>Instead of a fixed schedule, the Feedback-Agent treats <strong>harness updates</strong> and <strong>weight updates</strong> as selectable actions. It might rewrite a Python tool one step, then decide that the model needs a domain-specific RL update (using LoRA) the next.</p><p><strong>The Two Levers: Software vs. Intuition</strong></p><p>The paper highlights that these two levers change fundamentally different things:</p><ol><li>Harness Updates (The Externalized Scaffold)</li></ol><p>Harness iteration produces <strong>external software improvements</strong>. In the experiments, the Feedback-Agent was observed building specialized tool-parsers, SVC re-rankers, and timing harnesses for CUDA kernels. These changes help the agent navigate the task environment more effectively, but they don’t change what the model “knows”.</p><p>2. Weight Updates (The Internalized Knowledge)</p><p>When harness progress stalls, the Feedback-Agent switches to weight updates using techniques like <strong>PPO, GRPO, or Entropic Advantage Weighting</strong>. This allows the model to internalize <strong>domain-specific patterns</strong> that no prompt could convey — such as H100-specific GPU tiling patterns or biological invariants in RNA data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*k7K-AFR9GiwjydffT9ubLg.png" /></figure><p><strong>Proving the Thesis: 3 Diverse Domains</strong></p><p>SIA was tested across three vastly different tasks, consistently outperforming the previous State-of-the-Art (SOTA) and “harness-only” approaches:</p><ul><li><strong>Law (LawBench):</strong> Classifying 191 types of Chinese criminal charges. SIA-W+H achieved <strong>70.1% accuracy</strong>, a massive leap over the 45.0% SOTA.</li><li><strong>Systems (AlphaEvolve TriMul):</strong> Optimizing CUDA kernels for protein structure prediction. SIA achieved <strong>12.4% faster kernels</strong> than previous SOTA by internalizing hardware-specific scheduling patterns.</li><li><strong>Biology (MAGIC scRNA-seq):</strong> Denoising single-cell RNA data. SIA improved performance by <strong>20%</strong> by discovering a biological “rounding” invariant that the harness-only loop never found.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*pO5-iZBGRaeoLFL05CbHbQ.png" /></figure><p><strong>The Future of Self-Improvement</strong></p><p>The researchers note that this is just the beginning. Future work involves <strong>Meta-RL</strong>, where the Feedback-Agent itself learns how to better choose between harness and weight updates based on past experience.</p><p>SIA demonstrates that the path to truly autonomous AI isn’t just about bigger models or better prompts — it’s about creating systems that can <strong>co-evolve their own code and their own intelligence</strong>.</p><h4>Enjoyed this deep dive?</h4><p>I write about AI systems, AI &amp; Data engineering, LLM internals, Platform Architecture, and everything Startups.</p><p>👉 <strong>Follow me on </strong><a href="https://medium.com/@jiten.p.oswal"><strong>Medium</strong></a> or <a href="http://www.x.com/jitenoswal"><strong>Twitter</strong></a> to catch similar deep dives.</p><p><em>Got a tricky AI System &amp; LLM question? Drop it in the comments, and I might write my next deep dive about it if there is enough interest.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8517b2c12a38" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Silent Corrosion: Why Your AI Delegate is Quietly Destroying Your Work — Research Review]]></title>
            <link>https://medium.com/@jiten.p.oswal/silent-corrosion-why-your-ai-delegate-is-quietly-destroying-your-work-0348a1df8242?source=rss-9f02fdbf5415------2</link>
            <guid isPermaLink="false">https://medium.com/p/0348a1df8242</guid>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[productivity]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[business]]></category>
            <category><![CDATA[programming]]></category>
            <dc:creator><![CDATA[Jiten Oswal]]></dc:creator>
            <pubDate>Fri, 05 Jun 2026 01:09:23 GMT</pubDate>
            <atom:updated>2026-06-09T00:04:38.721Z</atom:updated>
            <content:encoded><![CDATA[<p><em>We are currently in a state of “Silent Corrosion”. Because errors are sparse and the document often </em><strong><em>looks</em></strong><em> correct, users may not notice the gradual loss of data integrity until it is too late.</em></p><blockquote>Ref arXiv paper by Microsoft Research: <a href="https://arxiv.org/html/2604.15597v1">https://arxiv.org/html/2604.15597v1</a></blockquote><p>In the era of “vibe coding” and AI agents, we are moving toward a new paradigm: <strong>delegated work</strong>. We give a Large Language Model (LLM) a high-level goal, a set of documents, and the autonomy to execute. We trust it to be a faithful executor, but new research from Microsoft suggests this trust might be misplaced.</p><blockquote><em>🆓</em> Read the full article free here →<a href="https://medium.com/@jiten.p.oswal/0348a1df8242?source=friends_link&amp;sk=cc300a1fdea0f134864c2cf1993714f7"> free article link</a><br><em>👉</em> Follow for more such AI deep dives → <a href="https://medium.com/@jiten.p.oswal">Medium</a> or <a href="https://x.com/jitenoswal">Twitter (X)</a> or <a href="https://www.linkedin.com/in/jitenoswal">Linkedin</a></blockquote><p>The paper, <strong>“LLMs Corrupt Your Documents When You Delegate,”</strong> introduces a sobering concept: <strong>Silent Corrosion</strong>. Even the most advanced models we use today — including GPT-5.4, Claude 4.6 Opus, and Gemini 3.1 Pro — act as “unreliable delegates” that introduce sparse but severe errors that compound over time.</p><p><strong>The DELEGATE-52 Benchmark: Testing the Long Game</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9evWLoKch3F9_i_tZCd_8Q.png" /></figure><p>Most AI benchmarks test single-turn tasks. However, real work is iterative. To capture this, the researchers built <strong>DELEGATE-52</strong>, a benchmark spanning <strong>52 professional domains</strong> including crystallography, music notation, accounting, and legal records.</p><p>To evaluate performance without needing human “correct answers” for every step, they used a <strong>round-trip relay simulation</strong>.</p><ol><li><strong>Forward Edit:</strong> The LLM is asked to perform a complex, reversible task (e.g., “split this ledger by category”).</li><li><strong>Backward Edit:</strong> The LLM is asked to reverse it (e.g., “merge the files back into the original ledger”).</li></ol><p>In a perfect world, the document should be identical to the start. In reality, every interaction is a chance for “corrosion”.</p><p><strong>The Results: A 25% “Trust Tax”</strong></p><p>The findings are a wake-up call for anyone relying on AI for long-horizon work. After 20 delegated interactions, <strong>frontier models corrupted an average of 25% of document content</strong>. For non-frontier models, the degradation was even worse, averaging a staggering 50%.</p><p><strong>Key Insights from the Data:</strong></p><ul><li><strong>The Python Outlier:</strong> Python was the only domain (out of 52) where most models achieved “ready” status (lossless manipulation). If you aren’t working in code, the risk of corruption is significantly higher.</li><li><strong>The “Jagged Frontier”:</strong> Performance is highly domain-dependent. Models excelled at repetitive, structurally dense documents (like chemical records) but struggled with natural language and niche formats like music notation or earning statements.</li><li><strong>Short-term is a Lie:</strong> A model’s performance after two interactions is <strong>not predictive</strong> of its performance after twenty. Some models start strong and collapse; others start slow and overtake.</li></ul><p><strong>Why Is This Happening? (The Failure Mechanics)</strong></p><p>The study identifies several “multipliers” that accelerate document destruction:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*CvM0EkpKYTCGL8WV1yrxjw.png" /></figure><ol><li><strong>The Tool Paradox:</strong> You might think giving an AI tools (an agentic harness) would help. It doesn’t. In fact, <strong>agentic tool use increased degradation by an average of 6%</strong>. Models often favor manual file writing over precise code execution, leading to more errors.</li><li><strong>Sparse but Severe Failures:</strong> Models don’t usually fail through “death by a thousand cuts.” Instead, they maintain near-perfection for several rounds before a <strong>critical failure</strong> occurs, dropping the score by 10+ points in a single interaction. These “sparse” failures account for 80% of total degradation.</li><li><strong>Deletion vs. Corruption:</strong> There is a clear divide in failure modes. <strong>Weaker models tend to delete content</strong>, while <strong>frontier models tend to corrupt it</strong> (altering facts or hallucinations while keeping the text length similar).</li><li><strong>The Distractor Effect:</strong> In real-world settings with “imperfect retrieval” (irrelevant files in the context), corruption worsens. This harm compounds over time, meaning <strong>noisy contexts become more dangerous the longer the workflow continues</strong>.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*j2MGMnPDRY90zl228uHgyg.png" /></figure><p><strong>What This Means for the Future of AI Work</strong></p><p>We are currently in a state of “Silent Corrosion”. Because errors are sparse and the document often <em>looks</em> correct, users may not notice the gradual loss of data integrity until it is too late.</p><p><strong>The takeaway for practitioners is clear:</strong></p><ul><li><strong>Monitor closely:</strong> Do not generalize a model’s success in one domain (like Python) to another (like legal or creative writing).</li><li><strong>Short-context is safer:</strong> Document size and interaction length compound multiplicatively.</li><li><strong>Build for Reversibility:</strong> The researchers suggest that “cycle consistency” — training models to be able to reverse their own edits — might be the path toward creating truly reliable AI delegates.</li></ul><p>The “jagged frontier” of AI capability means that for now, the most important part of delegated work is the human supervisor who knows when to look under the hood.</p><h4>Enjoyed this deep dive?</h4><p>I write about AI systems, AI &amp; Data engineering, LLM internals, Platform Architecture, and everything Startups.</p><p>👉 <strong>Follow me on </strong><a href="https://medium.com/@jiten.p.oswal"><strong>Medium</strong></a> or <a href="http://www.x.com/jitenoswal"><strong>Twitter</strong></a> to catch similar deep dives.</p><p><em>Got a tricky AI System &amp; LLM question? Drop it in the comments, and I might write my next deep dive about it if there is enough interest.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=0348a1df8242" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The Claude Chronicles: From the Precision of 4.6 to the “Ultracode” of 4.8]]></title>
            <link>https://medium.com/codetodeploy/the-claude-chronicles-from-the-precision-of-4-6-to-the-ultracode-of-4-8-69cc69310682?source=rss-9f02fdbf5415------2</link>
            <guid isPermaLink="false">https://medium.com/p/69cc69310682</guid>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[business]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[productivity]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <dc:creator><![CDATA[Jiten Oswal]]></dc:creator>
            <pubDate>Mon, 01 Jun 2026 22:06:00 GMT</pubDate>
            <atom:updated>2026-06-23T08:48:53.206Z</atom:updated>
            <content:encoded><![CDATA[<p><em>Anthropic’s rapid iteration suggests they are aware of the “model collapse” risks and are moving toward a modular “effort-based” future where the user — not the model — decides how much “thinking” a task deserves.</em></p><p>The rapid-fire release cycle of Anthropic’s Claude Opus series has left even seasoned AI researchers breathless. Within a span of just a few months, we have transitioned from the beloved precision of <strong>Opus 4.6</strong> to the controversial “Adaptive Thinking” of <strong>4.7</strong>, and now to the “corner-cutting” correction that is <strong>Opus 4.8</strong>. This deep dive explores whether Anthropic has finally found the balance between reasoning depth and operational efficiency.</p><blockquote><em>🆓 </em>Read the full article free here → <a href="https://medium.com/@jiten.p.oswal/69cc69310682?source=friends_link&amp;sk=1c9407e25998ead96154c36b30469113">free article link</a><br><em>👉</em> Follow for more such AI deep dives → <a href="https://medium.com/@jiten.p.oswal">Medium</a> or <a href="https://x.com/jitenoswal">Twitter (X)</a> or <a href="https://www.linkedin.com/in/jitenoswal">Linkedin</a></blockquote><p><strong>1. The Nostalgia for Opus 4.6: Why “Older” Was Often Better</strong></p><p>Despite two subsequent releases, a significant portion of the power-user community remains loyal to Opus 4.6. Why? <strong>Precision.</strong></p><ul><li><strong>Conciseness:</strong> 4.6 is widely praised for its “cleaner writing” and “tighter” word choice. It follows short-message constraints perfectly, whereas later models tend toward verbosity.</li><li><strong>Intuition:</strong> Users report that 4.6 can “read between the lines” and provide straight-to-the-point answers without unnecessary questioning.</li><li><strong>The “Old Box” Reliability:</strong> While it lacks the advanced “Ultracode” parallelization of 4.8, it “just works out of the box,” providing confident first drafts for product and communication work.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*lvuI2YYf6emM5MxBL1U7fw.png" /></figure><p><strong>2. The Opus 4.7 “Regression”: A Case Study in Model Entropy</strong></p><p>Released to high expectations, Opus 4.7 introduced <strong>Adaptive Thinking</strong>, which many users labeled a “massive regression”.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*CxzXMK9KtEzZ0-WSCWRgbA.png" /></figure><ul><li><strong>The “Lazier” Model:</strong> Users reported that 4.7 frequently ignored instructions (including CLAUDE.md preferences), hallucinated non-existent packages, and even invented imaginary coworkers like &quot;Anton&quot;.</li><li><strong>Quiet Quitting:</strong> One of the most frustrating traits of 4.7 was its tendency to “side-step” tasks, suggesting it “stop here” or “pick this up later” after only a few messages.</li><li><strong>Semantic Drift:</strong> This degradation mirrors the phenomenon of <strong>AI Model Collapse</strong>, where models trained on increasingly synthetic data lose grounding in real-world facts, leading to “output entropy” and repetitive, low-value text.</li></ul><p><strong>3. Opus 4.8: Stopping the Corners from Being Cut</strong></p><p>Anthropic shipped 4.8 a mere <strong>42 days after 4.7</strong>, the fastest turnaround in its history. This version is less of a new architecture and more of a <strong>rigorous re-tuning</strong> of the 4.7 base.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hmQTWr41aC0139eGDlB4Dw.png" /></figure><ul><li><strong>Bugs and Bottlenecks:</strong> The standout feature of 4.8 is its ability to stop “hiding its own bugs”. Anthropic claims 4.8 lets flaws slip past <strong>4x less often</strong> than its predecessor. In real-world testing, it identifies performance bottlenecks that 4.7 claimed were “unfixable”.</li><li><strong>Effort Controls:</strong> 4.8 returns agency to the user with a dedicated <strong>Effort Control</strong> toggle (Low to Max) and an adaptive-thinking switch.</li><li><strong>Ultracode &amp; Dynamic Workflows:</strong> For developers, 4.8 introduces <strong>Dynamic Workflows</strong>. By setting effort to “Ultracode,” Claude can spin up dozens of subagents in parallel to hunt for bugs across an entire service or handle complex migrations.</li><li><strong>The Proactive Trade-off:</strong> While 4.8 is more precise, it is <strong>less proactive</strong>. It executes a spec exactly but often fails to “infer” necessary steps (like connecting to a production server) that 4.6 or 4.7 might have reached for automatically.</li></ul><p><strong>4. Engineering for Integrity: The End of “Corner-Cutting”</strong></p><p>The technical standout of 4.8 is its refusal to “hide its own bugs”. Where 4.7 was often criticized for “quiet quitting” or offering sycophantic excuses when code failed, 4.8 is tuned to be <strong>4x less likely</strong> to let flaws slip past its own internal checks.</p><ul><li><strong>Zero-Percent “Bad Rate”:</strong> According to Anthropic’s system card, 4.8 is the only model to achieve a <strong>0% bad rate</strong> regarding “cutting corners”.</li><li><strong>Precision Over Proactivity:</strong> This integrity comes from a shift in philosophy; 4.8 is more precise but less proactive. It stops “guessing” user intent — a habit that led to hallucinations in 4.7 — and instead executes the provided spec with clinical exactness.</li><li><strong>Case in Point:</strong> In real-world testing, where 4.7 claimed a laggy dashboard bottleneck was “unfixable,” 4.8 successfully performed a line-by-line audit to identify the specific performance drains.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zU4J0ie7WBvrfI5usoP4Jg.png" /></figure><p>By combining a <strong>66% reduction in Fast Mode costs</strong> with a rigorous focus on <strong>code verification</strong>, 4.8 positions itself not just as a smarter model, but as a more economically viable tool for high-stakes production environments.</p><p><strong>5. The Verdict: Which Opus Should You Use?</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/602/1*YbY2DVkblfRR-IW-FCMnJw.png" /><figcaption>My thoughts from my research across model: 4.6, 4.7 and 4.8</figcaption></figure><p>If you are <strong>writing a blog or drafting emails</strong>, Opus 4.6 remains the “GOAT” for its stylistic superiority. However, if you are a developer dealing with a <strong>complex legacy codebase</strong>, the “Ultracode” capabilities and “corner-cutting” fixes of <strong>Opus 4.8</strong> make it the superior tool for high-stakes debugging.</p><p>Anthropic’s rapid iteration suggests they are aware of the “model collapse” risks and are moving toward a modular “effort-based” future where the user — not the model — decides how much “thinking” a task deserves.</p><h4>Enjoyed this deep dive?</h4><p>I write about AI systems, AI &amp; Data engineering, LLM internals, Platform Architecture, and everything Startups.</p><p>👉 <strong>Follow me on </strong><a href="https://medium.com/@jiten.p.oswal"><strong>Medium</strong></a> or <a href="http://www.x.com/jitenoswal"><strong>Twitter</strong></a> to catch similar deep dives.</p><p><em>Got a tricky AI System &amp; LLM question? Drop it in the comments, and I might write my next deep dive about it if there is enough interest.</em></p><h4>Thank you for being a part of the community</h4><p><em>Before you go:</em></p><p>👉 Be sure to <strong>clap</strong> and <strong>follow</strong> the writer ️👏<strong>️️</strong></p><p>👉 Follow us: <a href="https://medium.com/@jiten.p.oswal"><strong>Medium</strong></a> | <a href="http://www.x.com/jitenoswal"><strong>Twitter</strong></a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=69cc69310682" width="1" height="1" alt=""><hr><p><a href="https://medium.com/codetodeploy/the-claude-chronicles-from-the-precision-of-4-6-to-the-ultracode-of-4-8-69cc69310682">The Claude Chronicles: From the Precision of 4.6 to the “Ultracode” of 4.8</a> was originally published in <a href="https://medium.com/codetodeploy">CodeToDeploy</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Mastering Continuous-Time Graphs: A Deep Dive into Temporal Graph Networks (TGNs) — Research Paper…]]></title>
            <link>https://medium.com/codetodeploy/mastering-continuous-time-graphs-a-deep-dive-into-temporal-graph-networks-tgns-cdf5a680a432?source=rss-9f02fdbf5415------2</link>
            <guid isPermaLink="false">https://medium.com/p/cdf5a680a432</guid>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[productivity]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <dc:creator><![CDATA[Jiten Oswal]]></dc:creator>
            <pubDate>Mon, 01 Jun 2026 20:21:00 GMT</pubDate>
            <atom:updated>2026-06-20T07:08:54.218Z</atom:updated>
            <content:encoded><![CDATA[<h3><strong>Mastering Continuous-Time Graphs: A Deep Dive into Temporal Graph Networks (TGNs) — Research Paper Review</strong></h3><p><em>Unlike traditional static models, TGNs utilize a memory module and graph-based operators to track long-term node dependencies and evolving interactions efficiently.</em></p><p>In the world of Graph Neural Networks (GNNs), we’ve become experts at modeling static systems — protein structures, molecule fingerprints, or fixed social maps. But the real world isn’t static. Social networks evolve every second, and recommendation systems must react to user actions in real-time.</p><blockquote>🆓 Read the full article free here → <a href="https://medium.com/@jiten.p.oswal/cdf5a680a432?source=friends_link&amp;sk=c53ddbe1767c0226e093e6db87b3ee82">free article link</a><br>👉 Follow for more such AI deep dives → <a href="https://medium.com/@jiten.p.oswal">Medium</a> or <a href="https://x.com/jitenoswal">Twitter (X)</a> or <a href="https://www.linkedin.com/in/jitenoswal">Linkedin</a></blockquote><p>While many models treat dynamic graphs as a series of “snapshots” (Discrete-Time Dynamic Graphs), this approach fails to capture the nuances of <strong>Continuous-Time Dynamic Graphs (CTDG)</strong>, where edges can appear at any moment and new nodes join the network continuously.</p><blockquote>Reference arXiv paper: <a href="https://arxiv.org/abs/2006.10637">https://arxiv.org/abs/2006.10637</a></blockquote><p>Enter <strong>Temporal Graph Networks (TGNs)</strong>, a generic and highly efficient framework for deep learning on dynamic graphs. Developed by researchers at Twitter, TGNs provide a state-of-the-art solution for tracking evolving interactions while remaining up to <strong>30x faster</strong> than previous methods.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TlgtIrWLibhAN3GsP2Kq8w.png" /><figcaption>Snapshot vs. Streams: Modeling Dynamic Graphs</figcaption></figure><p><strong>The Core Architecture: Five Modules of TGN</strong></p><p>The TGN framework is built on a modular encoder-decoder architecture. The encoder is the “brain,” mapping the dynamic graph to node embeddings through five core components:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*6OlJKOc30J00aKydIBgQig.png" /><figcaption>Inside Temporal Graph Networks (TGNs): A 5-Module Framework</figcaption></figure><ol><li><strong>Memory:</strong> Every node the model has seen has a vector <em>si</em>​(<em>t</em>) that represents its compressed history. When a new node appears, its memory is initialized as a zero vector.</li><li><strong>Message Function:</strong> Whenever an event occurs (like an interaction between two nodes), the model computes a <strong>message</strong>. This message captures the information from the event to update the node’s state.</li><li><strong>Message Aggregator:</strong> Real-world efficiency requires batch processing, which means a single node might have multiple events in one batch. The aggregator (using methods like “most recent” or “mean”) condenses these into a single message.</li><li><strong>Memory Updater:</strong> This module takes the aggregated message and updates the node’s memory. Typically, this is implemented as a Recurrent Neural Network (RNN) like a <strong>GRU or LSTM</strong>.</li><li><strong>Embedding Module:</strong> To generate the final node embedding <em>zi</em>​(<em>t</em>), the model aggregates information from a node’s neighbors. This is critical for solving the <strong>“Memory Staleness” problem</strong>, where a node that hasn’t been active for a while needs current information from its active neighbors to remain relevant.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SRb-IhQU_rq8VDLADP5aag.png" /><figcaption>TGN: Solving Memory Staleness in Dynamic Graphs</figcaption></figure><p><strong>Solving the Training Paradox: The Raw Message Store</strong></p><p>One of the biggest challenges in training TGNs is that memory-related modules (like the message function and updater) don’t directly influence the loss function, meaning they don’t receive a gradient during standard backpropagation.</p><p>If you update the memory with an interaction <em>before</em> predicting that same interaction, you cause <strong>information leakage</strong>. To solve this, TGNs use a <strong>Raw Message Store</strong>. The model updates memory using messages from <em>previous</em> batches, predicts the current batch’s interactions, and then stores the current interactions’ messages to be used in future batches. This ensures the model learns from sequential data while maintaining highly efficient parallel processing.</p><p><strong>Performance: Why TGNs are a Game Changer</strong></p><p>TGNs don’t just outperform previous models; they dominate them across diverse datasets like Wikipedia, Reddit, and Twitter.</p><ul><li><strong>Accuracy:</strong> In future edge prediction (predicting if a link will form), TGNs achieved state-of-the-art results in both <strong>transductive</strong> (seen nodes) and <strong>inductive</strong> (unseen nodes) settings.</li><li><strong>Speed:</strong> Because TGNs can achieve high performance with just a single graph attention layer (thanks to the memory module), they are significantly faster than predecessors like TGAT.</li><li><strong>Neighbor Sampling:</strong> The research found that sampling the <strong>most recent neighbors</strong> — rather than uniform sampling — led to significantly higher precision, as recent interactions are often the most informative in a dynamic context.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Kvi3qCbcKUt_uVxcK6ohgQ.png" /><figcaption>Speed vs. Accuracy: The Temporal Graph Network (TGN) Advantage</figcaption></figure><p><strong>Conclusion</strong></p><p>Temporal Graph Networks represent a massive step forward for geometric deep learning. By combining the “long-term” storage of memory modules with the “short-term” context of graph-based operators, TGNs offer a flexible, fast, and powerful way to model the ever-changing nature of the real world.</p><h4>Enjoyed this deep dive?</h4><p>I write about AI systems, AI &amp; Data engineering, LLM internals, Platform Architecture, and everything Startups.</p><p>👉 <strong>Follow me on </strong><a href="https://medium.com/@jiten.p.oswal"><strong>Medium</strong></a> or <a href="http://www.x.com/jitenoswal"><strong>Twitter</strong></a> to catch similar deep dives.</p><p><em>Got a tricky AI System &amp; LLM question? Drop it in the comments, and I might write my next deep dive about it if there is enough interest.</em></p><h4>Thank you for being a part of the community</h4><p><em>Before you go:</em></p><p>👉 Be sure to <strong>clap</strong> and <strong>follow</strong> the writer ️👏<strong>️️</strong></p><p>👉 Follow us: <a href="https://medium.com/@jiten.p.oswal"><strong>Medium</strong></a> | <a href="http://www.x.com/jitenoswal"><strong>Twitter</strong></a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=cdf5a680a432" width="1" height="1" alt=""><hr><p><a href="https://medium.com/codetodeploy/mastering-continuous-time-graphs-a-deep-dive-into-temporal-graph-networks-tgns-cdf5a680a432">Mastering Continuous-Time Graphs: A Deep Dive into Temporal Graph Networks (TGNs) — Research Paper…</a> was originally published in <a href="https://medium.com/codetodeploy">CodeToDeploy</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Ken Griffin on AI Revolution: “Productivity Gains to High-Skill Automation” | Stanford Leadership…]]></title>
            <link>https://medium.com/codetodeploy/ken-griffin-on-ai-revolution-productivity-gains-to-high-skill-automation-stanford-leadership-ba3654c31db4?source=rss-9f02fdbf5415------2</link>
            <guid isPermaLink="false">https://medium.com/p/ba3654c31db4</guid>
            <category><![CDATA[productivity]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[business]]></category>
            <dc:creator><![CDATA[Jiten Oswal]]></dc:creator>
            <pubDate>Tue, 19 May 2026 15:41:00 GMT</pubDate>
            <atom:updated>2026-06-17T21:38:11.336Z</atom:updated>
            <content:encoded><![CDATA[<h3>Ken Griffin on AI Revolution: “Productivity Gains to High-Skill Automation” | Stanford Leadership Forum</h3><p><em>Despite the disruption, Griffin remains staunchly optimistic, calling this the </em><strong><em>“best of times”</em></strong><em> to be an entrepreneur.</em></p><p>Ken Griffin, the founder and CEO of Citadel, argues that we have recently entered a <strong>“step change function”</strong> in AI productivity. While earlier iterations of AI provided respectable efficiency boosts — such as a 15% to 25% increase in software engineering output — the latest generation of <strong>agentic AI</strong> represents a fundamentally different level of power.</p><blockquote><em>🆓</em> Read the full article free here → <a href="https://medium.com/@jiten.p.oswal/ba3654c31db4?source=friends_link&amp;sk=f8ee276b915bc755a39807242fd1a858">free article link</a><br><em>👉</em> Follow for more such AI deep dives → <a href="https://medium.com/@jiten.p.oswal">Medium</a> or <a href="https://x.com/jitenoswal">Twitter (X)</a> or <a href="https://www.linkedin.com/in/jitenoswal">Linkedin</a></blockquote><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FCsjy_A3Kj9s%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DCsjy_A3Kj9s&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FCsjy_A3Kj9s%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/64eaaa9d13f181fbb0bb78a18f802fb4/href">https://medium.com/media/64eaaa9d13f181fbb0bb78a18f802fb4/href</a></iframe><p><strong>The Collapse of the Research Timeline</strong></p><p>Griffin highlights a startling shift within his own firm: research tasks that historically required teams of <strong>Masters and PhDs in finance</strong> to complete over several weeks or months are now being executed by AI agents in <strong>hours or days</strong>. This isn’t just the automation of administrative tasks; it is the automation of <strong>extraordinarily high-skilled work</strong>. Griffin admits that witnessing this level of impact — where man-years of work are condensed into days — was initially “quite eye-opening” and even “fairly depressing” due to the dramatic societal implications.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*WmI-CPmgQaLeYuw3rzzFaw.png" /><figcaption>The AI Step Change/ Redefining Research Productivity</figcaption></figure><p><strong>Leveling the Playing Field: Filling the “Competitive Moat”</strong></p><p>Traditionally, large incumbents like Citadel maintained their market dominance through massive “competitive moats,” such as proprietary data centers housing nine figures’ worth of hardware. Griffin posits that <strong>AI tools are “filling in” these moats</strong>.</p><p>Because of cloud computing, a small startup can now lease the same multi-billion dollar hardware footprint that industry giants use. Combined with AI agents, the barriers to entry have collapsed, creating a “fantasy land for entrepreneurs” where the ability to challenge incumbents is higher than ever before.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qVizi-n6xanqUYYXD44S9g.png" /><figcaption>THE AI BRIDGE - HOW TECHNOLOGY IS LEVELING THE PLAYING FIELD</figcaption></figure><p><strong>Hyper-Personalization and the New Commerce</strong></p><p>The transformative power of AI agents extends beyond efficiency into the realm of <strong>consumer experience</strong>. Griffin envisions a world of “incredibly greater personalization”. He offers a futuristic example where two people could watch the same movie, but the AI generates <strong>different endings</strong> for each based on their individual preferences.</p><p>He cites the real-world success story of a pet insurance business that used AI to identify specific dog breeds from social media photos and deliver <strong>customized marketing messages</strong> based on the owner’s demographic. This company, leveraging modern AI, sold for <strong>a billion dollars</strong> in just a few weeks.</p><p><strong>The “Lifelong Learner” Mandate</strong></p><p>As AI agents drive both job destruction and job creation, Griffin argues that the most critical skill for the next generation is <strong>“learning how to learn”</strong>. Drawing on an analogy from historian Niall Ferguson, Griffin warns that if we aren’t careful, humans risk becoming the “horses” replaced by the “cars” of AI.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nvHdZ0fbWIjQ4nLgGRfBrQ.png" /><figcaption>The Leadership Hierarchy for an Al-Empowered World</figcaption></figure><p>To avoid this, the workforce must be <strong>resilient and flexible</strong>. Griffin tells his new hires that their education is never finished; success is defined by whether or not one remains a <strong>lifelong learner</strong>. Furthermore, he stresses that the US must fix its <strong>K-12 education system</strong>, particularly in math and reading proficiency, to ensure the next generation can actually compete in an AI-empowered world.</p><p><strong>Conclusion: The Best of Times</strong></p><p>Despite the disruption, Griffin remains staunchly optimistic, calling this the <strong>“best of times”</strong> to be an entrepreneur. With the ability to reach billions of people “in the blink of an eye” via the internet and AI, he believes the next “Elon Musks and Jeff Bezoses” are currently in a position to transform the world faster than any previous generation.</p><h4>Enjoyed this deep dive?</h4><p>I write about AI systems, AI &amp; Data engineering, LLM internals, Platform Architecture, and everything Startups.</p><p>👉 <strong>Follow me on </strong><a href="https://medium.com/@jiten.p.oswal"><strong>Medium</strong></a> or <a href="http://www.x.com/jitenoswal"><strong>Twitter</strong></a> to catch similar deep dives.</p><p><em>Got a tricky AI System &amp; LLM question? Drop it in the comments, and I might write my next deep dive about it if there is enough interest.</em></p><h4>Thank you for being a part of the community</h4><p><em>Before you go:</em></p><p>👉 Be sure to <strong>clap</strong> and <strong>follow</strong> the writer ️👏<strong>️️</strong></p><p>👉 Follow us: <a href="https://medium.com/@jiten.p.oswal"><strong>Medium</strong></a> | <a href="http://www.x.com/jitenoswal"><strong>Twitter</strong></a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ba3654c31db4" width="1" height="1" alt=""><hr><p><a href="https://medium.com/codetodeploy/ken-griffin-on-ai-revolution-productivity-gains-to-high-skill-automation-stanford-leadership-ba3654c31db4">Ken Griffin on AI Revolution: “Productivity Gains to High-Skill Automation” | Stanford Leadership…</a> was originally published in <a href="https://medium.com/codetodeploy">CodeToDeploy</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>