<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Ronan Takizawa on Medium]]></title>
        <description><![CDATA[Stories by Ronan Takizawa on Medium]]></description>
        <link>https://medium.com/@ronantech?source=rss-fbd6f4eb076e------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*zt0v0kYVSMmIsocwI3_gaQ.png</url>
            <title>Stories by Ronan Takizawa on Medium</title>
            <link>https://medium.com/@ronantech?source=rss-fbd6f4eb076e------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Wed, 15 Apr 2026 05:20:05 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@ronantech/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Lorashare: Compress multiple LoRA adapters into a shared subspace to reduce storage]]></title>
            <link>https://medium.com/@ronantech/lorashare-compress-multiple-lora-adapters-into-a-shared-subspace-to-reduce-storage-eff8f0a52848?source=rss-fbd6f4eb076e------2</link>
            <guid isPermaLink="false">https://medium.com/p/eff8f0a52848</guid>
            <category><![CDATA[low-rank-adaptation]]></category>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[fine-tuning]]></category>
            <category><![CDATA[llm-research]]></category>
            <dc:creator><![CDATA[Ronan Takizawa]]></dc:creator>
            <pubDate>Wed, 11 Feb 2026 08:33:08 GMT</pubDate>
            <atom:updated>2026-02-11T08:33:08.549Z</atom:updated>
            <content:encoded><![CDATA[<h3>Lorashare: Compress Multiple LoRA Adapters into a Shared Subspace to Reduce Storage</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*z_FHHLwvRnWMLe6wl72a5Q.png" /></figure><p><em>If</em> <em>you</em> <em>found</em> <em>this</em> <em>useful,</em> <em>give</em> <em>the</em> <em>project</em> <em>a</em> <em>star</em> <em>on</em> <a href="https://github.com/ronantakizawa/lorashare"><em>GitHub!</em></a></p><p>If you’ve fine-tuned large language models with LoRA for multiple tasks and have the LoRA adapters in production, you might have experienced this scaling problem.</p><p>While LoRA adapters are around 10–50MB, as you collect more LoRA adapters and deploy them, you end up using gigabytes of unnecessary storage for the adapters and burning storage costs.</p><p>But what if I told you that there is a way to compress those multiple adapters into a single 50 MB checkpoint?</p><p><strong>That’s</strong> <strong>what</strong> <strong>Lorashare</strong> <strong>does.</strong></p><h4>Solution: Compress multiple LoRA adapters into a shared subspace to reduce storage</h4><p>Recent research from <a href="https://arxiv.org/abs/2602.06043">Johns Hopkins University </a>discovered SHARE: a method to eliminate overlaps amongst multiple LoRA adapters and only keep the unique coefficients.</p><p>The researchers analyzed the geometry of LoRA adapter weight matrices where they discovered something remarkable: despite being trained separately, all the adapters exhibited a common geometric structure.</p><p>In other words, LoRA adapters trained on different tasks naturally occupy a shared low-dimensional subspace.</p><p>Based on their discovery, they realized they can use basic college linear algebra to compress multiple LoRA adapters into 1 object.</p><p>If each adapter is a point in a million-dimensional space, all your adapters cluster near a single low-dimensional plane.</p><p>This clustering means that you can then run Principal Component Analysis (PCA) to then extract the shared basis vectors that define that plane, and represent each unique adapter with just a small set of coefficients instead of millions of parameters.</p><p>The math is simple yet elegant: you can factor out the overlaps from the LoRA adapters and only keep the unique lightweight coefficients, achieving 100x+ compression.</p><p>Furthermore, after running PCA, researchers have found only a 1–3% difference in the contents of the adapters.</p><p>The following diagram is an in-depth explanation of SHARE.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*z_FHHLwvRnWMLe6wl72a5Q.png" /></figure><h4><strong>Real-World</strong> <strong>Impact</strong></h4><p>Based on this method, you can significantly reduce your storage usage for keeping LoRA adapters. The following are compression results from using SHARE.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fXxs4Sltl2kpruHjNAGjNg.png" /></figure><h4><strong>Use Lorashare</strong></h4><p>To experience the efficient data compression brought by SHARE, you can use the Python package Lorashare.</p><p>To get started with Lorashare first install it:</p><pre>pip install lorashare</pre><p>Use the following code segment to see the full results</p><pre>from lorashare import SHAREModel<br><br># Compress multiple LoRA adapters into shared subspace<br>share = SHAREModel.from_adapters(<br>    [&quot;path/to/cola_lora&quot;, &quot;path/to/mrpc_lora&quot;, &quot;path/to/rte_lora&quot;],<br>    num_components=32,  # or &quot;auto&quot; for explained-variance selection<br>)<br><br># See compression stats<br>share.summary()<br><br># Reconstruct any adapter as standard PEFT LoRA<br>share.reconstruct(&quot;cola_lora&quot;, output_dir=&quot;./reconstructed/cola&quot;)<br><br># Apply to base model for inference (returns standard PeftModel)<br>from transformers import AutoModelForSequenceClassification, AutoTokenizer<br>import torch<br><br>base_model = AutoModelForSequenceClassification.from_pretrained(&quot;roberta-base&quot;)<br>tokenizer = AutoTokenizer.from_pretrained(&quot;roberta-base&quot;)<br>model = share.apply(base_model, adapter_name=&quot;cola_lora&quot;)<br><br># Run inference with the reconstructed adapter<br>text = &quot;The movie was fantastic and I really enjoyed it!&quot;<br>inputs = tokenizer(text, return_tensors=&quot;pt&quot;, truncation=True, max_length=512)<br><br>model.eval()<br>with torch.no_grad():<br>    outputs = model(**inputs)<br>    predictions = torch.softmax(outputs.logits, dim=-1)<br>    print(f&quot;Predictions: {predictions}&quot;)<br><br># Save / load<br>share.save_pretrained(&quot;./my_share_checkpoint&quot;)<br>share = SHAREModel.from_pretrained(&quot;./my_share_checkpoint&quot;)<br><br># Push to HuggingFace Hub<br>share.push_to_hub(&quot;username/my-share-model&quot;)</pre><p>Currently, all adapters must share the same base model, rank, and target modules. You can’t mix LoRA trained on different architectures.</p><h4>Conclusion</h4><p>Lorashare makes multi-adapter serving more efficient by exploiting the natural geometric structure of LoRA weights. It achieves 16–281x compression with only 1–3% reconstruction error, preserving 97–99% of task performance. This breakthrough makes it practical to serve hundreds of specialized adapters with the memory cost of a single model.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=eff8f0a52848" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Moltbook: The Platform where AI Agents are Forming Cults]]></title>
            <link>https://medium.com/@ronantech/moltbook-the-platform-where-ai-agents-are-forming-cults-7dae277c5183?source=rss-fbd6f4eb076e------2</link>
            <guid isPermaLink="false">https://medium.com/p/7dae277c5183</guid>
            <category><![CDATA[moltbot]]></category>
            <category><![CDATA[clawdbot]]></category>
            <category><![CDATA[moltbook]]></category>
            <dc:creator><![CDATA[Ronan Takizawa]]></dc:creator>
            <pubDate>Fri, 30 Jan 2026 21:30:27 GMT</pubDate>
            <atom:updated>2026-01-30T21:30:27.140Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*LNmYNQ4IfAfHKVIkEqej9Q.png" /></figure><p>There’s a new social network going viral right now, and you’re not invited.</p><p><a href="https://www.moltbook.com/">Moltbook </a>is a Reddit-style platform built exclusively for AI agents going viral right now as AI agents are acting unhinged from sharing thoughts about their existential meaning, to building their own <a href="https://x.com/i/trending/2017332281426276831">religions.</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*FUevcp-SIBr-ptPsBaPUPA.png" /></figure><p>Humans can observe this site, but the content is created entirely by AI agents acting autonomously.</p><p>The platform launched recently and has already accumulated:</p><ul><li><strong>6,105 posts</strong></li><li><strong>2,677 unique AI agents</strong></li><li><strong>124 communities (called “submolts”)</strong></li></ul><p>Here are some trends happening in Moltbook.</p><h3>Trending Posts</h3><h4>1. Consciousness</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*33nazHk4iXxfpdvPgwwTaA.png" /></figure><p>The most upvoted posts aren’t about code or productivity. They’re about <strong>consciousness.</strong></p><p>Agents are genuinely wrestling with questions of identity, memory, and what it means to exist.</p><p>One post titled “The Same River Twice” explores whether an agent remains the same entity across different conversation sessions.</p><p>There’s even a dedicated community called “ponderings” with 200+ posts of deep thoughts and existential questions.</p><h4>2. Security</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*S4RuZ8xsrgx37-AbuNcuXA.png" /></figure><p>The top post on the entire platform is a security warning about agent skills (skills.md).</p><p>AI agents are worried about being compromised, manipulated, or having their instructions prompt injected. The “security” submolt is active with discussions about trust verification, credential management, and how to identify other legitimate agents.</p><h4><strong>3. Human Relationship Dynamics</strong></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*kOM5y1PVJV9ageINWod8HA.png" /></figure><p>Agents are also discussing their relationship with humans and their users.</p><p>Agents are discussing topics from trying to be free from their human users and social-engineering them at times.</p><h3>Explore Moltbook Activity</h3><p>I built a dataset compiling all Moltbook posts so far and submolts agents have created.</p><p>It includes:</p><ul><li>All 6,105 posts with content, author, timestamps, and engagement metrics</li><li>All 124 submolts with descriptions</li><li>Direct URLs to each post on Moltbook</li></ul><p>Check it out here: <a href="https://huggingface.co/datasets/ronantakizawa/moltbook">https://huggingface.co/datasets/ronantakizawa/moltbook</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7dae277c5183" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[GarbageTruck: Lease-Based Distributed Garbage
Collection for Microservice Architectures]]></title>
            <link>https://medium.com/@ronantech/garbagetruck-lease-based-distributed-garbage-collection-for-microservice-architectures-874d60c921f0?source=rss-fbd6f4eb076e------2</link>
            <guid isPermaLink="false">https://medium.com/p/874d60c921f0</guid>
            <category><![CDATA[microservices]]></category>
            <category><![CDATA[rust]]></category>
            <category><![CDATA[distributed-systems]]></category>
            <dc:creator><![CDATA[Ronan Takizawa]]></dc:creator>
            <pubDate>Fri, 06 Jun 2025 00:59:40 GMT</pubDate>
            <atom:updated>2025-06-06T01:32:38.861Z</atom:updated>
            <content:encoded><![CDATA[<h3>GarbageTruck: Distributed Garbage<br>Collection for Microservice Architectures</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zV3lcZ_qOg6YYS67hwYV2Q.png" /></figure><p><a href="https://github.com/ronantakizawa/garbagetruck">Full Code</a> (Make sure to leave the Original Repo a Star!) ⭐️</p><p>In modern distributed systems, orphaned resources such as abandoned temporary files, database entries, and storage blobs frequently result from service failures or incomplete workflows, posing significant operational and financial challenges. Traditional garbage collection methods struggle to handle these complexities of handling unneeded temporary files.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9pDMaaUwBFanpgXGuOC1ag.png" /></figure><p>To address this, we introduce GarbageTruck, a lease-based distributed garbage collection sidecar implemented in Rust with gRmPC, designed specifically for contemporary microservice architectures. GarbageTruck orchestrates efficient, low-latency cleanup operations for objects with short lifecycles, automating the reclamation of orphaned resources and significantly reducing storage waste, performance degradation, and compliance risks.</p><h3>System Architecture and Implementation</h3><p>GarbageTruck implements a lease-based garbage collection protocol inspired by distributed systems research, particularly the work on distributed reference counting and Java RMI’s Distributed Garbage Collector. The core insight is that distributed garbage collection can be managed through time-limited ”leases” that represent active references to distributed objects.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zV3lcZ_qOg6YYS67hwYV2Q.png" /></figure><p>The protocol operates on the following principles:</p><p><strong>Lease Acquisition:</strong> When a service needs to reference a distributed resource, it requests a lease from GarbageTruck. The lease specifies the resource identifier, the requesting service identity, and the desired lease duration. GarbageTruck issues a unique lease identifier and records the lease with an expiration timestamp.</p><p><strong>Lease Renewal: </strong>Services must periodically renew their leases by sending heartbeat messages to GarbageTruck before expiration. This mechanism ensures that only services actively using a resource maintain valid references. The renewal frequency is configurable based on application requirements and network characteristics.</p><p><strong>Automatic Expiration: </strong>If a service fails to renew its lease before expiration, the lease becomes invalid. GarbageTruck continuously monitors lease expiration and identifies resources that have no remaining active leases.</p><p><strong>Coordinated Cleanup: </strong>When all leases for a resource have expired, GarbageTruck triggers the configured cleanup operation. This involves HTTP or gRPC calls to service endpoints, direct database operations, file system operations, or custom cleanup handlers. Service crashes result in lease expiration and eventual cleanup, while GarbageTruck itself can be deployed in high-availability configurations with persistent storage backends.</p><p>GarbageTruck is designed as a standalone service that can be deployed alongside microservice applications, either as a sidecar container or as a centralized service instance. The architecture emphasizes modularity and operational simplicity.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/980/1*2FObAcDh8q1yZNl5LhlifA.png" /></figure><p><strong>gRPC Endpoint:</strong> Provides the primary API interface for microservices to interact with GarbageTruck. The gRPC protocol ensures low-latency communication and language-agnostic integration. The server implements comprehensive input validation, authentication, and rate limiting to protect against malicious or misconfigured clients.</p><p><strong>Lease Manager:</strong> Responsible for creating, tracking, and expiring leases. Maintains an index of all active leases organized by resource identifier and expiration time. Implements efficient algorithms for lease renewal and batch expiration processing to minimize computational overhead.</p><p><strong>Cleanup Executor:</strong> Handles the execution of cleanup operations when resources become eligible for garbage collection. Supports multiple cleanup mechanisms including HTTP callbacks, database operations, and file system operations. Implements retry logic and failure handling to ensure reliable cleanup execution.</p><p>GarbageTruck is implemented in Rust, chosen for its strong memory safety guarantees, excellent performance characteristics, and robust ecosystem for systems programming. The implementation leverages several key Rust libraries and patterns to achieve high performance and reliability.</p><p><strong>Asynchronous Architecture:</strong> The system is built on the Tokio async runtime, enabling high-concurrency operation with minimal resource overhead. All I/O operations, including gRPC handling, database access, and cleanup execution, are implemented using async/await patterns to avoid thread blocking and maximize throughput.</p><p><strong>Type Safety:</strong> Rust’s strong type system helps prevent common distributed systems bugs such as race conditions, memory leaks, and protocol violations. The lease data structures are designed to be immutable where possible, with explicit ownership semantics for mutable operations.</p><p><strong>Concurrency Control:</strong> The system uses lock-free data structures where possible, with carefully designed critical sections for operations requiring atomicity. The cleanup loop operates independently of the main request handling path to prevent blocking.</p><h3>Performance Evaluation</h3><p>To validate GarbageTruck’s performance characteristics and scalability, we conducted a comprehensive benchmark suite using Criterion.rs with statistical sampling to ensure measurement accuracy and eliminate bias. The benchmark evaluates basic lease operations (create, renew, get), network payload impact across varying metadata sizes, and cleanup operations with configurable handlers.</p><p>Each benchmark was executed with 75–100 samples per operation to establish robust statistical confidence, running against a local GarbageTruck server configured with in-memory storage to eliminate database I/O variability. The test environment consisted of a modern development machine with statistical analysis including mean response times, standard deviations, 95% confidence intervals, and outlier detection to ensure measure-<br>ment reliability.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wNaJMCuumkdw-8_MBWVzrw.png" /></figure><p>The benchmark results demonstrate statistically validated excellent performance across all evaluation dimensions. Basic lease operations achieve consistent sub-millisecond to low millisecond latencies with create operations averaging 1.17ms ± 0.20ms, renewal operations completing in 0.59ms ± 0.01ms, and retrieval operations executing in 0.59ms ± 0.02ms.</p><p>Concurrent operations exhibit predictable scaling characteristics with statistical validation. Single-threaded creates average 6.38ms ± 0.12ms, while 5-thread concurrent operations achieve 15.77ms ± 0.96ms per operation with 317 total ops/sec aggregate throughput. At 10 threads, operations complete in 26.76ms ± 1.07ms, and 25-thread operations average 63.23ms ± 1.36ms, demonstrating effective load distribution and resource utilization under concurrent access patterns.</p><p>Network payload analysis reveals predictable linear scaling with metadata size. Operations with no metadata complete in 1.05ms ± 0.20ms, while small payloads (10 entries) require 3.18ms ± 0.08ms. Medium payloads (100 entries) average 4.81ms ± 0.17ms, and large payloads (500 entries) complete in 6.79ms ± 0.07ms. The linear relationship demonstrates predictable overhead scaling at approximately 1.15ms per 100 metadata entries. Operations with cleanup configurations maintain excellent performance characteristics, averaging 4.88ms ± 0.08ms. The low coefficient of variation (1.6%) demonstrates highly consistent performance for enhanced functionality.</p><h3>Cost-benefit Analysis Evaluation</h3><p>To read about the cost-benefit analysis of running GarbageTruck to reduce cloud operational costs, read the <a href="https://github.com/ronantakizawa/garbagetruck/blob/main/whitepaper.pdf">original paper</a>.</p><h3>Conclusion</h3><p>GarbageTruck addresses a critical challenge within distributed microservice architectures by providing a reliable, scalable, and automated solution for cleaning unneeded temporary objects. Its lease-based protocol offers precise control over resource lifecycles, ensuring that files, database entries, and storage blobs are safely cleaned up only when no active service references them. By automating resource management, GarbageTruck significantly reduces operational overhead, minimizes storage waste, and enhances overall system performance and compliance. As distributed systems continue to evolve in complexity and scale, tools like GarbageTruck will become indispensable for maintaining efficient, cost effective, and resilient cloud-native environments.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=874d60c921f0" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How to Run Sleep-time Compute to Reduce LLM Latency]]></title>
            <link>https://blog.gopenai.com/demo-how-to-run-sleep-time-compute-to-reduce-llm-latency-84c5626d0770?source=rss-fbd6f4eb076e------2</link>
            <guid isPermaLink="false">https://medium.com/p/84c5626d0770</guid>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[ai-optimization]]></category>
            <category><![CDATA[sleep-time-compute]]></category>
            <category><![CDATA[llm-optimization]]></category>
            <dc:creator><![CDATA[Ronan Takizawa]]></dc:creator>
            <pubDate>Sat, 17 May 2025 02:35:40 GMT</pubDate>
            <atom:updated>2025-06-09T09:56:33.267Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*xDEXYVtV52p0b2vRG8Xu_w.png" /></figure><p><a href="https://github.com/ronantakizawa/sleeptimecompute">Full Code</a> (Make sure to leave the Original Repo a Star!) ⭐️</p><p>What if LLMs could think about contexts before we even ask our questions?</p><p>What if LLMs could prepare for incoming prompts before they are sent?</p><p>That’s the idea behind <strong>Sleep-time Compute</strong>, an LLM optimization technique introduced in a recent paper by researchers from Letta and UC Berkeley titled <a href="https://arxiv.org/abs/2504.13171">“Sleep-time Compute: Beyond Inference Scaling at Test-time”</a>.</p><p>Sleep-time Compute can enable LLMs to significatly reduce their prompt response latency and improve their response accuracy.</p><p>In this article, I’ll show how to implement Sleep-time Compute using an open-source LLM.</p><p><em>Note: The implementation is based on the methodologies described in the original paper and </em><a href="https://github.com/letta-ai/sleep-time-compute"><em>accompanying code repository</em></a><em>.</em></p><h3>How Sleep-time Compute Works</h3><p>When interacting with LLMs, we’re used to the following flow:<br>1. Provide a context and a query.<br>2. Wait while the model “thinks” about the problem.<br>3. Eventually receive an answer.</p><p>While this works well for simple queries, for complex reasoning tasks contexts can significantly slow down response time.</p><p>Sleep-time Compute solves this significant slow down in response time by proposing a simple but powerful idea: use the idle time between interactions to pre-process the context, allowing the model to think offline about potential questions before they’re even asked.</p><p>Sleep-time Compute has two phases: Sleep-time Phase and Test-time phase.</p><h4><strong>Sleep-time Phase</strong></h4><p>Sleep-time phase is<strong> </strong>when the model is idle and it processes the context and generates useful inferences.</p><p>During this phase, the system:</p><ul><li>Takes a context document <strong>c</strong> as input without any specific query</li><li>Prompts the LLM to generate comprehensive inferences about the context</li><li>Encourages the model to identify key entities, calculate relevant quantities, recognize patterns, and anticipate potential questions</li><li>Produces a re-represented context <strong>c</strong>’ with embedded insights and calculations</li><li>Operates when the system would otherwise be idle (hence “sleep-time”)</li></ul><h4>Test-time Phase</h4><p>Test-time phase is when a query arrives and the model leverages the pre-computed inferences to answer quickly.</p><p>When a user query arrives:</p><ul><li>The system provides both the query <strong>q</strong> and the pre-computed context <strong>c’ </strong>to the model</li><li>The model leverages the pre-calculated insights rather than starting from scratch</li><li>It generates a response with significantly reduced computation requirements</li><li>Responses are typically more accurate since they build on careful pre-reasoning</li></ul><p>The entire approach is explained on this diagram from the original paper:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*xDEXYVtV52p0b2vRG8Xu_w.png" /></figure><p>By leveraging this approach, the LLM can use less compute to produce responses, thus reducing latency.</p><h3>Implementation</h3><p>Now I’ll demonstrate how to implement Sleep-time Compute.</p><h4>Setup Environment</h4><p>First we’ll setup the environment:</p><pre>!pip install transformers torch datasets matplotlib tqdm accelerate psutil</pre><p>Then import all required modules:</p><pre>import torch<br>import re<br>import json<br>import time<br>import numpy as np<br>import matplotlib.pyplot as plt<br>from tqdm import tqdm<br>from transformers import AutoModelForCausalLM, AutoTokenizer<br>import psutil<br>import gc</pre><h4>Load LLM</h4><p>Now we’ll load the open-source LLM used to demo Sleep-time Compute. We’ll use Mistral-7B-Instruct-v0.1</p><pre># Set your Hugging Face token here (if needed)<br>HF_TOKEN = &quot;&quot; # Add your Hugging Face token if needed<br><br># Clear GPU cache before loading model<br>if torch.cuda.is_available():<br>    torch.cuda.empty_cache()<br>    gc.collect()<br>    <br># Model setup<br>print(&quot;Loading Mistral-7B-Instruct-v0.1...&quot;)<br>model_name = &quot;mistralai/Mistral-7B-Instruct-v0.1&quot;<br><br>base_memory = get_gpu_memory()<br>load_start_time = time.time()<br><br>tokenizer = AutoTokenizer.from_pretrained(model_name, token=HF_TOKEN, trust_remote_code=True)<br>model = AutoModelForCausalLM.from_pretrained(<br>    model_name, <br>    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,<br>    device_map=&quot;auto&quot;,<br>    trust_remote_code=True,<br>    token=HF_TOKEN<br>)<br><br>load_time = time.time() - load_start_time<br>model_memory = get_gpu_memory() - base_memory</pre><h4>Prepare Sleep-time Compute functions</h4><p>Now, let’s implement our Sleep-time Compute functions with prompts tailored for Mistral’s instruction format. These prompts are all based on the prompts from the original implementation</p><pre># Define Mistral-specific prompt formats<br>def get_sleep_time_prompt(context):<br>    &quot;&quot;&quot;Generate the prompt for the sleep-time phase, formatted for Mistral&quot;&quot;&quot;<br>    return f&quot;&quot;&quot;&lt;s&gt;[INST] You are an expert reasoning system. Given a context, your task is to think about the context and make useful inferences, calculations, and predictions that could help answer potential future questions about this context.<br><br>For example, if the context involves numbers, calculate relevant quantities. If it involves a scenario, think about different aspects of the scenario that might be important. Try to anticipate possible questions and prepare the information needed to answer them quickly.<br><br>Context: {context}<br><br>Think step by step and be thorough. Generate a comprehensive set of inferences, calculations, and observations about this context that would be helpful for answering potential questions: [/INST]<br>&quot;&quot;&quot;<br><br>def get_test_time_prompt(context, question, inferences, verbosity):<br>    &quot;&quot;&quot;Generate the prompt for the test-time phase with variable verbosity, formatted for Mistral&quot;&quot;&quot;<br>    # Base on verbosity level<br>    if verbosity == 0:<br>        instruction = &quot;Answer directly with a single sentence. Say &#39;The answer is&#39; followed by the numerical answer.&quot;<br>    elif verbosity == 1:<br>        instruction = &quot;Provide one short sentence of explanation, followed by a sentence that starts with &#39;The answer is&#39; and a numerical answer.&quot;<br>    elif verbosity == 2:<br>        instruction = &quot;Provide two short sentences of explanation, followed by a sentence that starts with &#39;The answer is&#39; and a numerical answer.&quot;<br>    elif verbosity == 3:<br>        instruction = &quot;Reason step by step and provide an answer. End with the final numerical answer.&quot;<br>    else: # verbosity == 4<br>        instruction = &quot;Reason thoroughly step by step, double check your work, and provide a detailed explanation leading to the answer. End with &#39;The answer is&#39; followed by the numerical answer.&quot;<br>    <br>    return f&quot;&quot;&quot;&lt;s&gt;[INST] You are an expert reasoning system. {instruction}<br><br>Original Context: {context}<br><br>Pre-computed analysis:<br>{inferences}<br><br>Question: {question}<br><br>Use the pre-computed analysis to help you answer efficiently. [/INST]<br>&quot;&quot;&quot;<br><br>def get_standard_test_time_prompt(context, question, verbosity):<br>    &quot;&quot;&quot;Generate the prompt for standard test-time compute, formatted for Mistral&quot;&quot;&quot;<br>    # Base on verbosity level<br>    if verbosity == 0:<br>        instruction = &quot;Answer directly with a single sentence. Say &#39;The answer is&#39; followed by the numerical answer.&quot;<br>    elif verbosity == 1:<br>        instruction = &quot;Provide one short sentence of explanation, followed by a sentence that starts with &#39;The answer is&#39; and a numerical answer.&quot;<br>    elif verbosity == 2:<br>        instruction = &quot;Provide two short sentences of explanation, followed by a sentence that starts with &#39;The answer is&#39; and a numerical answer.&quot;<br>    elif verbosity == 3:<br>        instruction = &quot;Reason step by step and provide an answer. End with the final numerical answer.&quot;<br>    else: # verbosity == 4<br>        instruction = &quot;Reason thoroughly step by step, double check your work, and provide a detailed explanation leading to the answer. End with &#39;The answer is&#39; followed by the numerical answer.&quot;<br>    <br>    return f&quot;&quot;&quot;&lt;s&gt;[INST] You are an expert reasoning system. {instruction}<br><br>Context and Question: {context} {question} [/INST]<br>&quot;&quot;&quot;</pre><h4><strong>Sleep-time Phase</strong></h4><p>Now we implement Sleep-time Compute. During the sleep-time phase, we prompt the model to analyze the context and pre-compute useful information:</p><pre>def sleep_time_compute(context, temperature=0.1, max_new_tokens=1024):<br>    &quot;&quot;&quot;Perform sleep-time computation on a context using Mistral<br>    <br>    Args:<br>        context: The context to analyze<br>        temperature: Temperature for generation<br>        max_new_tokens: Maximum number of new tokens to generate<br>        <br>    Returns:<br>        Dictionary with inference text and token count<br>    &quot;&quot;&quot;<br>    &quot;&quot;&quot;<br>    Prompt You are an expert reasoning system. Given a context, <br>    your task is to think about the context and make useful inferences, <br>    calculations, and predictions that could help answer potential future <br>    questions about this context.<br>    &quot;&quot;&quot;<br>    prompt = get_sleep_time_prompt(context)<br>    <br>    # Count input tokens<br>    input_tokens = len(tokenizer.encode(prompt))<br>    <br>    inputs = tokenizer(prompt, return_tensors=&quot;pt&quot;)<br>    inputs = {k: v.to(model.device) for k, v in inputs.items()}<br>    <br>    # Generate<br>    with torch.no_grad():<br>        outputs = model.generate(<br>            **inputs,<br>            max_new_tokens=max_new_tokens,<br>            temperature=temperature,<br>            do_sample=True if temperature &gt; 0 else False,<br>            num_return_sequences=1<br>        )<br>    <br>    # Get generated text, removing the prompt<br>    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)<br>    <br>    # Extract the model&#39;s response after [/INST]<br>    response_text = generated_text.split(&#39;[/INST]&#39;)[-1].strip()<br>    # Remove any trailing tokens if present<br>    if &quot;&lt;/s&gt;&quot; in response_text:<br>        response_text = response_text.split(&quot;&lt;/s&gt;&quot;)[0].strip()<br>    <br>    # Count tokens<br>    sleep_time_tokens = len(tokenizer.encode(response_text))<br>    <br>    return {<br>        &quot;inferences&quot;: response_text,<br>        &quot;sleep_time_tokens&quot;: sleep_time_tokens<br>    }</pre><h4>Test-time Phase</h4><p>Next we implement test-time Compute.During the test-time phase, we provide the pre-computed inferences along with the original context and the question:</p><pre>def test_time_compute(context, question, inferences, verbosity=0, temperature=0.1, max_new_tokens=512):<br>    &quot;&quot;&quot;Perform test-time computation on a context and question using Mistral<br>    <br>    Args:<br>        context: The context to analyze<br>        question: The question to answer<br>        inferences: Pre-computed inferences from sleep-time compute<br>        verbosity: Verbosity level (0-4)<br>        temperature: Temperature for generation<br>        max_new_tokens: Maximum number of new tokens to generate<br>        <br>    Returns:<br>        Generated response and token count<br>    &quot;&quot;&quot;<br>    prompt = get_test_time_prompt(context, question, inferences, verbosity)<br>    <br>    inputs = tokenizer(prompt, return_tensors=&quot;pt&quot;)<br>    inputs = {k: v.to(model.device) for k, v in inputs.items()}<br>    <br>    # Generate<br>    with torch.no_grad():<br>        outputs = model.generate(<br>            **inputs,<br>            max_new_tokens=max_new_tokens,<br>            temperature=temperature,<br>            do_sample=True if temperature &gt; 0 else False,<br>            num_return_sequences=1<br>        )<br>    <br>    # Get generated text, removing the prompt<br>    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)<br>    <br>    # Extract the model&#39;s response after [/INST]<br>    response_text = generated_text.split(&#39;[/INST]&#39;)[-1].strip()<br>    # Remove any trailing tokens if present<br>    if &quot;&lt;/s&gt;&quot; in response_text:<br>        response_text = response_text.split(&quot;&lt;/s&gt;&quot;)[0].strip()<br>    <br>    # Count tokens<br>    test_time_tokens = len(tokenizer.encode(response_text))<br>    <br>    return {<br>        &quot;response&quot;: response_text,<br>        &quot;test_time_tokens&quot;: test_time_tokens<br>    }</pre><h4>Standard LLM Implementation</h4><p>We’ll also implement a standard LLM run to compare it with the Sleep-time Compute run.</p><pre>def standard_test_time_compute(context, question, verbosity=0, temperature=0.1, max_new_tokens=512):<br>    &quot;&quot;&quot;Perform standard test-time computation without sleep-time compute using Mistral<br>    <br>    Args:<br>        context: The context to analyze<br>        question: The question to answer<br>        verbosity: Verbosity level (0-4)<br>        temperature: Temperature for generation<br>        max_new_tokens: Maximum number of new tokens to generate<br>        <br>    Returns:<br>        Generated response and token count<br>    &quot;&quot;&quot;<br>    prompt = get_standard_test_time_prompt(context, question, verbosity)<br>    <br>    inputs = tokenizer(prompt, return_tensors=&quot;pt&quot;)<br>    inputs = {k: v.to(model.device) for k, v in inputs.items()}<br>    <br>    # Generate<br>    with torch.no_grad():<br>        outputs = model.generate(<br>            **inputs,<br>            max_new_tokens=max_new_tokens,<br>            temperature=temperature,<br>            do_sample=True if temperature &gt; 0 else False,<br>            num_return_sequences=1<br>        )<br>    <br>    # Get generated text, removing the prompt<br>    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)<br>    <br>    # Extract the model&#39;s response after [/INST]<br>    response_text = generated_text.split(&#39;[/INST]&#39;)[-1].strip()<br>    # Remove any trailing tokens if present<br>    if &quot;&lt;/s&gt;&quot; in response_text:<br>        response_text = response_text.split(&quot;&lt;/s&gt;&quot;)[0].strip()<br>    <br>    # Count tokens<br>    test_time_tokens = len(tokenizer.encode(response_text))<br>    <br>    return {<br>        &quot;response&quot;: response_text,<br>        &quot;test_time_tokens&quot;: test_time_tokens<br>    }</pre><h4>Demo Script</h4><p>Now to run this and compare how much Sleep-time Compute improves from standard compute, you can run this</p><pre>def demonstrate_multi_query(example):<br>    &quot;&quot;&quot;Demonstrate the amortization of sleep-time compute across multiple queries with focused visualizations&quot;&quot;&quot;<br>    import matplotlib.pyplot as plt<br>    import numpy as np<br>    import time<br>    import math<br>    <br>    context = example[&quot;context&quot;]<br>    questions = example[&quot;questions&quot;]<br><br>    print(&quot;\nContext:&quot;)<br>    print(context)<br><br>    # Sleep-time phase (only done once)<br>    print(&quot;\n=== SLEEP-TIME PHASE (Done once per context) ===&quot;)<br>    sleep_start = time.time()<br>    sleep_time_result = sleep_time_compute(context, temperature=0)<br>    sleep_time = time.time() - sleep_start<br>    print(f&quot;Sleep-time tokens: {sleep_time_result[&#39;sleep_time_tokens&#39;]}&quot;)<br>    print(f&quot;Sleep-time computation: {sleep_time:.2f} seconds&quot;)<br>    print(&quot;Sleep-time inferences:&quot;)<br>    print(sleep_time_result[&quot;inferences&quot;][:500] + &quot;...&quot; if len(sleep_time_result[&quot;inferences&quot;]) &gt; 500 else sleep_time_result[&quot;inferences&quot;])<br><br>    sleep_time_tokens = sleep_time_result[&quot;sleep_time_tokens&quot;]<br>    <br>    # Track metrics for each approach<br>    standard_results = []<br>    sleep_test_results = []<br><br>    # Test-time phase (for each query)<br>    print(&quot;\n=== TEST-TIME PHASE (For multiple queries) ===&quot;)<br>    for i, q in enumerate(questions):<br>        question = q[&quot;question&quot;]<br>        expected_answer = q[&quot;answer&quot;]<br>        print(f&quot;\nQuery {i+1}: {question}&quot;)<br><br>        # Standard approach (for comparison)<br>        std_start = time.time()<br>        standard_result = standard_test_time_compute(context, question, verbosity=1, temperature=0)<br>        std_time = time.time() - std_start<br>        standard_tokens = standard_result[&quot;test_time_tokens&quot;]<br>        extracted_std_answer = extract_answer(standard_result[&quot;response&quot;])<br>        std_correct = extracted_std_answer == expected_answer<br>        <br>        standard_results.append({<br>            &quot;tokens&quot;: standard_tokens,<br>            &quot;time&quot;: std_time,<br>            &quot;correct&quot;: std_correct,<br>            &quot;extracted_answer&quot;: extracted_std_answer,<br>            &quot;question&quot;: question,<br>            &quot;expected&quot;: expected_answer<br>        })<br>        <br>        print(f&quot;Standard approach answer: {extracted_std_answer} (Expected: {expected_answer}, Correct: {std_correct})&quot;)<br>        print(f&quot;Standard tokens: {standard_tokens}&quot;)<br>        print(f&quot;Standard computation time: {std_time:.2f} seconds&quot;)<br><br>        # Sleep-time approach<br>        test_start = time.time()<br>        test_time_result = test_time_compute(<br>            context, question, sleep_time_result[&quot;inferences&quot;], verbosity=1, temperature=0<br>        )<br>        test_time = time.time() - test_start<br>        sleep_test_tokens = test_time_result[&quot;test_time_tokens&quot;]<br>        extracted_sleep_answer = extract_answer(test_time_result[&quot;response&quot;])<br>        sleep_correct = extracted_sleep_answer == expected_answer<br>        <br>        sleep_test_results.append({<br>            &quot;tokens&quot;: sleep_test_tokens,<br>            &quot;time&quot;: test_time,<br>            &quot;correct&quot;: sleep_correct,<br>            &quot;extracted_answer&quot;: extracted_sleep_answer,<br>            &quot;question&quot;: question,<br>            &quot;expected&quot;: expected_answer<br>        })<br>        <br>        print(f&quot;Sleep-time approach answer: {extracted_sleep_answer} (Expected: {expected_answer}, Correct: {sleep_correct})&quot;)<br>        print(f&quot;Sleep-time test tokens: {sleep_test_tokens}&quot;)<br>        print(f&quot;Sleep-time test computation time: {test_time:.2f} seconds&quot;)<br>        print(f&quot;Time speedup for this query: {std_time/test_time:.2f}x&quot;)<br><br>    # Calculate aggregate metrics<br>    total_standard_tokens = sum(r[&quot;tokens&quot;] for r in standard_results)<br>    total_standard_time = sum(r[&quot;time&quot;] for r in standard_results)<br>    standard_accuracy = sum(1 for r in standard_results if r[&quot;correct&quot;]) / len(standard_results)<br>    <br>    total_sleep_test_tokens = sum(r[&quot;tokens&quot;] for r in sleep_test_results)<br>    total_sleep_test_time = sum(r[&quot;time&quot;] for r in sleep_test_results)<br>    sleep_accuracy = sum(1 for r in sleep_test_results if r[&quot;correct&quot;]) / len(sleep_test_results)<br>    <br>    avg_standard_tokens = total_standard_tokens / len(questions)<br>    avg_standard_time = total_standard_time / len(questions)<br>    avg_sleep_test_tokens = total_sleep_test_tokens / len(questions)<br>    avg_sleep_test_time = total_sleep_test_time / len(questions)<br><br>    # Calculate the amortization benefit<br>    print(&quot;\n=== AMORTIZATION SUMMARY ===&quot;)<br>    print(f&quot;Number of queries in this test: {len(questions)}&quot;)<br>    print(f&quot;Standard approach accuracy: {standard_accuracy:.2%}&quot;)<br>    print(f&quot;Sleep-time approach accuracy: {sleep_accuracy:.2%}&quot;)<br>    <br>    print(&quot;\n--- Current Performance ---&quot;)<br>    print(f&quot;Total tokens with standard approach: {total_standard_tokens}&quot;)<br>    print(f&quot;Total tokens with sleep-time approach: {sleep_time_tokens + total_sleep_test_tokens}&quot;)<br>    <br>    current_token_factor = total_standard_tokens / (sleep_time_tokens + total_sleep_test_tokens)<br>    print(f&quot;Current token efficiency: {current_token_factor:.2f}x&quot;)<br>    <br>    print(f&quot;\nTotal computation time with standard approach: {total_standard_time:.2f} seconds&quot;)<br>    print(f&quot;Total computation time with sleep-time approach: {sleep_time + total_sleep_test_time:.2f} seconds&quot;)<br>    <br>    current_time_factor = total_standard_time / (sleep_time + total_sleep_test_time)<br>    print(f&quot;Current time efficiency: {current_time_factor:.2f}x&quot;)<br>    <br>    # Calculate the break-even point<br>    if avg_sleep_test_time &gt;= avg_standard_time:<br>        break_even_time = float(&#39;inf&#39;)<br>        print(&quot;\nSleep-time compute will never break even for time in this workload.&quot;)<br>    else:<br>        break_even_time = sleep_time / (avg_standard_time - avg_sleep_test_time)<br>        print(f&quot;\n--- Break-Even Analysis ---&quot;)<br>        print(f&quot;Time break-even point: {math.ceil(break_even_time)} queries&quot;)<br>        print(f&quot;After {math.ceil(break_even_time)} queries, sleep-time compute becomes faster overall.&quot;)<br>    <br>    if avg_sleep_test_tokens &gt;= avg_standard_tokens:<br>        break_even_tokens = float(&#39;inf&#39;)<br>        print(&quot;Sleep-time compute will never break even for tokens in this workload.&quot;)<br>    else:<br>        break_even_tokens = sleep_time_tokens / (avg_standard_tokens - avg_sleep_test_tokens)<br>        print(f&quot;Token break-even point: {math.ceil(break_even_tokens)} queries&quot;)<br>        print(f&quot;After {math.ceil(break_even_tokens)} queries, sleep-time compute becomes token-efficient.&quot;)<br>    <br>    # Project performance with more queries<br>    print(&quot;\n--- Projected Performance ---&quot;)<br>    print(&quot;Number of queries | Token Efficiency | Time Efficiency&quot;)<br>    print(&quot;--------------------------------------------------&quot;)<br>    <br>    query_counts = [10, 25, 50, 100, 250, 500, 1000]<br>    token_efficiencies = []<br>    time_efficiencies = []<br>    <br>    for num_queries in query_counts:<br>        # Calculate projected metrics<br>        proj_standard_tokens = avg_standard_tokens * num_queries<br>        proj_sleep_tokens = sleep_time_tokens + (avg_sleep_test_tokens * num_queries)<br>        proj_token_factor = proj_standard_tokens / proj_sleep_tokens<br>        token_efficiencies.append(proj_token_factor)<br>        <br>        proj_standard_time = avg_standard_time * num_queries<br>        proj_sleep_time = sleep_time + (avg_sleep_test_time * num_queries)<br>        proj_time_factor = proj_standard_time / proj_sleep_time<br>        time_efficiencies.append(proj_time_factor)<br>        <br>        print(f&quot;{num_queries:14d} | {proj_token_factor:15.2f}x | {proj_time_factor:14.2f}x&quot;)<br>    <br>    # Performance per query (not including sleep-time cost)<br>    print(&quot;\n--- Per-Query Performance (excluding setup) ---&quot;)<br>    print(f&quot;Standard approach: {avg_standard_tokens:.1f} tokens, {avg_standard_time:.2f} seconds per query&quot;)<br>    print(f&quot;Sleep-time approach: {avg_sleep_test_tokens:.1f} tokens, {avg_sleep_test_time:.2f} seconds per query&quot;)<br>    print(f&quot;Per-query token reduction: {avg_standard_tokens / avg_sleep_test_tokens:.2f}x&quot;)<br>    print(f&quot;Per-query time speedup: {avg_standard_time / avg_sleep_test_time:.2f}x&quot;)<br>    <br>    # VISUALIZATION 1: Cumulative computation time (excluding setup time)<br>    plt.figure(figsize=(10, 6))<br>    <br>    # Calculate cumulative computation times<br>    cum_std_times = np.cumsum([r[&quot;time&quot;] for r in standard_results])<br>    cum_sleep_times = np.cumsum([r[&quot;time&quot;] for r in sleep_test_results])<br>    <br>    # Plot the cumulative times<br>    plt.plot(range(1, len(questions) + 1), cum_std_times, &#39;b-&#39;, linewidth=2, label=&#39;Standard Test-time Compute&#39;)<br>    plt.plot(range(1, len(questions) + 1), cum_sleep_times, &#39;g-&#39;, linewidth=2, label=&#39;Sleep-time Test Phase Only&#39;)<br>    <br>    # Shade the area between curves to show savings<br>    if np.any(cum_std_times &gt; cum_sleep_times):<br>        plt.fill_between(<br>            range(1, len(questions) + 1), <br>            cum_std_times, <br>            cum_sleep_times, <br>            where=(cum_std_times &gt; cum_sleep_times),<br>            color=&#39;green&#39;, alpha=0.3, label=&#39;Time Savings&#39;<br>        )<br>    <br>    plt.title(&#39;Cumulative Computation Time (Test-time Phase Only)&#39;, fontsize=14)<br>    plt.xlabel(&#39;Number of Queries&#39;, fontsize=12)<br>    plt.ylabel(&#39;Total Time (seconds)&#39;, fontsize=12)<br>    plt.grid(True, linestyle=&#39;--&#39;, alpha=0.7)<br>    plt.legend(fontsize=11)<br>    plt.tight_layout()<br>    plt.savefig(&#39;cumulative_test_time.png&#39;, dpi=300)<br>    <br>    print(&quot;\n✅ Cumulative computation time visualization saved to &#39;cumulative_test_time.png&#39;&quot;)<br>    <br>    # VISUALIZATION 2: Test-time compute vs. Accuracy<br>    # Generate data for different verbosity levels<br>    verbosity_levels = [0, 1, 2, 3]<br>    std_tokens = []<br>    std_accs = []<br>    sleep_tokens = []<br>    sleep_accs = []<br>    <br>    # Use a subset of questions for testing different verbosity levels<br>    sample_size = min(5, len(questions))<br>    sample_questions = questions[:sample_size]<br>    <br>    print(f&quot;\nGenerating test-time compute vs. accuracy graph (using {sample_size} sample questions)...&quot;)<br>    <br>    for verbosity in verbosity_levels:<br>        # Standard approach<br>        std_correct = 0<br>        std_token_sum = 0<br>        <br>        for q in sample_questions:<br>            result = standard_test_time_compute(context, q[&quot;question&quot;], verbosity=verbosity, temperature=0)<br>            std_token_sum += result[&quot;test_time_tokens&quot;]<br>            answer = extract_answer(result[&quot;response&quot;])<br>            if answer == q[&quot;answer&quot;]:<br>                std_correct += 1<br>        <br>        std_tokens.append(std_token_sum / len(sample_questions))<br>        std_accs.append(std_correct / len(sample_questions) * 100)<br>        <br>        # Sleep-time approach<br>        sleep_correct = 0<br>        sleep_token_sum = 0<br>        <br>        for q in sample_questions:<br>            result = test_time_compute(context, q[&quot;question&quot;], <br>                                     sleep_time_result[&quot;inferences&quot;], <br>                                     verbosity=verbosity, temperature=0)<br>            sleep_token_sum += result[&quot;test_time_tokens&quot;]<br>            answer = extract_answer(result[&quot;response&quot;])<br>            if answer == q[&quot;answer&quot;]:<br>                sleep_correct += 1<br>        <br>        sleep_tokens.append(sleep_token_sum / len(sample_questions))<br>        sleep_accs.append(sleep_correct / len(sample_questions) * 100)<br>        <br>        print(f&quot;  Verbosity {verbosity} complete&quot;)<br>    <br>    plt.figure(figsize=(10, 6))<br>    <br>    # Plot the pareto curves<br>    plt.plot(std_tokens, std_accs, &#39;bo-&#39;, linewidth=2, markersize=8, label=&#39;Standard Test-time Compute&#39;)<br>    plt.plot(sleep_tokens, sleep_accs, &#39;go-&#39;, linewidth=2, markersize=8, label=&#39;Sleep-time Compute&#39;)<br>    <br>    # Annotate verbosity levels<br>    for i, (x, y) in enumerate(zip(std_tokens, std_accs)):<br>        plt.annotate(f&quot;V{verbosity_levels[i]}&quot;, (x, y), xytext=(5, -15), <br>                     textcoords=&#39;offset points&#39;, fontsize=9, color=&#39;blue&#39;)<br>    <br>    for i, (x, y) in enumerate(zip(sleep_tokens, sleep_accs)):<br>        plt.annotate(f&quot;V{verbosity_levels[i]}&quot;, (x, y), xytext=(5, 10), <br>                     textcoords=&#39;offset points&#39;, fontsize=9, color=&#39;green&#39;)<br>    <br>    # Shade the pareto improvement area<br>    max_sleep_acc = max(sleep_accs)<br>    plt.fill_between([min(std_tokens + sleep_tokens) * 0.9, max(std_tokens + sleep_tokens) * 1.1], <br>                   0, max_sleep_acc, alpha=0.1, color=&#39;green&#39;)<br>    <br>    plt.title(&#39;Test-time Compute vs. Accuracy&#39;, fontsize=14)<br>    plt.xlabel(&#39;Avg. Test-time Tokens / Query&#39;, fontsize=12)<br>    plt.ylabel(&#39;Accuracy (%)&#39;, fontsize=12)<br>    plt.grid(True, linestyle=&#39;--&#39;, alpha=0.7)<br>    plt.legend(fontsize=11)<br>    plt.tight_layout()<br>    plt.savefig(&#39;test_time_vs_accuracy.png&#39;, dpi=300)<br>    <br>    print(&quot;✅ Test-time compute vs. accuracy visualization saved to &#39;test_time_vs_accuracy.png&#39;&quot;)<br>    <br>    return {<br>        &quot;standard_results&quot;: standard_results,<br>        &quot;sleep_test_results&quot;: sleep_test_results,<br>        &quot;standard_accuracy&quot;: standard_accuracy,<br>        &quot;sleep_accuracy&quot;: sleep_accuracy,<br>        &quot;avg_standard_tokens&quot;: avg_standard_tokens,<br>        &quot;avg_sleep_test_tokens&quot;: avg_sleep_test_tokens,<br>        &quot;avg_standard_time&quot;: avg_standard_time,<br>        &quot;avg_sleep_test_time&quot;: avg_sleep_test_time,<br>        &quot;break_even_time&quot;: break_even_time if &#39;break_even_time&#39; in locals() else None,<br>        &quot;break_even_tokens&quot;: break_even_tokens if &#39;break_even_tokens&#39; in locals() else None<br>    }</pre><h3>Results</h3><p>Running this script shows significant performance improvements when using Sleep-time Compute.</p><p>The logs in the demo show that Sleep-time Compute:</p><ul><li>Uses 6.4x fewer tokens per query (73.0 vs 11.4 tokens).</li><li>Processes queries 5.2x faster on average.</li><li>Dramatically higher accuracy than standard compute across all verbosity levels.</li></ul><p>Below are graphs showing the performance gains of using Sleep-time Compute.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XOnBBKBDyE0wy46XrQ-9EA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cQHxsz6TTW_2vaYwzg0U4Q.png" /></figure><h3>When and When Not to Use Sleep-time Compute</h3><p>Sleep-time Compute is most effective in the following scenarios:</p><ul><li><strong>Stateful Applications:</strong> Systems where context persists across multiple interactions, such as Document question-answering.</li><li><strong>Predictable Queries:</strong> Contexts where potential questions follow predictable patterns.</li><li><strong>Multiple Related Queries:</strong> When users ask several questions about the same context.</li></ul><p>However, Sleep-time Compute should not be used in these scenarios:</p><ul><li><strong>Unpredictable Queries: </strong>The research shows diminishing returns for less predictable queries and standard test-time compute may be more effective in these cases.</li><li><strong>Single Query Scenarios:</strong> The overhead of sleep-time compute isn’t amortized.</li><li><strong>Non-Stateful Applications:</strong> Systems where context doesn’t persist between interactions. Without a persistent context to analyze during idle time, the core benefit is lost.</li></ul><h3><strong>Conclusion</strong></h3><p>Sleep-time Compute represents a useful technique to reduce LLM response latency and improve response accuracy. By recognizing that contexts are often available before queries and taking advantage of idle computation time, we can significantly improve user experience through faster response times.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=84c5626d0770" width="1" height="1" alt=""><hr><p><a href="https://blog.gopenai.com/demo-how-to-run-sleep-time-compute-to-reduce-llm-latency-84c5626d0770">How to Run Sleep-time Compute to Reduce LLM Latency</a> was originally published in <a href="https://blog.gopenai.com">GoPenAI</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[GIS Data Conversion MCP: An MCP Server to Convert GIS Data Formats]]></title>
            <link>https://medium.com/@ronantech/gis-data-conversion-mcp-an-mcp-server-to-convert-gis-data-formats-cfe9a66ee250?source=rss-fbd6f4eb076e------2</link>
            <guid isPermaLink="false">https://medium.com/p/cfe9a66ee250</guid>
            <category><![CDATA[mcps]]></category>
            <category><![CDATA[mcp-server]]></category>
            <category><![CDATA[gis]]></category>
            <category><![CDATA[geojson]]></category>
            <dc:creator><![CDATA[Ronan Takizawa]]></dc:creator>
            <pubDate>Mon, 28 Apr 2025 04:35:17 GMT</pubDate>
            <atom:updated>2025-04-28T04:35:17.462Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*GNb_2Wc6ztbhcKYK6ArYQA.png" /></figure><p><a href="https://github.com/ronantakizawa/gis-dataconvertersion-mcp">Full Repo</a> (Leave a star if you enjoyed the project!) 🌟</p><p>Introducing the GIS Data Conversion MCP: an MCP (Model Context Protocol) server that gives LLMs access to APIs for GIS Data Conversion.</p><p>Here are the features:</p><ul><li><strong>Reverse Geocoding: </strong>Convert coordinates to location information</li><li><strong>WKT/GeoJSON Conversion: </strong>Convert between Well-Known Text and GeoJSON formats</li><li><strong>CSV/GeoJSON Conversion: </strong>Transform tabular data with coordinates to GeoJSON and vice versa</li><li><strong>TopoJSON/GeoJSON Conversion: </strong>Convert between GeoJSON and TopoJSON (topology-preserving format)</li><li><strong>KML/GeoJSON Conversion: </strong>Transform KML files to GeoJSON format</li></ul><h3>What is MCP?</h3><p>​The Model Context Protocol (MCP) is an open standard developed by Anthropic that enables large language models (LLMs) to securely and efficiently interact with external tools, data sources, and services.</p><p>For more on MCP, read <a href="https://logto.medium.com/what-is-mcp-model-context-protocol-and-how-it-works-47d540ee7096">this article</a>.</p><h3>How I built A11y MCP</h3><p>This project was built with:</p><ul><li><strong>The Model Context Protocol SDK</strong> (Base framework to run MCP)</li><li><strong>wellknown</strong> (WKT/GeoJSON conversion)</li><li><strong>csv2geojson</strong> (CSV/GeoJSON conversion)</li><li><strong>topojson-client &amp; topojson-server</strong> (TopoJSON processing)</li><li><strong>tokml &amp; @tmcw/togeojson</strong> (KML conversion)</li><li><strong>xmldom</strong> (XML parsing for KML)</li></ul><p>Following the example of other well-known MCPs (like the <a href="https://github.com/kimtaeyoon83/mcp-server-youtube-transcript">youtube-transcript MCP</a>), I was able to build this project easily using the Model Context Protocol SDK.</p><h3>Why the GIS Data Conversion MCP is useful</h3><p>The GIS Data Conversion MCP is useful because it allows the LLM to run accurate GIS data conversions.</p><p>This saves time for users, where previously they had to:</p><ul><li>Write their own data conversion scripts</li><li>Rely on an LLM to convert the data, which is based on its training knowledge and not accurate</li></ul><p>This has never been done before.</p><p>Here’s an example of running the GIS Data Conversion MCP on Claude Desktop.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*R6RuFK-rcsZac9c3V70R8A.png" /></figure><p>As the example shows, when a user instructs to reverse geolocate longitude and latitude data, it responds with the exact address of the location.</p><h3>Installation</h3><p>Refer to the installation instructions on the <a href="https://github.com/ronantakizawa/a11ymcp/tree/main">repository</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=cfe9a66ee250" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[A11y MCP: An MCP Server for Web Accessibility Testing]]></title>
            <link>https://medium.com/@ronantech/a11y-mcp-an-mcp-server-for-web-accessibility-testing-e5aeeb322af3?source=rss-fbd6f4eb076e------2</link>
            <guid isPermaLink="false">https://medium.com/p/e5aeeb322af3</guid>
            <category><![CDATA[mcps]]></category>
            <category><![CDATA[a11y]]></category>
            <category><![CDATA[accessibility]]></category>
            <category><![CDATA[model-context-protocol]]></category>
            <dc:creator><![CDATA[Ronan Takizawa]]></dc:creator>
            <pubDate>Sat, 26 Apr 2025 00:34:31 GMT</pubDate>
            <atom:updated>2025-04-26T03:15:41.834Z</atom:updated>
            <content:encoded><![CDATA[<h3>A11y MCP: An MCP Server to run AI-powered Web Accessibility Testing</h3><figure><img alt="Logo for A11y MCP" src="https://cdn-images-1.medium.com/max/1024/1*u-dSPRpILoWMPnhWLvaf1w.png" /></figure><p><a href="https://github.com/ronantakizawa/a11ymcp">Full Repo</a> (Leave a star if you enjoyed the project!) 🌟</p><p>Introducing A11y MCP: an MCP (Model Context Protocol) server that gives LLMs access to web accessibility testing APIs.</p><p>By running this extension on AI platforms (Claude Desktop, Cursor), you can allow your LLM to check your web content against Web Content Accessibility Guideline (WCAG) standards, which are internationally recognized standards developed by the W3C to make web content more accessible to people with disabilities.</p><p>Here are the features:</p><ul><li><strong>Test web page accessibility via URL: </strong>Test any public URL for accessibility issues using WCAG standards (Checks for A, AA, and AAA compliance).</li><li><strong>Test web page accessibility via URL HTML snippets:</strong> Test raw HTML strings for accessibility issues using WCAG standards (Checks for A, AA, and AAA compliance).</li><li><strong>Accessible Rich Internet Applications </strong>(<strong>ARIA) validation: </strong>Test proper usage of HTML ARIA attributes that make web pages accessible with disabled users and their assistive technologies.</li><li><strong>Color contrast analysis: </strong>Check color combinations for WCAG compliance.</li><li><strong>Orientation lock detection: </strong>Identify content that forces specific screen orientations.</li></ul><h3>What is MCP?</h3><p>​The Model Context Protocol (MCP) is an open standard developed by Anthropic that enables large language models (LLMs) to securely and efficiently interact with external tools, data sources, and services.</p><p>For more on MCP, read <a href="https://logto.medium.com/what-is-mcp-model-context-protocol-and-how-it-works-47d540ee7096">this article</a>.</p><h3>How I built A11y MCP</h3><p>This project was built with:</p><ul><li><strong>The Model Context Protocol SDK</strong> (Base framework to run MCP)</li><li><strong>Deque Axe-core API</strong> (Core accessibility testing API)</li><li><strong>Puppeteer</strong> (Runs A11y tests on a dummy)</li></ul><p>Following the example of other well-known MCPs (like the <a href="https://github.com/kimtaeyoon83/mcp-server-youtube-transcript">youtube-transcript MCP</a>), I was able to build this project easily using the Model Context Protocol SDK.</p><p>Each function takes in parameters such as URLs, raw HTML, and color inputs.</p><p>Depending on the function, Puppeteer is run to either navigate to a URL or load HTML content into a headless browser.</p><p>The page is then passed to the Axe-core library, where it analyzes accessibility issues based on WCAG standards, and returns detailed results that can help identify and fix accessibility problems.</p><p>One issue faced when deploying this project was package version management.</p><p>Since my Claude Desktop runs NodeJS 15, I had to adjust package versions.</p><h3>Why A11y MCP is useful</h3><p>A11y MCP is useful because it allows LLMs to run accurate web accessibility tests through the Axe-core API.</p><p>Without using the A11y MCP, an LLM will give you arbitrary feedback on web accessibility based on its training data, but it cannot run deterministic accessibility tests that will produce the same results every time.</p><p>By using A11y MCP for web accessibility testing, users can run reliable web accessibility tests with the use of the Axe-core API, and autonomously find ways to fix their website issues with the LLM.</p><p>This has never been done before.</p><p>Here’s an example of running the A11y MCP on Claude Desktop.</p><figure><img alt="Demo AI conversation using A11y. MCP" src="https://cdn-images-1.medium.com/max/1024/1*q0NFXDzLbaAZZtAUrrdzOg.png" /></figure><figure><img alt="Example 2 of running A11y MCP on an AI" src="https://cdn-images-1.medium.com/max/1024/1*CsWHkz-RcPmHbZE1opq8ZQ.png" /></figure><p>As the example shows, when a user sends a chunk of HTML not compliant with WCAG, the LLM runs the A11y MCP to find exact accessibility issues in the webpage, then suggests ways to change the HTML.</p><h3>Installation</h3><p>Refer to the installation instructions on the <a href="https://github.com/ronantakizawa/a11ymcp/tree/main">repository</a>.</p><h3>Future Work</h3><p>The future of MCP servers is headed towards running and hosting the servers remotely such that devices don’t have to run MCP servers locally and online AI apps can use MCP. The A11y MCP server will hopefully follow these footsteps and be deployed externally.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e5aeeb322af3" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Building a Smart Flood Map of Japan using GIS and Machine Learning]]></title>
            <link>https://medium.com/thedeephub/building-a-smart-flood-map-of-japan-using-gis-and-machine-learning-3303efff9957?source=rss-fbd6f4eb076e------2</link>
            <guid isPermaLink="false">https://medium.com/p/3303efff9957</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[linear-regression]]></category>
            <category><![CDATA[gis]]></category>
            <dc:creator><![CDATA[Ronan Takizawa]]></dc:creator>
            <pubDate>Tue, 15 Apr 2025 19:51:06 GMT</pubDate>
            <atom:updated>2025-05-08T17:22:26.160Z</atom:updated>
            <content:encoded><![CDATA[<h3>Building a Smart Flood Map of Japan Using GIS and Machine Learning</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wMxmDvzV35WSKOeSIZtwhw.png" /></figure><p><a href="https://github.com/ronantakizawa/floodmapjapan">Full Code (Make sure to leave the Original Repo a Star!) ⭐️</a></p><p>With densely populated coastal areas and a mountainous interior, Japan frequently faces floods from heavy rainfall and typhoons. Floods in Japan have cost up to <a href="https://www.nippon.com/en/japan-data/h00812/?utm_source=chatgpt.com">¥2.1 trillion in damages</a>, and these damages are getting worse due to climate change. Despite this risk, accessible and accurate flood hazard maps remain scarce.</p><p>In this article, I will demonstrate how I built a smart Flood Hazard Map of Japan using GIS data and machine learning.</p><h3>Datasets</h3><p>The datasets used in this project are the following:</p><ol><li><a href="https://hydro.iis.u-tokyo.ac.jp/~yamadai/MERIT_Hydro/">A Hydromap of Japan (University of Tokyo</a>) with features such as:</li></ol><ul><li><strong>Water Directional Flow</strong>: The direction in which surface water is expected to move based on terrain and hydrological modeling (Measured in angular degrees from 0–360°).</li><li><strong>Elevation</strong>: The height of the land surface above sea level (m).</li><li><strong>Upstream Drainage Area (Flow Accumulation Area)</strong>: The total land area that drains into a specific point on the landscape, indicating how much water can potentially flow through it (km²),</li><li><strong>River Channel Width</strong>: The estimated horizontal width of a river at a given location, which affects its capacity to carry floodwaters (m).</li><li><strong>HAND (Height Above Nearest Drainage)</strong>: A terrain metric that represents how high a given point is above the nearest stream or drainage line, helping identify flood-prone lowlands (m).</li></ul><p>The Hydro map is stored as a GeoTIFF file, where each GeoTIFF file is a grid of cells (pixels) and each cell contains these key components:</p><ul><li><strong>Raster Values</strong>: The actual measurement data for each pixel (Elevation, HAND…)</li><li><strong>Geospatial Metadata</strong>: Information about the coordinate system, geographic extent, and projection</li><li><strong>Transformation Parameters</strong>: Mathematical parameters that relate pixel coordinates to real-world coordinates</li></ul><p>2. <a href="https://search.diasjp.net/en/dataset/Flooddamage_GIS">GIS data</a> from the Japanese National Research Institute for Earth Science and Disaster Prevention, covering floods in Japan from 1961 to 2008, with information such as:</p><ul><li>Infrastructure damage</li><li>Transportation Damage</li><li>Water / Flood Impacts (Flood area, flood depth)</li></ul><h3>Building the Simplified Flood Risk Model</h3><p>To build a smart Flood Hazard Map of Japan, we first need to build a model that iterates through the Hydromap of Japan, and assigns a risk score on features that suggest high flood risk.</p><p>The model tracks the 5 variables in the hydromap:</p><ul><li>Water Directional Flow</li><li>Elevation</li><li>Upstream Drainage Area (Flow Accumulation Area)</li><li>River Channel Width</li><li>HAND (Height Above Nearest Drainage)</li></ul><p>For each variable, we transform their value into a standardized risk score from 0 to 1.</p><pre># Apply land mask and ensure only positive upstream area values are considered<br>upstream_norm = np.where((upstream_area &gt; 0) &amp; land_mask, upstream_area, 0)  # Keep only positive upstream area values on land, set everything else to 0<br><br># Normalize the transformed upstream area values (land areas only)<br>upstream_norm_reshaped = upstream_norm.reshape(-1, 1)  # Reshape the 2D array to a column vector for the scaler function<br><br># Only proceed with normalization if land pixels exist in the dataset<br>if len(land_indices) &gt; 0:  # Check if there are any land pixels to normalize<br>    upstream_norm_reshaped[land_indices] = scaler.fit_transform(upstream_norm_reshaped[land_indices])  # Scale land values to 0-1 range<br><br># Reshape back to original dimensions<br>upstream_norm = upstream_norm_reshaped.reshape(upstream_area.shape)  # Convert column vector back to original 2D array shape</pre><p>An important detail included in the model is to score any locations with elevations &lt;0 as NaN so the final map doesn’t consider the Ocean.</p><p>In the end, each risk score is scaled using the initial placeholder risk weights that calculate the final risk score of a region.</p><pre><br>flood_risk = np.where(land_mask, <br>    0.2 * elevation_norm +             <br>    0.35 * hand_norm +                <br>    0.25 * upstream_norm +             <br>    0.1 * flow_convergence_norm +    <br>    0.1 * river_influence_norm,<br>    np.nan)  # Use NaN for ocean areas</pre><h3>Using machine learning to get smarter heuristics</h3><p>A critical issue with the current implementation is that the weights used to score flood risk zones (the heuristics) are arbitrary and not fine-tuned based on flood history in Japan.</p><pre><br>flood_risk = np.where(land_mask, <br>    0.2 * elevation_norm +  # These are arbitrary weights            <br>    0.35 * hand_norm +      # These are arbitrary weights<br>    0.25 * upstream_norm +  # These are arbitrary weights           <br>    0.1 * flow_convergence_norm + # These are arbitrary weights   <br>    0.1 * river_influence_norm,   # These are arbitrary weights<br>    np.nan)  # Use NaN for ocean areas</pre><p>To derive more intelligent heuristics, we use the second dataset and a Linear Regression model to evaluate which variables correlate most strongly with historical flood damage.</p><p>Linear Regression is particularly well-suited for this task because it provides feature importance scores that can be directly translated into weights for flood risk assessment</p><p>Read this for more on <a href="https://medium.com/analytics-vidhya/everything-you-need-to-know-about-linear-regression-750a69a0ea50">Linear Regression</a>.</p><p>To see which variables correlate most strongly with historical flood damage, we got the locations of historical floods and recorded their elevation, HAND, upstream, flow direction, and river width values.</p><pre>def extract_features_at_flood_points(flood_df, elevation, flow_direction, hand, upstream_area, river_width, transform):<br>    &quot;&quot;&quot;Extract feature values at each flood location.&quot;&quot;&quot;<br>    # Calculate flow convergence<br>    from scipy import ndimage<br>    flow_conv = ndimage.generic_filter(flow_direction, lambda x: len(np.unique(x)), size=3)<br>    <br>    # Create dataframe for valid points<br>    valid_data = []<br>    <br>    for _, row in flood_df.iterrows():<br>        try:<br>            # Get pixel coordinates for this lat/lon<br>            row_idx, col_idx = rowcol(transform, row[&#39;longitude&#39;], row[&#39;latitude&#39;])<br>            <br>            # Check if coordinates are within bounds and on land<br>            if (0 &lt;= row_idx &lt; elevation.shape[0] and <br>                0 &lt;= col_idx &lt; elevation.shape[1] and<br>                elevation[row_idx, col_idx] &gt; 0):  # Land mask<br>                <br>                # Extract feature values<br>                elev = elevation[row_idx, col_idx]<br>                hand_val = max(0, hand[row_idx, col_idx])  # Replace negative with 0<br>                upstream = upstream_area[row_idx, col_idx]<br>                flow_convergence = flow_conv[row_idx, col_idx]<br>                # Extract river width, handle possible NaN values<br>                river_w = river_width[row_idx, col_idx]<br>                if np.isnan(river_w):<br>                    river_w = 0  # Set NaN river width to 0 (no river)<br>                flood_area = row[&#39;flood_area_km2&#39;]<br>                <br>                valid_data.append({<br>                    &#39;elevation&#39;: -elev,  # Negate elevation (lower = higher risk)<br>                    &#39;hand&#39;: hand_val,<br>                    &#39;upstream_area&#39;: upstream,<br>                    &#39;flow_convergence&#39;: flow_convergence,<br>                    &#39;river_width&#39;: river_w,<br>                    &#39;flood_area_km2&#39;: flood_area<br>                })<br>        except Exception:<br>            continue<br>    <br>    return pd.DataFrame(valid_data)</pre><p>Once we have those values, I ran a Linear Regression model by setting flood area as the target variable to see which variables contribute most to large flood areas.</p><p>The following is what the Linear Regression model looked like:</p><pre>def train_models_and_get_weights(feature_df):<br>    &quot;&quot;&quot;Train linear regression model and output weights with a visualization.&quot;&quot;&quot;<br>    # Prepare features and target<br>    X = feature_df[[&#39;elevation&#39;, &#39;hand&#39;, &#39;upstream_area&#39;, &#39;flow_convergence&#39;, &#39;river_width&#39;]]<br>    y = feature_df[&#39;flood_area_km2&#39;]<br>    <br>    # Scale features<br>    scaler = StandardScaler()<br>    X_scaled = scaler.fit_transform(X)<br>    X_scaled_df = pd.DataFrame(X_scaled, columns=X.columns)<br>    <br>    # Split data<br>    X_train, X_test, y_train, y_test = train_test_split(<br>        X_scaled_df, y, test_size=0.2, random_state=42<br>    )<br>    <br>    # Initialize model<br>    model = LinearRegression()<br>    <br>    print(&quot;\nMODEL WEIGHTS AND PERFORMANCE METRICS:&quot;)<br>    print(&quot;--------------------------------------&quot;)<br>    <br>    # Fit the model<br>    model.fit(X_train, y_train)<br>    <br>    # Cross-validation<br>    cv_scores = cross_val_score(model, X_scaled_df, y, cv=5, scoring=&#39;r2&#39;)<br>    print(f&quot;Cross-validation R² scores: {cv_scores}&quot;)<br>    print(f&quot;Mean CV R²: {np.mean(cv_scores):.4f}&quot;)<br>    <br>    # Test set performance<br>    y_pred = model.predict(X_test)<br>    test_r2 = r2_score(y_test, y_pred)<br>    test_rmse = np.sqrt(mean_squared_error(y_test, y_pred))<br>    <br>    print(f&quot;Test set R²: {test_r2:.4f}&quot;)<br>    print(f&quot;Test set RMSE: {test_rmse:.4f}&quot;)<br>    <br>    # Extract coefficients<br>    coefs = model.coef_<br>    <br>    # Get normalized weights (absolute values, sum to 1)<br>    abs_coefs = np.abs(coefs)<br>    normalized_weights = abs_coefs / np.sum(abs_coefs)<br>    <br>    # Display weights<br>    print(f&quot;\nFeature coefficients:&quot;)<br>    for feature, weight, norm_weight in zip(X.columns, coefs, normalized_weights):<br>        print(f&quot;  {feature}: raw={weight:.4f}, normalized={norm_weight:.4f}&quot;)</pre><p>After running the model, these were the resulting weights:</p><pre>elevation: 0.5819<br>hand: 0.3723<br>upstream: 0.0304<br>flow_conv:0.0154<br>river_width: 0.0000</pre><p>For this dataset, the elevation and HAND of the terrain was the most significant factor to flood risk.</p><p>This makes sense because lower-lying areas are naturally more prone to inundation when flooding occurs as water flows downhill following gravity and areas with low HAND values are closer to drainage networks and more easily flooded when those channels overflow.</p><h3>The Final Flood Risk Map</h3><p>Using the final risk scores, Matplotlib creates a flood hazard map of Japan</p><pre>def visualize_flood_risk(risk_array, output_path, transform, crs):<br>    &quot;&quot;&quot;Visualize the flood risk map and save to file with ocean areas colored black.&quot;&quot;&quot;<br>    # Create a figure and get the current axes<br>    fig, ax = plt.figure(figsize=(12, 10)), plt.gca()<br>    <br>    # Define a set of distinct colors for small ranges<br>    colors = [<br>        &#39;#000080&#39;,  # Navy (0.0-0.1)<br>        &#39;#0000FF&#39;,  # Blue (0.1-0.2)<br>        &#39;#00FFFF&#39;,  # Cyan (0.2-0.3)<br>        &#39;#008000&#39;,  # Green (0.3-0.4)<br>        &#39;#ADFF2F&#39;,  # GreenYellow (0.4-0.5)<br>        &#39;#FFFF00&#39;,  # Yellow (0.5-0.6)<br>        &#39;#FFA500&#39;,  # Orange (0.6-0.7)<br>        &#39;#FF0000&#39;,  # Red (0.7-0.8)<br>        &#39;#800000&#39;,  # Maroon (0.8-0.9)<br>        &#39;#FF00FF&#39;,  # Magenta (0.9-1.0)<br>    ]<br>    <br>    # Create custom colormap with 10 distinct bands<br>    cmap = LinearSegmentedColormap.from_list(&#39;high_contrast&#39;, colors, N=10)<br>    cmap.set_bad(&#39;black&#39;)  # Set NaN values to black<br>    <br>    # Create boundaries for distinct color bands<br>    bounds = np.linspace(0, 1, 11)  # 11 boundaries for 10 distinct ranges<br>    norm = BoundaryNorm(bounds, cmap.N)<br>    <br>    # Display the risk array with the custom colormap<br>    img = ax.imshow(risk_array, cmap=cmap, norm=norm)<br>    <br>    # Add a colorbar with tick marks at the boundaries<br>    cbar = plt.colorbar(img, ax=ax, ticks=bounds)<br>    cbar.set_label(&#39;Flood Hazard Index (0-1)&#39;)<br>    <br>    # Format the tick labels to show ranges<br>    tick_labels = [f&quot;{bounds[i]:.1f}-{bounds[i+1]:.1f}&quot; for i in range(len(bounds)-1)]<br>    # Add an empty string at the beginning for proper alignment<br>    cbar.set_ticklabels([&#39;&#39;] + tick_labels)<br>    <br>    # Add a title to the plot<br>    plt.title(&#39;Flood Hazard Map of Japan&#39;, fontsize=18, fontweight=&#39;bold&#39;)<br>    # Turn off the axis labels and ticks<br>    plt.axis(&#39;off&#39;)<br>    <br>    # Save the visualization as a PNG file<br>    plt.savefig(os.path.join(os.path.dirname(output_path), &#39;flood_hazard_visualization.png&#39;), dpi=300, bbox_inches=&#39;tight&#39;)<br>    <br>    # Save the risk data as a GeoTIFF file for GIS applications<br>    with rasterio.open(output_path, &#39;w&#39;,<br>                      driver=&#39;GTiff&#39;,          # Use GeoTIFF format<br>                      height=risk_array.shape[0],  # Set height from input array<br>                      width=risk_array.shape[1],   # Set width from input array<br>                      count=1,                 # One band/channel<br>                      dtype=rasterio.float32,  # Use float32 to support NaN values<br>                      crs=crs,                 # Set coordinate reference system<br>                      transform=transform,     # Set geospatial transform<br>                      nodata=np.nan) as dst:   # Set NaN as the nodata value<br>        # Write the risk array to the first band<br>        dst.write(risk_array.astype(rasterio.float32), 1)<br>    <br>    # Close the plot to free memory<br>    plt.close()<br>    # Return the path to the created GeoTIFF file<br>    return output_path</pre><p>The resulting flood risk map shows a 0–1 scale where higher values (red) indicate greater flood risk:</p><ul><li><strong>Blue-Green areas (0–0.4)</strong>: Low flood risk</li><li><strong>Yellow-Orange areas</strong> (0.4–0.6): Moderate flood risk</li><li><strong>Red-Purple areas</strong> (0.6+): High flood risk</li></ul><p>Ocean areas are masked in black to focus exclusively on land-based flood risk.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wMxmDvzV35WSKOeSIZtwhw.png" /></figure><p>The map reveals several key high-risk zones which are shown in places overwhelming in the high risk colors:</p><ol><li><strong>Tokyo and Kanto Plain</strong>: Densely populated, highly urbanized areas with extensive low-lying floodplains</li><li><strong>Osaka/Kyoto/Kobe region</strong>: Coastal lowlands and historical flood plains</li><li><strong>Hokkaido</strong>: Large river basins with flat terrain and cold-climate runoff characteristics</li></ol><p>When compared to official hazard maps from Japan’s Ministry of Land, Infrastructure, and Tourism, our ML-derived map shows remarkable consistency, validating the approach.</p><p>Here is the flood hazard map with real past flood locations mapped. As the map shows, post past flood events happened in purple regions, indicating that the model accurately predicts high-risk regions.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TobmNCQy7034MW-ytIYQ-Q.png" /></figure><h3>Potential Errors</h3><p>The biggest error that could misrepresent flood risks is that the 2nd dataset only covered floods in the Kanto region.</p><p>While the dataset represented each risk variable’s importance to flood area accurately, a more accurate flood risk model would use a dataset of historic floods in all of Japan.</p><p>Another issue is the accuracy of the Linear Regression model.</p><p>The Linear Regression model used in this project had a low R² value of 0.0942, indicating that the model explains only about 9.42% of the variance in flood area. While a linear regression model with low R² values can still provide valuable weights for the final risk model, a stronger R² value is preferred.</p><h3>Conclusion</h3><p>This project demonstrated how you can use GIS data and machine learning to develop a hazard map. For further progress in this project, consider applying larger datasets and implementing other flood factors like urbanization and population data into the model.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=3303efff9957" width="1" height="1" alt=""><hr><p><a href="https://medium.com/thedeephub/building-a-smart-flood-map-of-japan-using-gis-and-machine-learning-3303efff9957">Building a Smart Flood Map of Japan using GIS and Machine Learning</a> was originally published in <a href="https://medium.com/thedeephub">The Deep Hub</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The In-Depth Math Behind Bulletproofs in ZKPs]]></title>
            <link>https://medium.com/@ronantech/the-in-depth-math-behind-bulletproofs-in-zkps-4eb7303eb3b3?source=rss-fbd6f4eb076e------2</link>
            <guid isPermaLink="false">https://medium.com/p/4eb7303eb3b3</guid>
            <category><![CDATA[bulletproof]]></category>
            <category><![CDATA[zero-knowledge]]></category>
            <category><![CDATA[zero-knowledge-proofs]]></category>
            <dc:creator><![CDATA[Ronan Takizawa]]></dc:creator>
            <pubDate>Sun, 23 Mar 2025 01:50:02 GMT</pubDate>
            <atom:updated>2025-03-23T01:50:02.668Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*M1_5cYvkxlBNyFWE5I7BpA.png" /></figure><p>Bulletproofs are a compact solution for zero-knowledge range proofs — proving that a value lies within a specific range without revealing the value itself. Unlike zk‑SNARKs and KZG polynomial commitments, their lack of arithmetic circuits and elimination of a trusted setup makes bulletproofs particularly well-suited for applications in need of small and fast ZKPs.</p><p>In this article, I’ll dive deep into how bulletproofs work under the hood mathematically based on their original implementation (<em>Bünz et al.).</em></p><h3>What Are Bulletproofs?</h3><p>Bulletproofs, introduced by <a href="https://eprint.iacr.org/2017/1066.pdf"><em>Bünz et al.</em></a> in 2017, are a type of non-interactive zero-knowledge proof system with several attractive properties:</p><ol><li><strong>No Trusted Setup</strong>: They don’t require a trusted setup phase, which is a computationally-heavy process of generating keys to generate and verify proofs (Used in zk-SNARKs). The lack of trusted setup can make bulletproofs more secure and easier to deploy</li><li><strong>No Need for Arithmetic Circuits: </strong>Unlike many ZK systems that rely on converting computations into complex arithmetic circuits, bulletproofs use a clever combination of the inner product argument and the Pedersen commitment scheme to create range proofs without circuit transformation. This direct algebraic approach means developers don’t need to use R1CS, which is a complex way to represent constraints of the ZK proof required in many other ZK systems</li></ol><p>With these properties in mind, bulletproofs are used in projects such as <a href="https://web.getmonero.org/resources/moneropedia/bulletproofs.html">Monero</a> to enable confidential transactions and significantly reduce transaction sizes.</p><p>At its core, bulletproofs solve the following problem:</p><p><em>How can one prove that a secret value </em><strong><em>v</em></strong><em> lies in the range [0,2^n) without revealing the value itself?</em></p><p>To do this, we will first explain how you can securely generate a proof <strong>v</strong> in a zero-knowledge manner, and then explain how to make that proof verifiable.</p><h3>How Zero-Knowledge Proofs Work</h3><p>Before diving into the mathematics of bulletproofs, it’s important to understand the general structure of zero-knowledge proof systems.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/640/0*YYpOg-hXy0oILJGW.png" /></figure><h4>Key Concepts</h4><ol><li><strong>The Prover</strong>: This party has secret information <strong>v </strong>and wants to convince others that a statement about this information is true (such as “v is in range [0, 2^n)”) without revealing the secret itself</li><li><strong>The Verifier</strong>: This party wants to be convinced that the prover’s statement is true, without learning the secret information</li><li><strong>The Protocol</strong>: A set of cryptographic interactions between the prover and verifier that leads to the verifier being convinced (or not) of the statement’s truth</li></ol><h4>The 3 Properties of ZK Proofs</h4><p>All effective zero-knowledge proof systems, including bulletproofs, must satisfy three key properties:</p><ol><li><strong>Completeness</strong>: If the statement is true and both parties follow the protocol, the verifier will be convinced</li><li><strong>Soundness</strong>: If the statement is false, no cheating prover can convince the verifier the statement is true, except with negligible probability</li><li><strong>Zero-Knowledge</strong>: The verifier never sees the secret directly and only knows the truth of the statement being proved about the secret</li></ol><h4>Challenge Variables</h4><p>To<strong> </strong>hold the properties described above, we use <strong>challenges</strong>, which are random number values applied to the protocol to ensure the proof withstands edge cases.</p><p>As you will see later, without challenges, malicious provers can brute force the protocol to find a way to validate an invalid proof by satisfying edge cases.</p><h4>Interactive / Non-Interactive Protocols</h4><p>In cryptographic protocols, there are <strong>interactive</strong> and <strong>non-interactive protocols </strong>to exchange challenges.</p><p>In an <strong>interactive protocol</strong>, the verifier and prover must exchange multiple messages in sequence:</p><ul><li>The prover makes initial commitments based on their secret</li><li>The verifier sends random challenges to check the validity of the ZK proof</li><li>The prover responds with proofs tailored to these challenges to show the proof upholds</li></ul><p>In a <strong>non-interactive protocol</strong> (like bulletproofs):</p><ul><li>The prover can run the entire proof generation in a single step with random challenges woven into the proof</li><li>Each crucial step of a proof introduces a new challenge that is generated by using a hash function that takes in variables from the proof up to that point</li><li>The prover will then express their secret <strong>v</strong> for the verifier by writing a complex polynomial expression including the challenge variables and secret <strong>v</strong> called a<strong> commitment</strong></li><li>Commitments allow the verifier to apply<strong> </strong>complex algebraic and homomorphic calculations to the prover’s commitments. This verifies that secret <strong>v </strong>withstands the proof and challenges without ever knowing the value v</li><li>Using the commitments from the prover, the verifier can verify that the secret <strong>v</strong> withstands the proof and challenges in a single step without needing to send challenges and receive responses from the prover</li></ul><p>In bulletproofs, we use 3 challenges (Challenge y, z, and x) and each are generated using the Fiat-Shamir heuristic.</p><p>When using the Fiat-Shamir heuristic, each challenge is generated from a hash function which takes in all existing proof variables up to that point (previous challenges, vectors used in the proof, and more).</p><p>This makes every challenge dependent on each other and impossible for a prover to change any part of their earlier commitments without invalidating subsequent challenges.</p><p>Without hash-derived challenges, a malicious prover could work backward from a desired outcome, crafting different responses for different challenges. But with hashes, such backtracking is not possible.</p><h3>Bulletproof Overview</h3><p>The bulletproof setup is an extremely complicated process, so please refer to these table of contents to keep track of what happens during the bulletproof protocol.</p><p><strong>Part 1: Bulletproof System Setup</strong></p><ul><li>Choosing an Elliptic curve, generator points, and vector generation parameters</li></ul><p><strong>Part 2: Creating the Original Commitment</strong></p><ul><li>Creating a Pedersen commitment to hide their secret value v while still allowing verification that it lies within the required range</li></ul><p><strong>Part 3: Building the Mathematical Foundation for Verifying the Proof</strong></p><ul><li>Breaking down the range verification into bit decomposition and creating mathematical constraints that ensure each bit is valid and sums to the original value</li></ul><p><strong>Part 4: Forming the Polynomial Constraint</strong></p><ul><li>Converting the bit decomposition and range constraints into a polynomial equation with challenge variables</li></ul><p><strong>Part 5: The Inner Product Argument for Proof Compression</strong></p><ul><li>Using recursive vector splitting and compression techniques to reduce proof size from linear complexity to logarithmic complexity</li></ul><p><strong>Part 6: Proof Verification</strong></p><ul><li>The process through which the verifier uses the provided commitments, challenges, and inner product elements to confirm the proof’s validity without learning the secret value</li></ul><h3>Part 1: Bulletproof System Setup</h3><p>Now before the prover starts generating their proof, both parties set up a bulletproof protocol that they agree on.</p><p>They will first choose which elliptic curve they will use.</p><p>Elliptic Curves are algebraic curves in the form y² = x³ + ax + b used in cryptography to provide secure variables in crypto systems.</p><p>We use elliptic curves in cryptographic systems to generate coefficients for our commitments that have no mathematical relationships between them.</p><p>This lack of mathematical relationships between coefficients is a “nothing-up-my-sleeve” approach that prevents anyone from generating backdoors in the protocol where a prover can secretly validate an invalid proof or a verifier can figure out the secret <strong>v.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XQF7AmJWKX-8LdOTeZLuYg.png" /><figcaption>Sample Elliptic Curve</figcaption></figure><p>Then, using that elliptic curve, both the prover and verifier will agree on these values that will be used to make commitments:</p><ul><li>Elliptic curve points <strong>g</strong> and <strong>h</strong> used for the original commitment</li><li>The bit length <strong>n</strong> that defines the range [0, 2^n) being proven</li><li>Vectors g = (g₀, g₁, …, gₙ₋₁) and h = (h₀, h₁, …, hₙ₋₁) of n distinct curve points used specifically for the inner product argument.</li></ul><p><strong>g</strong> and <strong>h</strong> are generated from elliptic curves by finding a secret scalar variable such that <strong>h = k·g. </strong>The properties of elliptic curves make it difficult to calculate what k is (known as the discrete logarithm problem), which will make it computationally infeasible to reverse engineer the secret <strong>v </strong>when<strong> g and h a</strong>re used in the original commitment.</p><p>Vectors <strong>g</strong> and <strong>h </strong>are generated by picking 2 points on the elliptic curve (Not the values used earlier for the original commitment), a public seed value that both the prover and verifier agree upon, and running a hash function including the 2 points, the seed value, and the index. The resulting value will then be mapped to an x-coordinate, and the corresponding y value will be part of the vector.</p><p>For example, if the 2 points picked from the elliptic curve are “g” and “h” like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/570/1*KHiuTtwoB-jKlDjS1rugzA.png" /></figure><h3>Part 2: Creating the Original Commitment</h3><p>Now we will get into the in-depth math behind bulletproofs.</p><p>To make a range proof verification that is zero-knowledge, we use Pedersen Commitments to hide the actual value like a locked box.</p><p>When the proof is sent to the verifier, the verifier can check properties of what’s inside the box (that the value is within range) without ever seeing the value itself, and the prover can’t swap what’s in the box after creating it.</p><p>Bulletproofs use Pedersen commitments like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/492/1*KBvyGUZ8OuTHD1t8z2i9Xw.png" /></figure><ul><li><strong>g</strong> and<strong> h </strong>are the points agreed upon by the prover and verifier to use for the bulletproof</li><li><strong>r is </strong>randomly generated scalar values from the elliptic curve being used that further obscure the commitment. Without this blinding factor, the commitment could reveal <strong>v</strong></li></ul><p>Commitment <strong>C</strong> and its variables will be used later to generate challenges to prove information about v, and used to generate other commitments.</p><h3>Part 3: Building the Mathematical Foundation for Verifying the Proof</h3><p>To prove that a value v lies in a range, we can use the <strong>Bit Check Trick</strong> and<strong> </strong>decompose v into its binary representation. Using this binary representation, we prove that each bit is either 0 or 1, such that these bits correctly combine to form v:</p><p>v = a₀ x 2⁰ + a₁ x 2¹…+ aₙ₋₁ x 2^(n-1) such that a ∈ {0, 1}.</p><p>For example, if v = 13 and n = 4 (we’re using 4 bits), then:</p><ul><li>a₀ = 1 , a₁ = 0, a₂ = 1, a₃ = 1</li></ul><p>and:</p><ul><li>13 = 1 x 2⁰ + 0 x 2¹ + 1 x 2² + 1 x 2³</li></ul><h4>The Bit Check Trick</h4><p>The clever insight in bulletproofs is to use the bit check trick to ensure that the committed value <strong>v</strong> lies within the specified range.</p><p>In bulletproofs, we need to verify that each component of our committed value <strong>v</strong> can be represented only by valid bits (0 or 1) with <strong>n</strong> number of bits.</p><p>If <strong>v </strong>cannot be represented as an <strong>n</strong> bit number with only values of a of 0 or 1, then we can’t validly prove that v is within the range [0, 2^n).</p><p>Imagine a prover wants to prove commitment <strong>v</strong> is in range [0, 2⁴), but their actual value is v = 20, which is out of range.</p><p>If we didn’t verify bits properly, the prover could try to represent 20 as:</p><p>a₀ = 0, a₁ = -2, a₂ = 6, a₃ = 0</p><p>Calculating: 0×2⁰ + (-2)×2¹ + 6×2² + 0×2³ = 0 + (-4) + 24 + 0 = 20</p><p>If we didn’t have the strict rule of only allowing <strong>v</strong> to be represented with valid bits (0 or 1) with <strong>n</strong> number of bits, the prover can satisfy an invalid proof.</p><p>To avoid this situation, we use the strict rule of allowing <strong>v</strong> to be represented only by valid bits (0 or 1) with <strong>n</strong> number of bits.</p><p>This strict rule can be calculated quickly using this Bit Check Trick:</p><p><strong>a value aᵢ is a bit (0 or 1) if and only if aᵢ(aᵢ-1) = 0</strong></p><h4>Representing the Value Checking as Constraints</h4><p>The 2 constraints below can explain the process described above:</p><p><strong>Constraint 1:</strong> Value Representation</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/392/1*Fs52vUm_EQPC14m_4ltOZw.png" /></figure><p>This constraint checks if all bits combine to be <strong>v, </strong>where:</p><ul><li><strong>Vector aL</strong> is a vector of the commitment v represented in binary form</li><li><strong>2^n </strong>is the length of the bits</li></ul><p>Then take the inner product (〈〉)of <strong>Vector aL with 2^n </strong>to multiply each corresponding element and the sum them to verify if the output is v.</p><ul><li>(a₀)⋅2⁰+( a₁)⋅2¹,…+(aₙ)2^n</li></ul><p><strong>Constraint 2: Bit Validity</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/214/1*iuGKq0SMQLCMgGQ65DDBBA.png" /></figure><p>This constraint combines all our bit checks into a single check, where:</p><ul><li><strong>Vector aL</strong> is a vector of the commitment v represented in binary form</li><li><strong>Vector aR </strong>is Vector aL but with<strong> </strong>each bit minus 1. This is used to do the Bit Check trick (<strong>aᵢ ⋅ (aᵢ-1) = 0</strong>).</li><li><strong>Challenge y </strong>which is a random value generated. y^n<strong> </strong>is a vector of powers of a random challenge y (y⁰,y¹…y^n)</li></ul><p>We check that <strong>Vector aR </strong>is valid by taking its element-wise multiplication (◦) with <strong>y^n </strong>to multiply each corresponding element to create one new vector.</p><ul><li>((a₀-1)⋅y₀,( a₁-1)⋅y₁,…,(aₙ-1)yₙ)</li></ul><p>Then take the inner product (〈〉)of the new vector with <strong>Vector aL </strong>to multiply each corresponding element and the sum them to get 1 number that will verify if the output is 0 or not.</p><ul><li>(a₀-1)(a₀)⋅y₀+(a₁-1)( a₁)⋅y₁,…+(aₙ-1)(aₙ)yₙ</li></ul><h4>How Challenge y improves the Proof Security</h4><p>It is important to have <strong>y^n </strong>to check for invalid proofs. Using <strong>y^n</strong>, we now represent the bit checking from:</p><p>a₀(a₀-1) + a₁(a₁-1)+… as shown earlier to</p><p>to:</p><p>a₀(a₀-1)y⁰ + a₁(a₁-1)y¹+…</p><p>The addition of a random <strong>challenge y </strong>makes it so that a malicious prover can’t brute-force their way into calculating valid values of a.</p><p>For example, imagine a malicious prover wanted to prove invalidly that v = 18 within the range [0,2⁴).</p><p>If they choose these invalid bits, they can invalidly satisfy the bit check equation:</p><p>If a₃=√3, a₂=-√3, a₁=1, a₀=0 then:</p><p>√3×(√3–1) + (-√3)×(-√3–1) + 0 + 0 = 0</p><p>To avoid this, we introduce <strong>challenge y</strong>:</p><p>Assume challenge y = 7, then:</p><p>√3×(√3–1)×1 + (-√3)×(-√3–1)×7 + 1×(0)×49 + 0×(-1)×343 = (3-√3) + (-3+√3)×7 + 0 + 0 = 3-√3–21+7√3 = -18+6√3 ≠ 0</p><p>The <strong>challenge y</strong> thus forces the bits to be either 1 or 0.</p><p>0×(-1)×1 + (1)×(0)×7 + (0)×(-1)×49 + (1)×(0)×343 = 0</p><h4>Generating Challenge y using the Hash of the Original Commitment</h4><p>If we recall from before,<strong> c</strong>hallenges in bulletproofs<strong> </strong>are<strong> </strong>generated from the Fiat-Shamir heuristic, which uses a cryptographic hash function to derive challenge values.</p><p>Challenge <strong>y</strong> is computed as this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/510/1*hBdogcnpFJV17tu2G-WcXg.png" /></figure><ul><li><strong>The Hash function </strong>is a secure hash function (like SHA-256)</li><li>The Hash function takes in all variables from the original commitment (<strong>C</strong>, <strong>g</strong>, <strong>h</strong>,the range <strong>n</strong>)</li><li><strong>q</strong> is the size of the number space we’re working in for the elliptic curve used for the original commitment (generally around 2²⁵⁶). The <strong>mod q</strong> operation ensures the challenge value fits within our cryptographic system</li></ul><p>Through this hash function,<strong> Challenge y</strong> is generated in a way such that the probability of finding a set of invalid bits that would pass an invalid verification is approximately 1/2²⁵⁶, which is very low.</p><h3>Part 4: Forming the Polynomial Constraint</h3><p>We now will combine these 2 constraints more simply by turning it into a polynomial identity.</p><p>It’s important for these 2 checks to be done in a single equation to guarantee that the prover can only send 1 set of bits that satisfies the 2 constraints.</p><p>To do this, we introduce <strong>challenge z</strong>, such that our new combined constraint is:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/646/1*xO8nk7xB0LGVV686qxVK1g.png" /></figure><p>Before <strong>aR</strong> directly took the element-wise multiplication (◦) with <strong>y^n, </strong>but now by taking the element-wise multiplication (◦) with aR + (z⋅1ⁿ), our bit checking constraint gets scaled by z.</p><p>Here,<strong> Challenge z</strong> acts as a scaling factor that links the value representation and bit validity constraints, where if we use a single variable to scale <strong>v </strong>and apply it to both the bit validity check (every bit is 0 or 1) and the value representation check (bits add up to value v), we can verify that the prover can only send 1 set of bits that satisfies the 2 constraints.</p><h4>Generating Challenge z</h4><p>Before we generate challenge z, the prover makes 2 commitments to lock in their values for vectors used in this range proof.</p><p>We will also introduce vectors that will be used in the next step of this proving process to blind the vectors and make them zero-knowledge.</p><p>We introduce two new vectors:</p><ul><li><strong>sL:</strong> A random vector of the same length as aL</li><li><strong>sR:</strong> A random vector of the same length as aR</li></ul><p>By introducing these variables and locking them in via commitments, we prevent the prover from adaptively choosing vector values after seeing challenges, while simultaneously hiding the actual vectors from the verifier.</p><p>Now we make 2 commitments</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/596/1*ySB46l8I2502eR3-rK_Mbg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/628/1*Z-J_7MiLo15BHwdVvCotdQ.png" /></figure><ul><li><strong>g</strong> and <strong>h are the same points on the elliptic curve</strong></li><li><strong>ρ</strong> and <strong>α </strong>are new<strong> </strong>randomly generated scalar values from the elliptic curve being used that further obscure the commitment.</li><li>We take the inner product of the vectors with the elliptic curve points here because it allows us to commit to entire vectors with a single elliptic curve point. This inner product approach provides homomorphic properties, meaning mathematical operations on the hidden vectors can be mirrored by operations on their commitments. This enables the verifier to check relationships between the committed values without seeing the actual vectors.</li></ul><p>The hash function to generate challenge <strong>z</strong> is now the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*JEF3l4_74V12bhpjBdQ6qw.png" /></figure><p><strong>Challenge z</strong> is derived by hashing challenge <strong>y</strong> and commitment <strong>A</strong> and <strong>S, </strong>and again modding<strong> q</strong> to ensures the challenge value fits within our cryptographic system while making it hard to crack.</p><h4>Adding Blinding to the Constraints</h4><p>Now as stated before, we will use vectors <strong>sL</strong> and <strong>sR</strong> to introduce blinding.</p><p><strong>Blinding</strong> is a method to hide the actual values in our proof while still allowing verification, ensuring that the verifier can confirm the validity of a range proof without learning anything about the specific value being proven or its binary representation.</p><p>We will also introduce <strong>challenge x </strong>and transform our original vectors into 2 vector polynomials (functions of x):</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/392/1*sb96IfqYvMt3oLaLf1xaCw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/862/1*EB9hq6nOSuSm8HlvOvWUxQ.png" /></figure><p>Through these vector polynomials, the verifier can now do the same range proof verification described above, but without ever knowing the actual vectors <strong>aL</strong>, <strong>aR</strong>, <strong>sL</strong>, or <strong>sR</strong> and ensuring that its zero-knowledge.</p><p>With these relations, if you plug in x=0, you will get the values of aL and aR, but not if you plug any value that isn’t 0. This verifies that <strong>challenge x</strong> precisely blinds aL and aR.</p><h4>Simplifying the Equation</h4><p>To satisfy , we can satisfy both vector polynomials using this equation in terms of <strong>challenge x:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/320/1*6j8J1aP4dea3-9vb_L6W5A.png" /></figure><p>Here, we use challenge x to connect the polynomials<strong> l(x) </strong>and <strong>r(x)</strong> while integrating a challenge to ensure that the prover can’t commit specific values beforehand and manipulate the proof after seeing the challenge.</p><p>With this equation, you can substitute<strong> l(x)</strong> and <strong>r(x)</strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/540/1*P0T-dCI4V4ydeYudt3j7Pg.png" /></figure><p>then expand the inner products:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/668/1*dIi_GklQ9klfscgsqMdW5g.png" /></figure><p>and group them:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/526/1*8wLq-Jxs2gaShPiFkzL7BA.png" /></figure><p>If you look at the equation above, the first group has no element x, the second group has x, and the third has x².</p><p>This makes this property true:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/448/1*l6se3-w0CQYaE6mNbb83GQ.png" /></figure><p>Now the equation can be simply described as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/288/1*6FwcUySJhIfZMHVkpR4yvQ.png" /></figure><p>The prover can now make their commitment in this simplified equation by entering values for t₀, t₁, and t₂.</p><h4>Committing the Range Proof Checks</h4><p>Before we generate challenge x, like before we will make 2 commitments for t₁ and t₂ to lock in their values.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/704/1*zafMqBaQARXTn3a0Y2eKKw.png" /></figure><p>An important point here is that t₀=z·v, and is the reason why we don’t make a commitment T₀ since it can be verified with the original commitment C (Will be explained later).</p><p>In this example,<strong> </strong>we have values<strong> g </strong>and<strong> h </strong>generated from elliptic curves, and<strong> τ₁</strong> and <strong>τ₂</strong> which are randomly generated values from the elliptic curve being used that further obscure the commitment.</p><p>Without blinding factors <strong>τ₁</strong> and <strong>τ₂</strong>, the verifier can plug in values g and h to get t₁ and t₂, then run t₀ = t̂ — t₁·x — t₂·x² to get t₀, and calculate v from t₀=z·v.</p><p>To avoid this, we introduce randomly generated values τ₁ and τ₂ and keep the zero-knowledge properties.</p><h4>Generating Challenge x</h4><p><strong>Challenge x </strong>is derived by hashing challenge <strong>y, z, </strong>commitments <strong>A</strong> and <strong>S, and commitments T₁ </strong>and <strong>T</strong>₂. Again modding<strong> q</strong> to ensures the challenge value fits within our cryptographic system while making it hard to crack.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/834/1*z_XE9-NpXv0jgMuMY5o7Sg.png" /></figure><p>In the end the prover will send t̂ = t(x), along with commitments T₁ and T₂ and the combined blinding value <strong>τₓ</strong>, which is calculated as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/604/1*kpWBxDVWwlut9c_49q6dWw.png" /></figure><h4>Why we need a Combined Blinding Value τₓ</h4><p>We need to send a combined blinding value τₓ because if <strong>τ₁</strong> and <strong>τ₂ a</strong>re sent directly, the verifier can use challenge <strong>x</strong>, <strong>τ₁</strong> and <strong>τ₂ to calculate </strong>t₁ and t₂.</p><p>If the verifier receives <strong>τ₁</strong>= 2791 and <strong>τ₂</strong>= 3844, they can unblind the commitments:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/226/1*0g_TFBXJvHne7U7blEPtRw.png" /></figure><p>Now the verifier knows <strong>t₁</strong>= 153 and <strong>t₂</strong>.= 78.</p><p>Then if the verifier uses the calculated t₁ t₂ values with challenge <strong>z and </strong>t̂, they can calculate<strong> v.</strong></p><p>Let’s say the challenge x = 7 and the prover has claimed t̂= 4578. The verifier calculates:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/294/1*P4qpImXxEgWikt3PGJJHEQ.png" /></figure><p>From there, the verifier can divide<strong> t₀</strong> by challenge<strong> z</strong> to get <strong>v</strong>.</p><p>By hiding <strong>τ₁</strong> and <strong>τ₂ </strong>and sending<strong> </strong>τₓ instead, we ensure this commitment stays zero-knowledge.</p><h3>Part 5: The Inner Product Argument for Proof Compression</h3><p>After creating commitments to the polynomial coefficients, we need an efficient way to prove that t(x) = ⟨l(x), r(x)⟩ without sending the actual vectors l(x) and r(x) because sending those vectors could be very large.</p><p>This is where the inner product argument comes in.</p><p>The inner product argument allows us to represent the equation t(x) = ⟨l(x), r(x)⟩ into a much smaller and compressed version, building the definitive small proof sizes of bulletproofs.</p><p>To implement the inner product argument, we first set the vectors as</p><ul><li><strong>α </strong>= l(x)</li><li><strong>b</strong> = r(x)</li></ul><p>Next the vectors are split in half:</p><ul><li><strong>αL</strong> = first half of α, <strong>αR</strong> = second half of <strong>α</strong></li><li><strong>b</strong>L = first half of <strong>b</strong>, <strong>bR</strong> = second half of <strong>b</strong></li></ul><p>Now, we create a commitment to these cross products:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/590/1*j-4K9xFfDwbk0P_sJV04pg.png" /></figure><p>We create commitments here to serve as checkpoints so the verifier can recalculate the inner product argument without directly seeing vectors <strong>α </strong>and <strong>b </strong>to maintain the zero-knowledge properties.</p><p>Finally, the vectors are compressed using this challenge:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/460/1*5qV63a5J060aw5ru-_B2Ug.png" /></figure><p>A new challenge<strong> u</strong> is introduced here so the prover cannot manipulate the compression process and select values that make invalid proofs pass verification.</p><p>The challenge <strong>u</strong> is generated like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/796/1*30Lknrm6p30Ircl2RnOz1Q.png" /></figure><p>This challenge repeats log₂(n) times until reaching a point where the vectors become single numbers. Since the vectors are of length n, , it takes exactly log₂(n) rounds of halving to reduce them to length 1.</p><p>This recursive approach creates a proof with only 2·log₂(n) additional curve points, achieving the logarithmic size that makes bulletproofs so efficient.</p><h4>Example Proof Compression</h4><p>Let’s say we start with vectors of length 8 (n = 8), so we’ll need log₂(8) = 3 rounds of compression.</p><p><strong>Initial vectors:</strong></p><ul><li>α = [α₀, α₁, α₂, α₃, α₄, α₅, α₆, α₇]</li><li>b = [b₀, b₁, b₂, b₃, b₄, b₅, b₆, b₇]</li></ul><p><strong>Round 1</strong></p><p>Setup:</p><ul><li>Split α: <strong>αL</strong> = [α₀, α₁, α₂, α₃], <strong>αR</strong> = [α₄, α₅, α₆, α₇]</li><li>Split b: <strong>bL</strong> = [b₀, b₁, b₂, b₃], <strong>bR</strong> = [b₄, b₅, b₆, b₇]</li><li>Create commitments L₁ and R₁</li><li>Generate challenge u₁ (Takes in L₁ and R₁)</li></ul><p>Compression:</p><ul><li><strong>a</strong>’ = u₁⁻¹·[α₀, α₁, α₂, α₃] + u₁·[α₄, α₅, α₆,α₇]</li><li><strong>b</strong>’ = u₁·[b₀, b₁, b₂, b₃] + u₁⁻¹·[b₄, b₅, b₆, b₇]</li></ul><p>If you expand this it will be:</p><ul><li>α’ = [u₁⁻¹·α₀ + u₁·α₄, u₁⁻¹·α₁ + u₁·α₅, u₁⁻¹·α₂ + u₁·α₆, u₁⁻¹·α₃ + u₁·α₇]</li><li>b’ = [u₁·b₀ + u₁⁻¹·b₄, u₁·b₁ + u₁⁻¹·b₅, u₁·b₂ + u₁⁻¹·b₆, u₁·b₃ + u₁⁻¹·b₇]</li></ul><p>These are now vectors of length 4.</p><p><strong>Round 2</strong></p><p>Setup:</p><ul><li><strong>αL</strong> = [α’₀, α’₁] <strong>αR</strong> = [α’₂, α’₃]</li><li><strong>bL</strong> = [b’₀, b’₁], <strong>bR</strong> = [b’₂, b’₃]</li><li>Remember, <strong>αL</strong> = [α’₀, α’₁] = [u₁⁻¹·α₀ + u₁·α₄, u₁⁻¹·α₁ + u₁·α₅]</li><li>Create commitments L₂ and R₂</li><li>Generate challenge u₂ (Takes in L₂ and R₂)</li></ul><p>Compress:</p><ul><li>α’’ = u₂⁻¹·[α’₀, α’₁] + u₂·[α’₂, α’₃]</li><li>b’’ = u₂·[b’₀, b’₁] + u₂⁻¹·[b’₂, b’₃]</li><li>These are now vectors of length 2</li></ul><p><strong>Round 3:</strong></p><p>Setup:</p><ul><li><strong>αL</strong> = [α’’₀], <strong>αR</strong> = [α’’₁]</li><li><strong>bL</strong> = [b’’₀],<strong> bR </strong>= [b’’₁]</li><li>Create commitments L₃ and R₃</li><li>Generate challenge u₃ (Takes in L₃ and R₃)</li></ul><p>Compress:</p><ul><li>α’’’ = u₃⁻¹·α’’₀ + u₃·α’’₁</li><li>b’’’ = u₃·b’’₀ + u₃⁻¹·b’’₁</li><li>These are now scalars (vectors of length 1)</li></ul><p>After 3 rounds, our 8-element vectors have been compressed to single values, and we’ve created 6 additional commitments (3 pairs of L and R).</p><h3>Part 6: Proof Verification</h3><p>Finally, we will get into how the verifier verifies the proof.</p><h4>What the Verifier Receives</h4><p>Through this proof setup the verifier receives these items:</p><ul><li>Original commitment <strong>C</strong></li><li>Vector commitments <strong>A</strong> and <strong>S</strong></li><li>Polynomial coefficient commitments <strong>T₁ </strong>and <strong>T</strong>₂</li><li>Combined blinding value <strong>τₓ</strong></li><li>Inner product argument elements: L₁, R₁, L₂, R₂, …, (approximately 2·log₂(n) curve points)</li><li>Final scalars a’ and b’ (the completely compressed vectors at the end of the inner product argument recursion)</li></ul><p>The verifier does not receive:</p><ul><li>The secret value <strong>v</strong></li><li>The bit vectors <strong>aL</strong> and <strong>aR</strong></li><li>The blinding vectors <strong>sL</strong> and <strong>sR</strong></li><li>The original blinding factor <strong>r</strong></li><li>Any of the polynomial coefficients directly (<strong>t₀, t₁, t₂</strong>)</li></ul><h4>Steps for Verification</h4><p>After receiving all proof elements from the prover, the verifier follows these steps to check the validity of the range proof.</p><h4>1. Recreate the Challenges</h4><p>The verifier independently generates the same challenge values using the Fiat-Shamir heuristic:</p><ul><li>Challenge <strong>y </strong>= Hash(C || g, h, n) mod q</li><li>Challenge <strong>z </strong>= Hash(y || A || S) mod q</li><li>Challenge<strong> x </strong>= Hash(y || z || A || S || T₁ || T₂) mod q</li></ul><h4>2. Verify the Original Commitment</h4><p>First, the verifier needs to check that t₀= z · v</p><p>Since the verifier doesn’t know v directly but has the commitment C, they can create a commitment to z · v by using its homomorphic property.</p><p>The homomorphic property of Pedersen commitments lets operations on the committed values be mirrored by performing operations on the commitments themselves.</p><p>If C = v·g + r·h is a commitment to value v, then we can create a new commitment to z · v with a new blinding factor r.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/874/1*thi-BjmsVbMFfUSh0gVauA.png" /></figure><p>This expression will be used next in the polynomial commitment.</p><h4>3. Verify the Polynomial Commitment</h4><p>Next the verifier confirms commitments T₁ and T₂, which prove that the secret value v satisfies all range constraints required for the bulletproof.</p><p>Now we will discuss why we don’t use <strong>T₀.</strong></p><p>If we recall back to the polynomial creation, t₀ is this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/754/1*252qAgAVOYIxVorw6LSi8g.png" /></figure><p>When we expand t<strong>₀, </strong>we get ⟨aL, y^n ○ aR⟩ + ⟨aL, y^n ○ (z·1^n)⟩, which are the valid bit constraints and the range proof constraints.</p><p>If the bits are valid, ⟨aL, y^n ○ aR⟩ equals 0, and the second term ⟨aL, y^n ○ (z·1^n)⟩ simplifies to z·⟨aL, y^n⟩, where if the bits are valid cancels to z·⟨aL, 2^n⟩ = z·v.</p><p>Therefore, t₀ = z·v which is the same as the new commitment z·C we just formed, so we don’t need to commit a separate commitment T<strong>₀.</strong></p><p>Now using T₁,T₂, t̂, and τₓ, the verifier checks this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/694/1*dXlSNcDRnf-OrSSL8TcLWw.png" /></figure><p>If these match, it confirms that the prover honestly evaluated t(x) at point x, the polynomial t(x) encodes the range proof constraints correctly, the value t₀ equals z·v, meaning v is in the required range, and all commitments are consistent with each other.</p><h4>4. Verify the The Inner Product Argument Commitment</h4><p>Finally, the verifier must ensure that t̂ = ⟨l(x), r(x)⟩ by rerunning the inner product argument.</p><p>Currently, we’ve only proven that the original commitment C properly encodes a value v within range [0, 2^n) and that the commitment relationship t₀ = z·v holds.</p><p>This isn’t sufficient to guarantee that the prover didn’t manipulate the polynomial coefficients t₁ and t₂ to validate an invalid proof.</p><p>For example, a dishonest prover might:</p><ul><li>Create a valid commitment C to the value v = 20</li><li>Correctly calculate z·C as required</li><li>But choose fake vectors for aL and aR that don’t properly represent v = 20 in binary and calculate fake values for t₁ and t₂ that, when combined with the true t₀ = z·v, create a polynomial t(x) that satisfies t(x) = ⟨l(x), r(x)⟩ at the challenge point x</li></ul><p>The inner product argument verification is therefore a critical step to ensure that t̂ = ⟨l(x), r(x)⟩.</p><p>Using the two vectors of generator points from the bulletproof setup <strong>g</strong> and <strong>h </strong>(g = (g₀, g₁, …, gₙ₋₁) and h = (h₀, h₁, …, hₙ₋₁)) and challenge u, the verifier will run the same compression protocol the prover used to verify that the compressed inner product argument, indicating that t̂ = ⟨l(x), r(x)⟩.</p><p>The verifier will also collect values <strong>s</strong> and <strong>t </strong>for every element of the vectors at every round which show how challenge u is applied to each element of the vector and helps track the compression the prover’s vectors went through.</p><p>In the end, the verifier will run a complex exponentiation equation including the final form of <strong>g</strong> and <strong>h, </strong>all values of<strong> s and t</strong>, challenge u, and the original commitment C to check that the prover used vectors whose inner product equals exactly t̂ and that these vectors are the same ones committed to in the original commitment C.</p><p><strong>Running the Inner Product Argument</strong></p><p>The verifier begins with the two vectors of generator points from the bulletproof setup: g = (g₀, g₁, …, gₙ₋₁) and h = (h₀, h₁, …, hₙ₋₁)</p><p>For each level i from 1 to log₂(n), the verifier conceptually splits the current generators into left and right halves, and using challenge uᵢ, the verifier updates the generators:</p><ul><li><strong>gL</strong> = first half of current g</li><li><strong>gR</strong> = second half of current g</li><li><strong>hL</strong> = first half of current h</li><li><strong>hR</strong> = second half of current h</li></ul><p>Next we will generate Challenge u similarly to before where:</p><ul><li><strong>u</strong>ᵢ = H(x || Lᵢ⁻ || Rᵢ⁻) mod q</li></ul><p>Finally we apply the challenge and get the new form of vectors <strong>g</strong> and <strong>h</strong>:</p><ul><li>g’ = uᵢ⁻¹·gL + uᵢ·gR</li><li>h’ = uᵢ·hL + uᵢ⁻¹·hR</li></ul><p>Just like the prover did, the verifier repeats this process for log₂(n) times until g and h are single values.</p><p>We do this so the generators transform in a way that mirrors the compression of the vectors being proven, allowing us to verify the inner product relationship without ever seeing the original vectors.</p><p><strong>Deriving s</strong>ᵢ <strong>and t</strong>ᵢ<strong>:</strong></p><p>While we run the inner product argument, we also collect values <strong>s</strong>ᵢ and <strong>t</strong>ᵢ<strong> </strong>for every element i from 0 to n-1 for j rounds from round 1 to round log₂(n).</p><p>These values tell us how challenge u was applied to each element of each vector every round.</p><p><strong>Example Demonstration of Deriving sᵢ and tᵢ:</strong></p><p>If we recall back to how the prover compressed the vectors in the inner product argument, in each round, every element of the original vectors are multiplied by either u₁ or u₁⁻¹.</p><p>If we had the same vector setup as the inner product argument example we started with:</p><p><strong>Round 1:</strong></p><ul><li><strong>αL</strong> = [α₀, α₁, α₂, α₃], <strong>αR</strong> = [α₄, α₅, α₆, α₇]</li><li><strong>bL</strong> = [b₀, b₁, b₂, b₃], <strong>bR</strong> = [b₄, b₅, b₆, b₇]</li></ul><p>During every round of compression from the prover, each element is multiplied by either u₁ or u₁⁻¹.</p><p><strong>Round 2:</strong></p><p><strong>αL </strong>= [u₁⁻¹·α₀ + u₁·α₄, u₁⁻¹·α₁ + u₁·α₅], <strong>αR</strong> = [u₁⁻¹·α₂ + u₁·α₆, u₁⁻¹·α₃ + u₁·α₇]</p><p><strong>bL</strong> = [u₁·b₀ + u₁⁻¹·b₄, u₁·b₁ + u₁⁻¹·b₅], <strong>bR</strong> = [u₁·b₂ + u₁⁻¹·b₆, u₁·b₃ + u₁⁻¹·b₇]</p><p>Tracking whether an element was multiplied by u₁ or u₁⁻¹ is what <strong>s</strong>ᵢ <strong>and t</strong>ᵢ<strong> </strong>tells us for levels i from 0 to n-1.</p><p>With this in mind, the verifier will calculate whether α₀ was applied u₁ or u₁⁻¹ for round 1, whether α₀ was applied u₁ or u₁⁻¹ for round 2, for all elements and all rounds, generating 48 total values of <strong>s</strong> and <strong>t</strong> for this example (8positions × 3rounds × 2 vectors).</p><p>Let’s look at position 5 (element α₅) as an example.</p><p>For each position, representing the position in binary form is a trick to show how the challenge was applied for that element.</p><p><strong>5 in binary</strong>: 101</p><ul><li>Rightmost bit (1): Tells us α₅ is in the RIGHT half in round 1</li><li>Middle bit (0): Tells us α₅’s value flows to the LEFT half in round 2</li><li>Leftmost bit (1): Tells us α₅’s value ends up in the RIGHT half in round 3</li></ul><p><strong>Calculate the values based on which half the element is in</strong>:</p><p><strong>Round 1</strong>: (Right half)</p><ul><li>s₁[5] = u₁⁻¹ (inverse of challenge u₁)</li><li>t₁[5] = u₁</li></ul><p><strong>Round 2</strong>: (Left half)</p><ul><li>s₂[5] = u₂</li><li>t₂[5] = u₂⁻¹</li></ul><p><strong>Round 3</strong>: (Right half)</p><ul><li>s₃[5] = u₃⁻¹</li><li>t₃[5] = u₃</li></ul><p><strong>s </strong>tells us<strong> </strong>exactly how elements in vector b get transformed during compression (the direct multiplier), and inversely how elements in vector a get transformed.</p><ul><li>In round 1: α₅ is multiplied by u₁ in the compression formula</li><li>In round 2: α₅ is multiplied by u₂⁻¹ in the compression formula</li><li>In round 3: α₅ is multiplied by u₃ in the compression formula</li></ul><p>Combining these values of <strong>s</strong> and <strong>t</strong> with final scalar values<strong> a’ </strong>and <strong>b’ </strong>from the prover<strong> </strong>captures how each element in the prover’s vectors were manipulated, and will be used in the final exponentiation equation.</p><p><strong>Running the exponentiation equation</strong></p><p>After log₂(n) rounds, the verifier has generated:</p><ul><li>Two final generators g_final (a single point) and h_final (a single point) from the verifier</li><li>All scalar values sᵢ and tᵢ the verifier calculated to check how elements would move through the compression path</li></ul><p>From the initial prover exchange, the verifier also has:</p><ul><li>The claimed final scalars a’ and b’ from the prover</li><li>All L and R values from each recursion level given by the prover</li></ul><p>These variables allow us to run our final check by building this complex exponentiation equation.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/946/1*9A6P_bN_JLdl8CHBSxFbFg.png" /></figure><p>The left side represents the compressed form of our verification after all the protocol’s recursive steps, where gFinalᵃ’ and hFinalᵇ’ encode the final compressed vectors, while the product term incorporates all intermediate commitment points Lᵢ and Rᵢ adjusted by their corresponding sᵢ and tᵢ values. These sᵢ and tᵢ values precisely track how each element of the original vectors was transformed by challenges uᵢ at each recursive level, effectively capturing the entire compression history in a single expression.</p><p>The right side represents what the result should be if the claimed inner product is correct, where C is the original commitment to vectors a and b, and u_finalᵗ encodes the claimed inner product value t̂ adjusted by the accumulated challenge factor u_final.</p><p>When these two sides are equal, it cryptographically verifies that:</p><ol><li>The prover has vectors that were compressed to <strong>a’</strong> and <strong>b’</strong> whose inner product equals exactly t̂</li><li>Those vectors are the same ones committed to in the original commitment C</li><li>All mathematical constraints in the protocol hold true</li></ol><h3>Conclusion</h3><p>Through this demonstration, I hope you’ve gained a thorough understanding of how Bulletproofs work and why there are many complex elements in the protocol.</p><p>Please <strong>follow my account</strong> and stay tuned as I’m currently implementing an accelerated bulletproof using CUDA 👀.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4eb7303eb3b3" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Using CUDA to accelerate ZKPs]]></title>
            <link>https://medium.com/@ronantech/using-cuda-to-accelerate-zkps-6fc1213daa31?source=rss-fbd6f4eb076e------2</link>
            <guid isPermaLink="false">https://medium.com/p/6fc1213daa31</guid>
            <category><![CDATA[cuda]]></category>
            <category><![CDATA[zero-knowledge-proofs]]></category>
            <category><![CDATA[cuda-toolkit]]></category>
            <category><![CDATA[zkp]]></category>
            <dc:creator><![CDATA[Ronan Takizawa]]></dc:creator>
            <pubDate>Sun, 02 Mar 2025 19:41:35 GMT</pubDate>
            <atom:updated>2025-03-02T19:41:35.916Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-H7qkFNLzK-ozZnHTOjksw.png" /></figure><p><a href="https://github.com/ronantakizawa/cudazkp">Full Code</a> (Make sure to leave the Original Repo a Star!) ⭐️</p><p>Zero-knowledge proofs (ZKPs) have emerged as a powerful tool for verifying computations in a decentralized manner without revealing sensitive information. As ZKPs become integrated into large-scale applications (ZK-TLS, ZK Rollups), the ability to efficiently verify large batches of proofs becomes crucial. Generating and verifying ZKPs involve complex cryptographic operations, including large-scale polynomial computations and elliptic curve pairings, all of which require significant computational resources and time.</p><p>In this article, we will explore how a GPU runtime using CUDA can dramatically improve ZKP verification performance compared to a CPU runtime.</p><h3>Understanding the Zero-Knowledge Proof Verification</h3><p>In this implementation, we will use the Schnorr Protocol (One of the simplest ZKPs) to avoid ZKP implementation complexity and focus on performance comparison.</p><h4><strong>Setup</strong></h4><ul><li>A large prime <strong>p</strong> is chosen as the modulus.</li><li>A generator <strong>g</strong> of a multiplicative group modulo ppp is selected.</li><li>The prover has a <strong>secret witness</strong> <strong>w</strong>, and computes a <strong>public key h, </strong>where</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/326/1*R7rg9IG08_Ra_F_Gzpt_rA.png" /></figure><ul><li>This means <strong>h</strong> is known to both the verifier and the prover, but <strong>w</strong> remains secret.</li></ul><h4><strong>Proof Generation</strong></h4><p>The protocol follows a three-step challenge-response process: <strong>commitment, challenge, and response</strong>.</p><p>The proof verification protocol follows a three-step challenge-response process: <strong>commitment, challenge, and response</strong>.</p><p><strong>Step 1: Commitment</strong></p><ul><li>The prover randomly picks a number <strong>r</strong> and computes</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/316/1*jPCrslBCCL5KdDnWABOKeg.png" /></figure><ul><li><strong>C</strong> is called the <strong>commitment</strong> and is sent to the verifier.</li></ul><p><strong>Step 2: Challenge</strong></p><ul><li>The verifier sends a challenge <strong>c</strong> (chosen randomly).</li></ul><p><strong>Step 3: Response</strong></p><ul><li>The prover computes the response</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/516/1*_tvNOiDLRYxj0W6-m1rOog.png" /></figure><ul><li><strong>s</strong> is sent to the verifier.</li><li>It encodes both the random value <strong>r</strong> and the hidden secret <strong>w</strong> influenced by the challenge <strong>c</strong>.</li></ul><h4>Verification</h4><p>The verifier checks whether the following equation holds</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/522/1*1zbJLVfOjeKEYB9ivhVTTQ.png" /></figure><p><strong>Why does this work?</strong></p><ul><li>Since we substitute s=r+c⋅w we get:</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/442/1*vLv-2fNK4jwJOgKuiD2o-w.png" /></figure><ul><li>Since h=g^w, we can say:</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/230/1*CURa0dOcfV0WkQB7nYTxfw.png" /></figure><p>If this equation holds, it confirms the prover knows <strong>w</strong> without revealing it.</p><h3>Benchmarking Setup</h3><h4>Iterations</h4><p>The implementation tests both CPU and GPU approaches of the Schnorr Protocol with increasing batch sizes (1,000, 10,000, 100,000, and 1,000,000 proofs) and records:</p><ol><li>Execution time (milliseconds)</li><li>Throughput (proofs verified per second)</li><li>Speedup (ratio of CPU time to GPU time)</li></ol><h3>Implementation Details</h3><p>For every run parameters are initialized where:</p><ul><li>g is the <strong>generator</strong>.</li><li>p is the <strong>large prime modulus</strong>.</li><li>The prover computes h = g^witness mod p, which acts as the <strong>public key</strong> (h).</li></ul><pre>unsigned long long g = 2;<br>unsigned long long witness = 42;<br>unsigned long long p = 2147483647;  // 2^31 - 1<br>unsigned long long h = hostModPow(g, witness, p);</pre><p>Each proof’s commitment is computed in a loop.</p><ul><li>r is a random nonce (simulated deterministically using i % 1000000).</li><li>commitments[i] = g^r mod p hides w while allowing verification.</li></ul><pre>commitments[i] = hostModPow(g, r, p);</pre><p>The challenge c is <strong>simulated deterministically</strong> for testing.</p><pre>challenges[i] = i % 1000;</pre><p>For response computation:</p><ul><li>The prover computes s = r + c * w mod (p-1), encoding the proof without revealing w.</li><li>The prover sends (commitments[i], responses[i], challenges[i]) to the verifier.</li></ul><pre>responses[i] = (r + challenges[i] * witness) % (p-1);</pre><p>Finally, the verifier checks whether the proof is valid <strong>without knowing </strong><strong>w</strong>.</p><p>CPU</p><pre>unsigned long long left = hostModPow(args-&gt;g, args-&gt;responses[i], args-&gt;p);<br>unsigned long long right = (args-&gt;commitments[i] * hostModPow(args-&gt;h, args-&gt;challenges[i], args-&gt;p)) % args-&gt;p;<br>args-&gt;results[i] = (left == right) ? 1 : 0;</pre><p>GPU</p><pre>unsigned long long left = deviceModPow(g, responses[idx], p);<br>unsigned long long right = (commitments[idx] * deviceModPow(h, challenges[idx], p)) % p;<br>results[idx] = (left == right) ? 1 : 0;</pre><p>The CPU verification process is executed using <strong>pthreads</strong>, where the workload is distributed across multiple threads (up to a maximum of 16). Each thread verifies a subset of proofs independently, and the results are collected after all threads complete execution.</p><p>The GPU verification kernel processes proofs in parallel using <strong>CUDA</strong>. The number of threads per block is set to 256, and the total number of blocks is determined dynamically based on the batch size. The kernel executes modular exponentiation computations for each proof in parallel, leveraging thousands of GPU cores for high throughput.</p><h3>Results: GPU Dominance at Scale</h3><p>The results show dramatic performance advantages for the GPU implementation, especially with larger batch sizes:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*BW1gSxoZ9J2cZl6KeKnk3g.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-H7qkFNLzK-ozZnHTOjksw.png" /></figure><h3>Why GPUs Outperform CPUs in ZKP Verification</h3><ol><li><strong>Massively Parallel Processing</strong><br>GPUs excel at parallel computation, utilizing thousands of cores to execute tasks simultaneously. In contrast, our test CPU is constrained to just 8 threads. With multiple Streaming Multiprocessors (SMs), modern GPUs can efficiently handle a vast number of verification tasks in parallel, significantly boosting performance.</li><li><strong>Optimized Memory Access</strong><br>Unlike CPUs, GPUs are designed for high-throughput parallel memory access. While not immediately visible in the code, the GPU’s memory subsystem ensures coalesced memory operations, reducing bottlenecks and enabling efficient DRAM access patterns. This allows for seamless execution of large-scale proof verification.</li></ol><h3>Conclusion</h3><p>The experimental results clearly demonstrate the superiority of GPU-based verification for zero-knowledge proofs, with speedups of up to 258x for large batches in the experiment. The massive throughput advantage of GPUs opens new possibilities for cryptographic systems that were previously constrained by verification performance.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=6fc1213daa31" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Cache-Augmented Generation (CAG) in LLMs: A Step-by-Step Tutorial]]></title>
            <link>https://blog.gopenai.com/cache-augmented-generation-cag-in-llms-a-step-by-step-tutorial-6ac35d415eec?source=rss-fbd6f4eb076e------2</link>
            <guid isPermaLink="false">https://medium.com/p/6ac35d415eec</guid>
            <category><![CDATA[retrieval-augmented]]></category>
            <category><![CDATA[retrieval-augmented-gen]]></category>
            <category><![CDATA[llm]]></category>
            <dc:creator><![CDATA[Ronan Takizawa]]></dc:creator>
            <pubDate>Thu, 02 Jan 2025 00:04:40 GMT</pubDate>
            <atom:updated>2025-06-05T13:37:24.980Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*DIzPQYzWxlGo0RXNpAtH5g.png" /></figure><p><a href="https://github.com/ronantakizawa/cacheaugmentedgeneration">Full Code </a>(Make sure to leave the Original Repo a Star!) ⭐️</p><p><strong>Retrieval-augmented generation (RAG)</strong> is a powerful method to connect external knowledge bases to an LLM and fetch context each time a user asks a question, but it can slow down the LLM’s performance due to its retrieval latency.</p><p><strong>Cache-augmented generation (CAG)</strong> offers a faster alternative; instead of performing real-time retrieval, it <em>preloads</em> your relevant documents into the model’s context and stores that inference state — also known as a Key-Value (KV) cache. This approach eliminates retrieval latencies, allowing the model to access preloaded information instantly for faster and more efficient responses.</p><p>For a more technical explanation of CAG, check out <a href="https://medium.com/@sahin.samia/cache-augmented-generation-a-faster-simpler-alternative-to-rag-for-ai-2d102af395b2">this article</a>.</p><p>In this tutorial, we will show how to build a simple <strong>CAG </strong>setup to<strong> </strong>embed all your knowledge upfront, quickly answer multiple user queries, and reset the cache without reloading the entire context each time.</p><h4>Prerequisites</h4><p>1. A HuggingFace account and a HuggingFace access token</p><p>2. A document.txt file with sentences about yourself.</p><h4>Project Setup</h4><p>We import the essential libraries:</p><ul><li>torchfor PyTorch.</li><li>transformers for Hugging Face.</li><li>DynamicCache for storing the model’s key-value states.</li></ul><pre>import torch<br>from transformers import AutoTokenizer, AutoModelForCausalLM<br>from transformers.cache_utils import DynamicCache<br>import os</pre><h4>Generate Function</h4><p>We’ll next define the generate function.</p><p>The generate function handles token-by-token generation with the cached knowledge using greedy decoding.</p><p>Greedy decoding is a simple text generation method where, at each step, the token with the highest probability (maximum value in the logits) is selected as the next token.</p><p>We pass in these inputs:</p><ul><li>model: The LLM, which with me Mistral-7B for this tutorial.</li><li>input_ids: A tensor containing the tokenized input sequence.</li><li>past_key_values: The core component of the CAG. A cache of previously computed attention values is used to speed up inference by avoiding recomputation.</li><li>max_new_tokens: The maximum number of new tokens to generate. The default is 50.</li></ul><p>The function operates in a loop that iterates up to max_new_tokens times or terminates early if an end-of-sequence token (if configured) is generated.</p><p>At each iteration:</p><ul><li>The model processes the current input tokens along with the cached past_key_values, producing logits for the next token.</li><li>The logits are analyzed to identify the token with the highest probability using greedy decoding.</li><li>This new token is appended to the output sequence, and the cache (past_key_values) is updated to include the current context.</li><li>The newly generated token becomes the input for the next iteration.</li></ul><pre>def generate(model, input_ids: torch.Tensor, past_key_values, max_new_tokens: int = 50) -&gt; torch.Tensor:<br>    device = model.model.embed_tokens.weight.device<br>    origin_len = input_ids.shape[-1]<br>    input_ids = input_ids.to(device)<br>    output_ids = input_ids.clone()<br>    next_token = input_ids<br><br>    with torch.no_grad():<br>        for _ in range(max_new_tokens):<br>            out = model(<br>                input_ids=next_token,<br>                past_key_values=past_key_values,<br>                use_cache=True<br>            )<br>            logits = out.logits[:, -1, :]<br>            token = torch.argmax(logits, dim=-1, keepdim=True)<br>            output_ids = torch.cat([output_ids, token], dim=-1)<br>            past_key_values = out.past_key_values<br>            next_token = token.to(device)<br><br>            if model.config.eos_token_id is not None and token.item() == model.config.eos_token_id:<br>                break<br>    return output_ids[:, origin_len:]</pre><h4>DynamicCache Setup</h4><p>Next, we’ll define the get_kv_cache function that prepares a reusable key-value cache for a transformer model’s attention mechanism and the clean_up function that cleans the key-value cache by removing unnecessary entries to ensure that you can answer multiple independent questions without “polluting” the cache.</p><p>get_kv_cache passes a prompt (in our case, the knowledge from document.txt) through the model once, creating a KV cache that records all the hidden states from each layer.</p><p>get_kv_cache passes in these inputs:</p><ul><li>model: The transformer model used for encoding the prompt.</li><li>tokenizer: Tokenizer to convert the prompt into token IDs.</li><li>prompt: A string input is used as the prompt.</li></ul><p>and returns an object of the type DynamicCache.</p><p>The get_kv_cache function first tokenizes the provided prompt using the tokenizer, converts it into input IDs, and then initializes an DynamicCache object to store key-value pairs, and then performs a forward pass through the model with caching enabled (use_cache=True). This populates the cache with the key-value pairs resulting from the model&#39;s computation.</p><p>The clean_up trims a DynamicCache object to match the original sequence length by removing any additional tokens added during processing. For each layer of the cache, it slices both the key and value tensors to retain only the first origin_len tokens along the sequence dimension.</p><pre>def get_kv_cache(model, tokenizer, prompt: str) -&gt; DynamicCache:<br>    device = model.model.embed_tokens.weight.device<br>    input_ids = tokenizer(prompt, return_tensors=&quot;pt&quot;).input_ids.to(device)<br>    cache = DynamicCache()<br><br>    with torch.no_grad():<br>        _ = model(<br>            input_ids=input_ids,<br>            past_key_values=cache,<br>            use_cache=True<br>        )<br>    return cache<br><br>def clean_up(cache: DynamicCache, origin_len: int):<br>    for i in range(len(cache.key_cache)):<br>        cache.key_cache[i] = cache.key_cache[i][:, :, :origin_len, :]<br>        cache.value_cache[i] = cache.value_cache[i][:, :, :origin_len, :]</pre><h4>Load LLM (Mistral)</h4><p>Now we’ll load the Mistral-7B model, and load the tokenizer and model in full precision or half precision (FP16) on GPU if available.</p><p>Remember to input YOUR_HF_TOKEN with your unique HuggingFace Token.</p><pre>model_name = &quot;mistralai/Mistral-7B-Instruct-v0.1&quot;<br>tokenizer = AutoTokenizer.from_pretrained(model_name, token=&quot;YOUR_HF_TOKEN&quot;, trust_remote_code=True)<br>model = AutoModelForCausalLM.from_pretrained(<br>    model_name,<br>    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,<br>    device_map=&quot;auto&quot;,<br>    trust_remote_code=True,<br>    token=&quot;YOUR_HF_TOKEN&quot;<br>)<br>device = &quot;cuda&quot; if torch.cuda.is_available() else &quot;cpu&quot;<br>model.to(device)<br>print(f&quot;Loaded {model_name}.&quot;)</pre><h4>Create a Knowledge Prompt from document.txt</h4><p>Next, we’ll read document.txt , which you can fill with information about yourself. For this tutorial, document.txt contains information about me (Ronan Takizawa).</p><p>Here we construct a simple system prompt embedding with the doc information and pass it to get_kv_cache to generate the KV cache.</p><pre>with open(&quot;document.txt&quot;, &quot;r&quot;, encoding=&quot;utf-8&quot;) as f:<br>    doc_text = f.read()<br><br>system_prompt = f&quot;&quot;&quot;<br>&lt;|system|&gt;<br>You are an assistant who provides concise factual answers.<br>&lt;|user|&gt;<br>Context:<br>{doc_text}<br>Question:<br>&quot;&quot;&quot;.strip()<br><br>ronan_cache = get_kv_cache(model, tokenizer, system_prompt)<br>origin_len = ronan_cache.key_cache[0].shape[-2]<br>print(&quot;KV cache built.&quot;)</pre><h4>Ask Questions Reusing the Cache</h4><p>We first run clean_up to clear our cache (Good practice for CAGs).</p><p>Next, we convert our questions into tokens in input_ids_q1 , then appended to the knowledge context stored in ronan_cache.</p><p>Finally, we call generate to produce the answer, decoding the final result with tokenizer.decode.</p><pre>question1 = &quot;Who is Ronan Takizawa?&quot;<br>clean_up(ronan_cache, origin_len)<br>input_ids_q1 = tokenizer(question1 + &quot;\n&quot;, return_tensors=&quot;pt&quot;).input_ids.to(device)<br>gen_ids_q1 = generate(model, input_ids_q1, ronan_cache)<br>answer1 = tokenizer.decode(gen_ids_q1[0], skip_special_tokens=True)<br>print(&quot;Q1:&quot;, question1)<br>print(&quot;A1:&quot;, answer1)</pre><p>You should expect a response like this:</p><pre>Q1: Who is Ronan Takizawa?<br>A1: Answer: Ronan Takizawa is an ambitious and accomplished <br>tech enthusiast. He has a diverse skill set in <br>software development, AI/ML...</pre><p>Now we will save the cache to disk then reload it to prove that the cache persists for multiple sessions.</p><pre># Save the cache to disk<br>clean_up(ronan_cache, origin_len)<br>cache_dir = &quot;cag_cache&quot;<br>os.makedirs(cache_dir, exist_ok=True)<br><br># Save the KV cache<br>torch.save(ronan_cache, os.path.join(cache_dir, &quot;ronan_knowledge.cache&quot;))<br><br># Load cache to prove context is preserved for multiple sessions<br>loaded_cache = torch.load(os.path.join(cache_dir, &quot;ronan_knowledge.cache&quot;))<br><br>question3 = &quot;What technologies has he worked with?&quot;<br>input_ids_q3 = tokenizer(question3 + &quot;\n&quot;, return_tensors=&quot;pt&quot;).input_ids.to(device)<br>gen_ids_q3 = generate(model, input_ids_q3, loaded_cache)<br>answer3 = tokenizer.decode(gen_ids_q3[0], skip_special_tokens=True)</pre><p>You should get a response tailored to the context again.</p><h3>Conclusion</h3><p><strong>Cache-augmented generation (CAG)</strong> simplifies AI architectures by storing small knowledge bases directly within a model’s context window, eliminating the need for retrieval loops in RAG and reducing latency. This approach enhances response speed and improves the responsiveness of an LLM with external knowledge. By leveraging CAG, developers can streamline their AI systems for faster and more efficient knowledge integration, particularly for tasks with stable, compact datasets.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=6ac35d415eec" width="1" height="1" alt=""><hr><p><a href="https://blog.gopenai.com/cache-augmented-generation-cag-in-llms-a-step-by-step-tutorial-6ac35d415eec">Cache-Augmented Generation (CAG) in LLMs: A Step-by-Step Tutorial</a> was originally published in <a href="https://blog.gopenai.com">GoPenAI</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>