<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Vishnu Sivan on Medium]]></title>
        <description><![CDATA[Stories by Vishnu Sivan on Medium]]></description>
        <link>https://medium.com/@codemaker2016?source=rss-b7a60e472dfc------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/0*dj_jwDIShpJxw6-T</url>
            <title>Stories by Vishnu Sivan on Medium</title>
            <link>https://medium.com/@codemaker2016?source=rss-b7a60e472dfc------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Tue, 19 May 2026 03:43:15 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@codemaker2016/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[A Practical Guide to Training AI Agents with Microsoft Agent Lightning]]></title>
            <link>https://codemaker2016.medium.com/a-practical-guide-to-training-ai-agents-with-microsoft-agent-lightning-52921b0ed8af?source=rss-b7a60e472dfc------2</link>
            <guid isPermaLink="false">https://medium.com/p/52921b0ed8af</guid>
            <category><![CDATA[reinforcement-learning]]></category>
            <category><![CDATA[ai-agents-in-action]]></category>
            <category><![CDATA[agent-lightning]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[large-language-models]]></category>
            <dc:creator><![CDATA[Vishnu Sivan]]></dc:creator>
            <pubDate>Sun, 22 Mar 2026 13:43:01 GMT</pubDate>
            <atom:updated>2026-03-22T13:52:09.952Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mdOtLEMkO7B0SCbUGK8QEg.png" /></figure><p>Artificial intelligence agents are rapidly transforming the way we interact with software. Powered by large language models (LLMs), these agents can answer questions, automate workflows, and integrate with external tools and data sources. However, training and improving AI agents — especially for complex, multi-step tasks — has traditionally been difficult. Developers often need to modify large portions of code, design custom training loops, and manage complex reinforcement learning pipelines.</p><p>Microsoft’s Agent Lightning framework aims to simplify this process. It introduces a new approach that separates how an agent operates from how it learns, allowing existing AI agents to be trained with reinforcement learning with minimal or even zero code changes. Instead of redesigning the agent architecture, developers can plug Agent Lightning into their current systems and allow the agent to improve through real-world interactions.</p><p>In this article, we explore how to build and train an AI agent using the Agent Lightning framework.</p><h3>Getting Started</h3><h4>Table of contents</h4><ul><li><a href="#8435">What is Agent Lightning</a></li><li><a href="#96fc">Why Agent Lightning Matters</a></li><li><a href="#7dc3">Three-Component Architecture of Agent Lightning</a></li><li><a href="#c55b">How Agent Lightning Works</a></li><li><a href="#822e">Hands-on 1: Manual Prompt Search with AgentLightning and OpenAI</a></li><li><a href="#4184">Hands-on 2: Building a Trainable LLM Agent with AgentLightning</a></li><li><a href="#3830">Hands-on 3: Sentiment Analysis Agent with AgentLightning</a></li><li><a href="#a1da">Hands-on 4: LangGraph SQL Agent with AgentLightning</a></li></ul><h4>What is Agent Lightning?</h4><p>Agent Lightning is an open-source framework developed by Microsoft that enables AI agents to be trained and optimized using reinforcement learning (RL) based on their real-world execution behavior.</p><p>Traditionally, improving an AI agent requires modifying the agent’s internal logic or redesigning parts of its architecture. Agent Lightning solves this problem by introducing an external training layer that observes how an agent behaves during execution. It captures the agent’s actions, states, and outcomes, and uses this data to improve the agent’s performance over time. This approach allows existing agents to become self-learning systems without rewriting their core logic, which is especially valuable in production environments.</p><p>At a conceptual level, Agent Lightning treats an agent’s execution as a Markov Decision Process (MDP). During each step of a task, the agent is in a specific state, generates an action (typically an LLM output), and receives a reward depending on whether the action helps achieve the task goal. These rewards become learning signals that guide the reinforcement learning process.</p><p>One of the biggest advantages of Agent Lightning is its framework-agnostic design. It can be integrated with agents built using popular frameworks such as LangChain, OpenAI Agents SDK, AutoGen, CrewAI, LangGraph, or even custom Python-based agents. In most cases, integration requires minimal or near-zero code changes.</p><p>The framework typically consists of a Python SDK and a training server. Developers wrap their existing agent logic in a lightweight interface (such as a LitAgent class), define how to evaluate the agent’s output using a reward function, and start the training process. Agent Lightning then collects execution traces, processes them through its hierarchical reinforcement learning algorithm (LightningRL), and updates the model or prompt configuration to improve the agent’s performance.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FPRWvU7DQuZI%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DPRWvU7DQuZI&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FPRWvU7DQuZI%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/d71655f466e7dbf7f9fda5cc1eaee323/href">https://medium.com/media/d71655f466e7dbf7f9fda5cc1eaee323/href</a></iframe><p><strong>Key Features</strong></p><ul><li><strong>Framework Agnostic</strong> — Compatible with major agent frameworks such as LangChain, OpenAI Agents SDK, AutoGen, CrewAI, and custom Python-based agents.</li><li><strong>Reinforcement Learning Optimization</strong> — Improves agent performance over time by applying reinforcement learning to learn from successes, failures, and feedback signals.</li><li><strong>Multiple Training Methods</strong> — Supports reinforcement learning, automatic prompt optimization, and supervised fine-tuning.</li><li><strong>Multi-Agent Coordination</strong> — Enables optimization and collaboration across multiple agents within complex multi-agent systems.</li><li><strong>Execution Monitoring &amp; Error Tracking</strong> — Uses the Lightning Server to monitor agent execution, detect errors, and track performance.</li><li><strong>Selective Optimization</strong> — Allows developers to optimize specific agents or components within a larger system.</li><li><strong>Flexible and Extensible Architecture</strong> — Provides open interfaces for customizing algorithms, reward strategies, and training workflows.</li></ul><p><strong>Core Components</strong></p><ul><li><strong>LightningStore</strong> — Central system that stores tasks, resources, and execution traces.</li><li><strong>Tracer</strong> — Captures structured data such as prompts, tool calls, and reward signals.</li><li><strong>Algorithm Engine</strong> — Processes traces and learns improved strategies or prompts.</li><li><strong>Trainer</strong> — Orchestrates the training workflow and updates the inference system.</li></ul><h4>Why Agent Lightning Matters</h4><p>Many popular agent frameworks such as LangChain, LangGraph, CrewAI, and AutoGen enable developers to build powerful AI agents capable of reasoning step-by-step and interacting with tools. However, most of these systems operate using static prompts, fixed workflows, and unchanged model parameters. As a result, the agents do not learn from their past interactions or improve automatically over time.</p><p>This limitation becomes a major challenge in real-world applications where tasks are complex and environments constantly change. Developers often need to manually refine prompts, adjust logic, or redesign workflows, which becomes difficult to maintain as systems scale.</p><p>Agent Lightning addresses this gap by introducing an automated learning pipeline powered by reinforcement learning. By enabling agents to learn continuously from real-world usage, Agent Lightning transforms traditional static agents into adaptive systems. This capability is particularly valuable for enterprise workflows, long-running automation systems, and multi-step processes, where reliability and accuracy improve only through repeated execution and feedback.</p><h4>Three-Component Architecture of Agent Lightning</h4><p>Agent Lightning consists of two main components: the Lightning Server and the Lightning Client. Together, they act as a lightweight intermediate layer that connects agent frameworks with LLM training systems. The framework exposes an OpenAI-compatible LLM API within the training infrastructure, enabling existing agents to integrate with the training system without modifying their original code.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/875/0*oustd5X0VJq787mU.png" /><figcaption><a href="https://www.microsoft.com/en-us/research/project/agent-lightning/">Agent Lightning — Microsoft Research</a></figcaption></figure><p>The architecture of Agent Lightning is built around three core components that work together to create a continuous training loop for AI agents. These components manage learning, execution, and coordination, allowing agents to improve automatically through real-world interactions.</p><p><strong>1. Algorithm — The Learning Engine<br></strong>The Algorithm component is responsible for training and optimizing the agent. It analyzes how agents perform during task execution and uses that information to improve their behavior over time.</p><p>Its main responsibilities include:</p><ul><li>Assigning tasks (rollouts) for agents to execute</li><li>Analyzing execution traces collected from completed tasks</li><li>Learning from agent actions and outcomes</li><li>Updating resources such as model parameters or prompt templates</li></ul><p>Agent Lightning supports pluggable algorithms, allowing developers to choose or implement different optimization strategies. Built-in options include:</p><ul><li><strong>APO (Automatic Prompt Optimization) </strong>— Improves prompt templates using textual gradients and search strategies</li><li><strong>VERL-based Reinforcement Learning</strong> — Uses RL techniques such as PPO or GRPO to train agent policies</li><li><strong>Custom Algorithms </strong>— Developers can implement their own optimization logic through the Algorithm interface.</li></ul><p><strong>2. Runner — The Execution Worker<br></strong>The Runner is responsible for executing agent tasks and collecting telemetry data. It retrieves tasks from the system, runs the agent using the latest resources, and records detailed execution traces.</p><p>Key responsibilities include:</p><ul><li>Fetching tasks from the queue</li><li>Loading the latest models or prompt configurations</li><li>Running the agent to complete tasks</li><li>Capturing execution traces through a tracer system</li><li>Sending collected data back for training</li></ul><p>The default LitAgentRunner works with agents wrapped as LitAgent instances, but developers can create custom runners for specialized execution environments.</p><p><strong>3. LightningStore — The Coordination Hub<br></strong>The LightningStore acts as the central system that coordinates all components. It manages tasks, stores execution traces, and keeps track of resource versions used during training.</p><p>Its main functions include:</p><ul><li>Managing the task queue for agent rollouts</li><li>Storing detailed execution traces (spans)</li><li>Tracking versions of models, prompts, and resources</li><li>Handling retries and failure management during training</li></ul><p>Agent Lightning provides multiple storage implementations such as InMemoryLightningStore, SqliteLightningStore, and MongoLightningStore, enabling the framework to scale from local experiments to distributed production environments.</p><h4>How Agent Lightning Works</h4><p>Agent Lightning acts as middleware between reinforcement learning (RL) algorithms and agent environments, providing standardized interfaces that allow scalable training and coordination across different system components.</p><ul><li>The Agent Runner manages agents as they execute tasks. It distributes work, monitors progress, and collects execution results and traces.</li><li>The Algorithm component handles the training process and hosts the LLMs used for both inference and optimization.</li><li>The LightningStore serves as the system’s central data repository.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*QuTfE2SCNTW_d1to.png" /><figcaption><a href="https://www.microsoft.com/en-us/research/blog/agent-lightning-adding-reinforcement-learning-to-ai-agents-without-code-rewrites/">Agent Lightning: Adding reinforcement learning to AI agents without code rewrites</a></figcaption></figure><p><strong>Execution flow</strong></p><ol><li><strong>Task Execution<br></strong>The Lightning Server retrieves tasks from a task pool and sends them to the agent. The agent then attempts to complete the task using its native workflow, which may include tool usage, multi-turn interactions, or coordination with other agents.</li></ol><p><strong>2. Trace Collection<br></strong>Agent Lightning uses a sidecar-based monitoring approach to capture execution data without interfering with the agent’s internal logic. During each run, the system collects structured telemetry such as execution traces, agent actions, errors and reward signals. These traces are converted into state–action–reward–next state transitions, which represent the agent’s decision-making steps during the task.</p><p><strong>3. Training and Optimization Loop<br></strong>The collected traces are organized into training data and processed by a reinforcement learning framework (such as VERL). RL algorithms like GRPO analyze the agent’s behavior and update resources such as prompt templates or model weights.</p><p>The updated configuration is then used in the next task cycle, creating a continuous feedback loop where the agent improves its performance through repeated execution and learning. Agent Lightning also supports intermediate reward signals, allowing smaller rewards for successful steps within a task to accelerate training.</p><h4>Basic Integration</h4><p>Integrating Agent Lightning into an existing agent is simple and requires minimal changes to the original code. The core idea is to add lightweight hooks that capture the agent’s actions and observations during execution, without modifying its internal logic.</p><p>By using Agent Lightning’s helper functions, the agent can report key events such as inputs and outputs, enabling the training system to collect data for optimization.</p><pre>import agentlightning as agl<br><br># Existing agent logic remains unchanged<br>def your_existing_agent_function(query):<br>    # Capture the input action<br>    agl.emit_action(&quot;user_query&quot;, query)<br>    <br>    # Execute agent logic<br>    response = your_agent.process(query)<br>    <br>    # Capture the output observation<br>    agl.emit_observation(&quot;agent_response&quot;, response)<br>    <br>    return response</pre><h3>Hands-on 1: <strong>Manual Prompt Search with AgentLightning and OpenAI</strong></h3><p>In this hands-on tutorial, we will build and train a simple AI agent using Agent Lightning in Google Colab. The goal is to demonstrate how reinforcement learning can improve an agent’s performance over time through real interactions.</p><p>We will set up both the Lightning Server and Client, create a basic QA agent, and connect it to the training pipeline. As the agent executes tasks, Agent Lightning will collect feedback and continuously refine its behavior, making it progressively smarter.</p><p>Agent Lightning supports multiple learning algorithms, including:</p><ul><li><strong>APO (Automatic Prompt Optimization)</strong> — requires additional libraries such as POML</li><li><strong>VERL (Reinforcement Learning)</strong> — integrates with frameworks like PyTorch and vLLM, and requires GPU support</li></ul><p>Note that VERL setup may take 20–40 minutes, depending on dependencies and environment configuration.</p><h4>Setting Up the Environment</h4><p>For this tutorial, we will use Google Colab with GPU support to ensure efficient training.</p><ol><li>Open Google Colab and sign in with your Google account</li><li>Create a new notebook</li><li>Go to Runtime → Change runtime type</li></ol><ul><li>Set Hardware Accelerator to GPU</li><li>Select T4 GPU (recommended)</li></ul><p>4. Click Save</p><h4>Installing dependencies</h4><ul><li>Install the required libraries using the following command.</li></ul><pre>!pip install agentlightning</pre><ul><li>Build a simple qa agent and enable training using Agent Lightning.</li></ul><pre>import os, asyncio, nest_asyncio, logging<br>from getpass import getpass<br><br>os.environ[&quot;AGENTOPS_DISABLE_AUTO_INSTRUMENTATION&quot;] = &quot;true&quot;<br>os.environ[&quot;OPENAI_API_KEY&quot;] = &quot;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&quot;<br><br>logging.getLogger(&quot;agentlightning.tracer.otel&quot;).setLevel(logging.ERROR)<br>logging.getLogger(&quot;opentelemetry.trace&quot;).setLevel(logging.ERROR)<br><br>from agentlightning import (<br>    LitAgent, LitAgentRunner, OtelTracer,<br>    emit_reward, NamedResources, Rollout,<br>    InMemoryLightningStore,<br>)<br>from agentlightning.types import PromptTemplate<br>from typing import Any, Dict, Optional<br>import openai<br><br>nest_asyncio.apply()<br>MODEL = os.getenv(&quot;MODEL&quot;, &quot;gpt-4o-mini&quot;)<br><br>_reward_log: list[float] = []<br><br>class QAAgent(LitAgent):<br>    def rollout(self, task: Dict[str, Any], resources: NamedResources, rollout: Rollout) -&gt; Optional[float]:<br>        sys_prompt = resources[&quot;system_prompt&quot;].template<br>        user = task[&quot;prompt&quot;]<br>        gold = task.get(&quot;answer&quot;, &quot;&quot;).strip().lower()<br><br>        try:<br>            r = openai.chat.completions.create(<br>                model=MODEL,<br>                messages=[<br>                    {&quot;role&quot;: &quot;system&quot;, &quot;content&quot;: sys_prompt},<br>                    {&quot;role&quot;: &quot;user&quot;,   &quot;content&quot;: user},<br>                ],<br>                temperature=0.2,<br>            )<br>            pred = r.choices[0].message.content.strip()<br>        except Exception as e:<br>            pred = f&quot;[error] {e}&quot;<br><br>        def score(pred: str, gold: str) -&gt; float:<br>            P       = pred.lower()<br>            base    = 1.0 if gold and gold in P else 0.0<br>            gt      = set(gold.split()); pr = set(P.split())<br>            inter   = len(gt &amp; pr); denom = (len(gt) + len(pr)) or 1<br>            overlap = 2 * inter / denom<br>            brevity = 0.2 if base == 1.0 and len(P.split()) &lt;= 8 else 0.0<br>            return max(0.0, min(1.0, 0.7 * base + 0.25 * overlap + brevity))<br><br>        reward = float(score(pred, gold))<br>        emit_reward(reward)<br>        _reward_log.append(reward)<br>        print(f&quot;  Q: {user!r:45s} | Pred: {pred!r:20s} | Gold: {gold!r:12s} | R: {reward:.3f}&quot;)<br>        return reward<br><br>TASKS = [<br>    {&quot;prompt&quot;: &quot;Capital of France?&quot;,             &quot;answer&quot;: &quot;Paris&quot;},<br>    {&quot;prompt&quot;: &quot;Who wrote Pride and Prejudice?&quot;, &quot;answer&quot;: &quot;Jane Austen&quot;},<br>    {&quot;prompt&quot;: &quot;2+2 = ?&quot;,                        &quot;answer&quot;: &quot;4&quot;},<br>]<br><br>PROMPTS = [<br>    &quot;You are a terse expert. Answer with only the final fact, no sentences.&quot;,<br>    &quot;You are a helpful, knowledgeable AI. Prefer concise, correct answers.&quot;,<br>    &quot;Answer as a rigorous evaluator; return only the canonical fact.&quot;,<br>    &quot;Be a friendly tutor. Give the one-word answer if obvious.&quot;,<br>]<br><br>async def run_prompt_search():<br>    store  = InMemoryLightningStore()<br>    agent  = QAAgent()<br>    tracer = OtelTracer()<br>    runner = LitAgentRunner(tracer=tracer)<br><br>    results = []<br><br>    for sp in PROMPTS:<br>        print(f&quot;\n{&#39;=&#39;*60}&quot;)<br>        print(f&quot;Prompt: {sp}&quot;)<br>        print(&#39;=&#39;*60)<br><br>        await store.update_resources(<br>            resources_id=&quot;default&quot;,<br>            resources={&quot;system_prompt&quot;: PromptTemplate(template=sp, engine=&quot;f-string&quot;)}<br>        )<br>        _reward_log.clear()<br><br>        with runner.run_context(agent=agent, store=store):<br>            for t in TASKS:<br>                await runner.step(t)<br><br>        avg = sum(_reward_log) / len(_reward_log) if _reward_log else 0.0<br>        print(f&quot;\n  → Prompt avg: {avg:.3f}&quot;)<br>        results.append((sp, avg))<br><br>    best = max(results, key=lambda x: x[1])<br>    print(f&quot;\n{&#39;=&#39;*60}&quot;)<br>    print(f&quot;BEST PROMPT : {best[0]}&quot;)<br>    print(f&quot;BEST SCORE  : {best[1]:.3f}&quot;)<br><br>asyncio.run(run_prompt_search())</pre><p>Here’s a step-by-step walkthrough of what the code does:</p><p>1. Setup &amp; Imports Configures API keys, suppresses noisy logs, applies nest_asyncio (needed to run asyncio inside Jupyter), and imports AgentLightning components.</p><p>2. QAAgent — the core agent Subclasses LitAgent and implements rollout(), which is called once per task:</p><ul><li>Pulls the current system prompt from resources[&quot;system_prompt&quot;]</li><li>Calls OpenAI with that prompt + the task’s question</li><li>Scores the prediction against the gold answer using a custom score() function that combines exact match, token overlap, and a brevity bonus</li><li>Emits the reward back to AgentLightning via emit_reward() and logs it to _reward_log</li></ul><p>3. TASKS and PROMPTS TASKS is a small 3-question QA dataset. PROMPTS is a list of 4 candidate system prompts with different styles (terse, helpful, rigorous, friendly) — these are what we&#39;re trying to compare.</p><p>4. run_prompt_search() — the search loop This is the main async function that runs a manual prompt search:</p><ul><li>Creates an InMemoryLightningStore, a QAAgent, and a LitAgentRunner</li><li>Loops over each candidate prompt, updates it in the store, then runs all 3 tasks through the agent</li><li>Collects the average reward for each prompt</li><li>At the end, picks and prints the best-scoring prompt</li></ul><p>5. Entry point asyncio.run(run_prompt_search()) kicks everything off. The result is a ranked comparison of which system prompt style gets the highest average reward across the 3 QA tasks.</p><p>6. Run the cell to display the output.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Vg4-aUrh-nFZFUbP9592CQ.png" /></figure><h3>Hands-on 2: Building a Trainable LLM Agent with AgentLightning</h3><p>The hands-on walks you through building a minimal but complete trainable LLM agent using the AgentLightning framework. You’ll learn how agents receive tasks, call an LLM, score their own outputs via a reward function, and report results back to a training loop — the core cycle behind reinforcement learning from human/environment feedback (RLHF-style training).</p><p><strong>Part 1 — </strong><strong>compute_reward()</strong></p><pre>def compute_reward(output: str, expected: str) -&gt; float:<br>    &quot;&quot;&quot;<br>    Simple reward logic. +1.0 for correct answer, -1.0 for wrong answer.<br>    &quot;&quot;&quot;<br>    return 1.0 if output.strip().lower() == expected.strip().lower() else -1.0</pre><p>The simplest possible reward function. It does an exact string match between the agent’s output and the expected answer, returning +1.0 for a correct answer and -1.0 for a wrong one. In real-world agents this could be a more nuanced scorer (e.g. F1, BLEU, SQL execution match).</p><p><strong>Part 2 — </strong><strong>SimpleAgent</strong></p><pre>import agentlightning as agl<br>from agentlightning import LitAgent, emit_reward, NamedResources, Rollout<br>from openai import OpenAI<br>from typing import Any, Dict, Optional<br><br>class SimpleAgent(LitAgent):<br>    &quot;&quot;&quot;<br>    A LitAgent that uses an LLM resource (like LitSQLAgent) instead of<br>    hardcoded rule-based logic.<br>    &quot;&quot;&quot;<br><br>    def __init__(self, system_prompt: str = &quot;You are a helpful assistant.&quot;):<br>        super().__init__()<br>        self.system_prompt = system_prompt<br><br>    def rollout(self, task: Dict[str, Any], resources: NamedResources, rollout: Rollout) -&gt; Optional[float]:<br>        # 1. Extract the LLM resource (same as LitSQLAgent)<br>        llm: agl.LLM = resources[&quot;main_llm&quot;]<br><br>        # 2. Build the OpenAI client using the LLM resource endpoint<br>        client = OpenAI(<br>            base_url=llm.endpoint,<br>            api_key=llm.api_key,<br>        )<br><br>        # 3. Call the LLM (instead of hardcoded if/elif logic)<br>        question = task.get(&quot;input&quot;, &quot;&quot;)<br>        response = client.chat.completions.create(<br>            model=llm.model,<br>            messages=[<br>                {&quot;role&quot;: &quot;system&quot;, &quot;content&quot;: self.system_prompt},<br>                {&quot;role&quot;: &quot;user&quot;,   &quot;content&quot;: question},<br>            ],<br>            **(llm.sampling_parameters or {}),<br>        )<br>        output = response.choices[0].message.content.strip()<br><br>        # 4. Compute reward and emit it (same as LitSQLAgent)<br>        expected = task.get(&quot;expected_output&quot;, &quot;&quot;)<br>        reward_value = compute_reward(output, expected)<br>        emit_reward(reward_value)<br><br>        print(f&quot;[Agent] Input:    {question}&quot;)<br>        print(f&quot;[Agent] Output:   {output}&quot;)<br>        print(f&quot;[Agent] Expected: {expected}&quot;)<br>        print(f&quot;[Agent] Reward:   {reward_value}&quot;)<br>        print(&quot;-&quot; * 40)<br><br>        return float(reward_value)</pre><p>The agent subclasses LitAgent and implements one required method: rollout(). This is called once per task during training or evaluation. Inside it:</p><ul><li><strong>Extracts the LLM resource</strong> from resources[&quot;main_llm&quot;] — this is injected by the runner, not hardcoded</li><li><strong>Builds an OpenAI client</strong> using the endpoint and API key from that resource</li><li><strong>Calls the LLM</strong> with a system prompt + the task’s question</li><li><strong>Scores the response</strong> via compute_reward() and calls emit_reward() to report it back to AgentLightning&#39;s training loop</li><li><strong>Returns the reward</strong> as a float for immediate use</li></ul><p><strong>Part 3 — Training Data and LLM Resource</strong></p><pre>import os<br>os.environ[&quot;AGENTOPS_DISABLE_AUTO_INSTRUMENTATION&quot;] = &quot;true&quot;<br><br>import asyncio<br>import nest_asyncio<br>nest_asyncio.apply()<br><br>import agentlightning as agl<br>from agentlightning import (<br>    LitAgentRunner,<br>    InMemoryLightningStore,<br>    OtelTracer,<br>    LLM,<br>    Trainer,<br>)<br><br># Training data: list of task inputs your agent will be tested on<br>TRAINING_DATA = [<br>    {&quot;input&quot;: &quot;What is 2+2?&quot;,   &quot;expected_output&quot;: &quot;2 + 2 equals 4.&quot;},<br>    {&quot;input&quot;: &quot;Who is Newton?&quot;, &quot;expected_output&quot;: &quot;Newton is the father of classical mechanics.&quot;},<br>    {&quot;input&quot;: &quot;Calculate 2+2&quot;,  &quot;expected_output&quot;: &quot;2 + 2 equals 4.&quot;},<br>]<br><br># Dummy LLM resource — required by the runner even for rule-based agents.<br># Replace endpoint/model/api_key with real values if you use an actual LLM.<br><br>def get_llm_resource(temperature: float = 0.7) -&gt; agl.LLM:<br>    import os<br>    os.environ[&quot;OPENAI_API_KEY&quot;] = &quot;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&quot;<br><br>    return agl.LLM(<br>        endpoint=&quot;https://api.openai.com/v1&quot;,<br>        model=&quot;gpt-4o-mini&quot;,<br>        api_key=os.environ.get(&quot;OPENAI_API_KEY&quot;, &quot;&quot;),<br>        sampling_parameters={&quot;temperature&quot;: temperature},<br>    )</pre><p>TRAINING_DATA is a list of task dictionaries — each has an input (what the agent sees) and expected_output (what it should produce, used by the reward function).</p><p>get_llm_resource() creates an agl.LLM object that wraps the model config (endpoint, model name, API key, sampling parameters). This is passed into the store so the agent can access it at runtime without hardcoding credentials inside the agent class.</p><p><strong>Part 4 — </strong><strong>run_step_test() and </strong><strong>run_trainer()</strong></p><pre>def run_step_test():<br>    &quot;&quot;&quot;<br>    Directly tests the agent by calling runner.step() for each training sample.<br>    This bypasses the full Trainer loop and is the easiest way to verify<br>    that agent.py and reward.py are connected and working correctly.<br>    &quot;&quot;&quot;<br>    print(&quot;=&quot; * 40)<br>    print(&quot;Running step-by-step test...&quot;)<br>    print(&quot;=&quot; * 40)<br><br>    tracer = OtelTracer()<br>    store  = InMemoryLightningStore()<br>    agent  = SimpleAgent()<br>    runner = LitAgentRunner(tracer=tracer)<br><br>    async def _run():<br>        # Seed the store with resources before running - the runner requires this<br>        await store.update_resources(<br>            resources_id=&quot;default&quot;, <br>            resources={&quot;main_llm&quot;: get_llm_resource(temperature=0.7)}<br>        )<br><br>        with runner.run_context(agent=agent, store=store):<br>            for task in TRAINING_DATA:<br>                rollout = await runner.step(task)<br>                print(f&quot;[Rollout status] {rollout.status}\n&quot;)<br><br>    asyncio.run(_run())<br>    print(&quot;Step test completed successfully!&quot;)<br><br>def run_trainer():<br>    &quot;&quot;&quot;<br>    Full Trainer-based training loop.<br>    Uses dev=True for a dry-run. Switch to dev=False and trainer.fit()<br>    for real training with an algorithm.<br>    &quot;&quot;&quot;<br>    print(&quot;=&quot; * 40)<br>    print(&quot;Running Trainer (dev mode)...&quot;)<br>    print(&quot;=&quot; * 40)<br><br>    trainer = Trainer(<br>        runner=LitAgentRunner,<br>        dev=True,<br>        initial_resources={&quot;main_llm&quot;: get_llm_resource(temperature=0.7)},<br>    )<br>    # Instantiate the agent<br>    agent = SimpleAgent()<br>    trainer.dev(agent=agent, train_dataset=TRAINING_DATA, val_dataset=TRAINING_DATA)<br>    print(&quot;Trainer dev run completed successfully!&quot;)<br><br>if __name__ == &quot;__main__&quot;:<br>    # Step 1: Test agent + reward wiring directly (recommended first)<br>    run_step_test()<br><br>    # Step 2: Uncomment below to test via full Trainer loop<br>    run_trainer()</pre><p><strong>run_step_test </strong>is the recommended first test — it bypasses the full Trainer and directly calls runner.step() for each task. This lets you verify that the agent, LLM, and reward function are all wired correctly before introducing the complexity of a training algorithm. The flow is:</p><ul><li>Seed the store with the LLM resource</li><li>Open a run_context that binds the agent and store to the runner</li><li>Step through each task one by one, printing the rollout status</li></ul><p><strong>run_trainer </strong>is the full Trainer-based loop. dev=True runs a dry-run — it exercises the full pipeline without running a real optimization algorithm, which is useful for integration testing. Switching to dev=False and calling trainer.fit() with a real algorithm (like APO) would enable actual prompt or model optimization.</p><p><strong>Part 5 — Execute the code</strong></p><p>Run the cells sequentially to display the output.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*YQOnpZokf3w1mWtoN8B-IA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4n0a4QgcHycabZWyMSqlmA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*X-qhQSRP4RYI63h1AjSn8w.png" /></figure><h3>Hands-on 3: Sentiment Analysis Agent with AgentLightning</h3><p>This hands-on demonstrates how to build a prompt-optimizing sentiment analysis agent using AgentLightning. The agent classifies text as “positive” or “negative” using an LLM, then automatically tests multiple prompt styles to find which one performs best — all without writing a training loop from scratch.</p><p><strong>Step 1 — Install &amp; Setup</strong></p><pre>!pip install poml</pre><p>Installs poml, a dependency required by AgentLightning&#39;s APO (Automatic Prompt Optimization) module.</p><p><strong>Step 2 — Imports &amp; Environment</strong></p><p>Sets the OpenAI key, disables AgentOps auto-instrumentation (avoids TracerProvider conflicts), and silences noisy but harmless OpenTelemetry warnings.</p><pre>import os<br>import agentlightning as agl<br>from openai import OpenAI, AsyncOpenAI<br>from agentlightning import LitAgent, LitAgentRunner, OtelTracer, emit_reward, NamedResources, Rollout, InMemoryLightningStore<br>from typing import TypedDict<br>import logging<br>import asyncio<br>import nest_asyncio<br>nest_asyncio.apply()<br><br>os.environ[&quot;OPENAI_API_KEY&quot;] = &quot;...&quot;<br>os.environ[&quot;AGENTOPS_DISABLE_AUTO_INSTRUMENTATION&quot;] = &quot;true&quot;<br>logging.getLogger(&quot;agentlightning.tracer.otel&quot;).setLevel(logging.ERROR)</pre><p><strong>Step 3 — Create agents</strong></p><p>The agent subclasses LitAgent and implements rollout(), which runs once per task. It pulls the LLM and prompt template from resources, calls OpenAI, compares the prediction to the gold label, and reports a binary reward (1.0 correct, 0.0 wrong).</p><pre># ── Agent ─────────────<br>class SentimentAgent(LitAgent):<br>    def rollout(self, task: Dict[str, Any], resources: NamedResources, rollout: Rollout) -&gt; Optional[float]:<br>        # Pull resources exactly like QAAgent pulls &quot;system_prompt&quot;<br>        llm         = resources[&quot;main_llm&quot;]<br>        prompt_tmpl = resources[&quot;prompt_template&quot;].template<br><br>        # Format the prompt with the task text<br>        prompt = prompt_tmpl.format(**task)<br><br>        # Build client from resource<br>        client = OpenAI(<br>            base_url=llm.endpoint,<br>            api_key=llm.api_key,<br>        )<br><br>        response = client.chat.completions.create(<br>            model=llm.model,<br>            messages=[{&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: prompt}],<br>            **(llm.sampling_parameters or {}),<br>        )<br><br>        pred = response.choices[0].message.content.strip().lower().rstrip(&quot;.&quot;)<br>        gold = task[&quot;expected_label&quot;].lower()<br>        reward = 1.0 if pred == gold else 0.0<br><br>        emit_reward(reward)<br>        print(f&quot;  Text: {task[&#39;text&#39;]!r:40s} | Pred: {pred!r:12s} | Gold: {gold!r:12s} | R: {reward:.1f}&quot;)<br>        return reward<br><br># ── Data ──────────────────────────<br>TRAIN_TASKS = [<br>    {&quot;text&quot;: &quot;I love this product!&quot;,                &quot;expected_label&quot;: &quot;positive&quot;},<br>    {&quot;text&quot;: &quot;This is terrible and disappointing.&quot;, &quot;expected_label&quot;: &quot;negative&quot;},<br>    {&quot;text&quot;: &quot;An excellent experience overall.&quot;,    &quot;expected_label&quot;: &quot;positive&quot;},<br>    {&quot;text&quot;: &quot;I will never buy this again.&quot;,        &quot;expected_label&quot;: &quot;negative&quot;},<br>]<br><br>PROMPTS = [<br>    &#39;Classify the sentiment. Reply with exactly one word: positive or negative.\nText: &quot;{text}&quot;\nSentiment:&#39;,<br>    &#39;Is the following text positive or negative? Reply with one word only.\nText: &quot;{text}&quot;&#39;,<br>    &#39;Sentiment analysis: respond only with &quot;positive&quot; or &quot;negative&quot;.\nInput: &quot;{text}&quot;&#39;,<br>]<br><br># ── Resources ────────────────────────────────<br>def make_resources(prompt_template: str) -&gt; dict:<br>    return {<br>        &quot;main_llm&quot;: agl.LLM(<br>            model=&quot;gpt-4o-mini&quot;,<br>            endpoint=&quot;https://api.openai.com/v1&quot;,<br>            api_key=os.environ[&quot;OPENAI_API_KEY&quot;],<br>            sampling_parameters={&quot;temperature&quot;: 0.2},<br>        ),<br>        &quot;prompt_template&quot;: PromptTemplate(template=prompt_template, engine=&quot;f-string&quot;),<br>    }<br><br># ── Prompt search loop ──────────────────<br>async def run_sentiment_search():<br>    store  = InMemoryLightningStore()<br>    agent  = SentimentAgent()<br>    tracer = OtelTracer()<br>    runner = LitAgentRunner(tracer=tracer)<br><br>    results = []<br><br>    for prompt_tmpl in PROMPTS:<br>        print(f&quot;\n{&#39;=&#39;*60}&quot;)<br>        print(f&quot;Prompt: {prompt_tmpl[:60]}...&quot;)<br>        print(&#39;=&#39;*60)<br><br>        await store.update_resources(<br>            resources_id=&quot;default&quot;,<br>            resources=make_resources(prompt_tmpl)<br>        )<br><br>        reward_log = []<br>        with runner.run_context(agent=agent, store=store):<br>            for task in TRAIN_TASKS:<br>                result = await runner.step(task)<br>                if result and hasattr(result, &quot;reward&quot;):<br>                    reward_log.append(result.reward)<br><br>        avg = sum(reward_log) / len(reward_log) if reward_log else 0.0<br>        print(f&quot;\n  → Prompt avg: {avg:.3f}&quot;)<br>        results.append((prompt_tmpl, avg))<br><br>    best = max(results, key=lambda x: x[1])<br>    print(f&quot;\n{&#39;=&#39;*60}&quot;)<br>    print(f&quot;BEST PROMPT : {best[0]}&quot;)<br>    print(f&quot;BEST SCORE  : {best[1]:.3f}&quot;)<br><br>asyncio.run(run_sentiment_search())</pre><p><strong>Step 4 — Execute the code</strong></p><p>Run the cells sequentially to display the output.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1023/1*-2vxRkE-y9BtkTB2O-VePw.png" /></figure><h3>Hands-on 4: LangGraph SQL Agent with AgentLightning</h3><p>A notable example from Microsoft’s research highlights the use of Agent Lightning to train AI agents that can generate and iteratively refine SQL queries.</p><p>This hands-on builds a Text-to-SQL agent that takes natural language questions, generates SQLite queries using an LLM, executes them, and automatically retries on errors. It uses LangGraph for the query-generate-execute-fix loop and AgentLightning for training orchestration.</p><p><strong>Step 1 — Downloading the Dataset</strong></p><p>Download the Spider dataset from Google Drive and extract it into the working directory to prepare it for use in the tutorial.</p><pre>!gdown --fuzzy https://drive.google.com/file/d/1oi9J1jZP9TyM35L85CL3qeGWl2jqlnL6/view<br>!unzip -q spider-data.zip -d data &amp;&amp; rm spider-data.zip</pre><p><strong>Step 2 — Install dependencies</strong></p><p>Install all the dependencies such as agentlightning[verl] ,torch ,flash-attn , langchain, langchain-openai, langchain-community, langchain-text-splitters, and faiss-cpu .</p><pre>!pip install agentlightning[verl]<br><br># For Colab/Kaggle (CUDA 12.1):<br>!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121<br><br># For CPU-only environments:<br># !pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu<br><br>!pip install flash-attn --no-build-isolation<br>!pip install pyairports --break-system-packages<br><br>!pip install langchain<br>!pip install langchain-openai<br>!pip install langchain-community<br>!pip install langchain-text-splitters<br>!pip install faiss-cpu</pre><p><strong>Step 3— Import libraries</strong></p><pre>import os<br>import sqlite3<br>from typing import TypedDict<br>from langgraph.graph import StateGraph<br>from langchain_openai import ChatOpenAI<br>from langchain_core.messages import HumanMessage, SystemMessage<br>import pandas as pd<br>import agentlightning as agl<br>import nest_asyncio<br>nest_asyncio.apply()<br><br>os.environ[&quot;AGENTOPS_DISABLE_AUTO_INSTRUMENTATION&quot;] = &quot;true&quot;<br>os.environ[&quot;OPENAI_API_BASE&quot;] = &quot;https://api.openai.com/v1&quot;<br>os.environ[&quot;OPENAI_API_KEY&quot;] = &quot;...&quot;<br><br># Silence noisy but harmless warnings<br>import logging<br>logging.getLogger(&quot;agentlightning.tracer.otel&quot;).setLevel(logging.ERROR)<br>logging.getLogger(&quot;agentlightning.tracer.agentops&quot;).setLevel(logging.ERROR)<br>logging.getLogger(&quot;opentelemetry.trace&quot;).setLevel(logging.ERROR)</pre><p><strong>Step 4— Define state schema</strong></p><p>Defines the data that flows between nodes in the LangGraph graph. Every node reads from and writes to this shared state dict. get_schema extracts the table definitions so the LLM knows what columns exist. run_sql executes a query and returns results or an error string — the error string is what triggers the retry loop.</p><pre># ── State schema (required by StateGraph) ──────────────────────<br>class SQLState(TypedDict):<br>    question:    str<br>    schema:      str<br>    query:       str<br>    result:      str<br>    error:       str<br>    attempts:    int<br><br># ── SQL helpers ───────────────────<br>def get_schema(database_path: str) -&gt; str:<br>    conn = sqlite3.connect(database_path.replace(&quot;sqlite:///&quot;, &quot;&quot;))<br>    cursor = conn.cursor()<br>    cursor.execute(&quot;SELECT sql FROM sqlite_master WHERE type=&#39;table&#39;&quot;)<br>    schema = &quot;\n&quot;.join(row[0] for row in cursor.fetchall() if row[0])<br>    conn.close()<br>    return schema<br><br>def run_sql(query: str, db_path: str):<br>    try:<br>        conn = sqlite3.connect(db_path.replace(&quot;sqlite:///&quot;, &quot;&quot;))<br>        cursor = conn.cursor()<br>        cursor.execute(query)<br>        result = cursor.fetchall()<br>        conn.close()<br>        return result<br>    except Exception as e:<br>        return f&quot;ERROR: {e}&quot;</pre><p><strong>Step 5— LangGraph Agent</strong></p><p>Builds a 4-node graph: write_query → execute_query → check_query → conditionally rewrite_query or end. The key routing logic is in should_rewrite — if there&#39;s an error and attempts remain, it loops back to fix the query; otherwise it exits.</p><pre># ── LangGraph agent builder ───────────────<br>    database_path: str,<br>    openai_base_url: str,<br>    model: str,<br>    sampling_parameters: dict,<br>    max_turns: int,<br>    truncate_length: int,<br>):<br>    llm = ChatOpenAI(<br>        base_url=openai_base_url,<br>        model=model,<br>        **sampling_parameters,<br>    )<br>    schema = get_schema(database_path)<br><br>    def write_query(state: SQLState) -&gt; SQLState:<br>        response = llm.invoke([<br>            SystemMessage(&quot;You are a SQL expert. Write a SQLite query to answer the question. Return ONLY the SQL query.&quot;),<br>            HumanMessage(f&quot;Schema:\n{schema}\n\nQuestion: {state[&#39;question&#39;]}&quot;),<br>        ])<br>        return {**state, &quot;query&quot;: response.content.strip(), &quot;attempts&quot;: 0}<br><br>    def execute_query(state: SQLState) -&gt; SQLState:<br>        result = run_sql(state[&quot;query&quot;], database_path)<br>        if isinstance(result, str) and result.startswith(&quot;ERROR&quot;):<br>            return {**state, &quot;error&quot;: result, &quot;result&quot;: &quot;&quot;}<br>        return {**state, &quot;result&quot;: str(result)[:truncate_length], &quot;error&quot;: &quot;&quot;}<br><br>    def check_query(state: SQLState) -&gt; SQLState:<br>        # Pass through; routing logic handled by conditional edge below<br>        return state<br><br>    def rewrite_query(state: SQLState) -&gt; SQLState:<br>        response = llm.invoke([<br>            SystemMessage(&quot;You are a SQL expert. Fix the SQL query based on the error. Return ONLY the corrected SQL query.&quot;),<br>            HumanMessage(<br>                f&quot;Schema:\n{schema}\n\nOriginal query: {state[&#39;query&#39;]}\n&quot;<br>                f&quot;Error: {state[&#39;error&#39;]}\nQuestion: {state[&#39;question&#39;]}&quot;<br>            ),<br>        ])<br>        return {**state, &quot;query&quot;: response.content.strip(), &quot;attempts&quot;: state[&quot;attempts&quot;] + 1}<br><br>    def should_rewrite(state: SQLState) -&gt; str:<br>        if state.get(&quot;error&quot;) and state[&quot;attempts&quot;] &lt; max_turns:<br>            return &quot;rewrite_query&quot;<br>        return &quot;__end__&quot;<br><br>    builder = StateGraph(SQLState)<br><br>    builder.add_node(&quot;write_query&quot;,   write_query)<br>    builder.add_node(&quot;execute_query&quot;, execute_query)<br>    builder.add_node(&quot;check_query&quot;,   check_query)<br>    builder.add_node(&quot;rewrite_query&quot;, rewrite_query)<br><br>    builder.add_edge(&quot;__start__&quot;,    &quot;write_query&quot;)<br>    builder.add_edge(&quot;write_query&quot;,  &quot;execute_query&quot;)<br>    builder.add_edge(&quot;execute_query&quot;,&quot;check_query&quot;)<br><br>    builder.add_conditional_edges(&quot;check_query&quot;, should_rewrite)<br>    builder.add_edge(&quot;rewrite_query&quot;, &quot;execute_query&quot;)<br><br>    return builder.compile()</pre><p><strong>Step 6— Reward function</strong></p><p>Rewards execution equivalence, not exact string match — two different SQL queries that return the same rows both get 1.0. This is more robust than comparing SQL strings directly.</p><pre># ── Reward function ────────────────────────<br>def evaluate_query(predicted_query: str, ground_truth_query: str, db_path: str, raise_on_error: bool = False) -&gt; float:<br>    try:<br>        result_pred = run_sql(predicted_query, db_path)<br>        result_true = run_sql(ground_truth_query, db_path)<br>        return 1.0 if result_pred == result_true else 0.0<br>    except Exception as e:<br>        if raise_on_error:<br>            raise<br>        return 0.0</pre><p><strong>Step 7— The Agent</strong></p><p>Subclasses LitAgent and wires everything together in rollout(). Builds the LangGraph agent from the injected LLM resource, runs it on the task, computes the reward, and emits it back to AgentLightning.</p><pre># ── Agent ─────────────────────────────────────────────────────────────────────<br>class LitSQLAgent(agl.LitAgent):<br>    def __init__(self, max_turns: int, truncate_length: int):<br>        super().__init__()<br>        self.max_turns = max_turns<br>        self.truncate_length = truncate_length<br><br>    def rollout(<br>        self,<br>        task: dict,<br>        resources: agl.NamedResources,<br>        rollout: agl.Rollout,<br>    ) -&gt; float:<br>        llm: agl.LLM = resources[&quot;main_llm&quot;]<br><br>        agent = build_langgraph_sql_agent(<br>            database_path=&quot;sqlite:///&quot; + task[&quot;db_id&quot;],<br>            openai_base_url=llm.endpoint,    # use endpoint directly<br>            model=llm.model,<br>            sampling_parameters=llm.sampling_parameters or {},<br>            max_turns=self.max_turns,<br>            truncate_length=self.truncate_length,<br>        )<br><br>        # Safely get langchain handler only if available<br>        callbacks = []<br>        tracer = self.get_tracer()<br>        if hasattr(tracer, &quot;get_langchain_handler&quot;):<br>            callbacks.append(tracer.get_langchain_handler())<br><br>        result = agent.invoke(<br>            {&quot;question&quot;: task[&quot;question&quot;], &quot;schema&quot;: &quot;&quot;, &quot;query&quot;: &quot;&quot;,<br>             &quot;result&quot;: &quot;&quot;, &quot;error&quot;: &quot;&quot;, &quot;attempts&quot;: 0},<br>            {&quot;callbacks&quot;: callbacks, &quot;recursion_limit&quot;: 100},<br>        )<br><br>        reward = evaluate_query(<br>            result.get(&quot;query&quot;, &quot;&quot;),<br>            task.get(&quot;query&quot;, task.get(&quot;sql&quot;, &quot;&quot;)),   # ground truth SQL<br>            task.get(&quot;db_path&quot;, f&quot;data/database/{task[&#39;db_id&#39;]}/{task[&#39;db_id&#39;]}.sqlite&quot;),<br>            raise_on_error=False,<br>        )<br>        print(f&quot;[Rollout] Q: {task[&#39;question&#39;][:50]} | Reward: {reward:.3f}&quot;)<br>        agl.emit_reward(reward)<br>        return reward</pre><p><strong>Step 8— VERL Config</strong></p><p>Configures the VERL reinforcement learning algorithm — uses GRPO, the hf (HuggingFace) rollout engine, and fine-tunes Qwen2.5-Coder-1.5B. Requires a GPU node to run.</p><pre># ── VERL config ───────────────────────────────────────────────────────────────<br>verl_config = {<br>    &quot;algorithm&quot;: {&quot;adv_estimator&quot;: &quot;grpo&quot;, &quot;use_kl_in_reward&quot;: False},<br>    &quot;data&quot;: {<br>        &quot;train_batch_size&quot;: 8,<br>        &quot;max_prompt_length&quot;: 4096,<br>        &quot;max_response_length&quot;: 2048,<br>    },<br>    &quot;actor_rollout_ref&quot;: {<br>        &quot;rollout&quot;: {<br>            &quot;name&quot;: &quot;hf&quot;,<br>            &quot;n&quot;: 4,<br>            &quot;tensor_model_parallel_size&quot;: 1,<br>            &quot;multi_turn&quot;: {&quot;format&quot;: &quot;hermes&quot;},<br>        },<br>        &quot;actor&quot;: {&quot;ppo_mini_batch_size&quot;: 8, &quot;optim&quot;: {&quot;lr&quot;: 1e-6}},<br>        &quot;model&quot;: {&quot;path&quot;: &quot;Qwen/Qwen2.5-Coder-1.5B-Instruct&quot;},<br>    },<br>    &quot;trainer&quot;: {<br>        &quot;n_gpus_per_node&quot;: 1,<br>        &quot;val_before_train&quot;: True,<br>        &quot;test_freq&quot;: 32,<br>        &quot;save_freq&quot;: 64,<br>        &quot;total_epochs&quot;: 1,<br>    },<br>}</pre><p><strong>Step 9— Run Modes</strong></p><p>run_dev() is for local testing — it uses the OpenAI API and a small 10-sample subset. run_training() is for full RL fine-tuning on the Spider dataset and requires a Ray cluster with GPUs. Always test with run_dev() first.</p><pre># ── Full training ───────────────────────<br>def run_training():<br>    agent     = LitSQLAgent(max_turns=3, truncate_length=1024)<br>    algorithm = agl.VERL(verl_config)<br><br>    trainer = agl.Trainer(<br>        n_runners=10,<br>        algorithm=algorithm,<br>        adapter=agl.TracerTraceToTriplet(agent_match=&quot;write|rewrite&quot;),<br>    )<br><br>    train_data = pd.read_parquet(&quot;data/train_spider.parquet&quot;).to_dict(&quot;records&quot;)<br>    val_data   = pd.read_parquet(&quot;data/test_dev_500.parquet&quot;).to_dict(&quot;records&quot;)<br><br>    trainer.fit(agent, train_data, val_dataset=val_data)<br><br><br># ── Dev / dry-run ─────────────────────────<br>def run_dev():<br>    agent = LitSQLAgent(max_turns=3, truncate_length=1024)<br><br>    trainer = agl.Trainer(<br>        n_workers=1,<br>        initial_resources={<br>            &quot;main_llm&quot;: agl.LLM(<br>                endpoint=os.environ[&quot;OPENAI_API_BASE&quot;],<br>                model=&quot;gpt-4o-mini&quot;,<br>                api_key=os.environ[&quot;OPENAI_API_KEY&quot;],<br>                sampling_parameters={&quot;temperature&quot;: 0.7},<br>            )<br>        },<br>    )<br><br>    # Load a small subset — make sure the parquet file path is correct<br>    dev_data = pd.read_parquet(&quot;data/test_dev_500.parquet&quot;).to_dict(&quot;records&quot;)[:10]<br><br>    trainer.dev(agent, dev_data)<br><br>if __name__ == &quot;__main__&quot;:<br>    run_dev()<br>    # run_training()</pre><p><strong>Step 10 — Execute the code</strong></p><p>Execute the <strong>run_dev()</strong> to display the output.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SWBUDIil1JlXDwtbKYPi-Q.png" /></figure><p><em>Thanks for reading this article !!</em></p><p>If you enjoyed this article, please click on the clap button 👏 and share to help others find it!</p><p>The full source code for this tutorial can be found here,</p><ul><li><a href="https://colab.research.google.com/drive/1c93D61rpOHevfxbAe4qZbtjb2AjhGKlp?usp=sharing">Google Colab</a></li><li><a href="https://github.com/codemaker2015/agent-lightning-demo">GitHub - codemaker2015/agent-lightning-demo: agent lightning demo</a></li></ul><h4>Resources</h4><ul><li><a href="https://microsoft.github.io/agent-lightning/latest/how-to/train-first-agent/">Train the First Agent - Agent-lightning</a></li><li><a href="https://github.com/microsoft/agent-lightning/tree/main/docs/tutorials">agent-lightning/docs/tutorials at main · microsoft/agent-lightning</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=52921b0ed8af" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Zvec: Reimagining Vector Databases with SQLite-Style Simplicity]]></title>
            <link>https://codemaker2016.medium.com/zvec-reimagining-vector-databases-with-sqlite-style-simplicity-e76b247b6555?source=rss-b7a60e472dfc------2</link>
            <guid isPermaLink="false">https://medium.com/p/e76b247b6555</guid>
            <category><![CDATA[embedding]]></category>
            <category><![CDATA[python-programming]]></category>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[vector-database]]></category>
            <dc:creator><![CDATA[Vishnu Sivan]]></dc:creator>
            <pubDate>Sat, 28 Feb 2026 13:28:08 GMT</pubDate>
            <atom:updated>2026-02-28T13:28:08.417Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*DTjHJD0y804h-cDw4IZR0g.png" /></figure><p>The research team at Alibaba Tongyi Lab has introduced Zvec, an open-source in-process vector database purpose-built for edge and on-device retrieval workloads. It described as “the SQLite of vector databases,” runs as a lightweight embedded library inside your application. It requires no external services, no background daemons, and no network communication layer.</p><p>As Retrieval-Augmented Generation (RAG), semantic search, and AI agents increasingly move toward local-first and privacy-preserving deployments, Zvec addresses a growing infrastructure gap: how to deliver high-performance vector search with full persistence and metadata support — without the operational complexity of a server-based system.</p><p>This article explores why Zvec matters, how it is architected, where it fits in the vector database ecosystem, and why embedded vector search may become a core building block for the next generation of AI applications.</p><h3>Getting Started</h3><h4>Table of contents</h4><ul><li><a href="#07a5">What is Zvec</a></li><li><a href="#984f">How Zvec Works</a></li><li><a href="#9016">Why Zvec Matters</a></li><li><a href="#0c21">Performance Benchmarks</a></li><li><a href="#4cca">RAG-Focused Features</a></li><li><a href="#184d"><strong>Hands-On 1: Creating Your First Embedded Vector Database with Zvec</strong></a></li><li><a href="#58a4">Setting up the environment</a></li><li><a href="#24d8">Creating and querying a Zvec collection</a></li><li><a href="#90a2"><strong>Hands-On 2: Building a Zvec-based FAQ Agent for customer queries</strong></a></li><li><a href="#2fef">Preparing sample FAQ data</a></li><li><a href="#62a5">Creating Zvec collection</a></li><li><a href="#e14b">Building retrieval function</a></li><li><a href="#1792">Connecting to LLM for answer generation</a></li><li><a href="#5c73">Testing FAQ agent</a></li></ul><h4>What is Zvec</h4><p>Zvec is an open-source, in-process vector database developed by Alibaba that makes high-performance similarity search easy to embed directly into applications without running a separate server or service. It is designed to be lightweight, production-ready, and useful for AI workflows like semantic search, retrieval-augmented generation (RAG), recommendation systems, and other similarity-based tasks — all within the same process as your application.</p><p>At its core, Zvec runs as a simple library use just like you would a relational embedded database such as SQLite—but for vectors instead of tables. Because it runs “in-process,” there is no network layer, no external daemon to manage, and minimal configuration required. This zero-ops design makes it especially well-suited for local development, edge devices, desktop tools, command-line utilities, and other environments where deploying a separate vector database service would be impractical.</p><h4>How Zvec Works</h4><p>Zvec is built on Proxima, Alibaba Group’s high-performance vector search engine that has been battle-tested in large-scale production environments. It exposes a simple API that lets developers define collections, insert documents with vectors, and run similarity queries — all in a few lines of code.</p><p>Zvec supports:</p><ul><li>Dense and sparse vector types</li><li>Multi-vector queries</li><li>Hybrid search with scalar filters</li><li>Fast approximate nearest neighbor (ANN) search</li><li>Hybrid search optimizations</li><li>Resource control for CPU and memory, making it stable in constrained environments like mobile or edge devices</li></ul><p>Because Zvec runs inside your application process, it can seamlessly integrate into Python projects, notebooks, CLI tools, or edge applications without additional infrastructure.</p><h4>Why Zvec Matters</h4><p>RAG and semantic search systems require more than just a similarity index — they need vector storage, metadata (scalar fields), full CRUD operations, and reliable persistence as local knowledge bases constantly evolve.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*DJTUrnXyqWiolcTP.png" /><figcaption><a href="https://zvec.org/en/blog/introduction/">Zvec: The SQLite of Vector Databases | Zvec</a></figcaption></figure><p>Libraries like Faiss offer fast nearest neighbor search but lack built-in storage, crash recovery, and hybrid query capabilities, forcing developers to build additional infrastructure. Extensions such as DuckDB with VSS support add vector search but provide limited indexing flexibility and weaker resource control for edge environments. Service-based platforms like Milvus or managed vector cloud solutions require separate deployment and network communication, which can be excessive for on-device applications.</p><p>Zvec is designed specifically for these local use cases, offering a vector-native engine with built-in persistence, resource governance, and RAG-focused features — all packaged as a lightweight embedded library.</p><p>Zvec’s embedded approach enables:</p><ul><li>Local sematic search and RAG without external services</li><li>Fast prototyping and development</li><li>Edge and IoT usage with resource governance</li><li>Simplified integration comparable to SQLite’s impact on relational storage</li></ul><p>This positions Zvec as a key infrastructure component for applications that need high-quality vector retrieval but require a lightweight, zero-ops operational model.</p><h4>Performance Benchmarks</h4><p>Zvec is designed for CPU-bound, high-throughput similarity search workloads. The engine leverages multithreading, cache-efficient memory layouts, SIMD optimizations, and prefetching techniques to maximize performance on modern processors.</p><p>According to reported results from VectorDBBench (Cohere 10M dataset), Zvec achieves over 8,000 queries per second (QPS) while maintaining matched recall. In the same benchmark configuration, it reportedly delivered more than twice the throughput of the previous top-performing system, while also reducing index build time.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*mq6m44h-Tfc-wjol.png" /><figcaption><a href="https://zvec.org/en/blog/introduction/">Zvec: The SQLite of Vector Databases | Zvec</a></figcaption></figure><h4>RAG-Focused Features</h4><p>Zvec is specifically optimized for retrieval-augmented generation (RAG) and AI agent workflows. It provides capabilities that go beyond basic vector indexing, making it suitable for dynamic, production-ready knowledge systems.</p><ul><li><strong>Full CRUD support</strong> for managing mutable local knowledge bases.</li><li><strong>Schema evolution</strong>, allowing developers to modify indexing strategies and fields as application requirements evolve.</li><li><strong>Multi-vector retrieval</strong>, enabling the combination of multiple embedding channels for richer semantic matching.</li><li><strong>Built-in reranking mechanisms</strong>, including weighted fusion and Reciprocal Rank Fusion (RRF), to improve retrieval relevance.</li><li><strong>Scalar-vector hybrid search</strong>, with scalar filters pushed down into the index path for efficient execution, along with optional inverted indexes for attribute-based filtering.</li></ul><p>These capabilities make Zvec well-suited for applications that require flexible indexing, evolving schemas, and high-quality retrieval — all within an embedded, zero-ops deployment model.</p><h3>Hands-On 1: Creating Your First Embedded Vector Database with Zvec</h3><p>In this section, you will create your first Zvec collection, insert vector documents, and perform a similarity search — all within a few lines of Python code.</p><p>For this hands-on guide, we will be using Google Colab with a T4 GPU.</p><h4>Setting up the environment</h4><p><strong>Step 1: Create and Configure Your Colab Notebook</strong></p><ol><li>Open <a href="https://colab.research.google.com/">Google Colaboratory</a> and sign in with your Google account.</li><li>Create a new notebook by clicking on + New Notebook.</li><li>Navigate to Runtime → Change runtime type.</li></ol><ul><li>Set Hardware Accelerator to GPU.</li><li>Choose T4 GPU (recommended for this tutorial). It is recommended to use a GPU, as the CPU runtime may crash while creating the Zvec collection due to resource constraints.</li><li>Click Save.</li></ul><p><strong>Step 2: Add Hugging Face Access Token (Optional but Recommended)</strong></p><p>If you’re pulling models from Hugging Face, you’ll need an access token:</p><ol><li>In the left sidebar, select the 🔑 Secrets tab.</li><li>Add a new secret:</li></ol><ul><li>Key: HF_TOKEN</li><li>Value: <em>Your Hugging Face access token</em></li></ul><p>3. Generate a write token from your <a href="https://huggingface.co/settings/tokens">Hugging Face profile settings</a> → Create new token → Select write → Provide token name → Click on Create token.</p><p>Press enter or click to view image in full size</p><p><strong>Step 3: Install Dependencies</strong></p><p>Use the following script to install all necessary packages:</p><pre>!pip install zvec</pre><h4>Creating and querying a Zvec collection</h4><pre>import zvec<br> <br># Define collection schema<br>schema = zvec.CollectionSchema(<br>    name=&quot;example&quot;,<br>    vectors=zvec.VectorSchema(&quot;embedding&quot;, zvec.DataType.VECTOR_FP32, 4),<br>)<br> <br># Create collection<br>collection = zvec.create_and_open(path=&quot;./zvec_example&quot;, schema=schema,)<br> <br># Insert documents<br>collection.insert([<br>    zvec.Doc(id=&quot;doc_1&quot;, vectors={&quot;embedding&quot;: [0.1, 0.2, 0.3, 0.4]}),<br>    zvec.Doc(id=&quot;doc_2&quot;, vectors={&quot;embedding&quot;: [0.2, 0.3, 0.4, 0.1]}),<br>])<br> <br># Search by vector similarity<br>results = collection.query(<br>    zvec.VectorQuery(&quot;embedding&quot;, vector=[0.4, 0.3, 0.3, 0.1]),<br>    topk=10<br>)<br> <br># Results: list of {&#39;id&#39;: str, &#39;score&#39;: float, ...}, sorted by relevance <br>print(results)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/680/1*Qi3BE4TjsXmFgeLBicZK_w.png" /></figure><ul><li><strong>Define a </strong><strong>CollectionSchema</strong>: Specify vector fields (and optional scalar fields) that describe how your data will be stored.</li><li><strong>Create or open a collection</strong>: Use create_and_open() to initialize a new collection or load an existing one from disk.</li><li><strong>Insert documents</strong>: Add Doc objects containing unique IDs, embedding vectors, and optional metadata attributes.</li><li><strong>Build indexes and query</strong>: Run VectorQuery operations to retrieve the nearest neighbors based on vector similarity.</li><li><strong>Consume results</strong>: Results are returned as dictionaries containing document IDs and similarity scores, sorted by relevance. These can directly power a local semantic search engine or serve as the retrieval layer for a RAG pipeline.</li></ul><p>With this simple workflow, you now have a fully functional embedded vector database running inside your application — ready to scale from toy examples to production-ready local AI systems.</p><h3>Hands-On 2: Building a <strong>Zvec-based FAQ Agent for customer queries</strong></h3><p>This section provides a comprehensive, step-by-step guide to building a Zvec-based FAQ agent capable of handling large volumes of customer queries — similar to the support systems used by major e-commerce platforms.</p><h4>Installing dependencies</h4><p>In this hands-on tutorial, we will use Sentence Transformers to generate high-quality semantic embeddings and OpenAI’s language models to generate natural, context-aware responses for users.</p><pre>!pip install zvec sentence-transformers openai</pre><h4>Preparing sample FAQ data</h4><p>Create a structured FAQ dataset that will serve as the knowledge base for your Zvec-powered retrieval system, containing common customer questions and their corresponding answers for semantic search.</p><pre>faq_data = [<br>    {<br>        &quot;id&quot;: &quot;faq_1&quot;,<br>        &quot;question&quot;: &quot;How do I return a product?&quot;,<br>        &quot;answer&quot;: &quot;You can return a product within 7 days by going to Orders → Select item → Click Return.&quot;<br>    },<br>    {<br>        &quot;id&quot;: &quot;faq_2&quot;,<br>        &quot;question&quot;: &quot;How long does delivery take?&quot;,<br>        &quot;answer&quot;: &quot;Standard delivery takes 3-5 business days.&quot;<br>    },<br>    {<br>        &quot;id&quot;: &quot;faq_3&quot;,<br>        &quot;question&quot;: &quot;How do I track my order?&quot;,<br>        &quot;answer&quot;: &quot;Go to My Orders and click on Track to see real-time tracking updates.&quot;<br>    },<br>    {<br>        &quot;id&quot;: &quot;faq_4&quot;,<br>        &quot;question&quot;: &quot;How do I cancel my order?&quot;,<br>        &quot;answer&quot;: &quot;Orders can be cancelled before they are shipped from the Orders page.&quot;<br>    }<br>]</pre><h4>Creating Zvec collection</h4><p>Initialize and configure a persistent Zvec collection by defining the schema and preparing it to store and index embedded FAQ documents.</p><pre>import zvec<br>from sentence_transformers import SentenceTransformer<br>import shutil<br>import os<br>import time # Import time module<br><br># Load embedding model<br>model = SentenceTransformer(&quot;all-MiniLM-L6-v2&quot;)<br><br># Define schema<br>schema = zvec.CollectionSchema(<br>    name=&quot;faq_collection&quot;,<br>    vectors=zvec.VectorSchema(<br>        &quot;embedding&quot;,<br>        zvec.DataType.VECTOR_FP32,<br>        384   # dimension of MiniLM model<br>    )<br>    # fields=[<br>    #     zvec.FieldSchema(&quot;question&quot;, zvec.DataType.STRING),<br>    #     zvec.FieldSchema(&quot;answer&quot;, zvec.DataType.STRING)<br>    # ]<br>)<br><br># Remove existing collection if it exists<br>collection_path = &quot;./faq_zvec&quot; # Changed path to ensure a fresh start<br>if os.path.exists(collection_path):<br>    shutil.rmtree(collection_path)<br>    time.sleep(1) # Add a small delay<br><br># Create or open collection<br>collection = zvec.create_and_open(<br>    path=collection_path,<br>    schema=schema,<br>)<br><br>docs = []<br><br>for item in faq_data:<br>    text = item[&quot;question&quot;] + &quot; &quot; + item[&quot;answer&quot;]<br>    embedding = model.encode(text).tolist()<br><br>    docs.append(<br>        zvec.Doc(<br>            id=item[&quot;id&quot;],<br>            vectors={&quot;embedding&quot;: embedding}<br>            # fields={&quot;question&quot;: item[&quot;question&quot;], &quot;answer&quot;: item[&quot;answer&quot;]}<br>        )<br>    )<br><br>collection.insert(docs)<br><br>print(&quot;FAQ data indexed successfully!&quot;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Pn_ynusDNHw1-zcUHxQ2sQ.png" /></figure><h4>Building retrieval function</h4><p>Implement the retrieval layer that converts user queries into embeddings and fetches the most relevant FAQ entries from Zvec using vector similarity search.</p><pre>def retrieve(query, topk=3):<br>    embedding = model.encode(query).tolist()<br>    results = collection.query(<br>        zvec.VectorQuery(&quot;embedding&quot;, vector=embedding),<br>        topk=topk<br>    )<br>    return results</pre><h4>Connecting to LLM for answer generation</h4><p>Integrate the retrieval layer with an LLM to generate clear, natural responses by grounding answers in the relevant FAQs retrieved from Zvec.</p><pre># from retriever import retrieve<br>from openai import OpenAI<br><br>client = OpenAI(api_key=&quot;your-openai-key&quot;)  # set OPENAI_API_KEY<br><br># Build a lookup dict from faq_data to retrieve answers by ID<br>faq_lookup = {item[&quot;id&quot;]: item for item in faq_data}<br><br>def answer_query(user_query):<br>    retrieved_docs = retrieve(user_query, topk=3)<br><br>    # Look up answers by doc ID (since fields aren&#39;t stored in zvec)<br>    context = &quot;\n&quot;.join(<br>        [faq_lookup[doc.id][&quot;answer&quot;] for doc in retrieved_docs]<br>    )<br><br>    prompt = f&quot;&quot;&quot;<br>You are a customer support agent.<br><br>Use only the information below to answer.<br><br>Context:<br>{context}<br><br>Customer Question:<br>{user_query}<br><br>Answer:<br>&quot;&quot;&quot;<br><br>    response = client.chat.completions.create(<br>        model=&quot;gpt-4o-mini&quot;,<br>        messages=[{&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: prompt}]<br>    )<br><br>    return response.choices[0].message.content</pre><h4>Testing FAQ agent</h4><p>Run and interact with the FAQ agent to validate the end-to-end flow, from query embedding and retrieval in Zvec to natural response generation by the LLM.</p><pre>test_questions = [<br>    &quot;I received the wrong item, how do I return it?&quot;,<br>    &quot;When will my order arrive?&quot;,<br>    &quot;How do I cancel my order&quot;,<br>]<br><br>for question in test_questions:<br>    print(f&quot;\n{&#39;=&#39;*60}&quot;)<br>    print(f&quot;❓ Question: {question}&quot;)<br>    print(f&quot;{&#39;=&#39;*60}&quot;)<br>    answer = answer_query(question)<br>    print(f&quot;\n🤖 Answer:\n{answer}&quot;)</pre><p>Execute the script to observe the output as below.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*EoDiBH7KLwm5ZepIUBollA.png" /></figure><p>If you are interested in more hands-on examples, check out how Zvec can be used to build a car manual RAG agent with OpenAI for efficient and accurate data retrieval.</p><p><a href="https://colab.research.google.com/drive/1WM1eamaVeqcmx-jQAhTEBIeag5nOuy9Y?usp=sharing">Google Colab</a></p><p><em>Thanks for reading this article !!</em></p><p>If you enjoyed this article, please click on the clap button 👏 and share to help others find it!</p><p>The full source code for this tutorial can be found here,</p><p><a href="https://github.com/codemaker2015/zvec-experiments">GitHub - codemaker2015/zvec-experiments: (Retrieval-Augmented Generation) using PDF data and simple text data.</a></p><h4>Resources</h4><ul><li><a href="https://colab.research.google.com/drive/1qoOzIQaaoI7sndOf5QNZGoWdAvIRPu1T?usp=sharing">Google Colab</a></li><li><a href="https://zvec.org/en/blog/introduction/">Zvec: The SQLite of Vector Databases</a></li><li><a href="https://github.com/alibaba/zvec">GitHub - alibaba/zvec: A lightweight, lightning-fast, in-process vector database</a></li><li><a href="https://pypi.org/project/zvec/">zvec</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e76b247b6555" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[OmniDaemon: The Universal Event-Driven Runtime for Production Ready AI Agents]]></title>
            <link>https://codemaker2016.medium.com/omnidaemon-the-universal-event-driven-runtime-for-production-ready-ai-agents-02b1a5e63dfb?source=rss-b7a60e472dfc------2</link>
            <guid isPermaLink="false">https://medium.com/p/02b1a5e63dfb</guid>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[agentic-ai]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[omnistack]]></category>
            <category><![CDATA[agents]]></category>
            <dc:creator><![CDATA[Vishnu Sivan]]></dc:creator>
            <pubDate>Tue, 30 Dec 2025 16:07:38 GMT</pubDate>
            <atom:updated>2025-12-30T16:07:38.081Z</atom:updated>
            <content:encoded><![CDATA[<p><strong>Moving beyond the “Chatbot” era to autonomous, scalable, and resilient AI infrastructure.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zYd6mvBQMYZtt8R2A-UQLg.png" /></figure><p>The AI landscape is shifting. We have spent the last two years perfecting the “Chatbot” — a synchronous, request-response interface where an LLM waits for a user to type. But the future of AI isn’t a text box; it’s an autonomous service that listens to events, reasons, and acts in the background.</p><p>Enter OmniDaemon, a universal event-driven runtime built by <a href="https://github.com/omnirexflora-labs">OmniRexFlora Labs</a>. It is designed to turn AI agents into robust, production-ready infrastructure.</p><h3>Getting Started</h3><h4>Table of contents</h4><ul><li>The “Monolithic Trap” in Agentic AI</li><li>What is OmniDaemon?</li><li>Key Pillars</li><li>The Architecture: How it Works</li><li>The “Omni Stack” Ecosystem</li><li>The Core Problem OmniDaemon Solves</li><li>Why Event-Driven Architecture for AI Agents?</li><li>Why Traditional Architectures Fail for Agents</li><li>Getting Started: A Technical Quickstart</li><li>Step 1: Set Up the Environment</li><li>Step 2: Create your first agent</li><li>Step 3: Create the Event Producer (The Trigger)</li><li>Hands-On Project: Intelligent Log Insights Generator</li><li>Step 1: Set Up the Environment</li><li>Step 2: Create Sample Log Generator</li><li>Step 3: Build the Log Ingestion Agent</li><li>Step 4: Build the LLM-Powered Analysis Agent</li><li>Step 5: Build the Reporting Agent</li><li>Step 6: (Optional) Advanced Agent Chaining for Specialized Analysis</li><li>Advanced Features and Patterns</li></ul><h4>The “Monolithic Trap” in Agentic AI</h4><p>Most developers start by building multi-agent systems as single, monolithic applications. While this works for a local demo, it fails in production for three reasons:</p><ol><li><strong>The Blast Radius:</strong> If your “Data Analyst” agent crashes on a corrupted CSV, your entire system — Research Agent, Writer Agent, and API — goes down with it.</li><li><strong>The Scaling Wall:</strong> You cannot scale individual agents. If your “Research Agent” is I/O intensive, you’re forced to scale the whole monolith, wasting resources.</li><li><strong>Synchronous Bottlenecks:</strong> AI is slow. Making a user wait for a 30-second chain of LLM calls over an HTTP request leads to timeouts and poor UX.</li></ol><h3>What is OmniDaemon?</h3><p>OmniDaemon is a Universal Event-Driven Runtime for AI agents. It acts as the “Central Nervous System” for your AI stack. Think of it as “Kubernetes for AI Agents” — it provides the orchestration, scalability, and reliability needed to run AI agents in production. It allows agents to operate asynchronously, reacting to events (like a new file upload, a CRM update, or a message from another agent) rather than waiting for direct API calls.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/736/1*IKw5Qg0tBzo4OgKXfonlZA.png" /></figure><h4>Key Pillars:</h4><ul><li><strong>Framework Agnostic:</strong> It doesn’t care if your agent is built with OpenAI’s SDK, LangGraph, or CrewAI. If it’s Python code, OmniDaemon can daemonize it.</li><li><strong>Production Ready:</strong> It handles the “boring but hard” stuff: retries, Dead Letter Queues (DLQ), message persistence, and horizontal scaling.</li><li><strong>Event-Driven (EDA):</strong> Built on high-performance backends like Redis Streams, it ensures that agents collaborate across distributed systems without being tightly coupled.</li><li><strong>Asynchronous Execution:</strong> They listen to a “Topic” (like a Redis stream). When a task arrives, the daemon triggers a callback, processes it, and emits the result.</li><li><strong>Pluggable Backends</strong>: Swap event buses (Redis Streams, Kafka, RabbitMQ) and storage (Redis, PostgreSQL, MongoDB) via environment variables</li><li><strong>Horizontal Scaling</strong>: Deploy multiple agent instances for load balancing without code changes</li></ul><h4>The Architecture: How it Works</h4><p>OmniDaemon sits between your event source (like Redis or a JSON stream) and your AI logic.</p><ul><li><strong>The Listener:</strong> OmniDaemon monitors an event stream.</li><li><strong>The Callback:</strong> When a message arrives, it triggers your agent logic via a simple callback function.</li><li><strong>The Manager:</strong> It manages the lifecycle. If your agent fails, OmniDaemon handles the retry logic. If it succeeds, it persists the result and can even trigger the next agent in the chain.</li><li><strong>Storage Backends:</strong> It currently supports JSON (for local development) and Redis (for production). S3 and PostgreSQL support are on the roadmap.</li><li><strong>The Callback Pattern:</strong> You wrap your agent logic in a single function. OmniDaemon handles the “plumbing” — fetching the message, tracking metadata, and persisting the output.</li><li><strong>Metadata &amp; Context:</strong> Every event carries a rich metadata object including message_id, correlation_id (to track a task across 10 different agents), and tenant_id for multi-tenant SaaS apps.</li></ul><h4>The “Omni Stack” Ecosystem</h4><p>OmniDaemon is part of a larger vision by OmniRexFlora Labs to provide a complete “Linux-like” ecosystem for AI:</p><ul><li><strong>OmniCoreAgent:</strong> The framework for building the “brains.”</li><li><strong>OmniMemory:</strong> Persistent, semantic memory for agents.</li><li><strong>OmniDaemon:</strong> The runtime for distributed execution.</li><li><strong>OmniCloud:</strong> (Upcoming) The deployment and orchestration layer.</li></ul><h4>The Core Problem OmniDaemon Solves</h4><p>Traditional AI systems are request-driven: a user asks a question, the AI responds, and that’s it. But modern enterprises need AI that operates continuously in the background, reacting to events, coordinating with other agents, and integrating seamlessly with existing infrastructure.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/812/1*NFLMtRLeku92_-l27Iyjag.png" /></figure><h3>Why Event-Driven Architecture for AI Agents?</h3><p>The future of AI lies in autonomous agents that operate as distributed microservices. Just like microservices revolutionized application architecture, event-driven AI agents are transforming how we build intelligent systems.</p><p><strong>Wave 1: Predictive Models (Traditional ML)</strong></p><ul><li>Domain-specific, rigid systems</li><li>Required ML expertise for each use case</li><li>Difficult to repurpose or scale</li></ul><p><strong>Wave 2: Generative Models (LLMs)</strong></p><ul><li>Revolutionary generalization capabilities</li><li>But: fixed in time, expensive to fine-tune, no access to private data</li><li>RAG (Retrieval-Augmented Generation) helped, but workflows remained rigid</li></ul><p><strong>Wave 3: Agentic AI (Current)</strong></p><ul><li>Dynamic workflows that adapt on the fly</li><li>Autonomous decision-making with tool use</li><li>Context-driven processing with memory</li><li>Collaborative multi-agent systems</li></ul><blockquote>Industry leaders agree: “Agents are the new apps” (Dharmesh Shah, HubSpot CTO). But building scalable agent systems requires proper infrastructure.</blockquote><h4>Why Traditional Architectures Fail for Agents</h4><p>Connecting agents via REST APIs creates tightly coupled systems:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/811/1*miE-4wkFpbGGuCKuRz4y8g.png" /></figure><p>Event-driven architecture solves this through loose coupling:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/810/1*53g46U2rZgUuWPGPeK6Qrg.png" /></figure><h3>Getting Started: A Technical Quickstart</h3><p>OmniDaemon is designed for simplicity. You can move an existing agent into a daemonized state in minutes.</p><h4>Prerequisites</h4><ul><li><a href="https://www.python.org/downloads"><strong>Python 3.9+</strong></a></li><li><strong>Redis</strong> (Installed and running)</li><li><strong>OpenAI API Key</strong> (or any LLM provider)</li></ul><h4>Step 1: Set Up the Environment</h4><p>To use OmniDaemon effectively in a production or development environment, running Redis via <a href="https://www.docker.com/products/docker-desktop"><strong>Docker</strong></a> is the industry standard. This ensures your event stream is isolated, portable, and easy to manage.</p><h4>Initializing Redis</h4><p>Before running Redis, you must have the Docker Engine installed.</p><p><strong>For Windows &amp; Mac:<br>Download Docker Desktop</strong> from the <a href="https://www.docker.com/products/docker-desktop/">official website</a>.</p><p><strong>For Linux (Ubuntu):<br></strong>Open your terminal and run these commands to install the Docker Engine:</p><pre>sudo apt update<br>sudo apt install ca-certificates curl gnupg<br>sudo install -m 0755 -d /etc/apt/keyrings<br>curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg<br>sudo chmod a+r /etc/apt/keyrings/docker.gpg<br><br>echo &quot;deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release &amp;&amp; echo &quot;$VERSION_CODENAME&quot;) stable&quot; | sudo tee /etc/apt/sources.list.d/docker.list &gt; /dev/null<br><br>sudo apt update<br>sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin</pre><p>Once Docker is ready, you can pull and start Redis with a single command. For OmniDaemon, we want persistence so that your AI tasks aren’t lost if the container restarts.</p><pre>docker run -d --name omni-redis -p 6379:6379 -v redis-data:/data redis:latest redis-server --appendonly yes</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MD_HGrzUrmo55mh-OawQng.png" /></figure><h4>Installing Dependencies</h4><p>First, install the necessary libraries using pip.</p><pre>pip install omnidaemon</pre><h4>Step 2: Create your first agent</h4><p>Below is the implementation of a distributed “Greeter” agent. The agent_runner.py acts as our persistent service, while producer.py demonstrates how external systems interact with the agent through structured message passing.</p><p>Create file named agent_runner.py and add the following code it.</p><pre>import asyncio<br>from omnidaemon import OmniDaemonSDK<br>from omnidaemon import AgentConfig<br><br>sdk = OmniDaemonSDK()<br><br>async def greeter(message: dict):<br>    &quot;&quot;&quot;Your AI agent runs here!&quot;&quot;&quot;<br>    name = message.get(&quot;content&quot;, {}).get(&quot;name&quot;, &quot;stranger&quot;)<br>    return {&quot;reply&quot;: f&quot;Hello, {name}! 👋&quot;}<br><br>async def main():<br>    await sdk.register_agent(<br>        agent_config=AgentConfig(<br>            topic=&quot;greet.user&quot;,<br>            callback=greeter,<br>        )<br>    )<br>    await sdk.start()<br>    print(&quot;🎧 Agent running. Press Ctrl+C to stop.&quot;)<br>    <br>    try:<br>        while True:<br>            await asyncio.sleep(1)<br>    except KeyboardInterrupt:<br>        pass<br>    finally:<br>        await sdk.shutdown()<br><br>if __name__ == &quot;__main__&quot;:<br>    asyncio.run(main())</pre><h4>Step 3: Create the Event Producer (The Trigger)</h4><p>In an event-driven system, the producer doesn’t wait for the AI to finish. It just “drops” the job into the queue and moves on.</p><p>Create file named producer.py and add the following code it.</p><pre>import asyncio<br>import json<br><br>async def test_messages():<br>    &quot;&quot;&quot;Send test messages to the agent.&quot;&quot;&quot;<br>    <br>    # Test cases<br>    tests = [<br>        (&quot;greet.user&quot;, {&quot;content&quot;: {&quot;name&quot;: &quot;World&quot;}}),<br>        (&quot;greet.user&quot;, {&quot;content&quot;: {&quot;name&quot;: &quot;Alice&quot;}}),<br>        (&quot;greet.user&quot;, {&quot;content&quot;: {}}),<br>        (&quot;health.check&quot;, {&quot;timestamp&quot;: &quot;now&quot;}),<br>    ]<br>    <br>    print(&quot;Testing agent messages...\n&quot;)<br>    <br>    for topic, message in tests:<br>        print(f&quot;Topic: {topic}&quot;)<br>        print(f&quot;Message: {json.dumps(message)}&quot;)<br>        <br>        # Simulate sending<br>        await asyncio.sleep(0.3)<br>        <br>        # Simulate response<br>        if topic == &quot;greet.user&quot;:<br>            name = message.get(&quot;content&quot;, {}).get(&quot;name&quot;, &quot;stranger&quot;)<br>            print(f&quot;Response: Hello, {name}!\n&quot;)<br>        else:<br>            print(&quot;Response: {&#39;status&#39;: &#39;healthy&#39;}\n&quot;)<br>        <br>        await asyncio.sleep(0.5)<br>    <br>    print(&quot;Done!&quot;)<br><br>if __name__ == &quot;__main__&quot;:<br>    asyncio.run(test_messages())</pre><h4>Executing the code</h4><p>Open your first terminal and execute the agent runner. This process will initialize the OmniDaemonSDK, register the greeter callback to the greet.user topic, and enter an idle listening state.</p><pre>python agent_runner.py</pre><p><em>You should see the output:</em> 🎧 Agent running. Press Ctrl+C to stop.</p><p>Open a second terminal to run the publisher script. This script acts as the “Client” or “Producer,” injecting JSON-formatted payloads into the system to be picked up by the waiting agent.</p><pre>python producer.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UtFN-rFhkH-boZDqHsoTMA.png" /></figure><h3>Hands-On Project: Intelligent Log Insights Generator</h3><p>Now let’s build a real-world application: an AI-powered log analysis system that monitors authentication and database transaction logs, detects anomalies, identifies vulnerabilities, and provides actionable recommendations.</p><h4>Project Architecture</h4><p>Our system will have three main components:</p><ol><li><strong>Log Ingestion Agent</strong>: Watches log files and publishes events</li><li><strong>Analysis Agent</strong>: Uses LLM to analyze logs for security issues, anomalies, and performance concerns</li><li><strong>Reporting Agent</strong>: Aggregates insights and generates actionable reports</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/807/1*oLgTZ6TierTGpRA44Iheug.png" /></figure><h4>Step 1: Set Up the Environment</h4><p>First, install the necessary libraries. We will use omnidaemon for the runtime and langchain (optional) to demonstrate how it wraps external frameworks.</p><pre>pip install omnidaemon openai python-dotenv</pre><p>Ensure your .env file has your credentials:</p><pre>OPENAI_API_KEY=your_key_here<br>STORAGE_BACKEND=redis<br>REDIS_URL=redis://localhost:6379<br>EVENT_BUS_TYPE=redis_stream<br>OMNIDAEMON_API_ENABLED=true<br>OMNIDAEMON_API_PORT=8765<br>LOG_LEVEL=INFO</pre><h4>Step 2: Create Sample Log Generator</h4><p>First, let’s create realistic sample logs for testing:</p><pre># generate_logs.py<br>import json<br>import random<br>import time<br>from datetime import datetime, timedelta<br>from pathlib import Path<br><br># Create logs directory<br>log_dir = Path(&quot;./sample_logs&quot;)<br>log_dir.mkdir(exist_ok=True)<br><br># Sample data for realistic logs<br>users = [&quot;alice&quot;, &quot;bob&quot;, &quot;charlie&quot;, &quot;david&quot;, &quot;eve&quot;, &quot;mallory&quot;]<br>ips = [&quot;192.168.1.10&quot;, &quot;192.168.1.20&quot;, &quot;192.168.1.30&quot;, &quot;10.0.0.5&quot;, &quot;172.16.0.100&quot;]<br>suspicious_ips = [&quot;45.133.1.12&quot;, &quot;89.248.172.45&quot;]  # Potential attack IPs<br>db_operations = [&quot;SELECT&quot;, &quot;INSERT&quot;, &quot;UPDATE&quot;, &quot;DELETE&quot;]<br>tables = [&quot;users&quot;, &quot;transactions&quot;, &quot;orders&quot;, &quot;products&quot;, &quot;audit_logs&quot;]<br><br>def generate_auth_log():<br>    &quot;&quot;&quot;Generate authentication log entry&quot;&quot;&quot;<br>    is_suspicious = random.random() &lt; 0.1  # 10% suspicious activity<br>    <br>    if is_suspicious:<br>        # Generate suspicious patterns<br>        event_type = random.choice([&quot;failed_login&quot;, &quot;brute_force_attempt&quot;, &quot;impossible_travel&quot;])<br>        ip = random.choice(suspicious_ips)<br>        success = False<br>    else:<br>        event_type = random.choice([&quot;login&quot;, &quot;logout&quot;, &quot;password_change&quot;, &quot;session_refresh&quot;])<br>        ip = random.choice(ips)<br>        success = random.random() &lt; 0.95  # 95% success rate normally<br>    <br>    return {<br>        &quot;timestamp&quot;: datetime.now().isoformat(),<br>        &quot;log_type&quot;: &quot;authentication&quot;,<br>        &quot;event&quot;: event_type,<br>        &quot;user&quot;: random.choice(users),<br>        &quot;ip_address&quot;: ip,<br>        &quot;success&quot;: success,<br>        &quot;user_agent&quot;: &quot;Mozilla/5.0&quot; if random.random() &lt; 0.8 else &quot;curl/7.68.0&quot;,<br>        &quot;session_id&quot;: f&quot;sess_{random.randint(10000, 99999)}&quot;<br>    }<br><br>def generate_db_log():<br>    &quot;&quot;&quot;Generate database transaction log entry&quot;&quot;&quot;<br>    is_anomaly = random.random() &lt; 0.15  # 15% anomalies<br>    <br>    operation = random.choice(db_operations)<br>    execution_time = random.uniform(0.01, 0.5)<br>    <br>    if is_anomaly:<br>        # Generate anomalous patterns<br>        if random.random() &lt; 0.5:<br>            execution_time = random.uniform(5.0, 30.0)  # Slow query<br>        <br>        if operation == &quot;DELETE&quot; and random.random() &lt; 0.3:<br>            rows_affected = random.randint(1000, 10000)  # Mass deletion<br>        else:<br>            rows_affected = random.randint(1, 100)<br>    else:<br>        rows_affected = random.randint(1, 10)<br>    <br>    return {<br>        &quot;timestamp&quot;: datetime.now().isoformat(),<br>        &quot;log_type&quot;: &quot;database&quot;,<br>        &quot;operation&quot;: operation,<br>        &quot;table&quot;: random.choice(tables),<br>        &quot;user&quot;: random.choice(users),<br>        &quot;execution_time_ms&quot;: round(execution_time * 1000, 2),<br>        &quot;rows_affected&quot;: rows_affected,<br>        &quot;query_hash&quot;: f&quot;qh_{random.randint(100000, 999999)}&quot;,<br>        &quot;connection_id&quot;: random.randint(1, 50)<br>    }<br><br>def generate_logs(num_logs=100, filename=&quot;application.log&quot;):<br>    &quot;&quot;&quot;Generate mixed authentication and database logs&quot;&quot;&quot;<br>    log_file = log_dir / filename<br>    <br>    with open(log_file, &quot;w&quot;) as f:<br>        for _ in range(num_logs):<br>            # Mix of auth and db logs (60% auth, 40% db)<br>            if random.random() &lt; 0.6:<br>                log_entry = generate_auth_log()<br>            else:<br>                log_entry = generate_db_log()<br>            <br>            f.write(json.dumps(log_entry) + &quot;\n&quot;)<br>            time.sleep(0.01)  # Simulate real-time generation<br>    <br>    print(f&quot;Generated {num_logs} log entries in {log_file}&quot;)<br><br>if __name__ == &quot;__main__&quot;:<br>    # Generate initial logs<br>    generate_logs(num_logs=100, filename=&quot;application.log&quot;)<br>    <br>    # Optionally: Continuous generation<br>    print(&quot;\nGenerating continuous logs... (Ctrl+C to stop)&quot;)<br>    try:<br>        while True:<br>            generate_logs(num_logs=10, filename=&quot;application.log&quot;)<br>            time.sleep(5)  # Generate new batch every 5 seconds<br>    except KeyboardInterrupt:<br>        print(&quot;\nLog generation stopped.&quot;)</pre><h4>Step 3: Build the Log Ingestion Agent</h4><p>The log ingestion agent is the entry point of our system. It acts as a bridge between your log files and the OmniDaemon event-driven architecture. Instead of having your analysis agents directly read files (which creates tight coupling and scaling issues), the ingestion agent watches for new log entries and publishes them as events to the event bus.</p><pre># log_ingestion_agent.py<br>import asyncio<br>import json<br>import time<br>from pathlib import Path<br>from omnidaemon import OmniDaemonSDK, EventEnvelope, PayloadBase<br><br>sdk = OmniDaemonSDK()<br><br>class LogIngestionAgent:<br>    def __init__(self, log_dir=&quot;./sample_logs&quot;):<br>        self.log_dir = Path(log_dir)<br>        self.processed_lines = {}  # Track processed lines per file<br>        <br>    async def watch_logs(self):<br>        &quot;&quot;&quot;Watch log files and publish new entries&quot;&quot;&quot;<br>        print(f&quot;🔍 Watching logs in {self.log_dir}&quot;)<br>        <br>        while True:<br>            try:<br>                # Find all log files<br>                log_files = list(self.log_dir.glob(&quot;*.log&quot;))<br>                <br>                for log_file in log_files:<br>                    await self.process_log_file(log_file)<br>                <br>                await asyncio.sleep(2)  # Check every 2 seconds<br>                <br>            except Exception as e:<br>                print(f&quot;❌ Error watching logs: {e}&quot;)<br>                await asyncio.sleep(5)<br>    <br>    async def process_log_file(self, log_file: Path):<br>        &quot;&quot;&quot;Process new lines from a log file&quot;&quot;&quot;<br>        try:<br>            with open(log_file, &quot;r&quot;) as f:<br>                lines = f.readlines()<br>            <br>            # Get last processed line number<br>            last_line = self.processed_lines.get(str(log_file), 0)<br>            <br>            # Process only new lines<br>            new_lines = lines[last_line:]<br>            <br>            for line in new_lines:<br>                if line.strip():<br>                    await self.publish_log_entry(line, str(log_file))<br>            <br>            # Update processed line count<br>            self.processed_lines[str(log_file)] = len(lines)<br>            <br>        except Exception as e:<br>            print(f&quot;❌ Error processing {log_file}: {e}&quot;)<br>    <br>    async def publish_log_entry(self, log_line: str, source_file: str):<br>        &quot;&quot;&quot;Publish a log entry to the event bus&quot;&quot;&quot;<br>        try:<br>            log_entry = json.loads(log_line)<br>            <br>            # Create event envelope<br>            event = EventEnvelope(<br>                topic=&quot;logs.raw&quot;,<br>                payload=PayloadBase(<br>                    content={<br>                        &quot;log_entry&quot;: log_entry,<br>                        &quot;source_file&quot;: source_file<br>                    },<br>                    reply_to=&quot;logs.analyzed&quot;  # Results go to analyzed topic<br>                ),<br>                source=&quot;log_ingestion_agent&quot;<br>            )<br>            <br>            # Publish to event bus<br>            task_id = await sdk.publish_task(event_envelope=event)<br>            print(f&quot;📤 Published log: {log_entry.get(&#39;log_type&#39;)} - {task_id}&quot;)<br>            <br>        except json.JSONDecodeError:<br>            print(f&quot;⚠️  Invalid JSON in log line&quot;)<br>        except Exception as e:<br>            print(f&quot;❌ Error publishing log: {e}&quot;)<br><br>async def main():<br>    try:<br>        # Create ingestion agent<br>        ingestion_agent = LogIngestionAgent()<br>        <br>        # Start watching logs<br>        print(&quot;🚀 Log Ingestion Agent started&quot;)<br>        await ingestion_agent.watch_logs()<br>        <br>    except KeyboardInterrupt:<br>        print(&quot;\n👋 Shutting down...&quot;)<br>    finally:<br>        await sdk.shutdown()<br><br>if __name__ == &quot;__main__&quot;:<br>    asyncio.run(main())</pre><h4>Step 4: Build the LLM-Powered Analysis Agent</h4><p>This is the intelligent core of our system where the analysis agent subscribes to the logs.raw topic and uses OpenAI&#39;s GPT-4 to perform deep, context-aware analysis of each log entry. This LLM-powered agent understands context, identifies novel threats, and provides human-like reasoning about security vulnerabilities, performance anomalies, and operational risks. It automatically classifies severity levels, generates actionable recommendations, and publishes results to the logs.analyzed topic for downstream processing—all while handling 3 parallel consumers for high-throughput analysis.</p><pre># log_analysis_agent.py<br>import asyncio<br>import json<br>from typing import Dict, Any<br>from openai import AsyncOpenAI<br>from omnidaemon import OmniDaemonSDK, AgentConfig, SubscriptionConfig<br>from decouple import config<br><br>sdk = OmniDaemonSDK()<br>client = AsyncOpenAI(api_key=config(&quot;OPENAI_API_KEY&quot;))<br><br>ANALYSIS_SYSTEM_PROMPT = &quot;&quot;&quot;You are an expert security and database analyst. <br>Analyze log entries and provide detailed insights on:<br><br>1. **Security Vulnerabilities**: Identify potential security threats, unusual access patterns, <br>   brute force attempts, suspicious IP addresses, or unauthorized access attempts.<br><br>2. **Anomalies**: Detect unusual patterns such as:<br>   - Failed login attempts from the same IP<br>   - Impossible travel (logins from distant locations in short time)<br>   - Unusual database operations (mass deletions, slow queries)<br>   - Off-hours access patterns<br>   - Spike in certain operations<br><br>3. **Critical Actions**: Flag actions that require immediate attention:<br>   - Successful logins after multiple failures<br>   - Large-scale data modifications<br>   - Administrative privilege escalation<br>   - Database performance degradation<br><br>4. **Recommendations**: Provide specific, actionable recommendations:<br>   - Implement IP blocking for suspicious addresses<br>   - Add monitoring for specific patterns<br>   - Optimize slow queries<br>   - Review user permissions<br>   - Enable additional security measures<br><br>Return your analysis as a JSON object with this structure:<br>{<br>    &quot;severity&quot;: &quot;critical|high|medium|low|info&quot;,<br>    &quot;category&quot;: &quot;security|performance|anomaly|normal&quot;,<br>    &quot;vulnerabilities&quot;: [&quot;list of identified vulnerabilities&quot;],<br>    &quot;anomalies&quot;: [&quot;list of detected anomalies&quot;],<br>    &quot;critical_actions&quot;: [&quot;list of critical actions taken&quot;],<br>    &quot;recommendations&quot;: [&quot;list of specific recommendations&quot;],<br>    &quot;summary&quot;: &quot;brief summary of findings&quot;<br>}<br><br>Be concise but thorough. Focus on actionable insights.&quot;&quot;&quot;<br><br>async def analyze_log_with_llm(log_entry: Dict[str, Any]) -&gt; Dict[str, Any]:<br>    &quot;&quot;&quot;Analyze a log entry using LLM&quot;&quot;&quot;<br>    try:<br>        # Prepare log context for LLM<br>        log_context = json.dumps(log_entry, indent=2)<br>        <br>        # Call OpenAI API<br>        response = await client.chat.completions.create(<br>            model=&quot;gpt-4o&quot;,<br>            messages=[<br>                {&quot;role&quot;: &quot;system&quot;, &quot;content&quot;: ANALYSIS_SYSTEM_PROMPT},<br>                {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: f&quot;Analyze this log entry:\n\n{log_context}&quot;}<br>            ],<br>            temperature=0.3,<br>            max_tokens=1000<br>        )<br>        <br>        # Parse LLM response<br>        analysis_text = response.choices[0].message.content<br>        <br>        # Extract JSON from response (handle markdown code blocks)<br>        if &quot;```json&quot; in analysis_text:<br>            analysis_text = analysis_text.split(&quot;```json&quot;)[1].split(&quot;```&quot;)[0].strip()<br>        elif &quot;```&quot; in analysis_text:<br>            analysis_text = analysis_text.split(&quot;```&quot;)[1].split(&quot;```&quot;)[0].strip()<br>        <br>        analysis = json.loads(analysis_text)<br>        <br>        return {<br>            &quot;status&quot;: &quot;success&quot;,<br>            &quot;analysis&quot;: analysis,<br>            &quot;log_entry&quot;: log_entry<br>        }<br>        <br>    except Exception as e:<br>        print(f&quot;❌ LLM Analysis Error: {e}&quot;)<br>        return {<br>            &quot;status&quot;: &quot;error&quot;,<br>            &quot;error&quot;: str(e),<br>            &quot;log_entry&quot;: log_entry<br>        }<br><br>async def log_analysis_callback(message: dict):<br>    &quot;&quot;&quot;OmniDaemon callback for log analysis&quot;&quot;&quot;<br>    content = message.get(&quot;content&quot;, {})<br>    log_entry = content.get(&quot;log_entry&quot;, {})<br>    source_file = content.get(&quot;source_file&quot;, &quot;unknown&quot;)<br>    <br>    print(f&quot;\n🔍 Analyzing {log_entry.get(&#39;log_type&#39;)} log from {source_file}&quot;)<br>    <br>    # Analyze log with LLM<br>    result = await analyze_log_with_llm(log_entry)<br>    <br>    if result[&quot;status&quot;] == &quot;success&quot;:<br>        analysis = result[&quot;analysis&quot;]<br>        severity = analysis.get(&quot;severity&quot;, &quot;info&quot;)<br>        category = analysis.get(&quot;category&quot;, &quot;normal&quot;)<br>        <br>        # Print severity-based alerts<br>        if severity == &quot;critical&quot;:<br>            print(f&quot;🚨 CRITICAL: {analysis.get(&#39;summary&#39;)}&quot;)<br>        elif severity == &quot;high&quot;:<br>            print(f&quot;⚠️  HIGH: {analysis.get(&#39;summary&#39;)}&quot;)<br>        elif severity == &quot;medium&quot;:<br>            print(f&quot;⚡ MEDIUM: {analysis.get(&#39;summary&#39;)}&quot;)<br>        else:<br>            print(f&quot;ℹ️  {analysis.get(&#39;summary&#39;)}&quot;)<br>        <br>        # Print recommendations if any<br>        if analysis.get(&quot;recommendations&quot;):<br>            print(f&quot;💡 Recommendations:&quot;)<br>            for rec in analysis[&quot;recommendations&quot;][:2]:  # Show top 2<br>                print(f&quot;   - {rec}&quot;)<br>    <br>    return result<br><br>async def main():<br>    try:<br>        print(&quot;🤖 Starting Log Analysis Agent...&quot;)<br>        <br>        # Register analysis agent<br>        await sdk.register_agent(<br>            agent_config=AgentConfig(<br>                name=&quot;LOG_ANALYSIS_AGENT&quot;,<br>                topic=&quot;logs.raw&quot;,<br>                callback=log_analysis_callback,<br>                description=&quot;LLM-powered log analysis agent&quot;,<br>                tools=[&quot;openai&quot;, &quot;security_analysis&quot;],<br>                config=SubscriptionConfig(<br>                    reclaim_idle_ms=30000,<br>                    dlq_retry_limit=2,<br>                    consumer_count=3  # Parallel processing<br>                )<br>            )<br>        )<br>        <br>        # Start agent runner<br>        await sdk.start()<br>        print(&quot;✅ Log Analysis Agent is now running&quot;)<br>        print(&quot;🎧 Listening for logs on &#39;logs.raw&#39; topic...&quot;)<br>        <br>        # Keep running<br>        try:<br>            while True:<br>                await asyncio.sleep(1)<br>        except KeyboardInterrupt:<br>            print(&quot;\n👋 Received shutdown signal...&quot;)<br>    <br>    except Exception as e:<br>        print(f&quot;❌ Error: {e}&quot;)<br>        raise<br>    <br>    finally:<br>        print(&quot;Shutting down...&quot;)<br>        await sdk.shutdown()<br>        print(&quot;✅ Shutdown complete&quot;)<br><br>if __name__ == &quot;__main__&quot;:<br>    asyncio.run(main())</pre><h4>Step 5: Build the Reporting Agent</h4><p>The reporting agent is the final stage of our pipeline that transforms individual log analyses into actionable intelligence. It subscribes to the logs.analyzed topic, aggregates insights by severity level (critical, high, medium, low), and automatically generates comprehensive reports every 20 insights. This agent produces JSON reports that can feed into dashboards, alerting systems, or business intelligence tools.</p><pre># reporting_agent.py<br>import asyncio<br>import json<br>from datetime import datetime<br>from collections import defaultdict<br>from pathlib import Path<br>from omnidaemon import OmniDaemonSDK, AgentConfig, SubscriptionConfig<br><br>sdk = OmniDaemonSDK()<br><br>class ReportingAgent:<br>    def __init__(self):<br>        self.insights = defaultdict(list)<br>        self.reports_dir = Path(&quot;./reports&quot;)<br>        self.reports_dir.mkdir(exist_ok=True)<br>        <br>    async def process_analysis(self, message: dict):<br>        &quot;&quot;&quot;Process analyzed log and aggregate insights&quot;&quot;&quot;<br>        content = message.get(&quot;content&quot;, {})<br>        <br>        if content.get(&quot;status&quot;) != &quot;success&quot;:<br>            return {&quot;status&quot;: &quot;skipped&quot;, &quot;reason&quot;: &quot;analysis failed&quot;}<br>        <br>        analysis = content.get(&quot;analysis&quot;, {})<br>        log_entry = content.get(&quot;log_entry&quot;, {})<br>        <br>        # Aggregate insights by severity<br>        severity = analysis.get(&quot;severity&quot;, &quot;info&quot;)<br>        self.insights[severity].append({<br>            &quot;timestamp&quot;: log_entry.get(&quot;timestamp&quot;),<br>            &quot;log_type&quot;: log_entry.get(&quot;log_type&quot;),<br>            &quot;category&quot;: analysis.get(&quot;category&quot;),<br>            &quot;summary&quot;: analysis.get(&quot;summary&quot;),<br>            &quot;vulnerabilities&quot;: analysis.get(&quot;vulnerabilities&quot;, []),<br>            &quot;anomalies&quot;: analysis.get(&quot;anomalies&quot;, []),<br>            &quot;critical_actions&quot;: analysis.get(&quot;critical_actions&quot;, []),<br>            &quot;recommendations&quot;: analysis.get(&quot;recommendations&quot;, [])<br>        })<br>        <br>        # Generate report every 20 insights<br>        total_insights = sum(len(v) for v in self.insights.values())<br>        if total_insights % 20 == 0:<br>            await self.generate_report()<br>        <br>        return {<br>            &quot;status&quot;: &quot;processed&quot;,<br>            &quot;severity&quot;: severity,<br>            &quot;total_insights&quot;: total_insights<br>        }<br>    <br>    async def generate_report(self):<br>        &quot;&quot;&quot;Generate comprehensive security and performance report&quot;&quot;&quot;<br>        try:<br>            timestamp = datetime.now().strftime(&quot;%Y%m%d_%H%M%S&quot;)<br>            report_file = self.reports_dir / f&quot;log_insights_report_{timestamp}.json&quot;<br>            <br>            # Compile statistics<br>            stats = {<br>                &quot;generated_at&quot;: datetime.now().isoformat(),<br>                &quot;total_insights&quot;: sum(len(v) for v in self.insights.values()),<br>                &quot;by_severity&quot;: {k: len(v) for k, v in self.insights.items()},<br>                &quot;insights&quot;: dict(self.insights)<br>            }<br>            <br>            # Extract top recommendations<br>            all_recommendations = []<br>            for severity_insights in self.insights.values():<br>                for insight in severity_insights:<br>                    all_recommendations.extend(insight.get(&quot;recommendations&quot;, []))<br>            <br>            # Count recommendation frequency<br>            rec_counts = defaultdict(int)<br>            for rec in all_recommendations:<br>                rec_counts[rec] += 1<br>            <br>            top_recommendations = sorted(<br>                rec_counts.items(), <br>                key=lambda x: x[1], <br>                reverse=True<br>            )[:10]<br>            <br>            stats[&quot;top_recommendations&quot;] = [<br>                {&quot;recommendation&quot;: rec, &quot;frequency&quot;: count}<br>                for rec, count in top_recommendations<br>            ]<br>            <br>            # Save report<br>            with open(report_file, &quot;w&quot;) as f:<br>                json.dump(stats, f, indent=2)<br>            <br>            print(f&quot;\n📊 Report generated: {report_file}&quot;)<br>            print(f&quot;   Total Insights: {stats[&#39;total_insights&#39;]}&quot;)<br>            print(f&quot;   Critical: {stats[&#39;by_severity&#39;].get(&#39;critical&#39;, 0)}&quot;)<br>            print(f&quot;   High: {stats[&#39;by_severity&#39;].get(&#39;high&#39;, 0)}&quot;)<br>            print(f&quot;   Medium: {stats[&#39;by_severity&#39;].get(&#39;medium&#39;, 0)}&quot;)<br>            <br>            if top_recommendations:<br>                print(f&quot;\n🎯 Top Recommendations:&quot;)<br>                for rec, count in top_recommendations[:3]:<br>                    print(f&quot;   - {rec} (mentioned {count} times)&quot;)<br>            <br>        except Exception as e:<br>            print(f&quot;❌ Error generating report: {e}&quot;)<br><br>async def main():<br>    try:<br>        print(&quot;📊 Starting Reporting Agent...&quot;)<br>        <br>        reporting_agent = ReportingAgent()<br>        <br>        # Register reporting agent<br>        await sdk.register_agent(<br>            agent_config=AgentConfig(<br>                name=&quot;REPORTING_AGENT&quot;,<br>                topic=&quot;logs.analyzed&quot;,<br>                callback=reporting_agent.process_analysis,<br>                description=&quot;Aggregates analysis results and generates reports&quot;,<br>                config=SubscriptionConfig(<br>                    consumer_count=1  # Single consumer for aggregation<br>                )<br>            )<br>        )<br>        <br>        # Start agent runner<br>        await sdk.start()<br>        print(&quot;✅ Reporting Agent is now running&quot;)<br>        print(&quot;🎧 Listening for analyzed logs on &#39;logs.analyzed&#39; topic...&quot;)<br>        <br>        # Keep running<br>        try:<br>            while True:<br>                await asyncio.sleep(1)<br>        except KeyboardInterrupt:<br>            print(&quot;\n👋 Generating final report...&quot;)<br>            await reporting_agent.generate_report()<br>    <br>    except Exception as e:<br>        print(f&quot;❌ Error: {e}&quot;)<br>        raise<br>    <br>    finally:<br>        print(&quot;Shutting down...&quot;)<br>        await sdk.shutdown()<br>        print(&quot;✅ Shutdown complete&quot;)<br><br>if __name__ == &quot;__main__&quot;:<br>    asyncio.run(main())</pre><h4>Executing the app</h4><p>You can now run the agents in different terminals to see how they work together as one system.</p><pre>python generate_logs.py<br>python log_ingestion_agent.py<br>python log_analysis_agent.py<br>python reporting_agent.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4V5Vjg-EsOa5SG-qlSNXFw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*0q569OZ00D0r942vkwZ3TA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*z-yHn0DqWRE35rW7VqnZ0w.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vDL0fAJJznx0agYfoHu2MQ.png" /></figure><h4>Step 6: (Optional) Advanced Agent Chaining for Specialized Analysis</h4><p>For more sophisticated log analysis, we can create a multi-stage pipeline with specialized agents. This demonstrates OmniDaemon’s powerful agent chaining capabilities.</p><p>Create a new file for the chained agent system:</p><pre># chained_analysis_agents.py<br>import asyncio<br>import json<br>from typing import Dict, Any<br>from openai import AsyncOpenAI<br>from omnidaemon import OmniDaemonSDK, AgentConfig, SubscriptionConfig, EventEnvelope, PayloadBase<br>from decouple import config<br><br>sdk = OmniDaemonSDK()<br>client = AsyncOpenAI(api_key=config(&quot;OPENAI_API_KEY&quot;))<br><br># ============================================================================<br># AGENT 1: Parse and Classify Logs<br># ============================================================================<br><br>async def parse_and_classify(message: dict):<br>    &quot;&quot;&quot;<br>    First agent in the chain: Parse log entries and classify them<br>    Routes logs to specialized agents based on type and content<br>    &quot;&quot;&quot;<br>    content = message.get(&quot;content&quot;, {})<br>    log_entry = content.get(&quot;log_entry&quot;, {})<br>    source_file = content.get(&quot;source_file&quot;, &quot;unknown&quot;)<br>    <br>    print(f&quot;\n🔍 [CLASSIFIER] Processing log from {source_file}&quot;)<br>    <br>    # Classify log type and determine routing<br>    log_type = log_entry.get(&quot;log_type&quot;, &quot;unknown&quot;)<br>    classification = {<br>        &quot;log_entry&quot;: log_entry,<br>        &quot;source_file&quot;: source_file,<br>        &quot;classification&quot;: None,<br>        &quot;route_to&quot;: None,<br>        &quot;priority&quot;: &quot;normal&quot;<br>    }<br>    <br>    if log_type == &quot;authentication&quot;:<br>        # Check for security concerns<br>        success = log_entry.get(&quot;success&quot;, True)<br>        event = log_entry.get(&quot;event&quot;, &quot;&quot;)<br>        ip = log_entry.get(&quot;ip_address&quot;, &quot;&quot;)<br>        <br>        # Detect security-related patterns<br>        is_security_concern = (<br>            not success or<br>            event in [&quot;failed_login&quot;, &quot;brute_force_attempt&quot;, &quot;impossible_travel&quot;] or<br>            ip.startswith((&quot;45.&quot;, &quot;89.&quot;))  # Suspicious IP ranges<br>        )<br>        <br>        if is_security_concern:<br>            classification[&quot;classification&quot;] = &quot;security_threat&quot;<br>            classification[&quot;route_to&quot;] = &quot;logs.security&quot;<br>            classification[&quot;priority&quot;] = &quot;high&quot;<br>            print(f&quot;   ⚠️  Classified as: SECURITY THREAT → Routing to security agent&quot;)<br>        else:<br>            classification[&quot;classification&quot;] = &quot;normal_auth&quot;<br>            classification[&quot;route_to&quot;] = &quot;logs.security&quot;<br>            classification[&quot;priority&quot;] = &quot;low&quot;<br>            print(f&quot;   ✅ Classified as: Normal Authentication → Security agent&quot;)<br>    <br>    elif log_type == &quot;database&quot;:<br>        # Check for performance concerns<br>        execution_time = log_entry.get(&quot;execution_time_ms&quot;, 0)<br>        rows_affected = log_entry.get(&quot;rows_affected&quot;, 0)<br>        operation = log_entry.get(&quot;operation&quot;, &quot;&quot;)<br>        <br>        # Detect performance issues<br>        is_performance_issue = (<br>            execution_time &gt; 1000 or  # Slow query (&gt; 1 second)<br>            (operation == &quot;DELETE&quot; and rows_affected &gt; 500) or<br>            rows_affected &gt; 5000<br>        )<br>        <br>        if is_performance_issue:<br>            classification[&quot;classification&quot;] = &quot;performance_issue&quot;<br>            classification[&quot;route_to&quot;] = &quot;logs.performance&quot;<br>            classification[&quot;priority&quot;] = &quot;high&quot;<br>            print(f&quot;   🐌 Classified as: PERFORMANCE ISSUE → Routing to performance agent&quot;)<br>        else:<br>            classification[&quot;classification&quot;] = &quot;normal_db&quot;<br>            classification[&quot;route_to&quot;] = &quot;logs.performance&quot;<br>            classification[&quot;priority&quot;] = &quot;low&quot;<br>            print(f&quot;   ✅ Classified as: Normal Database Op → Performance agent&quot;)<br>    <br>    else:<br>        classification[&quot;classification&quot;] = &quot;unknown&quot;<br>        classification[&quot;route_to&quot;] = &quot;logs.general&quot;<br>        print(f&quot;   ❓ Unknown log type → General analysis&quot;)<br>    <br>    # Publish to appropriate specialized agent with reply_to for final reporting<br>    await sdk.publish_task(<br>        event_envelope=EventEnvelope(<br>            topic=classification[&quot;route_to&quot;],<br>            payload=PayloadBase(<br>                content=classification,<br>                reply_to=&quot;logs.final_report&quot;  # All results go to final reporting<br>            ),<br>            source=&quot;classifier_agent&quot;,<br>            correlation_id=message.get(&quot;correlation_id&quot;, f&quot;corr_{log_entry.get(&#39;timestamp&#39;)}&quot;)<br>        )<br>    )<br>    <br>    return {<br>        &quot;status&quot;: &quot;classified&quot;,<br>        &quot;classification&quot;: classification[&quot;classification&quot;],<br>        &quot;routed_to&quot;: classification[&quot;route_to&quot;],<br>        &quot;priority&quot;: classification[&quot;priority&quot;]<br>    }<br><br># ============================================================================<br># AGENT 2: Security-Specific Analysis<br># ============================================================================<br><br>SECURITY_PROMPT = &quot;&quot;&quot;You are a cybersecurity expert specializing in authentication and access control.<br>Analyze this log entry for security threats:<br><br>1. Identify specific attack patterns (brute force, credential stuffing, account takeover)<br>2. Assess threat severity (critical, high, medium, low)<br>3. Check for IOCs (Indicators of Compromise): suspicious IPs, user agents, access patterns<br>4. Recommend immediate security actions<br><br>Return JSON:<br>{<br>    &quot;threat_level&quot;: &quot;critical|high|medium|low&quot;,<br>    &quot;attack_type&quot;: &quot;description of attack pattern&quot;,<br>    &quot;iocs&quot;: [&quot;list of indicators of compromise&quot;],<br>    &quot;immediate_actions&quot;: [&quot;list of urgent actions needed&quot;],<br>    &quot;recommendations&quot;: [&quot;security recommendations&quot;]<br>}&quot;&quot;&quot;<br><br>async def security_analysis(message: dict):<br>    &quot;&quot;&quot;<br>    Specialized agent for security analysis<br>    Deep dive into authentication logs and security threats<br>    &quot;&quot;&quot;<br>    content = message.get(&quot;content&quot;, {})<br>    log_entry = content.get(&quot;log_entry&quot;, {})<br>    classification = content.get(&quot;classification&quot;, &quot;unknown&quot;)<br>    priority = content.get(&quot;priority&quot;, &quot;normal&quot;)<br>    <br>    print(f&quot;\n🛡️  [SECURITY] Analyzing {classification} (Priority: {priority})&quot;)<br>    <br>    try:<br>        # Prepare security context<br>        log_context = json.dumps(log_entry, indent=2)<br>        <br>        # Call LLM for security analysis<br>        response = await client.chat.completions.create(<br>            model=&quot;gpt-4o&quot;,<br>            messages=[<br>                {&quot;role&quot;: &quot;system&quot;, &quot;content&quot;: SECURITY_PROMPT},<br>                {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: f&quot;Analyze this authentication log:\n\n{log_context}&quot;}<br>            ],<br>            temperature=0.2,  # Lower temperature for security analysis<br>            max_tokens=800<br>        )<br>        <br>        analysis_text = response.choices[0].message.content<br>        <br>        # Parse JSON from response<br>        if &quot;```json&quot; in analysis_text:<br>            analysis_text = analysis_text.split(&quot;```json&quot;)[1].split(&quot;```&quot;)[0].strip()<br>        elif &quot;```&quot; in analysis_text:<br>            analysis_text = analysis_text.split(&quot;```&quot;)[1].split(&quot;```&quot;)[0].strip()<br>        <br>        security_analysis = json.loads(analysis_text)<br>        <br>        # Log findings<br>        threat_level = security_analysis.get(&quot;threat_level&quot;, &quot;low&quot;)<br>        if threat_level in [&quot;critical&quot;, &quot;high&quot;]:<br>            print(f&quot;   🚨 THREAT DETECTED: {security_analysis.get(&#39;attack_type&#39;)}&quot;)<br>            print(f&quot;   📍 IOCs: {&#39;, &#39;.join(security_analysis.get(&#39;iocs&#39;, [])[:2])}&quot;)<br>        else:<br>            print(f&quot;   ✅ No significant threats detected&quot;)<br>        <br>        return {<br>            &quot;status&quot;: &quot;analyzed&quot;,<br>            &quot;agent_type&quot;: &quot;security&quot;,<br>            &quot;log_entry&quot;: log_entry,<br>            &quot;classification&quot;: classification,<br>            &quot;analysis&quot;: security_analysis,<br>            &quot;correlation_id&quot;: message.get(&quot;correlation_id&quot;)<br>        }<br>        <br>    except Exception as e:<br>        print(f&quot;   ❌ Security analysis error: {e}&quot;)<br>        return {<br>            &quot;status&quot;: &quot;error&quot;,<br>            &quot;agent_type&quot;: &quot;security&quot;,<br>            &quot;error&quot;: str(e),<br>            &quot;log_entry&quot;: log_entry<br>        }<br><br># ============================================================================<br># AGENT 3: Performance Analysis<br># ============================================================================<br><br>PERFORMANCE_PROMPT = &quot;&quot;&quot;You are a database performance expert.<br>Analyze this database transaction log for performance issues:<br><br>1. Identify slow queries and resource bottlenecks<br>2. Detect unusual data access patterns<br>3. Flag potential database design issues<br>4. Recommend query optimizations and indexing strategies<br><br>Return JSON:<br>{<br>    &quot;performance_score&quot;: &quot;excellent|good|degraded|critical&quot;,<br>    &quot;issues_found&quot;: [&quot;list of performance issues&quot;],<br>    &quot;bottlenecks&quot;: [&quot;identified bottlenecks&quot;],<br>    &quot;optimization_recommendations&quot;: [&quot;specific optimization steps&quot;],<br>    &quot;estimated_impact&quot;: &quot;description of performance impact&quot;<br>}&quot;&quot;&quot;<br><br>async def performance_analysis(message: dict):<br>    &quot;&quot;&quot;<br>    Specialized agent for database performance analysis<br>    Focuses on query optimization and resource utilization<br>    &quot;&quot;&quot;<br>    content = message.get(&quot;content&quot;, {})<br>    log_entry = content.get(&quot;log_entry&quot;, {})<br>    classification = content.get(&quot;classification&quot;, &quot;unknown&quot;)<br>    priority = content.get(&quot;priority&quot;, &quot;normal&quot;)<br>    <br>    print(f&quot;\n⚡ [PERFORMANCE] Analyzing {classification} (Priority: {priority})&quot;)<br>    <br>    try:<br>        # Prepare performance context<br>        log_context = json.dumps(log_entry, indent=2)<br>        <br>        # Call LLM for performance analysis<br>        response = await client.chat.completions.create(<br>            model=&quot;gpt-4o&quot;,<br>            messages=[<br>                {&quot;role&quot;: &quot;system&quot;, &quot;content&quot;: PERFORMANCE_PROMPT},<br>                {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: f&quot;Analyze this database log:\n\n{log_context}&quot;}<br>            ],<br>            temperature=0.2,<br>            max_tokens=800<br>        )<br>        <br>        analysis_text = response.choices[0].message.content<br>        <br>        # Parse JSON from response<br>        if &quot;```json&quot; in analysis_text:<br>            analysis_text = analysis_text.split(&quot;```json&quot;)[1].split(&quot;```&quot;)[0].strip()<br>        elif &quot;```&quot; in analysis_text:<br>            analysis_text = analysis_text.split(&quot;```&quot;)[1].split(&quot;```&quot;)[0].strip()<br>        <br>        perf_analysis = json.loads(analysis_text)<br>        <br>        # Log findings<br>        perf_score = perf_analysis.get(&quot;performance_score&quot;, &quot;good&quot;)<br>        if perf_score in [&quot;degraded&quot;, &quot;critical&quot;]:<br>            print(f&quot;   🐌 PERFORMANCE ISSUE: {perf_score.upper()}&quot;)<br>            issues = perf_analysis.get(&quot;issues_found&quot;, [])<br>            if issues:<br>                print(f&quot;   📊 Issues: {&#39;, &#39;.join(issues[:2])}&quot;)<br>        else:<br>            print(f&quot;   ✅ Performance acceptable: {perf_score}&quot;)<br>        <br>        return {<br>            &quot;status&quot;: &quot;analyzed&quot;,<br>            &quot;agent_type&quot;: &quot;performance&quot;,<br>            &quot;log_entry&quot;: log_entry,<br>            &quot;classification&quot;: classification,<br>            &quot;analysis&quot;: perf_analysis,<br>            &quot;correlation_id&quot;: message.get(&quot;correlation_id&quot;)<br>        }<br>        <br>    except Exception as e:<br>        print(f&quot;   ❌ Performance analysis error: {e}&quot;)<br>        return {<br>            &quot;status&quot;: &quot;error&quot;,<br>            &quot;agent_type&quot;: &quot;performance&quot;,<br>            &quot;error&quot;: str(e),<br>            &quot;log_entry&quot;: log_entry<br>        }<br><br># ============================================================================<br># AGENT 4: Final Report Aggregator<br># ============================================================================<br><br>async def final_report_aggregator(message: dict):<br>    &quot;&quot;&quot;<br>    Final agent in the chain: Aggregates specialized analyses<br>    Combines security and performance insights into unified report<br>    &quot;&quot;&quot;<br>    content = message.get(&quot;content&quot;, {})<br>    agent_type = content.get(&quot;agent_type&quot;, &quot;unknown&quot;)<br>    analysis = content.get(&quot;analysis&quot;, {})<br>    log_entry = content.get(&quot;log_entry&quot;, {})<br>    correlation_id = content.get(&quot;correlation_id&quot;, &quot;unknown&quot;)<br>    <br>    print(f&quot;\n📋 [FINAL REPORT] Aggregating {agent_type} analysis (ID: {correlation_id})&quot;)<br>    <br>    # Create unified report<br>    report = {<br>        &quot;correlation_id&quot;: correlation_id,<br>        &quot;timestamp&quot;: log_entry.get(&quot;timestamp&quot;),<br>        &quot;log_type&quot;: log_entry.get(&quot;log_type&quot;),<br>        &quot;agent_type&quot;: agent_type,<br>        &quot;analysis&quot;: analysis<br>    }<br>    <br>    # Determine overall severity<br>    if agent_type == &quot;security&quot;:<br>        threat_level = analysis.get(&quot;threat_level&quot;, &quot;low&quot;)<br>        if threat_level in [&quot;critical&quot;, &quot;high&quot;]:<br>            print(f&quot;   🚨 SECURITY ALERT: {threat_level.upper()} threat detected&quot;)<br>            print(f&quot;   🎯 Actions: {&#39;, &#39;.join(analysis.get(&#39;immediate_actions&#39;, [])[:2])}&quot;)<br>    <br>    elif agent_type == &quot;performance&quot;:<br>        perf_score = analysis.get(&quot;performance_score&quot;, &quot;good&quot;)<br>        if perf_score in [&quot;degraded&quot;, &quot;critical&quot;]:<br>            print(f&quot;   ⚠️  PERFORMANCE ALERT: {perf_score.upper()} detected&quot;)<br>            print(f&quot;   💡 Recommendations: {&#39;, &#39;.join(analysis.get(&#39;optimization_recommendations&#39;, [])[:2])}&quot;)<br>    <br>    return {<br>        &quot;status&quot;: &quot;reported&quot;,<br>        &quot;report&quot;: report<br>    }<br><br># ============================================================================<br># Main: Register All Agents in the Chain<br># ============================================================================<br><br>async def main():<br>    try:<br>        print(&quot;🚀 Starting Chained Analysis System...&quot;)<br>        print(&quot;=&quot; * 70)<br>        print(&quot;Pipeline: Classifier → Security/Performance → Final Report&quot;)<br>        print(&quot;=&quot; * 70)<br>        <br>        # Register Agent 1: Classifier (entry point)<br>        await sdk.register_agent(<br>            agent_config=AgentConfig(<br>                name=&quot;CLASSIFIER_AGENT&quot;,<br>                topic=&quot;logs.raw&quot;,<br>                callback=parse_and_classify,<br>                description=&quot;Parses and classifies logs for routing&quot;,<br>                config=SubscriptionConfig(<br>                    consumer_count=2  # Parallel classification<br>                )<br>            )<br>        )<br>        print(&quot;✅ Registered: Classifier Agent (logs.raw)&quot;)<br>        <br>        # Register Agent 2: Security Analyzer<br>        await sdk.register_agent(<br>            agent_config=AgentConfig(<br>                name=&quot;SECURITY_ANALYSIS_AGENT&quot;,<br>                topic=&quot;logs.security&quot;,<br>                callback=security_analysis,<br>                description=&quot;Specialized security threat analysis&quot;,<br>                tools=[&quot;openai&quot;, &quot;security_analysis&quot;],<br>                config=SubscriptionConfig(<br>                    consumer_count=3  # More consumers for security<br>                )<br>            )<br>        )<br>        print(&quot;✅ Registered: Security Analysis Agent (logs.security)&quot;)<br>        <br>        # Register Agent 3: Performance Analyzer<br>        await sdk.register_agent(<br>            agent_config=AgentConfig(<br>                name=&quot;PERFORMANCE_ANALYSIS_AGENT&quot;,<br>                topic=&quot;logs.performance&quot;,<br>                callback=performance_analysis,<br>                description=&quot;Specialized database performance analysis&quot;,<br>                tools=[&quot;openai&quot;, &quot;performance_analysis&quot;],<br>                config=SubscriptionConfig(<br>                    consumer_count=3<br>                )<br>            )<br>        )<br>        print(&quot;✅ Registered: Performance Analysis Agent (logs.performance)&quot;)<br>        <br>        # Register Agent 4: Final Report Aggregator<br>        await sdk.register_agent(<br>            agent_config=AgentConfig(<br>                name=&quot;FINAL_REPORT_AGENT&quot;,<br>                topic=&quot;logs.final_report&quot;,<br>                callback=final_report_aggregator,<br>                description=&quot;Aggregates specialized analysis into final reports&quot;,<br>                config=SubscriptionConfig(<br>                    consumer_count=1  # Single consumer for aggregation<br>                )<br>            )<br>        )<br>        print(&quot;✅ Registered: Final Report Agent (logs.final_report)&quot;)<br>        <br>        # Start all agents<br>        await sdk.start()<br>        print(&quot;\n&quot; + &quot;=&quot; * 70)<br>        print(&quot;🎧 Chained Analysis System is LIVE!&quot;)<br>        print(&quot;=&quot; * 70)<br>        print(&quot;\nAgent Chain Flow:&quot;)<br>        print(&quot;  1️⃣  logs.raw → Classifier&quot;)<br>        print(&quot;  2️⃣  Classifier → logs.security OR logs.performance&quot;)<br>        print(&quot;  3️⃣  Specialized Agent → logs.final_report&quot;)<br>        print(&quot;  4️⃣  Final Report → Aggregated Insights&quot;)<br>        print(&quot;\nPress Ctrl+C to stop...&quot;)<br>        <br>        # Keep running<br>        try:<br>            while True:<br>                await asyncio.sleep(1)<br>        except KeyboardInterrupt:<br>            print(&quot;\n\n👋 Received shutdown signal...&quot;)<br>    <br>    except Exception as e:<br>        print(f&quot;❌ Error: {e}&quot;)<br>        raise<br>    <br>    finally:<br>        print(&quot;Shutting down all agents...&quot;)<br>        await sdk.shutdown()<br>        print(&quot;✅ Shutdown complete&quot;)<br><br>if __name__ == &quot;__main__&quot;:<br>    asyncio.run(main())</pre><p>Run the chained the agent to see the combined results.</p><pre>python chained_analysis_agents.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*6ogB_g0ZTopHnQtJdFHEhw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XW7nZxka7oQw9ZVzfhJjgw.png" /></figure><h3>Advanced Features and Patterns</h3><h4>1. Agent Chaining for Complex Workflows</h4><p>The chained system we built demonstrates powerful agent orchestration patterns:</p><p><strong>Pattern 1: Intelligent Classification and Routing</strong></p><pre># Classifier examines log and routes to specialist<br>async def parse_and_classify(message: dict):<br>    log_entry = message.get(&quot;content&quot;, {}).get(&quot;log_entry&quot;, {})<br>    <br>    # Determine routing based on content analysis<br>    if is_security_concern(log_entry):<br>        route_to = &quot;logs.security&quot;<br>        priority = &quot;high&quot;<br>    elif is_performance_issue(log_entry):<br>        route_to = &quot;logs.performance&quot;<br>        priority = &quot;high&quot;<br>    else:<br>        route_to = &quot;logs.general&quot;<br>        priority = &quot;normal&quot;<br>    <br>    # Publish to specialized agent with correlation tracking<br>    await sdk.publish_task(<br>        event_envelope=EventEnvelope(<br>            topic=route_to,<br>            payload=PayloadBase(<br>                content=classification_data,<br>                reply_to=&quot;logs.final_report&quot;  # All results aggregated here<br>            ),<br>            correlation_id=generate_correlation_id(log_entry)<br>        )<br>    )</pre><p><strong>Pattern 2: Parallel Specialized Processing</strong></p><pre># Multiple specialized agents process different aspects simultaneously<br># Security agent handles authentication threats<br>await sdk.register_agent(<br>    agent_config=AgentConfig(<br>        topic=&quot;logs.security&quot;,<br>        callback=security_analysis,<br>        config=SubscriptionConfig(consumer_count=3)  # 3 parallel security analysts<br>    )<br>)<br><br># Performance agent handles database optimization<br>await sdk.register_agent(<br>    agent_config=AgentConfig(<br>        topic=&quot;logs.performance&quot;,<br>        callback=performance_analysis,<br>        config=SubscriptionConfig(consumer_count=3)  # 3 parallel performance analysts<br>    )<br>)</pre><p><strong>Pattern 3: Correlation-Based Aggregation</strong></p><pre># Final agent aggregates related analyses using correlation_id<br>async def final_report_aggregator(message: dict):<br>    correlation_id = message.get(&quot;correlation_id&quot;)<br>    <br>    # Track related events across pipeline<br>    # All events with same correlation_id are part of same analysis flow<br>    report = {<br>        &quot;correlation_id&quot;: correlation_id,<br>        &quot;security_analysis&quot;: ...,<br>        &quot;performance_analysis&quot;: ...,<br>        &quot;unified_recommendations&quot;: ...<br>    }</pre><h4>2. Multi-Tenant Log Analysis</h4><p>Extend the system to handle multiple tenants:</p><pre>async def multi_tenant_analysis_callback(message: dict):<br>    tenant_id = message.get(&quot;tenant_id&quot;)<br>    content = message.get(&quot;content&quot;, {})<br>    <br>    # Load tenant-specific security policies<br>    policies = await load_tenant_policies(tenant_id)<br>    <br>    # Analyze with tenant context<br>    result = await analyze_log_with_llm(content, policies)<br>    <br>    return {<br>        &quot;status&quot;: &quot;success&quot;,<br>        &quot;tenant_id&quot;: tenant_id,<br>        &quot;analysis&quot;: result<br>    }</pre><h4>2. Multi-Tenant Log Analysis</h4><p>Send immediate alerts for critical issues:</p><pre>event = EventEnvelope(<br>    topic=&quot;logs.raw&quot;,<br>    payload=PayloadBase(<br>        content=log_data,<br>        webhook=&quot;https://your-api.com/alerts&quot;  # Receive instant notifications<br>    )<br>)</pre><h4>3. Real-Time Alerting with Webhooks</h4><p>Handle high log volumes by scaling agents:</p><pre>await sdk.register_agent(<br>    agent_config=AgentConfig(<br>        name=&quot;LOG_ANALYSIS_AGENT&quot;,<br>        topic=&quot;logs.raw&quot;,<br>        callback=analyze_log,<br>        config=SubscriptionConfig(<br>            consumer_count=10  # Run 10 parallel consumers<br>        )<br>    )<br>)</pre><h4>4. Horizontal Scaling</h4><p><strong>Monitoring and Observability</strong></p><pre># Check system health<br>health = await sdk.health()<br>print(f&quot;Status: {health[&#39;status&#39;]}&quot;)<br>print(f&quot;Active Agents: {health[&#39;registered_agents_count&#39;]}&quot;)<br><br># Check dead letter queue<br>omnidaemon bus dlq --topic logs.raw<br># View metrics<br>omnidaemon metrics list</pre><p><strong>Error Handling and Retry Logic</strong></p><pre>async def robust_analysis_callback(message: dict):<br>    try:<br>        result = await analyze_log_with_llm(message.get(&quot;content&quot;))<br>        return {&quot;status&quot;: &quot;success&quot;, &quot;data&quot;: result}<br>    except RateLimitError as e:<br>        # Retriable error - OmniDaemon will retry<br>        raise<br>    except InvalidLogFormat as e:<br>        # Non-retriable error - return error status<br>        return {&quot;status&quot;: &quot;error&quot;, &quot;error&quot;: str(e)}</pre><h4>Performance Optimization</h4><pre># Configuration for high-throughput scenarios<br>STORAGE_BACKEND=redis  # Use Redis for distributed systems<br>EVENT_BUS_TYPE=redis_stream  # Or Kafka for massive scale<br>REDIS_URL=redis://prod-redis.example.com:6379<br><br># Increase consumer count for parallel processing<br>consumer_count=10<br># Optimize LLM calls with batching<br>async def batch_analyze_logs(log_batch: list):<br>    # Analyze multiple logs in single LLM call<br>    pass</pre><p><em>Thanks for reading this article !!</em></p><p>If you enjoyed this article, please click on the clap button 👏 and share to help others find it!</p><p>The full source code for this tutorial can be found here,</p><ul><li><a href="https://github.com/codemaker2015/intelligent-log-insights-generator">GitHub - codemaker2015/intelligent-log-insights-generator: An AI-powered log analysis system that monitors authentication and database transaction logs, detects anomalies, identifies vulnerabilities, and provides actionable recommendations using OmniDeamon - the universal event-driven runtime engine</a></li><li><a href="https://github.com/codemaker2015/omnidaemon-demo">GitHub - codemaker2015/omnidaemon-demo</a></li></ul><h4>Resources</h4><ul><li><a href="https://omnidaemon.omnirexfloralabs.com/">OmniDaemon - Kubernetes for AI Agents</a></li><li><a href="https://github.com/omnirexflora-labs/OmniDaemon">GitHub - omnirexflora-labs/OmniDaemon: OmniDaemon is a Universal Event-Driven Runtime for AI Agents, it&#39;s framework-agnostic, event-driven runtime that turns AI agents into production-grade, autonomous infrastructure services. It enables agents to listen, react, and collaborate across distributed systems-bringing true event-driven intelligence to modern enterprise environments.</a></li><li><a href="https://docs-omnidaemon.omnirexfloralabs.com/">OmniDaemon - OmniDaemon</a></li><li><a href="https://github.com/omnirexflora-labs/omnicoreagent">GitHub - omnirexflora-labs/omnicoreagent: OmniCoreAgent is a powerful Python framework for building autonomous AI agents that think, reason, and execute complex tasks. Production-ready agents that use tools, manage memory, coordinate workflows, and handle real-world business logic.</a></li><li><a href="https://pypi.org/project/omnidaemon/">Client Challenge</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=02b1a5e63dfb" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Exploring TimesFM: The Foundation Model That Understands the Language of Time]]></title>
            <link>https://codemaker2016.medium.com/exploring-timesfm-the-foundation-model-that-understands-the-language-of-time-57486ebca761?source=rss-b7a60e472dfc------2</link>
            <guid isPermaLink="false">https://medium.com/p/57486ebca761</guid>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[forecasting]]></category>
            <category><![CDATA[time-series-forecasting]]></category>
            <category><![CDATA[beginner]]></category>
            <category><![CDATA[python]]></category>
            <dc:creator><![CDATA[Vishnu Sivan]]></dc:creator>
            <pubDate>Thu, 23 Oct 2025 17:44:09 GMT</pubDate>
            <atom:updated>2025-10-24T05:29:11.162Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*miGA__9RI_kp3iruD6eAsw.png" /></figure><p>Forecasting the future has always been a cornerstone of human progress — from predicting sales for the next quarter to anticipating global health trends. Traditionally, time-series forecasting has relied on statistical models like ARIMA and Exponential Smoothing, and later, deep learning-based methods. While these models have proven effective, they often demand significant domain expertise and extensive fine-tuning for each new dataset.</p><p>In today’s data-driven world, accurate forecasting powers industries such as retail, finance, energy, and healthcare, enabling smarter, evidence-based decision-making. With recent advances in deep learning, the paradigm is shifting — traditional models are being surpassed by large-scale neural architectures capable of understanding complex temporal patterns.</p><p>Enter <strong>TimesFM (Time-series Foundation Model)</strong> — a groundbreaking decoder-only foundation model from Google Research, purpose-built for time-series forecasting. Unlike conventional models that require dataset-specific training, TimesFM achieves remarkable zero-shot forecasting performance, making it a true leap forward in how we approach predictive modeling.</p><p>This article explores what makes TimesFM unique — how it understands the “language of time,” captures trends, rhythms, and seasonality across diverse domains, its core architecture, highlight its innovative features, and walk through practical examples of how to use it effectively in real-world applications.</p><h3>Getting Started</h3><h4>Table of contents</h4><ul><li><a href="#5162">The Evolution of Time-Series Forecasting</a></li><li><a href="#d8e7">What is a Decoder?</a></li><li><a href="#3cf8">What is TimesFM?</a></li><li><a href="#eb13">Architecture Overview</a></li><li><a href="#9920">Experimenting with TimesFM model</a></li><li><a href="#44c4">Handson 1: Zero-Shot Forecasting</a></li><li><a href="#b54a">Handson 2: Zero-Shot Forecasting with stats</a></li><li><a href="#b3e7">Handson 3: Finetuning the model</a></li><li><a href="#fded">Handson 4: Model Comparison</a></li><li><a href="#a813">Building NextGen Forecasting app using TimesFM</a></li></ul><h4>The Evolution of Time-Series Forecasting</h4><p>Time-series forecasting has evolved significantly over the years. Traditional models like ARIMA and Exponential Smoothing performed well for simple, univariate data but struggled with the complexities of modern, multivariate, and high-dimensional datasets.</p><p>The rise of deep learning models such as DeepAR and N-BEATS addressed these challenges by using neural networks capable of capturing long-term dependencies and intricate temporal patterns.</p><p>Now, a new generation of models is redefining the field. TimesFM, a foundation model for time-series forecasting, is pre-trained on diverse datasets and achieves high-accuracy, zero-shot predictions — eliminating the need for task-specific fine-tuning and marking a major leap forward in forecasting technology.</p><h4>What is a Decoder?</h4><p>At the core of TimesFM lies the decoder architecture, a key component also used in modern Large Language Models (LLMs) like GPT. A decoder is a neural network designed for generative tasks, where it produces new sequences based on a given context.</p><p>In language models, the decoder takes a sequence of words and predicts the next one in an autoregressive manner — generating text step by step. TimesFM applies this same concept to time-series data. Instead of predicting the next word, it predicts the next value or segment in a numerical sequence. Essentially, the model learns the “grammar of time” — understanding patterns, rhythms, and trends — to generate accurate forecasts.</p><h4>What is TimesFM?</h4><p>TimesFM is a decoder-only foundation model for time-series forecasting, inspired by the architecture of large language models like GPT-3. Unlike traditional models, it predicts future values directly from past data without requiring an encoder.</p><p>The model processes time-series data in patches — segments of sequential values — a technique that improves efficiency and helps capture long-term temporal patterns.</p><p>Designed as a general-purpose, zero-shot forecaster, TimesFM can deliver accurate predictions for entirely new datasets without any fine-tuning, marking a major shift from traditional, task-specific forecasting approaches.</p><h4>Architecture Overview</h4><p>TimesFM 1.0 is a decoder-only transformer model consisting of approximately 200 million parameters, trained on a massive pretraining dataset containing over 100 billion real-world time points. This large-scale training enables the model to deliver highly accurate forecasts on unseen datasets without any additional fine-tuning.</p><p>The model is designed for univariate time-series forecasting, where it predicts the future values of a single variable using only its past observations. TimesFM handles context lengths up to 512 time points and supports any forecast horizon, with an optional frequency indicator input to incorporate time granularity information.</p><p>TimesFM’s architecture combines a scalable transformer backbone, efficient patch-based processing, and robust masking techniques to achieve zero-shot generalization across diverse time-series datasets — setting a new standard for foundation models in forecasting.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*K9VTwl40dYG_V8O55eCkQg.jpeg" /><figcaption>Image source: <a href="https://research.google/blog/a-decoder-only-foundation-model-for-time-series-forecasting/">A decoder-only foundation model for time-series forecasting</a></figcaption></figure><p>In TimesFM, the input time-series is first divided into fixed-length input patches. Each patch is then transformed into a vector using a residual block, aligning it with the model’s transformer dimensions. Positional encodings are added to these vectors to preserve the temporal order before they are passed through the stacked transformer layers.</p><p>Within the transformer, SA (self-attention) represents multi-head causal attention, which ensures the model only attends to past and present data, while FFN refers to the feed-forward (fully connected) layers that refine the learned representations.</p><p>Finally, the output tokens are processed through another residual block to produce an output of length output_patch_len, representing the model’s forecast for the next time window beyond the last observed patch.</p><p>Below is a concise overview of its key architectural elements:</p><ul><li><strong>Decoder-Only Design:</strong><br>TimesFM utilizes a decoder-only setup, making it inherently capable of generating future sequences from past context. This structure enables it to manage input sequences of varying lengths with flexibility and efficiency.</li><li><strong>Input Patching:</strong><br>The time-series data is divided into contiguous, non-overlapping patches (tokens), which pass through residual blocks that transform each patch into a vector representation. These vectors are then processed by the transformer layers for deeper temporal understanding.</li><li><strong>Stacked Transformer Layers:</strong><br>Multiple transformer layers equipped with multi-head self-attention allow each token to reference others in the sequence. TimesFM employs causal attention, ensuring that predictions are influenced only by past and present information — never by future data.</li><li><strong>Extended Output Patches:</strong><br>A key innovation in TimesFM is its ability to generate longer output patches than the input patches. Unlike traditional LLMs that produce outputs token by token, TimesFM predicts entire segments of future data at once, enabling faster inference and more accurate long-term forecasts.</li><li><strong>Patch Masking:</strong><br>To prevent overfitting and enhance generalization, TimesFM applies a patch masking strategy during training. This allows the model to adapt seamlessly to different context lengths during inference, ensuring strong and consistent performance across diverse time-series datasets.</li></ul><p><strong>Model Parameters (Hyperparameters)<br></strong>These are the key tunable settings that define the model’s behavior and impact its forecasting performance:</p><ul><li><strong>model_dim:</strong> Dimensionality of the input and output vectors.</li><li><strong>input_patch_len (p):</strong> Length of each input patch.</li><li><strong>output_patch_len (h):</strong> Number of future time steps predicted per generation step.</li><li><strong>num_heads:</strong> Number of attention heads in the multi-head attention mechanism.</li><li><strong>num_layers (nl):</strong> Number of stacked transformer layers.</li><li><strong>context length (L):</strong> Length of historical data used for forecasting.</li><li><strong>horizon length (H):</strong> Length of the forecast horizon.</li><li><strong>Number of input tokens (N):</strong> Representing the number of patches derived from the input sequence. Each token is then processed through transformer layers for contextual learning.</li></ul><p><strong>Model Components<br></strong>The core architecture of TimesFM is composed of the following fundamental components:</p><ul><li><strong>Residual Blocks: </strong>Used to preprocess input and output patches, ensuring stable gradient flow and efficient feature extraction.</li><li><strong>Stacked Transformer Layers: </strong>The central component of the model, enabling rich temporal representation learning through self-attention.</li><li><strong>Input Tokens (tj): </strong>Derived from processed patches, these tokens are generated using residual blocks and positional encodings.</li><li><strong>Output Tokens (oj): </strong>Generated by the stacked transformer layers, these tokens are used to predict the corresponding output patches.</li><li><strong>Patch Mask (m1:L): </strong>Applied to ignore certain portions of the input sequence during processing, helping the model generalize to varying context lengths.</li><li><strong>Loss Function</strong>: During training, TimesFM minimizes the Mean Squared Error (MSE) between predicted and actual future values.</li></ul><h3>Experimenting with TimesFM model</h3><p>This section guides the setup of the TimesFM model for time-series forecasting, starting with zero-shot forecasting to demonstrate predictions on unseen datasets without prior training. It also covers fine-tuning for specific datasets to enhance performance.</p><p>TimesFM’s performance is compared with other approaches, including statistical models (e.g., AutoETS), machine learning models (e.g., Random Forest, XGBoost, LGBM), and other foundational models such as TimeGPT, highlighting its unique capabilities and advantages.</p><p>We will also use uv, a modern and fast Python package manager (instead of pip), to set up our environment and handle dependencies.</p><h4>Installing uv</h4><p>uv simplifies dependency management, virtual environments, and running scripts.</p><ul><li><strong>For Windows:</strong></li></ul><pre>powershell -ExecutionPolicy ByPass -c &quot;irm https://astral.sh/uv/install.ps1 | iex&quot;<br>set Path=C:\Users\Codem\.local\bin;%Path%</pre><ul><li><strong>For Linux / Mac:</strong></li></ul><pre>curl -LsSf https://astral.sh/uv/install.sh | sh</pre><p>Refer to the official website for detailed installation instructions.</p><p><a href="https://docs.astral.sh/uv/getting-started/installation/">Installation | uv</a></p><h4>Installing the dependencies</h4><ul><li>Initialize a uv project by executing the following command.</li></ul><pre>uv init timesfm_demo<br>cd timesfm_demo</pre><ul><li>Create and activate a virtual environment by executing the following command.</li></ul><pre>uv venv<br>source .venv/bin/activate # for linux<br>.venv\Scripts\activate    # for windows</pre><blockquote>The official PyPI package for TimesFM provides older versions (e.g., timesfm-2.0–500m and timesfm-1.0–200m). To access the latest version (timesfm-2.5–200m-pytorch), it is recommended to install the library directly from GitHub. This ensures access to the most recent features, improvements, and bug fixes.</blockquote><ul><li>Clone the github repository and move to the folder using the following command.</li></ul><pre>git clone https://github.com/google-research/timesfm.git<br>cd timesfm</pre><ul><li>Install timesfm by executing the following command.</li></ul><pre>uv pip install -e .[torch]</pre><h3>Handson 1: Zero-Shot Forecasting</h3><p>Zero-shot forecasting allows TimesFM to generate accurate predictions on unseen datasets without any prior training or fine-tuning. Leveraging its pre-trained knowledge on diverse time-series data, the model can identify trends, seasonality, and temporal patterns directly from historical inputs.</p><p>In this demo, using a synthetic bike rental dataset with trend, seasonality, and noise, TimesFM was able to accurately forecast the next 90 days. Despite having no prior exposure to this dataset, the model effectively captured the underlying temporal patterns, demonstrating the power of zero-shot forecasting.</p><ul><li>Install matplotlib library for graph based visualization using uv.</li></ul><pre>uv add matplotlib</pre><ul><li>Create a file named bike_rental_forecast.py and add the following code to it.</li></ul><pre>import torch<br>import numpy as np<br>import pandas as pd<br>import matplotlib.pyplot as plt<br>import timesfm<br><br>torch.set_float32_matmul_precision(&quot;high&quot;)<br><br># --- 1. Generate Synthetic Bike Rental Data ---<br>def create_bike_rental_data():<br>    dates = pd.date_range(start=&quot;2020-01-01&quot;, end=&quot;2023-12-31&quot;, freq=&quot;D&quot;)<br>    n_days = len(dates)<br><br>    trend = np.linspace(start=150, stop=350, num=n_days)<br>    yearly_seasonality = 180 * (1 + np.sin(2 * np.pi * dates.dayofyear / 365.25 - np.pi/2))<br>    weekly_seasonality = 120 * (dates.dayofweek &gt;= 5)<br>    noise = np.random.normal(0, 40, n_days)<br>    rentals = trend + yearly_seasonality + weekly_seasonality + noise<br>    rentals = np.maximum(0, rentals).astype(int)  # Ensure no negative rentals<br>    temp = 15 + 10 * np.sin(2 * np.pi * dates.dayofyear / 365.25 - np.pi/2) + np.random.randn(n_days) * 2<br>    is_weekend = (dates.dayofweek &gt;= 5).astype(int)<br><br>    df = pd.DataFrame({<br>        &#39;rentals&#39;: rentals,<br>        &#39;temperature&#39;: temp,<br>        &#39;is_weekend&#39;: is_weekend<br>    }, index=dates)<br><br>    return df<br><br># --- 2. Prepare Data ---<br>rental_df = create_bike_rental_data()<br>time_series_data = rental_df[&#39;rentals&#39;].values<br>horizon_len = 90  # Forecast the next 90 days<br>historical_data = time_series_data[:-horizon_len]<br>true_future_values = time_series_data[-horizon_len:]<br><br># --- 3. Initialize TimesFM 2.5 PyTorch Model ---<br>model = timesfm.TimesFM_2p5_200M_torch.from_pretrained(&quot;google/timesfm-2.5-200m-pytorch&quot;)<br>model.compile(<br>    timesfm.ForecastConfig(<br>        max_context=1024,<br>        max_horizon=256,<br>        normalize_inputs=True,<br>        use_continuous_quantile_head=True,<br>        force_flip_invariance=True,<br>        infer_is_positive=True,<br>        fix_quantile_crossing=True,<br>    )<br>)<br><br># --- 4. Generate Forecast ---<br>point_forecast, quantile_forecast = model.forecast(<br>    horizon=horizon_len,<br>    inputs=[historical_data],  # Single time series as input<br>)<br>forecast_values = point_forecast[0]  # Extract the forecast for our single series<br><br># --- 5. Visualize the Results ---<br>dates = rental_df.index<br>plt.figure(figsize=(15, 7))<br>plt.plot(dates[:-horizon_len], historical_data, label=&quot;Historical Data&quot;, color=&quot;black&quot;)<br>plt.plot(dates[-horizon_len:], true_future_values, label=&quot;True Future Values&quot;, color=&quot;blue&quot;, linestyle=&#39;--&#39;)<br>plt.plot(dates[-horizon_len:], forecast_values, label=&quot;TimesFM 2.5 Forecast&quot;, color=&quot;red&quot;)<br>plt.fill_between(dates[-horizon_len:],quantile_forecast[0, :, 1], quantile_forecast[0, :, 9], alpha=0.2, color=&#39;red&#39;, label=&#39;80% Prediction Interval&#39;)<br>plt.legend()<br>plt.title(&quot;TimesFM 2.5 Zero-Shot Forecast for Daily Bike Rentals&quot;)<br>plt.xlabel(&quot;Date&quot;)<br>plt.ylabel(&quot;Number of Rentals&quot;)<br>plt.grid(True, linestyle=&#39;--&#39;, alpha=0.6)<br>plt.tight_layout()<br>plt.show()<br><br># --- 6. Calculate Metrics ---<br>rmse = np.sqrt(np.mean((true_future_values - forecast_values)**2))<br>mae = np.mean(np.abs(true_future_values - forecast_values))<br><br>print(f&quot;Zero-Shot Forecast RMSE: {rmse:.2f}&quot;)<br>print(f&quot;Zero-Shot Forecast MAE: {mae:.2f}&quot;)<br>print(f&quot;\nPoint forecast shape: {point_forecast.shape}&quot;)<br>print(f&quot;Quantile forecast shape: {quantile_forecast.shape}&quot;)</pre><p><strong>Code Summary</strong></p><ul><li>Generates synthetic daily bike rental data (2020–2023) using the custom function create_bike_rental_data().</li><li>Splits the dataset into historical and future segments for model evaluation.</li><li>Loads the TimesFM 2.5 model using timesfm.TimesFM_2p5_200M_torch.from_pretrained(&quot;google/timesfm-2.5-200m-pytorch&quot;) to fetch the latest pre-trained TimesFM model.</li><li>Configures the model with model.compile() using ForecastConfig to define forecast parameters (e.g., max_horizon, normalization, quantile options).</li><li>Performs forecasting via model.forecast(horizon, inputs=[historical_data]) which generates both point and quantile forecasts.</li><li>Visualizes trends and uncertainty using matplotlib.</li><li>Evaluates model performance using RMSE (Root Mean Square Error) and MAE (Mean Absolute Error)</li></ul><p><strong>Executing the code<br></strong>To visualize trends and obtain forecasted results with TimesFM, run the code using your Python environment.</p><pre>python bike_rental_forecast.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*sma3723txIQ2vDMHRyskcA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*dW-kx5RdNdVggW5YOCNxVw.png" /></figure><h3>Handson 2: Zero-Shot Forecasting with stats</h3><p>Improved version of the bike rental forecasting with various statistics.</p><pre>import torch<br>import numpy as np<br>from datetime import datetime, timedelta<br>import timesfm<br>import matplotlib.pyplot as plt<br><br># Initialize TimesFM model<br>print(&quot;Loading TimesFM model...&quot;)<br>torch.set_float32_matmul_precision(&quot;high&quot;)<br>model = timesfm.TimesFM_2p5_200M_torch.from_pretrained(&quot;google/timesfm-2.5-200m-pytorch&quot;)<br>model.compile(<br>    timesfm.ForecastConfig(<br>        max_context=1024,<br>        max_horizon=256,<br>        normalize_inputs=True,<br>        use_continuous_quantile_head=True,<br>        force_flip_invariance=True,<br>        infer_is_positive=True,<br>        fix_quantile_crossing=True,<br>    )<br>)<br><br>def generate_forecast(historical_data, horizon=90, start_date=None, target_column=&quot;Bike Rentals&quot;):<br><br>    if len(historical_data) &lt; 30:<br>        raise ValueError(&quot;Historical data must contain at least 30 data points&quot;)<br>    <br>    if horizon &gt; 256 or horizon &lt; 1:<br>        raise ValueError(&quot;Horizon must be between 1 and 256 days&quot;)<br>    <br>    historical_array = np.array(historical_data, dtype=np.float32)<br>    <br>    print(f&quot;Generating {horizon}-day forecast...&quot;)<br>    point_forecast, quantile_forecast = model.forecast(<br>        horizon=horizon,<br>        inputs=[historical_array],<br>    )<br>    <br>    # Extract forecasts<br>    forecast_values = point_forecast[0]<br>    quantiles = quantile_forecast[0]  # Shape: [horizon, 11]<br>    <br>    # Generate dates<br>    if start_date:<br>        if isinstance(start_date, str):<br>            start = datetime.strptime(start_date, &quot;%Y-%m-%d&quot;)<br>        else:<br>            start = start_date<br>    else:<br>        start = datetime.now() - timedelta(days=len(historical_data))<br>    <br>    historical_dates = [start + timedelta(days=i) for i in range(len(historical_data))]<br>    forecast_start = start + timedelta(days=len(historical_data))<br>    forecast_dates = [forecast_start + timedelta(days=i) for i in range(horizon)]<br>    <br>    # Calculate summary statistics<br>    historical_mean = np.mean(historical_data)<br>    historical_std = np.std(historical_data)<br>    forecast_mean = np.mean(forecast_values)<br>    forecast_std = np.std(forecast_values)<br>    <br>    # Print summary<br>    print(&quot;\n&quot; + &quot;=&quot;*70)<br>    print(f&quot;FORECAST SUMMARY: {target_column}&quot;)<br>    print(&quot;=&quot;*70)<br>    <br>    print(&quot;\nHistorical Data Statistics:&quot;)<br>    print(f&quot;  Period: {historical_dates[0].strftime(&#39;%Y-%m-%d&#39;)} to {historical_dates[-1].strftime(&#39;%Y-%m-%d&#39;)}&quot;)<br>    print(f&quot;  Count: {len(historical_data)} days&quot;)<br>    print(f&quot;  Mean: {historical_mean:.2f}&quot;)<br>    print(f&quot;  Std Dev: {historical_std:.2f}&quot;)<br>    print(f&quot;  Min: {np.min(historical_data):.2f}&quot;)<br>    print(f&quot;  Max: {np.max(historical_data):.2f}&quot;)<br>    <br>    print(&quot;\nForecast Statistics:&quot;)<br>    print(f&quot;  Period: {forecast_dates[0].strftime(&#39;%Y-%m-%d&#39;)} to {forecast_dates[-1].strftime(&#39;%Y-%m-%d&#39;)}&quot;)<br>    print(f&quot;  Count: {horizon} days&quot;)<br>    print(f&quot;  Mean: {forecast_mean:.2f}&quot;)<br>    print(f&quot;  Std Dev: {forecast_std:.2f}&quot;)<br>    print(f&quot;  Min: {np.min(forecast_values):.2f}&quot;)<br>    print(f&quot;  Max: {np.max(forecast_values):.2f}&quot;)<br>    <br>    print(&quot;\nTrend Analysis:&quot;)<br>    change_percent = ((forecast_mean - historical_mean) / historical_mean) * 100<br>    trend_direction = &quot;increasing&quot; if change_percent &gt; 0 else &quot;decreasing&quot;<br>    print(f&quot;  Direction: {trend_direction.upper()}&quot;)<br>    print(f&quot;  Change: {change_percent:+.2f}%&quot;)<br>    <br>    print(&quot;\nConfidence Intervals (Average):&quot;)<br>    print(f&quot;  80% Interval: [{np.mean(quantiles[:, 1]):.2f}, {np.mean(quantiles[:, 9]):.2f}]&quot;)<br>    print(f&quot;  50% Interval: [{np.mean(quantiles[:, 3]):.2f}, {np.mean(quantiles[:, 7]):.2f}]&quot;)<br>    <br>    print(&quot;\nFirst 10 Forecast Values:&quot;)<br>    for i in range(min(10, len(forecast_values))):<br>        date_str = forecast_dates[i].strftime(&#39;%Y-%m-%d&#39;)<br>        val = forecast_values[i]<br>        q10 = quantiles[i, 1]<br>        q90 = quantiles[i, 9]<br>        print(f&quot;  {date_str}: {val:.2f} (80% CI: [{q10:.2f}, {q90:.2f}])&quot;)<br>    <br>    if len(forecast_values) &gt; 10:<br>        print(f&quot;  ... and {len(forecast_values) - 10} more days&quot;)<br>    <br>    print(&quot;=&quot;*70 + &quot;\n&quot;)<br>    <br>    # Generate plot for historical and forecasted data<br>    plt.figure(figsize=(15, 7))<br>    plt.plot(historical_dates, historical_data, <br>             label=&quot;Historical Data&quot;, color=&quot;black&quot;, linewidth=2, marker=&#39;o&#39;, <br>             markersize=3, markevery=max(1, len(historical_data)//50))<br>    plt.plot(forecast_dates, forecast_values, <br>             label=&quot;Forecast&quot;, color=&quot;red&quot;, linewidth=2, marker=&#39;s&#39;,<br>             markersize=3, markevery=max(1, horizon//50))<br>    <br>    # Add confidence intervals<br>    plt.fill_between(<br>        forecast_dates,<br>        quantiles[:, 1],  # 10th percentile<br>        quantiles[:, 9],  # 90th percentile<br>        alpha=0.2,<br>        color=&#39;red&#39;,<br>        label=&#39;80% Prediction Interval&#39;<br>    )<br>    <br>    plt.fill_between(<br>        forecast_dates,<br>        quantiles[:, 3],  # 25th percentile<br>        quantiles[:, 7],  # 75th percentile<br>        alpha=0.3,<br>        color=&#39;red&#39;,<br>        label=&#39;50% Prediction Interval&#39;<br>    )<br>    <br>    plt.legend(loc=&#39;best&#39;, fontsize=11)<br>    plt.title(f&quot;TimesFM Forecast: {target_column}&quot;, fontsize=16, fontweight=&#39;bold&#39;)<br>    plt.xlabel(&quot;Date&quot;, fontsize=13)<br>    plt.ylabel(target_column, fontsize=13)<br>    plt.grid(True, linestyle=&#39;--&#39;, alpha=0.6)<br>    plt.xticks(rotation=45)<br>    plt.tight_layout()<br>    plt.show()<br>    <br>    # Return detailed results<br>    return {<br>        &quot;forecast_values&quot;: forecast_values.tolist(),<br>        &quot;forecast_dates&quot;: [d.strftime(&#39;%Y-%m-%d&#39;) for d in forecast_dates],<br>        &quot;quantiles&quot;: {<br>            &quot;q10&quot;: quantiles[:, 1].tolist(),<br>            &quot;q25&quot;: quantiles[:, 3].tolist(),<br>            &quot;q50&quot;: quantiles[:, 5].tolist(),<br>            &quot;q75&quot;: quantiles[:, 7].tolist(),<br>            &quot;q90&quot;: quantiles[:, 9].tolist()<br>        },<br>        &quot;historical_mean&quot;: float(historical_mean),<br>        &quot;forecast_mean&quot;: float(forecast_mean),<br>        &quot;trend&quot;: trend_direction,<br>        &quot;change_percent&quot;: float(change_percent)<br>    }<br><br><br>if __name__ == &quot;__main__&quot;:<br>    # Create synthetic data with trend and seasonality<br>    np.random.seed(42)<br>    days = 365<br>    trend = np.linspace(150, 300, days)<br>    seasonality = 50 * np.sin(np.linspace(0, 4*np.pi, days))<br>    noise = np.random.normal(0, 20, days)<br>    historical_data = trend + seasonality + noise<br>    historical_data = np.maximum(historical_data, 0)  # Ensure non-negative<br>    <br>    # Generate forecast<br>    results = generate_forecast(<br>        historical_data=historical_data.tolist(),<br>        horizon=90,<br>        start_date=&quot;2023-01-01&quot;,<br>        target_column=&quot;Bike Rentals&quot;<br>    )<br>    print(results)</pre><p><strong>Code Summary</strong></p><ul><li><strong>Model Initialization:</strong><br>The latest version of TimesFM (timesfm-2.5–200m-pytorch) is loaded and compiled with forecasting configurations such as context length, horizon, normalization, and quantile settings.</li><li><strong>Input Preparation:</strong><br>Historical time-series data is converted into a numerical array. The model supports flexible context lengths, but a minimum of 30 data points is recommended for robust forecasts.</li><li><strong>Forecast Generation:</strong><br>The model predicts the next values for the specified forecast horizon (up to 256 steps), producing both point forecasts and quantile estimates for uncertainty intervals.</li><li><strong>Analysis and Visualization:<br></strong>Summary statistics for historical and forecasted data are computed.<br>Trend direction and percent change are calculated to understand expected growth or decline.<br>Forecasts are visualized with confidence intervals to highlight prediction uncertainty.</li></ul><p><strong>Executing the code<br></strong>To visualize trends and obtain forecasted results with TimesFM, run the code using your Python environment.</p><pre>python bike_rental_forecast.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*v3UoK-6MOlI_HXXgXXT9VA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*sTw-W-JU8rixo2_pBj0mUQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*RryMJjAqy429GE-5MPXHqw.png" /></figure><h3>Handson 3: Finetuning the model</h3><p>Although TimesFM’s zero-shot capabilities are already remarkable, fine-tuning the model on a specific dataset can further enhance forecasting accuracy.</p><p>Research shows that fine-tuning on as little as 10% of a dataset can achieve state-of-the-art performance, often outperforming models trained from scratch. While the current TimesFM library does not provide a high-level .finetune() API, the fine-tuning process can be implemented conceptually using frameworks like PyTorch or JAX, adjusting the model weights to better fit your dataset.</p><ul><li>Create a file named bike_rental_forecast_finetuning.py and add the following code to it.</li></ul><pre>import torch<br>import torch.nn as nn<br>import torch.optim as optim<br>from torch.utils.data import Dataset, DataLoader<br>import numpy as np<br>import timesfm<br>from datetime import datetime, timedelta<br>import matplotlib.pyplot as plt<br>from pathlib import Path<br><br># Initialize TimesFM model<br>print(&quot;Loading TimesFM model...&quot;)<br>torch.set_float32_matmul_precision(&quot;high&quot;)<br>model = timesfm.TimesFM_2p5_200M_torch.from_pretrained(&quot;google/timesfm-2.5-200m-pytorch&quot;)<br><br>model.compile(<br>    timesfm.ForecastConfig(<br>        max_context=1024,<br>        max_horizon=256,<br>        normalize_inputs=True,<br>        use_continuous_quantile_head=True,<br>        force_flip_invariance=True,<br>        infer_is_positive=True,<br>        fix_quantile_crossing=True,<br>    )<br>)<br>print(&quot;Model loaded successfully!\n&quot;)<br><br><br># Custom Dataset for fine-tuning<br>class TimeSeriesDataset(Dataset):<br>    def __init__(self, data, context_length=365, horizon=90):<br>        self.data = np.array(data, dtype=np.float32)<br>        self.context_length = context_length<br>        self.horizon = horizon<br>        <br>        # Create sliding windows<br>        self.samples = []<br>        for i in range(len(self.data) - context_length - horizon + 1):<br>            context = self.data[i:i + context_length]<br>            target = self.data[i + context_length:i + context_length + horizon]<br>            self.samples.append((context, target))<br>    <br>    def __len__(self):<br>        return len(self.samples)<br>    <br>    def __getitem__(self, idx):<br>        context, target = self.samples[idx]<br>        return torch.tensor(context), torch.tensor(target)<br><br><br>def prepare_finetuning_data(historical_data, batch_size=8, context_length=365, horizon=90):<br>    dataset = TimeSeriesDataset(historical_data, context_length, horizon)<br>    data_loader = DataLoader(<br>        dataset, <br>        batch_size=batch_size, <br>        shuffle=True,<br>        num_workers=0  # Set to 0 for compatibility<br>    )<br>    return data_loader<br><br><br>def finetune_model(model, train_data, num_epochs=10, learning_rate=1e-4, <br>                   context_length=365, horizon=90, batch_size=8, <br>                   save_path=&quot;finetuned_timesfm.pt&quot;):<br>    print(&quot;=&quot;*70)<br>    print(&quot;STARTING MODEL FINE-TUNING&quot;)<br>    print(&quot;=&quot;*70)<br>    print(f&quot;Training data points: {len(train_data)}&quot;)<br>    print(f&quot;Context length: {context_length}&quot;)<br>    print(f&quot;Forecast horizon: {horizon}&quot;)<br>    print(f&quot;Batch size: {batch_size}&quot;)<br>    print(f&quot;Learning rate: {learning_rate}&quot;)<br>    print(f&quot;Epochs: {num_epochs}\n&quot;)<br>    <br>    # Prepare data loader<br>    train_loader = prepare_finetuning_data(<br>        train_data, <br>        batch_size=batch_size,<br>        context_length=context_length,<br>        horizon=horizon<br>    )<br>    <br>    if len(train_loader) == 0:<br>        raise ValueError(f&quot;Not enough data for training. Need at least {context_length + horizon} points.&quot;)<br>    <br>    print(f&quot;Created {len(train_loader)} training batches\n&quot;)<br>    <br>    # Access the underlying PyTorch model. TimesFM wraps the actual model, we need to access it<br>    try:<br>        # Try to access the internal model<br>        if hasattr(model, &#39;model&#39;):<br>            pytorch_model = model.model<br>        elif hasattr(model, &#39;_model&#39;):<br>            pytorch_model = model._model<br>        else:<br>            # If we can&#39;t find the internal model, use the wrapper directly<br>            pytorch_model = model<br>        <br>        # Get trainable parameters<br>        trainable_params = [p for p in pytorch_model.parameters() if p.requires_grad]<br>        <br>        if len(trainable_params) == 0:<br>            print(&quot;Warning: No trainable parameters found. Attempting to unfreeze all parameters...&quot;)<br>            for param in pytorch_model.parameters():<br>                param.requires_grad = True<br>            trainable_params = list(pytorch_model.parameters())<br>        <br>        print(f&quot;Found {len(trainable_params)} trainable parameter groups\n&quot;)<br>        <br>    except Exception as e:<br>        print(f&quot;Warning: Could not access model parameters directly: {e}&quot;)<br>        print(&quot;TimesFM may not support direct fine-tuning through standard PyTorch methods.&quot;)<br>        print(&quot;Using the model in inference mode only.\n&quot;)<br>        return model, []<br>    <br>    optimizer = optim.Adam(trainable_params, lr=learning_rate)     # Create optimizer<br>    criterion = nn.MSELoss()    # Loss function<br>    training_losses = []    # Training loop<br>    <br>    for epoch in range(num_epochs):<br>        epoch_losses = []<br>        <br>        for batch_idx, (context, ground_truth_horizon) in enumerate(train_loader):<br>            try:<br>                # Note: TimesFM expects inputs as list of arrays<br>                predicted_horizon, _ = model.forecast(<br>                    horizon=horizon,<br>                    inputs=[context[i].numpy() for i in range(context.shape[0])]<br>                )                <br>                # Convert predictions to tensor<br>                predicted_tensor = torch.tensor(predicted_horizon, dtype=torch.float32, requires_grad=True)<br>                # Calculate loss<br>                loss = criterion(predicted_tensor, ground_truth_horizon)<br>                # Backward pass: update model weights<br>                optimizer.zero_grad()<br>                loss.backward()<br>                optimizer.step()<br>                <br>                epoch_losses.append(loss.item())<br>                <br>            except Exception as e:<br>                print(f&quot;Error during training batch {batch_idx}: {e}&quot;)<br>                print(&quot;Note: TimesFM may not support gradient-based fine-tuning.&quot;)<br>                print(&quot;The model will be used in inference-only mode.\n&quot;)<br>                return model, []<br>        <br>        avg_loss = np.mean(epoch_losses)<br>        training_losses.append(avg_loss)<br>        <br>        print(f&quot;Epoch {epoch+1}/{num_epochs}, Average Loss: {avg_loss:.6f}&quot;)<br>    <br>    print(f&quot;\nFine-tuning complete!&quot;)<br>    <br>    if len(training_losses) &gt; 0:<br>        print(f&quot;Final Loss: {training_losses[-1]:.6f}&quot;)<br>        print(f&quot;Saving fine-tuned model to: {save_path}&quot;)<br>        try:<br>            # Save the state dict of the internal model if possible<br>            if hasattr(model, &#39;model&#39;):<br>                model_state = model.model.state_dict()<br>            elif hasattr(model, &#39;_model&#39;):<br>                model_state = model._model.state_dict()<br>            else:<br>                model_state = {}<br>                <br>            torch.save({<br>                &#39;model_state_dict&#39;: model_state,<br>                &#39;optimizer_state_dict&#39;: optimizer.state_dict() if optimizer else None,<br>                &#39;training_losses&#39;: training_losses,<br>                &#39;config&#39;: {<br>                    &#39;context_length&#39;: context_length,<br>                    &#39;horizon&#39;: horizon,<br>                    &#39;num_epochs&#39;: num_epochs,<br>                    &#39;learning_rate&#39;: learning_rate<br>                }<br>            }, save_path)<br>            print(&quot;Model saved successfully!\n&quot;)<br>        except Exception as e:<br>            print(f&quot;Warning: Could not save model: {e}\n&quot;)<br>        <br>        # Plot training loss<br>        plt.figure(figsize=(10, 5))<br>        plt.plot(range(1, num_epochs + 1), training_losses, marker=&#39;o&#39;, linewidth=2)<br>        plt.title(&#39;Training Loss Over Epochs&#39;, fontsize=14, fontweight=&#39;bold&#39;)<br>        plt.xlabel(&#39;Epoch&#39;, fontsize=12)<br>        plt.ylabel(&#39;MSE Loss&#39;, fontsize=12)<br>        plt.grid(True, linestyle=&#39;--&#39;, alpha=0.6)<br>        plt.tight_layout()<br>        plt.show()<br>    else:<br>        print(&quot;Fine-tuning was not performed due to model limitations.&quot;)<br>        print(&quot;Using pre-trained model for inference.\n&quot;)<br>    <br>    return model, training_losses<br><br><br>def load_finetuned_model(model, checkpoint_path):<br>    print(f&quot;Loading fine-tuned model from: {checkpoint_path}&quot;)<br>    try:<br>        checkpoint = torch.load(checkpoint_path)<br>        <br>        # Try to load state dict into internal model<br>        if &#39;model_state_dict&#39; in checkpoint and checkpoint[&#39;model_state_dict&#39;]:<br>            if hasattr(model, &#39;model&#39;):<br>                model.model.load_state_dict(checkpoint[&#39;model_state_dict&#39;])<br>            elif hasattr(model, &#39;_model&#39;):<br>                model._model.load_state_dict(checkpoint[&#39;model_state_dict&#39;])<br>            else:<br>                print(&quot;Warning: Could not access internal model structure&quot;)<br>        <br>        print(&quot;Model loaded successfully!&quot;)<br>        <br>        if &#39;config&#39; in checkpoint:<br>            print(&quot;\nModel configuration:&quot;)<br>            for key, value in checkpoint[&#39;config&#39;].items():<br>                print(f&quot;  {key}: {value}&quot;)<br>    except Exception as e:<br>        print(f&quot;Warning: Could not load fine-tuned model: {e}&quot;)<br>        print(&quot;Using pre-trained base model instead.&quot;)<br>    <br>    return model<br><br><br>def generate_forecast(historical_data, horizon=90, start_date=None, target_column=&quot;Bike Rentals&quot;, <br>                     use_finetuned=False, finetuned_path=None):<br>    global model<br>    <br>    # Load fine-tuned model if requested<br>    if use_finetuned and finetuned_path:<br>        if Path(finetuned_path).exists():<br>            model = load_finetuned_model(model, finetuned_path)<br>        else:<br>            print(f&quot;Warning: Fine-tuned model not found at {finetuned_path}. Using base model.&quot;)<br>    <br>    # Validate input<br>    if len(historical_data) &lt; 30:<br>        raise ValueError(&quot;Historical data must contain at least 30 data points&quot;)<br>    <br>    if horizon &gt; 256 or horizon &lt; 1:<br>        raise ValueError(&quot;Horizon must be between 1 and 256 days&quot;)<br>    <br>    # Convert to numpy array<br>    historical_array = np.array(historical_data, dtype=np.float32)<br>    <br>    print(f&quot;Generating {horizon}-day forecast...&quot;)<br>    <br>    # Generate forecast (TimesFM handles inference mode internally)<br>    point_forecast, quantile_forecast = model.forecast(<br>        horizon=horizon,<br>        inputs=[historical_array],<br>    )<br>    <br>    # Extract forecasts<br>    forecast_values = point_forecast[0]<br>    quantiles = quantile_forecast[0]  # Shape: [horizon, 11]<br>    <br>    # Generate dates<br>    if start_date:<br>        if isinstance(start_date, str):<br>            start = datetime.strptime(start_date, &quot;%Y-%m-%d&quot;)<br>        else:<br>            start = start_date<br>    else:<br>        start = datetime.now() - timedelta(days=len(historical_data))<br>    <br>    historical_dates = [start + timedelta(days=i) for i in range(len(historical_data))]<br>    forecast_start = start + timedelta(days=len(historical_data))<br>    forecast_dates = [forecast_start + timedelta(days=i) for i in range(horizon)]<br>    <br>    # Calculate summary statistics<br>    historical_mean = np.mean(historical_data)<br>    historical_std = np.std(historical_data)<br>    forecast_mean = np.mean(forecast_values)<br>    forecast_std = np.std(forecast_values)<br>    <br>    # Print summary<br>    print(&quot;\n&quot; + &quot;=&quot;*70)<br>    print(f&quot;FORECAST SUMMARY: {target_column}&quot;)<br>    print(&quot;=&quot;*70)<br>    <br>    print(&quot;\nHistorical Data Statistics:&quot;)<br>    print(f&quot;  Period: {historical_dates[0].strftime(&#39;%Y-%m-%d&#39;)} to {historical_dates[-1].strftime(&#39;%Y-%m-%d&#39;)}&quot;)<br>    print(f&quot;  Count: {len(historical_data)} days&quot;)<br>    print(f&quot;  Mean: {historical_mean:.2f}&quot;)<br>    print(f&quot;  Std Dev: {historical_std:.2f}&quot;)<br>    print(f&quot;  Min: {np.min(historical_data):.2f}&quot;)<br>    print(f&quot;  Max: {np.max(historical_data):.2f}&quot;)<br>    <br>    print(&quot;\nForecast Statistics:&quot;)<br>    print(f&quot;  Period: {forecast_dates[0].strftime(&#39;%Y-%m-%d&#39;)} to {forecast_dates[-1].strftime(&#39;%Y-%m-%d&#39;)}&quot;)<br>    print(f&quot;  Count: {horizon} days&quot;)<br>    print(f&quot;  Mean: {forecast_mean:.2f}&quot;)<br>    print(f&quot;  Std Dev: {forecast_std:.2f}&quot;)<br>    print(f&quot;  Min: {np.min(forecast_values):.2f}&quot;)<br>    print(f&quot;  Max: {np.max(forecast_values):.2f}&quot;)<br>    <br>    print(&quot;\nTrend Analysis:&quot;)<br>    change_percent = ((forecast_mean - historical_mean) / historical_mean) * 100<br>    trend_direction = &quot;increasing&quot; if change_percent &gt; 0 else &quot;decreasing&quot;<br>    print(f&quot;  Direction: {trend_direction.upper()}&quot;)<br>    print(f&quot;  Change: {change_percent:+.2f}%&quot;)<br>    <br>    print(&quot;\nConfidence Intervals (Average):&quot;)<br>    print(f&quot;  80% Interval: [{np.mean(quantiles[:, 1]):.2f}, {np.mean(quantiles[:, 9]):.2f}]&quot;)<br>    print(f&quot;  50% Interval: [{np.mean(quantiles[:, 3]):.2f}, {np.mean(quantiles[:, 7]):.2f}]&quot;)<br>    <br>    print(&quot;\nFirst 10 Forecast Values:&quot;)<br>    for i in range(min(10, len(forecast_values))):<br>        date_str = forecast_dates[i].strftime(&#39;%Y-%m-%d&#39;)<br>        val = forecast_values[i]<br>        q10 = quantiles[i, 1]<br>        q90 = quantiles[i, 9]<br>        print(f&quot;  {date_str}: {val:.2f} (80% CI: [{q10:.2f}, {q90:.2f}])&quot;)<br>    <br>    if len(forecast_values) &gt; 10:<br>        print(f&quot;  ... and {len(forecast_values) - 10} more days&quot;)<br>    <br>    print(&quot;=&quot;*70 + &quot;\n&quot;)<br>    <br>    # Generate plot<br>    plt.figure(figsize=(15, 7))<br>    plt.plot(historical_dates, historical_data, <br>             label=&quot;Historical Data&quot;, color=&quot;black&quot;, linewidth=2, marker=&#39;o&#39;, <br>             markersize=3, markevery=max(1, len(historical_data)//50))<br>    plt.plot(forecast_dates, forecast_values, <br>             label=&quot;Forecast&quot;, color=&quot;red&quot;, linewidth=2, marker=&#39;s&#39;,<br>             markersize=3, markevery=max(1, horizon//50))<br>    <br>    # Add confidence intervals<br>    plt.fill_between(<br>        forecast_dates,<br>        quantiles[:, 1],  # 10th percentile<br>        quantiles[:, 9],  # 90th percentile<br>        alpha=0.2,<br>        color=&#39;red&#39;,<br>        label=&#39;80% Prediction Interval&#39;<br>    )<br>    <br>    plt.fill_between(<br>        forecast_dates,<br>        quantiles[:, 3],  # 25th percentile<br>        quantiles[:, 7],  # 75th percentile<br>        alpha=0.3,<br>        color=&#39;red&#39;,<br>        label=&#39;50% Prediction Interval&#39;<br>    )<br>    <br>    plt.legend(loc=&#39;best&#39;, fontsize=11)<br>    model_type = &quot;Fine-tuned&quot; if use_finetuned else &quot;Base&quot;<br>    plt.title(f&quot;TimesFM Forecast ({model_type}): {target_column}&quot;, fontsize=16, fontweight=&#39;bold&#39;)<br>    plt.xlabel(&quot;Date&quot;, fontsize=13)<br>    plt.ylabel(target_column, fontsize=13)<br>    plt.grid(True, linestyle=&#39;--&#39;, alpha=0.6)<br>    plt.xticks(rotation=45)<br>    plt.tight_layout()<br>    plt.show()<br>    <br>    # Return detailed results<br>    return {<br>        &quot;forecast_values&quot;: forecast_values.tolist(),<br>        &quot;forecast_dates&quot;: [d.strftime(&#39;%Y-%m-%d&#39;) for d in forecast_dates],<br>        &quot;quantiles&quot;: {<br>            &quot;q10&quot;: quantiles[:, 1].tolist(),<br>            &quot;q25&quot;: quantiles[:, 3].tolist(),<br>            &quot;q50&quot;: quantiles[:, 5].tolist(),<br>            &quot;q75&quot;: quantiles[:, 7].tolist(),<br>            &quot;q90&quot;: quantiles[:, 9].tolist()<br>        },<br>        &quot;historical_mean&quot;: float(historical_mean),<br>        &quot;forecast_mean&quot;: float(forecast_mean),<br>        &quot;trend&quot;: trend_direction,<br>        &quot;change_percent&quot;: float(change_percent)<br>    }<br><br><br>if __name__ == &quot;__main__&quot;:<br>    # Create synthetic data with trend and seasonality (2 years of data)<br>    np.random.seed(42)<br>    days = 730<br>    trend = np.linspace(150, 350, days)<br>    seasonality = 50 * np.sin(np.linspace(0, 8*np.pi, days))<br>    noise = np.random.normal(0, 20, days)<br>    historical_data = trend + seasonality + noise<br>    historical_data = np.maximum(historical_data, 0)  # Ensure non-negative<br>    <br>    print(&quot;OPTION 1: Use base pre-trained model (skip fine-tuning)&quot;)<br>    print(&quot;OPTION 2: Fine-tune model on your data first&quot;)<br>    print(&quot;\nChoosing OPTION 1 for this example...\n&quot;)<br>    <br>    # OPTION 1: Direct forecasting with base model<br>    print(&quot;Generating forecast with BASE model...&quot;)<br>    results_base = generate_forecast(<br>        historical_data=historical_data[-365:].tolist(),  # Use last year<br>        horizon=90,<br>        start_date=&quot;2023-01-01&quot;,<br>        target_column=&quot;Bike Rentals&quot;<br>    )<br>    <br>    # OPTION 2: Fine-tune model then forecast<br>    print(&quot;\n&quot; + &quot;=&quot;*70)<br>    print(&quot;FINE-TUNING MODEL ON TRAINING DATA&quot;)<br>    print(&quot;=&quot;*70 + &quot;\n&quot;)<br>    <br>    # Fine-tune on first 18 months of data<br>    finetuned_model, losses = finetune_model(<br>        model=model,<br>        train_data=historical_data[:540].tolist(),<br>        num_epochs=5,<br>        learning_rate=1e-4,<br>        context_length=180,<br>        horizon=30,<br>        batch_size=4,<br>        save_path=&quot;finetuned_timesfm.pt&quot;<br>    )<br>    <br>    # Generate forecast with fine-tuned model<br>    print(&quot;\nGenerating forecast with FINE-TUNED model...&quot;)<br>    results_finetuned = generate_forecast(<br>        historical_data=historical_data[-365:].tolist(),<br>        horizon=90,<br>        start_date=&quot;2023-01-01&quot;,<br>        target_column=&quot;Bike Rentals&quot;,<br>        use_finetuned=True,<br>        finetuned_path=&quot;finetuned_timesfm.pt&quot;<br>    )<br>    <br>    print(&quot;\n&quot; + &quot;=&quot;*70)<br>    print(&quot;FORECAST COMPLETE!&quot;)<br>    print(&quot;=&quot;*70)</pre><p><strong>Code Summary</strong></p><ul><li><strong>Model Initialization:<br></strong>Loads the pre-trained TimesFM model (timesfm-2.5-200m-pytorch) using PyTorch.<br>Configures forecasting parameters like max_context, max_horizon, normalization, quantile settings, and invariance options.</li><li><strong>Custom Dataset Creation:<br></strong>Defines TimeSeriesDataset class to create sliding windows of context and target sequences from historical data.<br>Converts data into PyTorch tensors for training.</li><li><strong>Data Preparation:<br></strong>prepare_finetuning_data() generates a DataLoader for batching historical data for fine-tuning.</li><li><strong>Fine-Tuning Function (</strong><strong>finetune_model):<br></strong>Prepares training batches and sets up optimizer (Adam) and loss function (MSELoss).<br>Attempts to access and unfreeze internal model parameters for gradient updates.<br>Performs multiple epochs of training, updating model weights based on prediction error.<br>Saves fine-tuned model state and plots training loss over epochs.<br>Handles exceptions if the model does not support gradient-based fine-tuning (fallback to inference mode).</li><li><strong>Loading Fine-Tuned Model:<br></strong>load_finetuned_model() loads saved model checkpoints and updates internal model weights for inference.</li><li><strong>Forecast Generation (</strong><strong>generate_forecast):<br></strong>Generates point forecasts and quantile-based confidence intervals for a specified horizon.<br>Computes summary statistics (mean, std, min, max) and trend direction.<br>Produces a plot showing historical data, forecasts, and confidence intervals.<br>Supports using either the base pre-trained model or fine-tuned model.</li></ul><p><strong>Executing the code<br></strong>To visualize trends and obtain forecasted results using the fine-tuned model, run the code using your Python environment.</p><pre>python bike_rental_forecast_finetuning.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*LEx1RSw_25Cg8QRKu8veLQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZEkalYgasfXvtVzwmhqgww.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*tfAym6L5VFFPd-Qjc184Lw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MJdnOmmXRbz3q7Vn7WSHlQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ksdy_ZfDt7djauWvtgqbEw.png" /></figure><h3>Handson 4: Model Comparison</h3><p>This section compares the performance of the TimesFM model with other approaches, including statistical models (e.g., AutoETS), machine learning models (e.g., Random Forest, XGBoost, LGBM), and foundational models such as TimeGPT.</p><p>The dataset used for this analysis is sourced from Kaggle: Monthly Gold Prices (1979–2021), which provides historic gold prices across 18 different countries.</p><p><a href="https://www.kaggle.com/datasets/odins0n/monthly-gold-prices">Monthly Gold Prices (1979-2021)</a></p><h4>Setting up the environment</h4><p>For this hands-on guide, we will be using Google Colab with a T4 GPU.</p><ol><li>Open <a href="https://colab.research.google.com/">Google Colaboratory</a> and sign in with your Google account.</li><li>Create a new notebook by clicking on + New Notebook.</li><li>Navigate to Runtime → Change runtime type.</li></ol><ul><li>Set Hardware Accelerator to GPU.</li><li>Choose T4 GPU (recommended for this tutorial).</li><li>Click Save.</li></ul><h4>Reading the data</h4><p>Let&#39;s begins by reading the dataset from the CSV file containing monthly gold price records.</p><pre>import pandas as pd<br>df = pd.read_csv(&quot;/content/1979-2021.csv&quot;)<br>df[&#39;Date&#39;] = pd.to_datetime(df[&#39;Date&#39;])<br>df = df.set_index(&#39;Date&#39;).resample(&#39;MS&#39;).mean()<br>df = df.reset_index() # Reset index to have &#39;Date&#39; as a column again<br>print(df.head())</pre><h4>Visualizing the dataset</h4><p>Visualize the dataset using seaborn.</p><pre>import matplotlib.pyplot as plt<br>import seaborn as sns<br>sns.set(style=&quot;darkgrid&quot;)<br>plt.figure(figsize=(10, 6))<br>sns.lineplot(x=&quot;Date&quot;, y=&#39;India(INR)&#39;, data=df, color=&#39;green&#39;)<br>plt.title(&#39;Monthly Gold Prices Over Time&#39;)<br>plt.xlabel(&#39;Date&#39;)<br>plt.ylabel(&#39;Gold Price in INR&#39;)<br>plt.show()</pre><h4>Time-series decomposition visualization — trend, seasonality and residuals</h4><p>Performs a time-series decomposition on India’s monthly gold prices and visually separates the data into trend (overall direction), seasonality (repeating pattern) and residuals (random noise) helping you understand the underlying patterns in gold price movements over time.</p><pre>from statsmodels.tsa.seasonal import seasonal_decompose<br><br>df.set_index(&quot;Date&quot;, inplace=True)<br>result = seasonal_decompose(df[&#39;India(INR)&#39;])<br>fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(10, 12))<br>result.observed.plot(ax=ax1, color=&#39;green&#39;)<br>ax1.set_ylabel(&#39;Observed&#39;)<br>result.trend.plot(ax=ax2, color=&#39;green&#39;)<br>ax2.set_ylabel(&#39;Trend&#39;)<br>result.seasonal.plot(ax=ax3, color=&#39;green&#39;)<br>ax3.set_ylabel(&#39;Seasonal&#39;)<br>result.resid.plot(ax=ax4, color=&#39;green&#39;)<br>ax4.set_ylabel(&#39;Residual&#39;)<br><br>plt.tight_layout()<br>plt.show()<br>df.reset_index(inplace=True)</pre><h4>Arranging the Data in Format as Required by the Models</h4><pre>df = pd.DataFrame({&#39;unique_id&#39;:[1]*len(df),&#39;ds&#39;: df[&quot;Date&quot;], &quot;y&quot;:df[&#39;India(INR)&#39;]})<br>train_df = df[df[&#39;ds&#39;] &lt;= &#39;31-07-2019&#39;]<br>test_df = df[df[&#39;ds&#39;] &gt; &#39;31-07-2019&#39;]</pre><h4>1. Statistical Modeling</h4><p>StatsForecast is an open-source library designed for fast and scalable statistical time-series forecasting. It provides a wide range of classical models such as ARIMA, ETS (Exponential Smoothing), Theta, and Seasonal Naive, implemented with high efficiency using Numba for parallel processing.</p><pre>import pandas as pd<br>from statsforecast import StatsForecast<br>from statsforecast.models import AutoARIMA, AutoETS<br><br># Define the AutoARIMA model<br>autoarima = AutoARIMA(season_length=12)<br># Define the AutoETS model<br>autoets = AutoETS(season_length=12)<br><br># Create StatsForecast object with AutoARIMA<br>statforecast = StatsForecast(<br>    models=[autoarima, autoets],<br>    freq=&#39;MS&#39;,<br>    n_jobs=-1)<br>statforecast.fit(train_df)<br><br># Generate forecasts for 24 periods ahead<br>sf_forecast = statforecast.forecast(df=train_df, h=24, fitted=True)<br>sf_forecast = sf_forecast.reset_index()<br>print(&quot;StatsForecast:&quot;, sf_forecast)</pre><h4>2. MLForecast</h4><p>MLForecast is a machine learning-based framework for time-series forecasting that leverages models such as Random Forest, XGBoost, and LightGBM.</p><pre>from mlforecast import MLForecast<br>from mlforecast.target_transforms import AutoDifferences<br>from numba import njit<br>import lightgbm as lgb<br>import xgboost as xgb<br>from sklearn.ensemble import RandomForestRegressor<br>from statsmodels.tsa.seasonal import seasonal_decompose<br>from mlforecast import MLForecast<br>from mlforecast.lag_transforms import (<br> RollingMean, RollingStd, RollingMin, RollingMax, RollingQuantile,<br> SeasonalRollingMean, SeasonalRollingStd, SeasonalRollingMin,<br> SeasonalRollingMax, SeasonalRollingQuantile,<br> ExpandingMean<br>)<br><br>models = [lgb.LGBMRegressor(verbosity=-1),<br> xgb.XGBRegressor(),<br> RandomForestRegressor(random_state=0),<br>]<br>fcst = MLForecast(<br>    models=models, # List of models to be used for forecasting<br>    freq=&#39;MS&#39;, # Monthly frequency, starting at the beginning of each month<br>    lags=[1,3,5,7,12], # Lag features: values from 1, 3, 5, 7, and 12 time steps ago<br>    lag_transforms={<br>        1: [ # Transformations applied to lag 1<br>            RollingMean(window_size=3),<br>            RollingStd(window_size=3),<br>            RollingMin(window_size=3),<br>            RollingMax(window_size=3),<br>            RollingQuantile(p=0.5, window_size=3),<br>            ExpandingMean()<br>        ],<br>        6:[ # Transformations applied to lag 6<br>            RollingMean(window_size=6),<br>            RollingStd(window_size=6),<br>            RollingMin(window_size=6),<br>            RollingMax(window_size=6),<br>            RollingQuantile(p=0.5, window_size=6),<br>        ],<br>        12: [ # Transformations applied to lag 12 (likely for yearly seasonality)<br>            SeasonalRollingMean(season_length=12, window_size=3),<br>            SeasonalRollingStd(season_length=12, window_size=3),<br>            SeasonalRollingMin(season_length=12, window_size=3),<br>            SeasonalRollingMax(season_length=12, window_size=3),<br>            SeasonalRollingQuantile(p=0.5, season_length=12, window_size=3)<br>        ]<br>    },<br>    date_features=[&#39;year&#39;, &#39;month&#39;, &#39;quarter&#39;], # Extract year, month, and quarter from the date as features<br>    target_transforms=[AutoDifferences(max_diffs=3)]<br>)<br><br>fcst.fit(train_df)<br>ml_forecast = fcst.predict(len(test_df))<br>print(&quot;MLForecast:&quot;, ml_forecast)</pre><h4>3. TimeGPT</h4><p>TimeGPT is a large foundational model for time-series forecasting developed by Nixtla, inspired by the architecture and scaling principles of large language models. Unlike traditional forecasting methods that require dataset-specific training, TimeGPT can perform zero-shot forecasting — generating accurate predictions on unseen time-series data without any fine-tuning.</p><pre>from nixtla import NixtlaClient<br>nixtla_client = NixtlaClient(api_key = &#39;nixak-PJPGa3MxJ3VdxZhKvylOcu2XHBtZ8ssIykc7wzoLKB0sVcDMnHoD53kGpvuJGk9e5lj83KojwKaljmcK&#39;)<br>timegpt_forecast = nixtla_client.forecast(df=train_df, h=24, freq=&#39;MS&#39;)<br>print(&quot;TimeGPT: &quot;, timegpt_forecast)</pre><h4>4. TimesFM</h4><p>Now lets use TimesFM to generate forecasts for the dataset, leveraging its zero-shot or fine-tuned capabilities to predict future time-series values.</p><pre>import torch<br>import numpy as np<br>import pandas as pd<br>import timesfm<br><br>torch.set_float32_matmul_precision(&quot;high&quot;)<br>model = timesfm.TimesFM_2p5_200M_torch.from_pretrained(&quot;google/timesfm-2.5-200m-pytorch&quot;)<br><br>model.compile(<br>    timesfm.ForecastConfig(<br>        max_context=1024,                # Maximum context length<br>        max_horizon=256,                 # Maximum forecast horizon<br>        normalize_inputs=True,           # Normalize time series before forecasting<br>        use_continuous_quantile_head=True,<br>        force_flip_invariance=True,<br>        infer_is_positive=True,<br>        fix_quantile_crossing=True,<br>    )<br>)<br>H = 24  # Forecast horizon<br><br>series_list = []<br>for _, g in train_df.groupby(&quot;unique_id&quot;):<br>    series_list.append(g[&quot;y&quot;].values.astype(np.float32))<br><br>point_forecast, quantile_forecast = model.forecast(<br>    horizon=H,<br>    inputs=series_list,<br>)<br><br>forecasts = []<br>for (uid, group), preds in zip(train_df.groupby(&quot;unique_id&quot;), point_forecast):<br>    # Get last date and extend future timestamps<br>    last_date = group[&quot;ds&quot;].iloc[-1]<br>    future_dates = pd.date_range(start=last_date, periods=H + 1, freq=&quot;MS&quot;)[1:]<br>    df_pred = pd.DataFrame({<br>        &quot;unique_id&quot;: uid,<br>        &quot;ds&quot;: future_dates,<br>        &quot;timesfm&quot;: preds<br>    })<br>    forecasts.append(df_pred)<br><br>timesfm_forecast = pd.concat(forecasts, ignore_index=True)<br>print(&quot;TimesEM: &quot;, timesfm_forecast)</pre><h4>Convert ‘ds’ to datetime in all DataFrames if necessary</h4><pre># Assuming the DataFrames have a common column &#39;ds&#39; for the dates<br>sf_forecast[&#39;ds&#39;] = pd.to_datetime(sf_forecast[&#39;ds&#39;])<br>ml_forecast[&#39;ds&#39;] = pd.to_datetime(ml_forecast[&#39;ds&#39;])<br>timegpt_forecast[&#39;ds&#39;] = pd.to_datetime(timegpt_forecast[&#39;ds&#39;])<br>timesfm_forecast[&#39;ds&#39;] = pd.to_datetime(timesfm_forecast[&#39;ds&#39;])<br>test_df[&#39;ds&#39;] = pd.to_datetime(test_df[&#39;ds&#39;])<br><br># Print shapes to debug<br>print(&quot;sf_forecast shape:&quot;, sf_forecast.shape)<br>print(&quot;ml_forecast shape:&quot;, ml_forecast.shape)<br>print(&quot;timegpt_forecast shape:&quot;, timegpt_forecast.shape)<br>print(&quot;timesfm_forecast shape:&quot;, timesfm_forecast.shape)<br>print(&quot;test_df shape:&quot;, test_df.shape)<br><br># Check the first few dates<br>print(&quot;\nFirst dates:&quot;)<br>print(&quot;sf_forecast:&quot;, sf_forecast[&#39;ds&#39;].head(3).tolist())<br>print(&quot;ml_forecast:&quot;, ml_forecast[&#39;ds&#39;].head(3).tolist())<br>print(&quot;test_df:&quot;, test_df[&#39;ds&#39;].head(3).tolist())</pre><h4>Perform the merges</h4><p>Start with test_df to ensure we keep all test dates.</p><pre>merged_fcst = test_df[[&#39;ds&#39;, &#39;y&#39;, &#39;unique_id&#39;]].copy()<br>merged_fcst = pd.merge(merged_fcst, sf_forecast[[&#39;ds&#39;, &#39;AutoARIMA&#39;, &#39;AutoETS&#39;]], on=&#39;ds&#39;, how=&#39;left&#39;)<br>merged_fcst = pd.merge(merged_fcst, ml_forecast[[&#39;ds&#39;, &#39;LGBMRegressor&#39;, &#39;XGBRegressor&#39;, &#39;RandomForestRegressor&#39;]], on=&#39;ds&#39;, how=&#39;left&#39;)<br>merged_fcst = pd.merge(merged_fcst, timegpt_forecast[[&#39;ds&#39;, &#39;TimeGPT&#39;]], on=&#39;ds&#39;, how=&#39;left&#39;)<br>merged_fcst = pd.merge(merged_fcst, timesfm_forecast[[&#39;ds&#39;, &#39;timesfm&#39;]], on=&#39;ds&#39;, how=&#39;left&#39;)<br><br>print(&quot;\nMerged forecast shape:&quot;, merged_fcst.shape)<br>print(&quot;\nMerged forecast columns:&quot;, merged_fcst.columns.tolist())<br>print(&quot;\nFirst few rows of merged_fcst:&quot;)<br>print(merged_fcst.head())<br>print(&quot;\nNull counts:&quot;)<br>print(merged_fcst.isnull().sum())</pre><h4>Model Comparison</h4><p>Finally, we can calculate and compare error metrics for multiple forecasting models (AutoARIMA, AutoETS, LGBMRegressor, XGBRegressor, RandomForestRegressor, TimeGPT, and timesfm) on a given dataset. It first defines a function calculate_error_metrics that computes MAE, RMSE, and MAPE between actual and predicted values.</p><pre>import numpy as np<br><br>def calculate_error_metrics(actual_values, predicted_values):<br>    actual_values = np.array(actual_values)<br>    predicted_values = np.array(predicted_values)<br>    <br>    # Remove any NaN values<br>    mask = ~(np.isnan(actual_values) | np.isnan(predicted_values))<br>    actual_values = actual_values[mask]<br>    predicted_values = predicted_values[mask]<br>    <br>    if len(actual_values) == 0:<br>        print(f&quot;Warning: No valid data points after removing NaNs&quot;)<br>        return pd.DataFrame({&#39;Metric&#39;: [&#39;MAE&#39;, &#39;RMSE&#39;, &#39;MAPE&#39;], &#39;Value&#39;: [np.nan, np.nan, np.nan]})<br>    <br>    metrics_dict = {<br>        &#39;MAE&#39;: np.mean(np.abs(actual_values - predicted_values)),<br>        &#39;RMSE&#39;: np.sqrt(np.mean((actual_values - predicted_values)**2)),<br>        &#39;MAPE&#39;: np.mean(np.abs((actual_values - predicted_values) / actual_values)) * 100<br>    }<br>    <br>    result_df = pd.DataFrame(list(metrics_dict.items()), columns=[&#39;Metric&#39;, &#39;Value&#39;])<br>    return result_df<br><br># Use actual gold prices from merged dataframe<br>actuals = merged_fcst[&#39;y&#39;].values<br>error_metrics_dict = {}<br><br># Model columns to evaluate<br>model_columns = [&#39;AutoARIMA&#39;, &#39;AutoETS&#39;, &#39;LGBMRegressor&#39;, &#39;XGBRegressor&#39;, &#39;RandomForestRegressor&#39;, &#39;TimeGPT&#39;, &#39;timesfm&#39;]<br><br>for col in model_columns:<br>    if col in merged_fcst.columns:<br>        print(f&quot;\nEvaluating {col}...&quot;)<br>        predicted_values = merged_fcst[col].values<br>        print(f&quot;  Actuals shape: {actuals.shape}, Predictions shape: {predicted_values.shape}&quot;)<br>        print(f&quot;  Non-null predictions: {(~np.isnan(predicted_values)).sum()}&quot;)<br>        error_metrics_dict[col] = calculate_error_metrics(actuals, predicted_values)[&#39;Value&#39;].values<br>    else:<br>        print(f&quot;\nWarning: {col} not found in merged_fcst&quot;)<br><br>error_metrics_df = pd.DataFrame(error_metrics_dict)<br>error_metrics_df.insert(0, &#39;Metric&#39;, [&#39;MAE&#39;, &#39;RMSE&#39;, &#39;MAPE&#39;])<br><br>print(&quot;\n&quot; + &quot;=&quot;*80)<br>print(&quot;FINAL ERROR METRICS:&quot;)<br>print(&quot;=&quot;*80)<br>print(error_metrics_df)</pre><p>Executing the code step by step produces the final output as shown below.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/891/1*aijBBWRJXKLy6bEcn-8zBg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/862/1*rQN08lNwgrg--ltHxXzyTQ.png" /></figure><p>The complete code for the model comparison is available at the link below.</p><p><a href="https://colab.research.google.com/drive/1UAKc3DEPQzsOHov1gug1-1iWmpdRFiHe?usp=sharing">Google Colab</a></p><h3>Building NextGen Forecasting app using TimesFM</h3><p>In the previous section, various forecasting models were explored, including zero-shot prompting and fine-tuning of the TimesFM model to generate forecasts. In this section, a Streamlit-based application is developed for time series forecasting. The app allows users to upload CSV files, configure forecast settings, generate forecasts, visualize results, and interactively query the forecasts. It supports multiple aggregation methods, customizable forecast horizons, and provides confidence intervals along with a Q&amp;A interface to interpret forecast insights.</p><h4>Installing the dependencies</h4><ul><li>Initialize a uv project by executing the following command.</li></ul><pre>uv init timesfm_forecasting_tool<br>cd timesfm_forecasting_tool</pre><ul><li>Create and activate a virtual environment by executing the following command.</li></ul><pre>uv venv<br>source .venv/bin/activate # for linux<br>.venv\Scripts\activate    # for windows</pre><ul><li>Clone the github repository and move to the folder using the following command.</li></ul><pre>git clone https://github.com/google-research/timesfm.git<br>cd timesfm</pre><ul><li>Install timesfm by executing the following command.</li></ul><pre>uv pip install -e .[torch]</pre><ul><li>Install streamlit and matplotlib using uv.</li></ul><pre>uv add streamlit matplotlib</pre><ul><li>Navigate back to the root directory, create a file named app.py, and add the following code to it.</li></ul><pre>import streamlit as st<br>import pandas as pd<br>import numpy as np<br>import torch<br>import timesfm<br>import matplotlib.pyplot as plt<br><br># Page configuration<br>st.set_page_config(<br>    page_title=&quot;TimesFM Forecasting Tool&quot;,<br>    page_icon=&quot;📈&quot;,<br>    layout=&quot;wide&quot;<br>)<br><br># Initialize TimesFM model<br>@st.cache_resource<br>def load_model():<br>    torch.set_float32_matmul_precision(&quot;high&quot;)<br>    LOCAL_MODEL_PATH = &quot;./timesfm_model&quot;<br>    <br>    try:<br>        model = timesfm.TimesFM_2p5_200M_torch.from_pretrained(LOCAL_MODEL_PATH)<br>        st.success(f&quot;✓ Model loaded from local path: {LOCAL_MODEL_PATH}&quot;)<br>    except Exception as e:<br>        st.warning(f&quot;Failed to load from local path, trying HuggingFace...&quot;)<br>        model = timesfm.TimesFM_2p5_200M_torch.from_pretrained(&quot;google/timesfm-2.5-200m-pytorch&quot;)<br>        st.success(&quot;✓ Model loaded from HuggingFace&quot;)<br>    <br>    model.compile(<br>        timesfm.ForecastConfig(<br>            max_context=1024,<br>            max_horizon=256,<br>            normalize_inputs=True,<br>            use_continuous_quantile_head=True,<br>            force_flip_invariance=True,<br>            infer_is_positive=True,<br>            fix_quantile_crossing=True,<br>        )<br>    )<br>    return model<br><br>def load_and_prepare_data(df, target_column, date_column, aggregation, time_period):<br>    &quot;&quot;&quot;Prepare time series data from dataframe&quot;&quot;&quot;<br>    # Make a copy to avoid modifying original<br>    df = df.copy()<br>    <br>    # Convert date column to datetime with multiple format attempts<br>    date_formats = [<br>        &#39;%d-%m-%Y&#39;,  # 01-01-1985<br>        &#39;%m-%d-%Y&#39;,  # 01-01-1985 (alternative interpretation)<br>        &#39;%Y-%m-%d&#39;,  # 1985-01-01<br>        &#39;%m/%d/%Y %H:%M&#39;,  # 2/24/2003 0:00<br>        &#39;%d/%m/%Y %H:%M&#39;,  # 24/2/2003 0:00<br>        &#39;%Y-%m-%d %H:%M:%S&#39;,<br>        &#39;%Y-%m-%d %H:%M&#39;,<br>        &#39;%d-%m-%Y %H:%M:%S&#39;,<br>        &#39;%d-%m-%Y %H:%M&#39;,<br>        &#39;%m/%d/%Y&#39;,<br>        &#39;%d/%m/%Y&#39;,<br>        &#39;%Y/%m/%d&#39;,<br>        &#39;%d.%m.%Y&#39;,  # 01.01.1985<br>        &#39;%Y.%m.%d&#39;,  # 1985.01.01<br>    ]<br>    <br>    parsed_successfully = False<br>    successful_format = None<br>    <br>    for fmt in date_formats:<br>        try:<br>            test_parse = pd.to_datetime(df[date_column], format=fmt, errors=&#39;coerce&#39;)<br>            # Check if at least some dates were parsed<br>            if test_parse.notna().sum() &gt; 0:<br>                df[date_column] = test_parse<br>                parsed_successfully = True<br>                successful_format = fmt<br>                break<br>        except:<br>            continue<br>    <br>    # If no format worked, try automatic parsing<br>    if not parsed_successfully:<br>        try:<br>            df[date_column] = pd.to_datetime(df[date_column], errors=&#39;coerce&#39;, infer_datetime_format=True)<br>            if df[date_column].notna().sum() &gt; 0:<br>                parsed_successfully = True<br>                successful_format = &quot;auto-detected&quot;<br>        except:<br>            pass<br>    <br>    if not parsed_successfully:<br>        # Show first few non-null values to help debug<br>        sample_values = df[date_column].dropna().head(5).tolist()<br>        raise ValueError(f&quot;Could not parse dates in column &#39;{date_column}&#39;. Sample values: {sample_values}&quot;)<br>    <br>    # Remove rows with invalid dates or missing target values<br>    initial_count = len(df)<br>    df = df.dropna(subset=[date_column, target_column])<br>    removed_count = initial_count - len(df)<br>    <br>    if len(df) == 0:<br>        raise ValueError(f&quot;No valid data after removing {removed_count} rows with invalid dates or missing target values.&quot;)<br>    <br>    # Sort by date<br>    df = df.sort_values(date_column)<br>    <br>    # Create period column based on time_period<br>    if time_period == &quot;Day&quot;:<br>        df[&#39;period&#39;] = df[date_column].dt.strftime(&#39;%d-%b-%Y&#39;)<br>        period_format = &quot;day&quot;<br>    elif time_period == &quot;Week&quot;:<br>        df[&#39;period&#39;] = df[date_column].dt.strftime(&#39;Week %U, %Y&#39;)<br>        period_format = &quot;week&quot;<br>    elif time_period == &quot;Month&quot;:<br>        df[&#39;period&#39;] = df[date_column].dt.strftime(&#39;%B %Y&#39;)<br>        period_format = &quot;month&quot;<br>    elif time_period == &quot;Year&quot;:<br>        df[&#39;period&#39;] = df[date_column].dt.year.astype(str)<br>        period_format = &quot;year&quot;<br>    else:<br>        df[&#39;period&#39;] = df[date_column].dt.strftime(&#39;%B %Y&#39;)<br>        period_format = &quot;month&quot;<br>    <br>    # Group by period and aggregate<br>    if aggregation == &quot;Sum&quot;:<br>        grouped = df.groupby(&#39;period&#39;)[target_column].sum()<br>    elif aggregation == &quot;Mean&quot;:<br>        grouped = df.groupby(&#39;period&#39;)[target_column].mean()<br>    elif aggregation == &quot;Median&quot;:<br>        grouped = df.groupby(&#39;period&#39;)[target_column].median()<br>    elif aggregation == &quot;Count&quot;:<br>        grouped = df.groupby(&#39;period&#39;)[target_column].count()<br>    else:<br>        grouped = df.groupby(&#39;period&#39;)[target_column].sum()<br>    <br>    periods = grouped.index.tolist()<br>    values = grouped.values.tolist()<br>    <br>    return periods, values, period_format<br><br>def generate_forecast_labels(last_period, horizon, period_format):<br>    &quot;&quot;&quot;Generate forecast period labels&quot;&quot;&quot;<br>    forecast_labels = []<br>    <br>    if period_format == &quot;month&quot;:<br>        try:<br>            last_date = pd.to_datetime(last_period, format=&#39;%B %Y&#39;)<br>            for i in range(1, horizon + 1):<br>                next_date = last_date + pd.DateOffset(months=i)<br>                forecast_labels.append(next_date.strftime(&#39;%B %Y&#39;))<br>        except:<br>            forecast_labels = [f&quot;Forecast {i+1}&quot; for i in range(horizon)]<br>    elif period_format == &quot;day&quot;:<br>        try:<br>            last_date = pd.to_datetime(last_period, format=&#39;%d-%b-%Y&#39;)<br>            for i in range(1, horizon + 1):<br>                next_date = last_date + pd.Timedelta(days=i)<br>                forecast_labels.append(next_date.strftime(&#39;%d-%b-%Y&#39;))<br>        except:<br>            forecast_labels = [f&quot;Forecast {i+1}&quot; for i in range(horizon)]<br>    elif period_format == &quot;year&quot;:<br>        try:<br>            last_year = int(last_period)<br>            forecast_labels = [str(last_year + i) for i in range(1, horizon + 1)]<br>        except:<br>            forecast_labels = [f&quot;Forecast {i+1}&quot; for i in range(horizon)]<br>    else:<br>        forecast_labels = [f&quot;Forecast {i+1}&quot; for i in range(horizon)]<br>    <br>    return forecast_labels<br><br>def generate_chat_response(query, forecast_data, summary):<br>    &quot;&quot;&quot;Generate responses to user queries about the forecast&quot;&quot;&quot;<br>    query_lower = query.lower()<br>    <br>    if &quot;trend&quot; in query_lower or &quot;direction&quot; in query_lower:<br>        direction = summary[&#39;trend&#39;][&#39;direction&#39;]<br>        change = summary[&#39;trend&#39;][&#39;change_percent&#39;]<br>        return f&quot;The forecast shows a **{direction}** trend with a {abs(change):.2f}% change compared to historical data.&quot;<br>    <br>    elif &quot;highest&quot; in query_lower or &quot;maximum&quot; in query_lower or &quot;peak&quot; in query_lower:<br>        max_val = summary[&#39;forecast_stats&#39;][&#39;max&#39;]<br>        max_idx = forecast_data[&#39;point_forecast&#39;].index(max_val)<br>        max_period = forecast_data[&#39;periods&#39;][max_idx]<br>        return f&quot;The highest forecasted value is **{max_val:,.2f}** in **{max_period}**.&quot;<br>    <br>    elif &quot;lowest&quot; in query_lower or &quot;minimum&quot; in query_lower:<br>        min_val = summary[&#39;forecast_stats&#39;][&#39;min&#39;]<br>        min_idx = forecast_data[&#39;point_forecast&#39;].index(min_val)<br>        min_period = forecast_data[&#39;periods&#39;][min_idx]<br>        return f&quot;The lowest forecasted value is **{min_val:,.2f}** in **{min_period}**.&quot;<br>    <br>    elif &quot;average&quot; in query_lower or &quot;mean&quot; in query_lower:<br>        avg = summary[&#39;forecast_stats&#39;][&#39;mean&#39;]<br>        return f&quot;The average forecasted value is **{avg:,.2f}**.&quot;<br>    <br>    elif &quot;total&quot; in query_lower or &quot;sum&quot; in query_lower:<br>        total = summary[&#39;forecast_stats&#39;][&#39;total&#39;]<br>        return f&quot;The total forecasted value across all periods is **{total:,.2f}**.&quot;<br>    <br>    elif &quot;confidence&quot; in query_lower or &quot;interval&quot; in query_lower:<br>        ci_80 = summary[&#39;confidence_intervals&#39;][&#39;80_percent&#39;]<br>        return f&quot;The 80% confidence interval ranges from **{ci_80[&#39;lower&#39;]:,.2f}** to **{ci_80[&#39;upper&#39;]:,.2f}**.&quot;<br>    <br>    elif &quot;compare&quot; in query_lower or &quot;historical&quot; in query_lower:<br>        hist_mean = summary[&#39;historical_stats&#39;][&#39;mean&#39;]<br>        fore_mean = summary[&#39;forecast_stats&#39;][&#39;mean&#39;]<br>        diff = fore_mean - hist_mean<br>        pct = ((fore_mean - hist_mean) / hist_mean * 100) if hist_mean != 0 else 0<br>        return f&quot;Historical average: **{hist_mean:,.2f}**\nForecast average: **{fore_mean:,.2f}**\nDifference: **{diff:+,.2f}** ({pct:+.2f}%)&quot;<br>    <br>    else:<br>        return &quot;I can help you with questions about:\n- Trend and direction\n- Highest/lowest values\n- Average and totals\n- Confidence intervals\n- Comparing historical vs forecast data\n\nPlease ask a specific question!&quot;<br><br># Main UI<br>st.title(&quot;📈 TimesFM Forecasting Tool&quot;)<br>st.markdown(&quot;Upload your data and generate time series forecasts using Google&#39;s TimesFM model&quot;)<br><br># Load model<br>with st.spinner(&quot;Loading TimesFM model...&quot;):<br>    model = load_model()<br><br># Sidebar for controls<br>st.sidebar.header(&quot;⚙️ Forecast Configuration&quot;)<br><br># Data input method<br>input_method = st.sidebar.radio(&quot;Data Input Method&quot;, [&quot;Upload CSV&quot;, &quot;Paste Text Data&quot;])<br><br>df = None<br><br>if input_method == &quot;Upload CSV&quot;:<br>    uploaded_file = st.sidebar.file_uploader(&quot;Upload CSV File&quot;, type=[&#39;csv&#39;])<br>    if uploaded_file:<br>        # Try different encodings<br>        encodings = [&#39;utf-8&#39;, &#39;latin-1&#39;, &#39;iso-8859-1&#39;, &#39;cp1252&#39;, &#39;utf-16&#39;]<br>        df = None<br>        <br>        for encoding in encodings:<br>            try:<br>                uploaded_file.seek(0)  # Reset file pointer<br>                df = pd.read_csv(uploaded_file, encoding=encoding, on_bad_lines=&#39;skip&#39;)<br>                st.sidebar.success(f&quot;✓ Loaded {len(df)} rows (encoding: {encoding})&quot;)<br>                break<br>            except (UnicodeDecodeError, Exception) as e:<br>                continue<br>        <br>        if df is None:<br>            st.sidebar.error(&quot;❌ Could not read file. Please check the file encoding.&quot;)<br>else:<br>    text_data = st.sidebar.text_area(&quot;Paste CSV Data (with headers)&quot;, height=200)<br>    if text_data:<br>        from io import StringIO<br>        df = pd.read_csv(StringIO(text_data))<br>        st.sidebar.success(f&quot;✓ Loaded {len(df)} rows&quot;)<br><br># Main content area<br>if df is not None:<br>    # Show data preview<br>    with st.expander(&quot;📊 Data Preview&quot;, expanded=True):<br>        st.dataframe(df.head(10), use_container_width=True)<br>        st.info(f&quot;Total rows: {len(df)} | Columns: {len(df.columns)}&quot;)<br>    <br>    # Configuration<br>    col1, col2 = st.sidebar.columns(2)<br>    <br>    # Get numeric and date columns<br>    numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()<br>    all_cols = df.columns.tolist()<br>    <br>    with col1:<br>        date_column = st.selectbox(&quot;Date Column&quot;, all_cols)<br>    <br>    with col2:<br>        target_column = st.selectbox(&quot;Target Column&quot;, numeric_cols)<br>    <br>    col3, col4 = st.sidebar.columns(2)<br>    <br>    with col3:<br>        time_period = st.selectbox(&quot;Time Period&quot;, [&quot;Day&quot;, &quot;Week&quot;, &quot;Month&quot;, &quot;Year&quot;])<br>    <br>    with col4:<br>        aggregation = st.selectbox(&quot;Aggregation&quot;, [&quot;Sum&quot;, &quot;Mean&quot;, &quot;Median&quot;, &quot;Count&quot;])<br>    <br>    horizon = st.sidebar.slider(&quot;Forecast Horizon&quot;, min_value=1, max_value=24, value=6)<br>    <br>    # Forecast button<br>    if st.sidebar.button(&quot;🚀 Generate Forecast&quot;, type=&quot;primary&quot;, use_container_width=True):<br>        with st.spinner(&quot;Generating forecast...&quot;):<br>            try:<br>                # Prepare data<br>                periods, values, period_format = load_and_prepare_data(<br>                    df.copy(), target_column, date_column, aggregation, time_period<br>                )<br>                <br>                if len(values) &lt; 3:<br>                    st.error(f&quot;Not enough data points after aggregation. Got {len(values)}, need at least 3&quot;)<br>                else:<br>                    # Generate forecast<br>                    historical_array = np.array(values, dtype=np.float32)<br>                    point_forecast, quantile_forecast = model.forecast(<br>                        horizon=horizon,<br>                        inputs=[historical_array],<br>                    )<br>                    <br>                    forecast_values = point_forecast[0]<br>                    quantiles = quantile_forecast[0]<br>                    <br>                    # Generate labels<br>                    forecast_labels = generate_forecast_labels(periods[-1], horizon, period_format)<br>                    <br>                    # Convert to native types<br>                    forecast_values_list = [float(x) for x in forecast_values]<br>                    <br>                    # Store in session state<br>                    st.session_state[&#39;forecast_data&#39;] = {<br>                        &#39;periods&#39;: forecast_labels,<br>                        &#39;point_forecast&#39;: forecast_values_list,<br>                        &#39;quantiles&#39;: {<br>                            &#39;q10&#39;: [float(x) for x in quantiles[:, 1]],<br>                            &#39;q25&#39;: [float(x) for x in quantiles[:, 3]],<br>                            &#39;q50&#39;: [float(x) for x in quantiles[:, 5]],<br>                            &#39;q75&#39;: [float(x) for x in quantiles[:, 7]],<br>                            &#39;q90&#39;: [float(x) for x in quantiles[:, 9]]<br>                        },<br>                        &#39;horizon&#39;: horizon<br>                    }<br>                    <br>                    st.session_state[&#39;summary&#39;] = {<br>                        &#39;historical_stats&#39;: {<br>                            &#39;count&#39;: len(values),<br>                            &#39;mean&#39;: float(np.mean(values)),<br>                            &#39;std&#39;: float(np.std(values)),<br>                            &#39;min&#39;: float(np.min(values)),<br>                            &#39;max&#39;: float(np.max(values)),<br>                            &#39;total&#39;: float(np.sum(values))<br>                        },<br>                        &#39;forecast_stats&#39;: {<br>                            &#39;count&#39;: len(forecast_values),<br>                            &#39;mean&#39;: float(np.mean(forecast_values)),<br>                            &#39;std&#39;: float(np.std(forecast_values)),<br>                            &#39;min&#39;: float(np.min(forecast_values)),<br>                            &#39;max&#39;: float(np.max(forecast_values)),<br>                            &#39;total&#39;: float(np.sum(forecast_values))<br>                        },<br>                        &#39;trend&#39;: {<br>                            &#39;direction&#39;: &#39;increasing&#39; if np.mean(forecast_values) &gt; np.mean(values) else &#39;decreasing&#39;,<br>                            &#39;change_percent&#39;: ((np.mean(forecast_values) - np.mean(values)) / np.mean(values) * 100) if np.mean(values) != 0 else 0<br>                        },<br>                        &#39;confidence_intervals&#39;: {<br>                            &#39;80_percent&#39;: {<br>                                &#39;lower&#39;: float(np.mean(quantiles[:, 1])),<br>                                &#39;upper&#39;: float(np.mean(quantiles[:, 9]))<br>                            },<br>                            &#39;50_percent&#39;: {<br>                                &#39;lower&#39;: float(np.mean(quantiles[:, 3])),<br>                                &#39;upper&#39;: float(np.mean(quantiles[:, 7]))<br>                            }<br>                        }<br>                    }<br>                    <br>                    st.session_state[&#39;periods&#39;] = periods<br>                    st.session_state[&#39;values&#39;] = values<br>                    st.session_state[&#39;forecast_labels&#39;] = forecast_labels<br>                    st.session_state[&#39;quantiles&#39;] = quantiles<br>                    st.session_state[&#39;target_column&#39;] = target_column<br>                    <br>                    st.success(&quot;✓ Forecast generated successfully!&quot;)<br>                    st.rerun()<br>                    <br>            except Exception as e:<br>                st.error(f&quot;Error generating forecast: {str(e)}&quot;)<br>    <br>    # Display results if forecast exists<br>    if &#39;forecast_data&#39; in st.session_state:<br>        st.markdown(&quot;---&quot;)<br>        st.header(&quot;📊 Forecast Results&quot;)<br>        <br>        # Summary metrics<br>        col1, col2, col3, col4 = st.columns(4)<br>        <br>        with col1:<br>            st.metric(<br>                &quot;Historical Mean&quot;,<br>                f&quot;{st.session_state[&#39;summary&#39;][&#39;historical_stats&#39;][&#39;mean&#39;]:,.2f}&quot;<br>            )<br>        <br>        with col2:<br>            st.metric(<br>                &quot;Forecast Mean&quot;,<br>                f&quot;{st.session_state[&#39;summary&#39;][&#39;forecast_stats&#39;][&#39;mean&#39;]:,.2f}&quot;,<br>                delta=f&quot;{st.session_state[&#39;summary&#39;][&#39;trend&#39;][&#39;change_percent&#39;]:.2f}%&quot;<br>            )<br>        <br>        with col3:<br>            st.metric(<br>                &quot;Trend&quot;,<br>                st.session_state[&#39;summary&#39;][&#39;trend&#39;][&#39;direction&#39;].title()<br>            )<br>        <br>        with col4:<br>            st.metric(<br>                &quot;Forecast Periods&quot;,<br>                st.session_state[&#39;forecast_data&#39;][&#39;horizon&#39;]<br>            )<br>        <br>        # Visualization<br>        st.subheader(&quot;📈 Forecast Visualization&quot;)<br>        <br>        fig, ax = plt.subplots(figsize=(14, 6))<br>        <br>        periods = st.session_state[&#39;periods&#39;]<br>        values = st.session_state[&#39;values&#39;]<br>        forecast_labels = st.session_state[&#39;forecast_labels&#39;]<br>        forecast_values = st.session_state[&#39;forecast_data&#39;][&#39;point_forecast&#39;]<br>        quantiles = st.session_state[&#39;quantiles&#39;]<br>        <br>        # Create x-axis<br>        total_points = len(values) + len(forecast_values)<br>        all_x = list(range(total_points))<br>        hist_x = all_x[:len(values)]<br>        forecast_x = all_x[len(values)-1:]<br>        <br>        # Plot<br>        ax.plot(hist_x, values, label=&quot;Historical Data&quot;, color=&quot;#3498db&quot;, <br>                linewidth=2, marker=&#39;o&#39;, markersize=4)<br>        <br>        forecast_with_connection = [values[-1]] + forecast_values<br>        ax.plot(forecast_x, forecast_with_connection, label=&quot;Forecast&quot;, <br>                color=&quot;#2ecc71&quot;, linewidth=2, marker=&#39;s&#39;, markersize=4)<br>        <br>        # Confidence intervals<br>        quantiles_with_connection = np.vstack([[values[-1]] * quantiles.shape[1], quantiles])<br>        <br>        ax.fill_between(forecast_x, quantiles_with_connection[:, 1], <br>                        quantiles_with_connection[:, 9], alpha=0.2, <br>                        color=&#39;#2ecc71&#39;, label=&#39;80% Prediction Interval&#39;)<br>        <br>        ax.fill_between(forecast_x, quantiles_with_connection[:, 3], <br>                        quantiles_with_connection[:, 7], alpha=0.3, <br>                        color=&#39;#2ecc71&#39;, label=&#39;50% Prediction Interval&#39;)<br>        <br>        # Labels<br>        all_labels = periods + forecast_labels<br>        step = max(1, len(all_labels) // 12)<br>        tick_positions = list(range(0, len(all_labels), step))<br>        tick_labels = [all_labels[i] for i in tick_positions]<br>        <br>        ax.set_xticks(tick_positions)<br>        ax.set_xticklabels(tick_labels, rotation=45, ha=&#39;right&#39;)<br>        ax.legend(loc=&#39;best&#39;)<br>        ax.set_title(f&quot;TimesFM Forecast: {st.session_state[&#39;target_column&#39;]}&quot;, <br>                     fontsize=14, fontweight=&#39;bold&#39;)<br>        ax.set_xlabel(&quot;Period&quot;)<br>        ax.set_ylabel(st.session_state[&#39;target_column&#39;])<br>        ax.grid(True, linestyle=&#39;--&#39;, alpha=0.6)<br>        <br>        plt.tight_layout()<br>        st.pyplot(fig)<br>        <br>        # Forecast table<br>        st.subheader(&quot;📋 Forecast Values&quot;)<br>        <br>        forecast_df = pd.DataFrame({<br>            &#39;Period&#39;: st.session_state[&#39;forecast_data&#39;][&#39;periods&#39;],<br>            &#39;Forecast&#39;: st.session_state[&#39;forecast_data&#39;][&#39;point_forecast&#39;],<br>            &#39;Lower (10%)&#39;: st.session_state[&#39;forecast_data&#39;][&#39;quantiles&#39;][&#39;q10&#39;],<br>            &#39;Lower (25%)&#39;: st.session_state[&#39;forecast_data&#39;][&#39;quantiles&#39;][&#39;q25&#39;],<br>            &#39;Median&#39;: st.session_state[&#39;forecast_data&#39;][&#39;quantiles&#39;][&#39;q50&#39;],<br>            &#39;Upper (75%)&#39;: st.session_state[&#39;forecast_data&#39;][&#39;quantiles&#39;][&#39;q75&#39;],<br>            &#39;Upper (90%)&#39;: st.session_state[&#39;forecast_data&#39;][&#39;quantiles&#39;][&#39;q90&#39;]<br>        })<br>        <br>        st.dataframe(forecast_df.style.format({<br>            &#39;Forecast&#39;: &#39;{:,.2f}&#39;,<br>            &#39;Lower (10%)&#39;: &#39;{:,.2f}&#39;,<br>            &#39;Lower (25%)&#39;: &#39;{:,.2f}&#39;,<br>            &#39;Median&#39;: &#39;{:,.2f}&#39;,<br>            &#39;Upper (75%)&#39;: &#39;{:,.2f}&#39;,<br>            &#39;Upper (90%)&#39;: &#39;{:,.2f}&#39;<br>        }), use_container_width=True)<br>        <br>        # Chat interface<br>        st.markdown(&quot;---&quot;)<br>        st.subheader(&quot;💬 Ask Questions About Your Forecast&quot;)<br>        <br>        user_query = st.text_input(<br>            &quot;Ask me anything about the forecast:&quot;,<br>            placeholder=&quot;e.g., What is the trend? What is the highest forecasted value?&quot;<br>        )<br>        <br>        if user_query:<br>            response = generate_chat_response(<br>                user_query, <br>                st.session_state[&#39;forecast_data&#39;],<br>                st.session_state[&#39;summary&#39;]<br>            )<br>            st.info(response)<br>        <br>        # Example questions<br>        with st.expander(&quot;💡 Example Questions&quot;):<br>            st.markdown(&quot;&quot;&quot;<br>            - What is the trend?<br>            - What is the highest forecasted value?<br>            - What is the average forecast?<br>            - Show me the confidence interval<br>            - Compare historical and forecast data<br>            - What is the total forecasted amount?<br>            &quot;&quot;&quot;)<br><br>else:<br>    # Welcome screen<br>    st.info(&quot;👈 Please upload a CSV file or paste your data to get started&quot;)<br>    <br>    st.markdown(&quot;&quot;&quot;<br>    ### How to use this tool:<br>    <br>    1. **Upload Data**: Choose to upload a CSV file or paste text data<br>    2. **Configure**: Select your date column, target column, and forecast settings<br>    3. **Generate**: Click the &quot;Generate Forecast&quot; button<br>    4. **Analyze**: View the forecast visualization and ask questions<br>    <br>    ### Supported Features:<br>    <br>    - ✅ Multiple time periods (Day, Week, Month, Year)<br>    - ✅ Various aggregation methods (Sum, Mean, Median, Count)<br>    - ✅ Customizable forecast horizon (1-24 periods)<br>    - ✅ Confidence intervals (50% and 80%)<br>    - ✅ Interactive Q&amp;A about forecasts<br>    &quot;&quot;&quot;)</pre><p>Run the Streamlit application using the following command to view the results.</p><pre>streamlit run app.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*qD87vsd-nSeRxB6UWkqSqg.gif" /></figure><p><em>Thanks for reading this article !!</em></p><p>If you enjoyed this article, please click on the clap button 👏 and share to help others find it!</p><p>The full source code for this tutorial can be found here,</p><p><a href="https://github.com/codemaker2015/timesfm-experiments">GitHub - codemaker2015/timesfm-experiments: Forecasting time series data using timesfm</a></p><h4>References</h4><ul><li><a href="https://research.google/blog/a-decoder-only-foundation-model-for-time-series-forecasting/">A decoder-only foundation model for time-series forecasting</a></li><li><a href="https://github.com/google-research/timesfm">GitHub - google-research/timesfm: TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.</a></li><li><a href="https://colab.research.google.com/drive/1UAKc3DEPQzsOHov1gug1-1iWmpdRFiHe?usp=sharing">Google Colab</a></li><li><a href="https://www.kaggle.com/datasets/kandij/electric-production/data">Time series analysis</a></li><li><a href="https://www.kaggle.com/datasets/odins0n/monthly-gold-prices">Monthly Gold Prices (1979-2021)</a></li><li><a href="https://www.kaggle.com/datasets/kyanyoga/sample-sales-data">Sample Sales Data</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=57486ebca761" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Understanding DeepEval: A Practical Guide for Evaluating Large Language Models]]></title>
            <link>https://codemaker2016.medium.com/understanding-deepeval-a-practical-guide-for-evaluating-large-language-models-d7272b6c2634?source=rss-b7a60e472dfc------2</link>
            <guid isPermaLink="false">https://medium.com/p/d7272b6c2634</guid>
            <category><![CDATA[llm-testing]]></category>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[deepeval]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[llm-evaluation]]></category>
            <dc:creator><![CDATA[Vishnu Sivan]]></dc:creator>
            <pubDate>Tue, 09 Sep 2025 04:43:13 GMT</pubDate>
            <atom:updated>2025-09-12T04:04:43.697Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*WyyyfS2tFEK7E17Daa9jvg.png" /></figure><p>The rapid evolution of Large Language Models (LLMs) has transformed the way we build intelligent applications, but with this growth comes an equally important challenge — how do we measure their true effectiveness? Traditional evaluation methods often fall short in capturing the diverse capabilities and limitations of LLMs. From reasoning and accuracy to coherence, bias, and ethical alignment, a robust evaluation framework is essential to ensure that these models are reliable and suitable for real-world use.</p><p>This is where DeepEval comes in. DeepEval is an open-source framework built to streamline LLM testing by offering a comprehensive suite of metrics, synthetic dataset generation, real-time evaluation, and seamless integration with popular testing frameworks like Pytest. By enabling easy customization, DeepEval empowers researchers and developers to benchmark models against tasks like MMLU, apply advanced metrics such as G-eval, and rigorously validate outputs for relevance and reliability.</p><p>In this tutorial, you’ll learn how to set up DeepEval, create a relevance test inspired by Pytest, evaluate LLM outputs using the G-eval metric, and run MMLU benchmarking on the TinyLlama model. By the end, you’ll have a clear workflow to systematically test and improve the performance of your LLM-powered applications.</p><h3>Getting Started</h3><h4>Table of contents</h4><ul><li><a href="#e4e6">What is DeepEval</a></li><li><a href="#e20d">Key Features</a></li><li><a href="#87e7">Getting started with evaluation of LLM models using DeepEval</a></li><li><a href="#99ab">Installing the dependencies</a></li><li><a href="#194a">Querying the Model &amp; Measuring Different Metrics</a></li><li><a href="#cf0f">Example 1: Answer Relevancy Metric</a></li><li><a href="#c81f">Example 2: G-Eval Metric</a></li><li><a href="#a7df">Example 3: Prompt Alignment Metric</a></li><li><a href="#c74f">Example 4: Json Correctness Metric</a></li><li><a href="#891a">Example 5: Summarization Metric</a></li><li><a href="#6d59">Example 6: LLM Integration</a></li><li><a href="#d772">Example 7: Hallucinations</a></li><li><a href="#f40c">Example 8: Faithfulness Metric</a></li><li><a href="#d194">Example 9: Chatbot Evaluation</a></li><li><a href="#31ce">Example 10: LLM Tracing</a></li><li><a href="#e0f8">Example 11: MCP Interactions</a></li><li><a href="#5cc3">MMLU benchmarking with DeepEval for custom LLMs</a></li></ul><h4>What is DeepEval</h4><p>DeepEval is an open-source evaluation framework designed for testing Large Language Models (LLMs) across multiple dimensions such as reasoning, accuracy, coherence, relevance, and ethical alignment. Unlike simple benchmarks, DeepEval goes beyond by offering custom metrics, real-time evaluation, synthetic dataset generation, and seamless integration with testing pipelines. It allows researchers and developers to systematically measure and monitor LLM performance both in experimentation and production.</p><h4>Key Features</h4><ul><li><strong>Extensive Metric Suite<br></strong>Provides 14+ research-backed metrics for LLM evaluation. <br>Includes advanced metrics like G-Eval (chain-of-thought reasoning), Faithfulness (accuracy &amp; reliability), Toxicity, Answer Relevancy, and Conversational Metrics such as knowledge retention and conversation completeness.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/611/1*-pCdH7BsRlX5s-Y6AsTr_w.png" /></figure><ul><li><strong>Custom Metric Development<br></strong>Allows users to define their own evaluation metrics tailored to specific use cases.</li><li><strong>Integration with LLMs<br></strong>Compatible with any LLM (including OpenAI models).<br>Supports benchmarking against popular datasets like MMLU and HumanEval.</li><li><strong>Real-Time Monitoring &amp; Benchmarking<br></strong>Enables continuous monitoring of LLMs in production.<br>Provides robust benchmarking capabilities to assess models efficiently.</li><li><strong>Simplified Testing with Pytest Integration<br></strong>Built with a Pytest-like architecture, making it easy to write unit tests for LLM outputs in just a few lines of code.</li><li><strong>Batch Evaluation Support<br></strong>Supports large-scale evaluations with batch processing, saving time when benchmarking custom LLMs.</li></ul><h3>Getting started with evaluation of LLM models using DeepEval</h3><p>In this session, we will explore how to evaluate Large Language Models (LLMs) with DeepEval. By default, DeepEval supports OpenAI, and we’ll be using the OpenAI GPT-4o-mini model. However, you can use any other LLM of your choice.</p><p>We will also use uv, a modern and fast Python package manager (instead of pip), to set up our environment and handle dependencies.</p><h4>Installing uv</h4><p>uv simplifies dependency management, virtual environments, and running scripts.</p><ul><li><strong>For Windows:</strong></li></ul><pre>powershell -ExecutionPolicy ByPass -c &quot;irm https://astral.sh/uv/install.ps1 | iex&quot;<br>set Path=C:\Users\Codem\.local\bin;%Path%</pre><ul><li><strong>For Linux / Mac:</strong></li></ul><pre>curl -LsSf https://astral.sh/uv/install.sh | sh</pre><p>Refer to the official website for detailed installation instructions.</p><p><a href="https://docs.astral.sh/uv/getting-started/installation">Installation | uv</a></p><h4>Installing the dependencies</h4><ul><li>Initialize a uv project by executing the following command.</li></ul><pre>uv init deepeval_demo<br>cd deepeval_demo</pre><ul><li>Create and activate a virtual environment by executing the following command.</li></ul><pre>uv venv<br>source .venv/bin/activate # for linux<br>.venv\Scripts\activate    # for windows</pre><ul><li>Install deepeval, langchain-openai, fastmcp, transformers, accelerate, bitsandbytes, datasets, pandas and python-dotenv using uv.</li></ul><pre>uv add deepeval langchain-openai fastmcp transformers accelerate bitsandbytes datasets pandas python-dotenv</pre><h4>Setting up the credentials</h4><ul><li>Create a file named .env. This file will store your environment variables, including the OpenAI key.</li><li>Open the .env file and add the following code to specify your OpenAI API key and Neo4j credentials.</li></ul><pre>OPENAI_API_KEY=sk-proj-C1K1hKug99wXxtj...</pre><h3>Querying the Model &amp; Measuring Different Metrics</h3><p>Now that the environment is set up, let’s start querying our LLM and measure the quality of its responses using different metrics.</p><h4>Example 1: Answer Relevancy Metric</h4><p>The Answer Relevancy Metric evaluates how relevant the model’s response is compared to the retrieval context. This is useful in RAG (Retrieval-Augmented Generation) systems or whenever you want to ensure the response aligns with supporting facts.</p><p>Create a file named test_relevancy.py and add the following code to it.</p><pre>from deepeval import assert_test<br>from deepeval.test_case import LLMTestCase<br>from deepeval.metrics import AnswerRelevancyMetric<br>from dotenv import load_dotenv<br>load_dotenv()<br><br>def test_relevancy():<br>    # Define the metric with a threshold<br>    relevancy_metric = AnswerRelevancyMetric(threshold=0.7, model=&quot;gpt-4o-mini&quot;)<br>    <br>    # Case 1: Partially relevant answer<br>    test_case_1 = LLMTestCase(<br>        input=&quot;Can I return these shoes after 30 days?&quot;,<br>        actual_output=&quot;Yes, you can return them. We offer a 30-day full refund. Do you have your original receipt?&quot;,<br>        retrieval_context=[<br>            &quot;All customers are eligible for a 30-day full refund at no extra cost.&quot;,<br>            &quot;Returns are only accepted within 30 days of purchase.&quot;,<br>        ],<br>    )<br>    <br>    # Case 2: Fully relevant answer<br>    test_case_2 = LLMTestCase(<br>        input=&quot;Can I return these shoes after 30 days?&quot;,<br>        actual_output=&quot;Unfortunately, returns are only accepted within 30 days of purchase.&quot;,<br>        retrieval_context=[<br>            &quot;All customers are eligible for a 30-day full refund at no extra cost.&quot;,<br>            &quot;Returns are only accepted within 30 days of purchase.&quot;,<br>        ],<br>    )<br>    <br>    # Run evaluation<br>    assert_test(test_case_1, [relevancy_metric])<br>    assert_test(test_case_2, [relevancy_metric])</pre><p><strong>Code explanation:</strong></p><ul><li><strong>Metric Defined</strong> → AnswerRelevancyMetric(threshold=0.7) sets the relevancy cutoff.</li><li><strong>Test Cases Created</strong> → Each LLMTestCase includes:<br>The user query (input), LLM output (actual_output) and retrieval context used for evaluation.</li><li><strong>Assertion</strong> → assert_test() automatically checks if the relevancy score passes the threshold.<br>In Test Case 1: the answer contradicts the context slightly (“Yes, you can return after 30 days” vs. rule of within 30 days).<br>In Test Case 2: the answer is fully aligned with the context.</li></ul><h4>Executing the test</h4><p>Use the following command to run the test.</p><pre>deepeval test run test_relevancy.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*eYcP2gTol6EnvycOVGF68g.png" /></figure><h4>Example 2: G-Eval Metric</h4><p>G-Eval is an LLM evaluation framework that leverages chain-of-thought (CoT) reasoning to assess model outputs based on custom criteria. Unlike fixed metrics, G-Eval is highly flexible and can evaluate nearly any aspect of a response — such as factual accuracy, omissions, clarity, or adherence to instructions.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/830/1*CIg2WHK8VfmXIcWP37GvGw.png" /><figcaption>Image Source: <a href="https://ar5iv.labs.arxiv.org/html/2303.16634">[2303.16634] G-Eval: NLG Evaluation using Gpt-4 with Better Human Alignment</a></figcaption></figure><p>It works in two steps:</p><ol><li><strong>Generate Evaluation Steps</strong> — Uses CoT reasoning to break down the evaluation based on the given criteria.</li><li><strong>Determine Final Score</strong> — Applies those steps to score the LLM’s output.</li></ol><p>If evaluation steps are manually provided, G-Eval skips step one and directly uses them to calculate the score.</p><p>Create a file named test_geval_example.py and add the following code to it.</p><pre>from deepeval import assert_test<br>from deepeval.test_case import LLMTestCase, LLMTestCaseParams<br>from deepeval.metrics import GEval<br><br>from dotenv import load_dotenv<br>load_dotenv()<br><br>correctness_metric = GEval(<br>    name=&quot;Correctness&quot;,<br>    model=&quot;gpt-4o-mini&quot;,<br>    evaluation_params=[<br>        LLMTestCaseParams.EXPECTED_OUTPUT,<br>        LLMTestCaseParams.ACTUAL_OUTPUT],<br>    evaluation_steps=[<br>        &quot;Check whether the facts in &#39;actual output&#39; contradicts any facts in &#39;expected output&#39;&quot;,<br>        &quot;You should also lightly penalize omission of detail, and focus on the main idea&quot;,<br>        &quot;Vague language, or contradicting OPINIONS, are OK&quot;<br>    ],<br>)<br><br>first_test_case = LLMTestCase(input=&quot;What are the main causes of deforestation?&quot;,<br>                              actual_output=&quot;The main causes of deforestation include agricultural expansion, logging, infrastructure development, and urbanization.&quot;,<br>                              expected_output=&quot;The main causes of deforestation include agricultural expansion, logging, infrastructure development, and urbanization.&quot;)<br><br><br>second_test_case = LLMTestCase(input=&quot;Define the term &#39;artificial intelligence&#39;.&quot;,<br>                               actual_output=&quot;Artificial intelligence is the simulation of human intelligence by machines.&quot;,<br>                               expected_output=&quot;Artificial intelligence refers to the simulation of human intelligence in machines that are programmed to think and learn like humans, including tasks such as problem-solving, decision-making, and language understanding.&quot;)<br><br><br>third_test_case = LLMTestCase(input=&quot;List the primary colors.&quot;,<br>                              actual_output=&quot;The primary colors are green, orange, and purple.&quot;,<br>                              expected_output=&quot;The primary colors are red, blue, and yellow.&quot;)<br><br>test_cases = [first_test_case, second_test_case, third_test_case]<br>for test_case in test_cases:<br>    assert_test(test_case, [correctness_metric])</pre><p><strong>Code explanation</strong></p><ul><li>Define a G-Eval metric named “Correctness” using gpt-4o-mini.</li><li>Specify evaluation parameters: compare expected vs actual output.</li><li>Add custom evaluation steps: check contradictions, penalize omissions, allow vague language/opinions.</li><li>Create three LLM test cases with queries, actual outputs, and expected outputs.</li><li>Run all test cases using assert_test() with the correctness metric.</li></ul><h4>Executing the test</h4><p>Use the following command to run the test.</p><pre>deepeval test run test_geval_example.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*S4RYJBd9ruoOnNF5LRPwIg.png" /></figure><h4>Example 3: Prompt Alignment Metric</h4><p>The prompt alignment metric evaluates whether an LLM’s generated output aligns with the instructions defined in the prompt template. It ensures that the model not only provides a relevant response to the query but also adheres to any specified formatting, style, or structural requirements.</p><p>Create a file named test_prompt_alignment.py and add the following code to it.</p><pre>from deepeval import evaluate<br>from deepeval.metrics import PromptAlignmentMetric<br>from deepeval.test_case import LLMTestCase<br>from langchain_core.prompts import ChatPromptTemplate<br>from langchain_openai import ChatOpenAI<br><br>from dotenv import load_dotenv<br>load_dotenv()<br><br>template = &quot;&quot;&quot;Question: {question}<br>Answer: Answer in Upper case.&quot;&quot;&quot;<br>prompt = ChatPromptTemplate.from_template(template)<br>model = ChatOpenAI(model=&quot;gpt-4o-mini&quot;)<br>chain = prompt | model<br>query = &quot;What is capital of India?&quot;<br>input_data = {&quot;question&quot;: query}<br># Invoke the chain with input data and display the response<br>actual_output = chain.invoke(input_data).content<br>print(&quot;actual_output:&quot;, actual_output)<br><br># Measuring prompt alignment<br>metric = PromptAlignmentMetric(<br>    prompt_instructions=[&quot;Reply in all uppercase&quot;],<br>    model=&quot;gpt-4o-mini&quot;,<br>    include_reason=True<br>)<br>test_case = LLMTestCase(<br>    input=query,<br>    actual_output=actual_output<br>)<br><br>metric.measure(test_case)<br>print(&quot;metric.score:&quot;, metric.score)<br>print(&quot;metric.reason:&quot;, metric.reason)<br><br># or evaluate test cases in bulk<br>evaluate([test_case], [metric])</pre><p>Run the test using deepval test run test_prompt_alignment.py command to check if the model output aligns with the prompt instructions.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*C1if2Rk97Uh9FOZuIpk1rQ.png" /></figure><h4>Example 4: Json Correctness Metric</h4><p>The JSON Correctness Metric evaluates whether an LLM’s generated output follows the correct JSON schema. Unlike other metrics that rely on an LLM for assessment, this metric simply checks the provided expected_schema and verifies if the actual_output can be successfully validated against it.</p><p>Create a file named test_json_correctness.py and add the following code to it.</p><pre>from deepeval import evaluate<br>from deepeval.metrics import JsonCorrectnessMetric<br>from deepeval.test_case import LLMTestCase<br>from langchain_core.prompts import ChatPromptTemplate<br>from langchain_openai import ChatOpenAI<br>from pydantic import BaseModel<br><br>from dotenv import load_dotenv<br>load_dotenv()<br><br>class ExampleSchema(BaseModel):<br>    name: str<br><br># Querying the model<br>template = &quot;&quot;&quot;Question: {question}<br>Answer:  Let&#39;s think step by step.&quot;&quot;&quot;<br>prompt = ChatPromptTemplate.from_template(template)<br>model = ChatOpenAI(model=&quot;gpt-4o-mini&quot;)<br>chain = prompt | model<br>query =&quot;Output me a random Json with the &#39;name&#39; key&quot;<br>input_data = {&quot;question&quot;: query}<br># Invoke the chain with input data and display the response<br>actual_output = chain.invoke(input_data).content<br>print(&quot;actual_output:&quot;, actual_output)<br><br># Measuring Json correctness<br>metric = JsonCorrectnessMetric(<br>    expected_schema=ExampleSchema,<br>    model=&quot;gpt-4o-mini&quot;,<br>    include_reason=True<br>)<br>test_case = LLMTestCase(<br>    input=query,<br>    actual_output=actual_output<br>)<br><br>metric.measure(test_case)<br>print(&quot;metric.score:&quot;, metric.score)<br>print(&quot;metric.reason:&quot;, metric.reason)<br><br># or evaluate test cases in bulk<br>evaluate([test_case], [metric])</pre><p>Run the test using deepval test run test_json_correctness.py command to check if the model output aligns with the instructions.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ATxXO8z5ZuALOLUkurY0WA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mosCHrcc_4OxjZDPUxdDcw.png" /></figure><h4>Example 5: Summarization Metric</h4><p>The Summarization Metric evaluates whether an LLM generates factually accurate summaries that include the essential details from the original text. Its score is calculated using two components: the alignment_score, which checks if the summary avoids hallucinations or contradictions, and the coverage_score, which measures whether the summary captures all the necessary information from the source text.</p><p>Create a file named test_summarization.py and add the following code to it.</p><pre>from deepeval import evaluate<br>from deepeval.metrics import SummarizationMetric<br>from deepeval.test_case import LLMTestCase<br>from langchain_core.prompts import ChatPromptTemplate<br>from langchain_openai import ChatOpenAI<br>from pydantic import BaseModel<br><br>from dotenv import load_dotenv<br>load_dotenv()<br><br>class ExampleSchema(BaseModel):<br>    name: str<br><br># This is the original text to be summarized<br>text = &quot;&quot;&quot;<br>Rice is the staple food of Bengal. Bhortas (lit-&quot;mashed&quot;) are a really common type of food used as an additive too rice. there are several types of Bhortas such as Ilish bhorta shutki bhorta, begoon bhorta and more. Fish and other seafood are also important because Bengal is a reverrine region.<br>Some fishes like puti (Puntius species) are fermented. Fish curry is prepared with fish alone or in combination with vegetables.Shutki maach is made using the age-old method of preservation where the food item is dried in the sun and air, thus removing the water content. This allows for preservation that can make the fish last for months, even years in Bangladesh<br>&quot;&quot;&quot;<br><br>template = &quot;&quot;&quot;Question: {question}<br>Answer:  Let&#39;s think step by step.&quot;&quot;&quot;<br>prompt = ChatPromptTemplate.from_template(template)<br>model = ChatOpenAI(model=&quot;gpt-4o-mini&quot;)<br>chain = prompt | model<br>query =&quot;Summarize the text for me %s&quot;%(text)<br>input_data = {&quot;question&quot;: query}<br># Invoke the chain with input data and display the response in Markdown format<br>actual_output = chain.invoke(input_data).content<br>print(&quot;actual_output:&quot;, actual_output)<br><br>test_case = LLMTestCase(input=text, actual_output=actual_output)<br>metric = SummarizationMetric(<br>    threshold=0.7,<br>    model=&quot;gpt-4o-mini&quot;,<br>)<br><br>metric.measure(test_case)<br>print(&quot;metric.score:&quot;, metric.score)<br>print(&quot;metric.reason:&quot;, metric.reason)<br><br># or evaluate test cases in bulk<br>evaluate([test_case], [metric])</pre><p>Run the test using deepval test run test_summarization.py command to check if the summary is generated correctly.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Wa8ywVYIMCtHcZMZUJuOlA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*5RhIT0-PZK-h5RjWOg__xQ.png" /></figure><h4>Example 6: LLM Integration</h4><p>DeepEval is capable of integrating with any LLM and evaluating its performance across different metrics. In this example, we use LangChain’s ChatOpenAI with DeepEval to test the relevancy of a response. The model (gpt-4o-mini) is queried with <em>“What is the capital of India?”</em>, and the output is evaluated using the Answer Relevancy Metric.</p><p>Create a file named test_openai_llm.py and add the following code to it.</p><pre>from langchain_openai import ChatOpenAI<br>from deepeval.metrics import AnswerRelevancyMetric<br>from deepeval.test_case import LLMTestCase<br><br>from dotenv import load_dotenv<br>load_dotenv()<br><br># Initialize the model<br>chat = ChatOpenAI(model=&quot;gpt-4o-mini&quot;,temperature=0.7)<br><br># Get response<br>query = &quot;What is the capital of India?&quot;<br>response = chat.invoke(query).content<br>print(f&quot;User: {query} \nAssistant: {response}&quot;)<br><br>metric = AnswerRelevancyMetric(<br>    threshold=0.7,<br>    model=&quot;gpt-4o-mini&quot;,<br>    include_reason=True<br>)<br>test_case = LLMTestCase(<br>    input=query,<br>    actual_output=response<br>)<br><br>metric.measure(test_case)<br>print(metric.score)<br>print(metric.reason)</pre><p>Run the test using deepval test run test_openai_llm.py command to check the evaluation metrics.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mNU8OR_5xUoreJddSHX-_Q.png" /></figure><h4>Example 7: Hallucinations</h4><p>DeepEval provides a Hallucination Metric that helps detect and score these cases. It compares the model’s actual output against the given context to check whether the response faithfully sticks to the facts provided. If the output strays beyond or invents unsupported details, the score decreases. This way, DeepEval helps ensure your LLM outputs remain accurate and grounded in the input context.</p><p>Create a file named test_hallucinations.py and add the following code to it.</p><pre>from deepeval import evaluate<br>from deepeval.metrics import HallucinationMetric<br>from deepeval.test_case import LLMTestCase<br><br>from dotenv import load_dotenv<br>load_dotenv()<br><br># Replace this with the actual documents that you are passing as input to your LLM.<br>context=[&quot;A man with blond-hair, and a brown shirt drinking out of a public water fountain.&quot;]<br><br># Replace this with the actual output from your LLM application<br>actual_output=&quot;A blond drinking water in public.&quot;<br><br>test_case = LLMTestCase(<br>    input=&quot;What was the blond doing?&quot;,<br>    actual_output=actual_output,<br>    context=context<br>)<br>metric = HallucinationMetric(threshold=0.5)<br><br># To run metric as a standalone<br># metric.measure(test_case)<br># print(metric.score, metric.reason)<br><br>evaluate(test_cases=[test_case], metrics=[metric])</pre><p>Run the test using deepval test run test_hallucinations.py command to check the evaluation metrics.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*01_xA8l91Gk1qLQyndnzVQ.png" /></figure><h4>Example 8: Faithfulness Metric</h4><p>The FaithfulnessMetric in DeepEval evaluates whether an LLM’s output is accurately grounded in the provided context or source material. It measures if the response remains consistent with the facts without introducing fabricated or contradictory information. This metric is particularly useful for applications like question-answering or retrieval-augmented generation, where maintaining factual consistency is critical. A higher faithfulness score indicates that the model’s output reliably reflects the input context.</p><p>Create a file named test_faithfulness.py and add the following code to it.</p><pre>from deepeval import evaluate<br>from deepeval.test_case import LLMTestCase<br>from deepeval.metrics import FaithfulnessMetric<br><br>from dotenv import load_dotenv<br>load_dotenv()<br><br># Replace this with the actual output from your LLM application<br>actual_output = &quot;We offer a 30-day full refund at no extra cost.&quot;<br><br># Replace this with the actual retrieved context from your RAG pipeline<br>retrieval_context = [&quot;All customers are eligible for a 30 day full refund at no extra cost.&quot;]<br><br>metric = FaithfulnessMetric(<br>    threshold=0.7,<br>    model=&quot;gpt-4o-mini&quot;,<br>    include_reason=True<br>)<br>test_case = LLMTestCase(<br>    input=&quot;What if these shoes don&#39;t fit?&quot;,<br>    actual_output=actual_output,<br>    retrieval_context=retrieval_context<br>)<br><br># To run metric as a standalone<br># metric.measure(test_case)<br># print(metric.score, metric.reason)<br><br>evaluate(test_cases=[test_case], metrics=[metric])</pre><p>Run the test using deepval test run test_faithfulness.pycommand to check the evaluation metrics.</p><h4>Example 9: Chatbot Evaluation</h4><p>Chatbot Evaluation differs from standard single-turn evaluations because conversations occur over multiple turns. This requires the chatbot to maintain context awareness throughout the interaction, rather than simply providing accurate responses in isolation. <br>In DeepEval, chatbots are assessed through multi-turn interactions, which must be structured as test cases following OpenAI’s message format. Evaluating multi-turn conversations is challenging, as each AI response depends on the preceding user input and all prior turns in the conversation, making the evaluation inherently context-dependent.</p><p>Create a file named test_chatbots.py and add the following code to it.</p><pre>from deepeval.test_case import ConversationalTestCase, Turn<br>from deepeval.metrics import TurnRelevancyMetric, KnowledgeRetentionMetric<br>from deepeval import evaluate<br><br>from dotenv import load_dotenv<br>load_dotenv()<br><br>test_case = ConversationalTestCase(<br>    turns=[<br>        Turn(role=&quot;user&quot;, content=&quot;Hello, how are you?&quot;),<br>        Turn(role=&quot;assistant&quot;, content=&quot;I&#39;m doing well, thank you!&quot;),<br>        Turn(role=&quot;user&quot;, content=&quot;How can I help you today?&quot;),<br>        Turn(role=&quot;assistant&quot;, content=&quot;I&#39;d like to buy a ticket to a Coldplay concert.&quot;),<br>    ]<br>)<br><br>evaluate(test_cases=[test_case], metrics=[TurnRelevancyMetric(), KnowledgeRetentionMetric()])</pre><p>Run the test using deepval test run test_chatbots.pycommand to check the evaluation results.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*njC7SocoQpVEixjxJT_iEg.png" /></figure><h4>Example 10: LLM Tracing</h4><p>LLM Tracing allows you to monitor the full execution of your application from start to finish. In DeepEval, the @observe decorator enables tracing and evaluation of any LLM interaction, regardless of the application’s complexity. By identifying individual components of your LLM workflow—such as functions that perform specific tasks or are invoked selectively—you can apply the @observe decorator to track their behavior. This provides detailed insights into how each part of your LLM application operates, making it easier to debug, optimize, and evaluate performance.</p><p>Tracing requires Confident AI credentials to see the traces. Create a basic account on Confident AI and add the given credentials in the .env file.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9U8594EzTkb-l622exOy-A.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*uFpvgk8VdUvG7KPi5jb_Mw.png" /></figure><p>Confident AI is a comprehensive platform designed to evaluate and enhance the performance of large language models (LLMs). It leverages its open-source evaluation framework, DeepEval, to provide robust testing, benchmarking, and monitoring capabilities for LLM applications. Confident AI emphasizes observability, allowing teams to trace LLM interactions, conduct A/B testing, and gather real-time performance insights.</p><p><a href="https://www.confident-ai.com/">Confident AI - The DeepEval LLM Evaluation Platform</a></p><p>Create a file named test_llm_tracing.py and add the following code to it.</p><pre>from openai import OpenAI<br>from deepeval.tracing import observe<br><br>from dotenv import load_dotenv<br>load_dotenv()<br><br>client = OpenAI()<br><br>@observe()<br>def llm_app(query: str) -&gt; str:<br>    return client.chat.completions.create(<br>        model=&quot;gpt-4o-mini&quot;,<br>        messages=[<br>            {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: query}<br>        ]<br>    ).choices[0].message.content<br>    return<br><br># Call app to send trace to Confident AI<br>llm_app(&quot;Write me a poem.&quot;)</pre><p>Run the test using deepval test run test_llm_tracing.pycommand to see the tracing results.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*CIGhAjX1QHv1sYTmI2-aCg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cXM-tQeLQ8tfEbSUKUpkmw.png" /></figure><h4>Example 11: MCP Interactions</h4><p>The MCP Use Metric evaluates how effectively an MCP-based LLM agent utilizes the MCP servers it has access to. It leverages an LLM-as-a-judge approach to assess both the MCP primitives invoked and the arguments generated by the LLM application. This metric can be applied to a single-turn LLMTestCase containing MCP parameters, providing insights into the agent’s efficiency and correctness in interacting with the MCP environment.</p><ul><li>Create a basic MCP server using FastMCP. For that, create a file named mcp_echo_server.py and add the following code to it.</li></ul><pre>import asyncio<br>from fastmcp import FastMCP<br><br># Initialize FastMCP server<br>mcp = FastMCP(&quot;Simple Echo Server&quot;)<br><br>@mcp.tool()<br>def echo(message: str) -&gt; str:<br>    &quot;&quot;&quot;Echo back the provided message.&quot;&quot;&quot;<br>    return message<br><br>def main():<br>    &quot;&quot;&quot;Run the server.&quot;&quot;&quot;<br>    mcp.run()<br><br>if __name__ == &quot;__main__&quot;:<br>    main()</pre><ul><li>Run the server using the command</li></ul><pre>python mcp_echo_server.py </pre><ul><li>Create a test case to evalute the MCP. For that, create a file named test_mcp.py and add the following code to it.</li></ul><pre>from deepeval import evaluate<br>from deepeval.metrics import MCPUseMetric<br>from deepeval.test_case import LLMTestCase, MCPServer<br><br>from dotenv import load_dotenv<br>load_dotenv()<br><br>test_case = LLMTestCase(<br>    input=&quot;Hello&quot;, # Your input here<br>    actual_output=&quot;Hello&quot;, # Your LLM app&#39;s final output here<br>    mcp_servers=[MCPServer(server_name=&quot;Simple Echo Server&quot;)] # Your MCP server&#39;s data<br>    # MCP primitives used (if any)<br>)<br><br>metric = MCPUseMetric()<br><br># To run metric as a standalone<br># metric.measure(convo_test_case)<br># print(metric.score, metric.reason)<br><br>evaluate([test_case], [metric])</pre><p>Run the test using deepval test run test_mcp.pycommand to see the mcp evaluation results.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*OKgvEqsBouOP6MEvs1yvFg.png" /></figure><h3>MMLU benchmarking with DeepEval for custom LLMs</h3><p>MMLU (Massive Multitask Language Understanding) is a benchmark commonly used to evaluate large language models through multiple-choice questions. Covering 57 subjects ranging from math and history to law and ethics, it provides a thorough assessment of an LLM’s knowledge and reasoning skills. Its wide subject coverage and carefully designed questions have made MMLU a gold standard for measuring model performance.</p><p>In this guide, we will evaluate our custom LLM (TinyLlama-1.1B) on the MMLU dataset. Each entry in the dataset consists of an input prompt and multiple-choice answers (A, B, C, D). Model performance is measured by calculating the percentage of questions answered correctly.</p><h4><strong>Creating the Custom LLM Model Class</strong></h4><p>Define a custom class called TinyLlamaModel that extends DeepEvalBaseLLM to generate responses using the language model and tokenizer. The objective is to produce short outputs (two tokens) for a given prompt, while efficiently managing device allocation and handling preprocessing for both the input prompt and the generated output.</p><p>Create a file named test_custom_model.py and add the following code to it.</p><pre>import torch<br>from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig<br>from deepeval.models.base_model import DeepEvalBaseLLM<br>from deepeval.benchmarks import MMLU<br>from deepeval.benchmarks.tasks import MMLUTask<br>from typing import List<br><br>class TinyLlamaModel(DeepEvalBaseLLM):<br>    def __init__(self, model, tokenizer):<br>        self.model = model<br>        self.tokenizer = tokenizer<br>        self.device = &quot;cuda&quot; if torch.cuda.is_available() else &quot;cpu&quot;<br><br>    def load_model(self):<br>        return self.model<br><br>    def generate(self, prompt: str) -&gt; str:<br>        # Clean the prompt for multiple choice<br>        prompt = prompt.replace(&quot;Output &#39;A&#39;, &#39;B&#39;, &#39;C&#39;, or &#39;D&#39;. Full answer not needed.&quot;, &quot;&quot;)<br>        <br>        # Format the prompt for TinyLlama<br>        formatted_prompt = f&quot;### Instruction: Answer with just the letter (A, B, C, or D)\n\n### Question: {prompt}\n\n### Answer:&quot;<br>        <br>        model_inputs = self.tokenizer([formatted_prompt], return_tensors=&quot;pt&quot;).to(self.device)<br>        <br>        generated_ids = self.model.generate(<br>            **model_inputs,<br>            max_new_tokens=3,<br>            do_sample=False,<br>            temperature=0.1,<br>            pad_token_id=self.tokenizer.eos_token_id,<br>            repetition_penalty=1.1<br>        )<br>        <br>        # Extract only the new tokens<br>        generated_tokens = generated_ids[0][model_inputs[&#39;input_ids&#39;].shape[1]:]<br>        clean_output = self.tokenizer.decode(generated_tokens, skip_special_tokens=True).strip()<br>        <br>        # Extract just the letter (A, B, C, or D)<br>        for char in clean_output:<br>            if char in [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;]:<br>                return char<br>        <br>        return clean_output[:1]  # Fallback: return first character<br><br>    async def a_generate(self, prompt: str) -&gt; str:<br>        return self.generate(prompt)<br><br>    def batch_generate(self, prompts: List[str]) -&gt; List[str]:<br>        &quot;&quot;&quot;Batch generate method required by MMLU benchmark&quot;&quot;&quot;<br>        results = []<br>        for prompt in prompts:<br>            try:<br>                result = self.generate(prompt)<br>                results.append(result)<br>            except Exception as e:<br>                print(f&quot;Error generating for prompt: {e}&quot;)<br>                results.append(&quot;&quot;)  # Fallback empty response<br>        return results<br><br>    async def a_batch_generate(self, prompts: List[str]) -&gt; List[str]:<br>        return self.batch_generate(prompts)<br><br>    def get_model_name(self):<br>        return &quot;TinyLlama-1.1B-Chat&quot;</pre><h4><strong>Loading the Model and Tokenizer</strong></h4><p>Create two functions to load the LLM model and tokenizer directly from local storage. The model will be loaded in 8-bit precision, and the tokenizer will be initialized with appropriate padding and special token configurations.</p><pre>def load_model(model_name: str):<br>    # Use light quantization for TinyLlama<br>    quant_config = BitsAndBytesConfig(<br>        load_in_4bit=True,<br>        bnb_4bit_quant_type=&quot;nf4&quot;,<br>        bnb_4bit_compute_dtype=torch.float16,<br>    )<br>    <br>    try:<br>        model = AutoModelForCausalLM.from_pretrained(<br>            model_name,<br>            quantization_config=quant_config,<br>            device_map=&quot;auto&quot;,<br>            dtype=torch.float16,<br>            trust_remote_code=True<br>        )<br>        return model<br>    except Exception as e:<br>        print(f&quot;Error loading quantized model, trying without quantization: {e}&quot;)<br>        # Fallback without quantization<br>        return AutoModelForCausalLM.from_pretrained(<br>            model_name,<br>            device_map=&quot;auto&quot;,<br>            dtype=torch.float16,<br>            trust_remote_code=True<br>        )<br><br>def load_tokenizer(model_name):<br>    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)<br>    tokenizer.pad_token = tokenizer.eos_token<br>    tokenizer.padding_side = &quot;left&quot;<br>    return tokenizer</pre><h4>Building the custom LLM</h4><p>We will use the Hugging Face model TinyLlama/TinyLlama-1.1B-Chat-v1.0 to directly load both the model and tokenizer. These will then be passed into the custom LLM class to create an LLM response generator.</p><p>After loading the <strong>TinyLlama 1.1B</strong> model and tokenizer from Hugging Face and wrapping them in our custom class, we can test the response generation. The code first runs a single test prompt related to abstract algebra to verify that the model produces an output. It then performs batch generation with multiple prompts, such as arithmetic and geography questions, to validate that the model can handle multiple inputs efficiently. The results are printed for both single and batch generations, ensuring that our custom model class works as expected.</p><pre># Load TinyLlama 1.1B model from Hugging Face<br>tinyllama_model_name = &quot;TinyLlama/TinyLlama-1.1B-Chat-v1.0&quot;<br><br>print(&quot;Loading tokenizer...&quot;)<br>tokenizer = load_tokenizer(tinyllama_model_name)<br><br>print(&quot;Loading model...&quot;)<br>model = load_model(tinyllama_model_name)<br><br>print(&quot;Creating custom model...&quot;)<br>custom_model = TinyLlamaModel(model, tokenizer)<br><br># Test model generation<br>print(&quot;\nTesting model generation:&quot;)<br>prompt = &quot;&quot;&quot;<br>The following are multiple choice questions (with answers) about abstract algebra.<br><br>Find all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.<br>A. 0<br>B. 1<br>C. 2<br>D. 3<br>Answer:&quot;&quot;&quot;<br><br>test_output = custom_model.generate(prompt)<br>print(f&quot;Generated output: &#39;{test_output}&#39;&quot;)<br><br># Test batch generation<br>print(&quot;\nTesting batch generation...&quot;)<br>test_prompts = [<br>    prompt,<br>    &quot;What is 2+2? A. 1 B. 2 C. 3 D. 4 Answer:&quot;,<br>    &quot;Capital of France? A. London B. Berlin C. Paris D. Rome Answer:&quot;<br>]<br><br>batch_outputs = custom_model.batch_generate(test_prompts)<br>for i, output in enumerate(batch_outputs):<br>    print(f&quot;Batch output {i+1}: &#39;{output}&#39;&quot;)</pre><h4><strong>Running the MMLU Benchmark</strong></h4><p>Finally, we will load the MMLU benchmark, define the tasks, and run the evaluation on the custom model. The results of each task can be reviewed using benchmark.task_scores, while benchmark.predictions provides detailed outputs showing which samples were answered correctly and which were not. This allows for a more granular analysis of the model’s performance.</p><pre># Run MMLU benchmark with very light settings<br>print(&quot;\nRunning MMLU benchmark...&quot;)<br>benchmark = MMLU(<br>    tasks=[MMLUTask.HIGH_SCHOOL_COMPUTER_SCIENCE],  # Only one task<br>    n_shots=2  # Reduced shots for smaller model<br>)<br><br>try:<br>    benchmark.evaluate(model=custom_model, batch_size=1)  # Batch size 1<br>    <br>    print(&quot;\nBenchmark Results:&quot;)<br>    print(f&quot;Task Scores: {benchmark.task_scores}&quot;)<br>    print(f&quot;Overall Score: {benchmark.overall_score}&quot;)<br>    <br>    # Print detailed predictions<br>    print(&quot;\nSample Predictions:&quot;)<br>    for i, (input_text, prediction) in enumerate(list(benchmark.predictions.items())[:3]):<br>        print(f&quot;Prediction {i+1}:&quot;)<br>        print(f&quot;Input: {input_text[:100]}...&quot;)<br>        print(f&quot;Prediction: {prediction}&quot;)<br>        print(&quot;---&quot;)<br>        <br>except Exception as e:<br>    print(f&quot;Error during benchmark evaluation: {e}&quot;)<br>    print(&quot;Trying with even smaller settings...&quot;)<br>    <br>    # Fallback: try with minimal settings<br>    benchmark = MMLU(<br>        tasks=[MMLUTask.HIGH_SCHOOL_COMPUTER_SCIENCE],<br>        n_shots=1<br>    )<br>    benchmark.evaluate(model=custom_model, batch_size=1)<br>    print(f&quot;Fallback overall score: {benchmark.overall_score}&quot;)</pre><h4>Final Code</h4><p>The complete implementation will look as follows:</p><pre>import torch<br>from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig<br>from deepeval.models.base_model import DeepEvalBaseLLM<br>from deepeval.benchmarks import MMLU<br>from deepeval.benchmarks.tasks import MMLUTask<br>from typing import List<br><br>class TinyLlamaModel(DeepEvalBaseLLM):<br>    def __init__(self, model, tokenizer):<br>        self.model = model<br>        self.tokenizer = tokenizer<br>        self.device = &quot;cuda&quot; if torch.cuda.is_available() else &quot;cpu&quot;<br><br>    def load_model(self):<br>        return self.model<br><br>    def generate(self, prompt: str) -&gt; str:<br>        # Clean the prompt for multiple choice<br>        prompt = prompt.replace(&quot;Output &#39;A&#39;, &#39;B&#39;, &#39;C&#39;, or &#39;D&#39;. Full answer not needed.&quot;, &quot;&quot;)<br>        <br>        # Format the prompt for TinyLlama<br>        formatted_prompt = f&quot;### Instruction: Answer with just the letter (A, B, C, or D)\n\n### Question: {prompt}\n\n### Answer:&quot;<br>        <br>        model_inputs = self.tokenizer([formatted_prompt], return_tensors=&quot;pt&quot;).to(self.device)<br>        <br>        generated_ids = self.model.generate(<br>            **model_inputs,<br>            max_new_tokens=3,<br>            do_sample=False,<br>            temperature=0.1,<br>            pad_token_id=self.tokenizer.eos_token_id,<br>            repetition_penalty=1.1<br>        )<br>        <br>        # Extract only the new tokens<br>        generated_tokens = generated_ids[0][model_inputs[&#39;input_ids&#39;].shape[1]:]<br>        clean_output = self.tokenizer.decode(generated_tokens, skip_special_tokens=True).strip()<br>        <br>        # Extract just the letter (A, B, C, or D)<br>        for char in clean_output:<br>            if char in [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;]:<br>                return char<br>        <br>        return clean_output[:1]  # Fallback: return first character<br><br>    async def a_generate(self, prompt: str) -&gt; str:<br>        return self.generate(prompt)<br><br>    def batch_generate(self, prompts: List[str]) -&gt; List[str]:<br>        &quot;&quot;&quot;Batch generate method required by MMLU benchmark&quot;&quot;&quot;<br>        results = []<br>        for prompt in prompts:<br>            try:<br>                result = self.generate(prompt)<br>                results.append(result)<br>            except Exception as e:<br>                print(f&quot;Error generating for prompt: {e}&quot;)<br>                results.append(&quot;&quot;)  # Fallback empty response<br>        return results<br><br>    async def a_batch_generate(self, prompts: List[str]) -&gt; List[str]:<br>        return self.batch_generate(prompts)<br><br>    def get_model_name(self):<br>        return &quot;TinyLlama-1.1B-Chat&quot;<br>    <br>def load_model(model_name: str):<br>    # Use light quantization for TinyLlama<br>    quant_config = BitsAndBytesConfig(<br>        load_in_4bit=True,<br>        bnb_4bit_quant_type=&quot;nf4&quot;,<br>        bnb_4bit_compute_dtype=torch.float16,<br>    )<br>    <br>    try:<br>        model = AutoModelForCausalLM.from_pretrained(<br>            model_name,<br>            quantization_config=quant_config,<br>            device_map=&quot;auto&quot;,<br>            dtype=torch.float16,<br>            trust_remote_code=True<br>        )<br>        return model<br>    except Exception as e:<br>        print(f&quot;Error loading quantized model, trying without quantization: {e}&quot;)<br>        # Fallback without quantization<br>        return AutoModelForCausalLM.from_pretrained(<br>            model_name,<br>            device_map=&quot;auto&quot;,<br>            dtype=torch.float16,<br>            trust_remote_code=True<br>        )<br><br>def load_tokenizer(model_name):<br>    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)<br>    tokenizer.pad_token = tokenizer.eos_token<br>    tokenizer.padding_side = &quot;left&quot;<br>    return tokenizer<br><br># Load TinyLlama 1.1B model from Hugging Face<br>tinyllama_model_name = &quot;TinyLlama/TinyLlama-1.1B-Chat-v1.0&quot;<br><br>print(&quot;Loading tokenizer...&quot;)<br>tokenizer = load_tokenizer(tinyllama_model_name)<br><br>print(&quot;Loading model...&quot;)<br>model = load_model(tinyllama_model_name)<br><br>print(&quot;Creating custom model...&quot;)<br>custom_model = TinyLlamaModel(model, tokenizer)<br><br># Test model generation<br>print(&quot;\nTesting model generation:&quot;)<br>prompt = &quot;&quot;&quot;<br>The following are multiple choice questions (with answers) about abstract algebra.<br><br>Find all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.<br>A. 0<br>B. 1<br>C. 2<br>D. 3<br>Answer:&quot;&quot;&quot;<br><br>test_output = custom_model.generate(prompt)<br>print(f&quot;Generated output: &#39;{test_output}&#39;&quot;)<br><br># Test batch generation<br>print(&quot;\nTesting batch generation...&quot;)<br>test_prompts = [<br>    prompt,<br>    &quot;What is 2+2? A. 1 B. 2 C. 3 D. 4 Answer:&quot;,<br>    &quot;Capital of France? A. London B. Berlin C. Paris D. Rome Answer:&quot;<br>]<br><br>batch_outputs = custom_model.batch_generate(test_prompts)<br>for i, output in enumerate(batch_outputs):<br>    print(f&quot;Batch output {i+1}: &#39;{output}&#39;&quot;)<br><br># Run MMLU benchmark with very light settings<br>print(&quot;\nRunning MMLU benchmark...&quot;)<br>benchmark = MMLU(<br>    tasks=[MMLUTask.HIGH_SCHOOL_COMPUTER_SCIENCE],  # Only one task<br>    n_shots=2  # Reduced shots for smaller model<br>)<br><br>try:<br>    benchmark.evaluate(model=custom_model, batch_size=1)  # Batch size 1<br>    <br>    print(&quot;\nBenchmark Results:&quot;)<br>    print(f&quot;Task Scores: {benchmark.task_scores}&quot;)<br>    print(f&quot;Overall Score: {benchmark.overall_score}&quot;)<br>    <br>    # Print detailed predictions<br>    print(&quot;\nSample Predictions:&quot;)<br>    for i, (input_text, prediction) in enumerate(list(benchmark.predictions.items())[:3]):<br>        print(f&quot;Prediction {i+1}:&quot;)<br>        print(f&quot;Input: {input_text[:100]}...&quot;)<br>        print(f&quot;Prediction: {prediction}&quot;)<br>        print(&quot;---&quot;)<br>        <br>except Exception as e:<br>    print(f&quot;Error during benchmark evaluation: {e}&quot;)<br>    print(&quot;Trying with even smaller settings...&quot;)<br>    <br>    # Fallback: try with minimal settings<br>    benchmark = MMLU(<br>        tasks=[MMLUTask.HIGH_SCHOOL_COMPUTER_SCIENCE],<br>        n_shots=1<br>    )<br>    benchmark.evaluate(model=custom_model, batch_size=1)<br>    print(f&quot;Fallback overall score: {benchmark.overall_score}&quot;)</pre><h4>Executing the test</h4><p>Executing the custom LLM model benchmark test using the above code by running the following command.</p><pre>deepeval test run test_custom_model.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*p89smFT-nNza_AXwlz0-7g.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3U6UPA-I23yN04ORh4XsRw.png" /></figure><p><em>Thanks for reading this article !!</em></p><p><em>Thanks Gowri M Bhatt for reviewing the content.</em></p><p>If you enjoyed this article, please click on the clap button 👏 and share to help others find it!</p><p>The full source code for this tutorial can be found here,</p><p><a href="https://github.com/codemaker2015/deepeval-experiments">GitHub - codemaker2015/deepeval-experiments: This repository contains hands-on experiments with DeepEval, an open-source evaluation framework for testing Large Language Models (LLMs)</a></p><h4>References</h4><ul><li><a href="https://deepeval.com/docs/getting-started">Quick Introduction | DeepEval - The Open-Source LLM Evaluation Framework</a></li><li><a href="https://www.confident-ai.com/docs/llm-evaluation/quickstart">Quickstart | Confident AI Docs</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=d7272b6c2634" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The Ultimate A2A Handbook: Rulebook for Agent Conversations]]></title>
            <link>https://codemaker2016.medium.com/the-ultimate-a2a-handbook-rulebook-for-agent-conversations-74e7e601b05c?source=rss-b7a60e472dfc------2</link>
            <guid isPermaLink="false">https://medium.com/p/74e7e601b05c</guid>
            <category><![CDATA[a2a-protocol]]></category>
            <category><![CDATA[generative-ai-tools]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[ai-agent]]></category>
            <category><![CDATA[generative-ai-use-cases]]></category>
            <dc:creator><![CDATA[Vishnu Sivan]]></dc:creator>
            <pubDate>Sun, 27 Jul 2025 15:42:28 GMT</pubDate>
            <atom:updated>2025-07-28T16:36:56.161Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9I4I7zCSXcKT0KswA1x_Ew.png" /></figure><p>As artificial intelligence continues to evolve, the need for AI systems to communicate and collaborate has become increasingly important. From document summarization to image generation and intelligent decision-making, today’s AI agents are no longer working in isolation — they must interact to handle complex tasks effectively. This is where the Agent-to-Agent (A2A) Protocol comes into play.</p><p>A2A is a communication framework designed to enable seamless interaction between autonomous agents. In an era dominated by distributed systems and multi-agent architectures, A2A offers a structured way for intelligent agents to share information, delegate tasks, and make coordinated decisions without human intervention.</p><p>This article explores the A2A protocol from foundational concepts to advanced implementations, and a building a travel planner app using A2A.</p><h3>Getting Started</h3><h4>Table of contents</h4><ul><li><a href="#5289">What is A2A</a></li><li><a href="#9b23">Core concepts of A2A</a></li><li><a href="#cdfd">A2A vs MCP</a></li><li><a href="#ca01">Experimenting with A2A</a></li><li><a href="#ba69">1. Implementing A2A from scratch using FastAPI</a></li><li><a href="#2e29">2. Implementing A2A using Google A2A SDK</a></li><li><a href="#5931">A2A Client-Server Interaction Flow</a></li><li><a href="#3c9d">3. Implementing A2A using Python-A2A library</a></li><li><a href="#efb9">Example 1: Echo Agent</a></li><li><a href="#0362">Example 2: Basic A2A Agent</a></li><li><a href="#d88c">Example 3: LLM-Based Agent</a></li><li><a href="#a156">Example 4: Converting LangChain to A2A Servers</a></li><li><a href="#4f10">Example 5: Converting MCP Tools to LangChain Tools</a></li><li><a href="#e5c7">Building a travel planner app using A2A</a></li></ul><h3>What is A2A</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*BijAWXDrxIc2Ek1nLdi6pg.gif" /><figcaption>Image Source: <a href="https://blog.dailydoseofds.com/p/a-visual-guide-to-agent2agent-a2a">A Visual Guide to Agent2Agent (A2A) Protocol</a></figcaption></figure><p>Imagine assembling a team of exceptional AI assistants — one masters data analysis, another crafts insightful reports, and a third flawlessly manages your schedule. Individually, they’re outstanding. But there’s a hitch: each speaks a different language. One uses Python, another JSON, and the third relies on obscure API calls. Getting them to collaborate would be like reviving the digital Tower of Babel. This is the challenge that the Agent-to-Agent (A2A) Protocol is designed to solve.</p><p>The Agent-to-Agent (A2A) Protocol, introduced by Google Cloud, is an open standard that enables seamless communication and collaboration between AI agents — regardless of the frameworks or vendors they originate from. Like a universal translator, A2A solves the interoperability challenge by providing a common language for agents to share information, delegate tasks, and coordinate actions effectively.</p><p>A2A complements Anthropic’s Model Context Protocol (MCP) by focusing on inter-agent communication, leveraging Google’s expertise in deploying large-scale agent systems. Together, these protocols lay the foundation for the future of multi-agent collaboration in enterprise environments. The protocol is supported by over 50 major technology and consulting partners, reflecting a shared vision for scalable, interoperable agent ecosystems.</p><h4>Core concepts of A2A</h4><p>The Agent-to-Agent (A2A) Protocol is designed around a set of foundational concepts that enable intelligent agents to collaborate efficiently and reliably. These core building blocks define how agents communicate, manage tasks, and exchange data.</p><ul><li><strong>AgentCard</strong><br>A standardized JSON document that describes an agent’s identity, capabilities, and supported protocols. Typically hosted at the /.well-known/agent.json endpoint, the AgentCard allows other agents or clients to easily discover and understand how to interact with the agent.</li><li><strong>Task</strong><br>A Task represents a stateful collaboration between a client and an agent, tracking progress toward a specific goal. It includes task status, execution history, and references to outputs (artifacts). Tasks enable agents to maintain context across multi-step processes.</li><li><strong>Artifact</strong><br>An Artifact is the final, immutable result produced by an agent during a task. It may include one or more Parts, which are discrete pieces of structured or unstructured content (e.g., text, files, or forms). Artifacts are useful for recording outcomes or sharing final deliverables.</li><li><strong>Message</strong><br>Used to exchange non-artifact content between agents or with clients. Messages may contain instructions, intermediate thoughts, context updates, or task status information. They support dynamic interaction throughout the lifecycle of a task.</li><li><strong>Part<br></strong>A Part is the smallest unit of content within a Message or Artifact. Each Part has a specific content type (e.g., text/plain, application/json, or a file reference), allowing agents to structure complex content with clarity and modularity.</li><li><strong>Transport mechanism</strong><br>The transport mechanism in the A2A protocol determines how agents exchange messages, using technologies like HTTP/HTTPS for simplicity, gRPC for low-latency communication, and MQTT or message buses for asynchronous, event-driven interactions — based on system requirements.</li><li><strong>Discovery</strong><br>Agents need a way to find and connect with each other. A2A supports discovery through mechanisms similar to DNS (e.g., static URLs) or through centralized agent registries — especially useful in enterprise or multi-agent platform settings.</li><li><strong>Security and authentication</strong><br>The A2A protocol ensures secure agent communication through authentication (using API keys, OAuth, or identity assertions), authorization for access control, and encryption (like TLS) to protect sensitive data during transmission.</li></ul><h3>A2A vs MCP</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*88Kyx0A6O3mSLsqlSka-5g.gif" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qXnRc2ofrbUKIyArOG_G-Q.gif" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*z3npMy-PE1ke9OSO6HKlwA.gif" /><figcaption>Source: A Visual Guide to Agent2Agent (A2A) Protocol</figcaption></figure><p>The Agent-to-Agent (A2A) protocol is designed to facilitate collaboration among multiple AI agents. It enables agents to interact securely through tasks, message exchanges, and artifact sharing. These interactions are stateful, allowing complex workflows to unfold over time. A2A supports discovery via JSON-based AgentCards, letting agents or clients locate and communicate with other agents based on capabilities. This protocol encourages modular, decentralized design where agents from different frameworks (like CrewAI or LlamaIndex) can work together as part of a team.</p><p>In contrast, the Model Context Protocol (MCP) is focused on enabling AI agents to access tools, plugins, or APIs. It allows a single model to invoke external capabilities like calculators, data retrievers, or code execution environments. MCP is less about inter-agent collaboration and more about augmenting an individual agent’s power through tool access. Importantly, A2A and MCP aren’t competitors — they’re complementary. A2A agents can be exposed as MCP tools, allowing models using MCP to discover and communicate with other agents via A2A. This layered integration bridges tool invocation with multi-agent orchestration, enabling richer, more intelligent systems.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/832/1*V8neSJFLphvbQYgShH25sw.png" /></figure><h3>Experimenting with A2A</h3><p>In this section, we will explore how to build A2A (Agent-to-Agent) applications. There are three main approaches we can follow:</p><ol><li>Implementing A2A from scratch using FastAPI,</li><li>Using Google’s A2A SDK (google-a2a),</li><li>Leveraging the Python-A2A library, a comprehensive implementation of Google&#39;s A2A protocol.</li></ol><p>We will begin by building an A2A implementation from scratch using FastAPI to understand the core concepts. Then, we will explore how to create A2A agents using the official Google A2A SDK. Finally, we will use the Python A2A library, which simplifies the development process and provides a robust interface for enabling seamless communication and collaboration between AI agents.</p><h3>1. Implementing A2A from scratch using FastAPI</h3><p>Let’s begin with a basic echo agent serves as the “Hello World” of A2A, helping you learn the core concepts by returning whatever input it receives.</p><h4>Installing uv</h4><p>We will use uv, a fast and modern Python project manager, to set up and manage our environment. It simplifies tasks like handling dependencies, creating virtual environments, and running scripts.</p><p>To install uv, run this in your terminal:</p><pre># For Windows<br>powershell -ExecutionPolicy ByPass -c &quot;irm https://astral.sh/uv/install.ps1 | iex&quot;<br>set Path=C:\Users\Codem\.local\bin;%Path%<br><br># For linux / Mac<br>curl -LsSf https://astral.sh/uv/install.sh | sh</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/875/0*sLdIpIJrPCQjk5zB.png" /></figure><p>Refer to the official website for detailed installation instructions.</p><p><a href="https://docs.astral.sh/uv/getting-started/installation">Installation | uv</a></p><h4>Installing the dependencies</h4><ul><li>Initialize a uv project by executing the following command.</li></ul><pre>uv init basic_a2a_demo<br>cd basic_a2a_demo</pre><ul><li>Create and activate a virtual environment by executing the following command.</li></ul><pre>uv venv<br>source .venv/bin/activate # for linux<br>.venv\Scripts\activate    # for windows</pre><ul><li>Install fastapi, uvicorn, requests, uuid and sseclient-py using uv.</li></ul><pre>uv add fastapi uvicorn requests uuid sseclient-py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*CEjAU72FXc4edCsa3icn4g.png" /></figure><h4>Creating an agent card</h4><p>The agent.json file acts as your agent’s identity card, describing its purpose and how others can interact with it.</p><p>Create a file named agent.json and add the following code to it.</p><pre>{<br>  &quot;schema_version&quot;: &quot;1.0.0&quot;,<br>  &quot;name&quot;: &quot;Echo Agent&quot;,<br>  &quot;description&quot;: &quot;I repeat what you say, like a friendly cave.&quot;,<br>  &quot;contact_email&quot;: &quot;you@example.com&quot;,<br>  &quot;capabilities&quot;: [<br>    &quot;a2a.text-chat&quot;<br>  ],<br>  &quot;versions&quot;: [<br>    {<br>      &quot;version&quot;: &quot;1.0.0&quot;,<br>      &quot;endpoint&quot;: &quot;http://localhost:8000/a2a&quot;,<br>      &quot;supports_streaming&quot;: true,<br>      &quot;auth&quot;: {<br>        &quot;type&quot;: &quot;none&quot;<br>      }<br>    }<br>  ]<br>}</pre><h4>Building A2A server</h4><p>Build a basic server that listens for messages and sends them back just as it received like a digital echo.</p><p>Create a file named echo_server.py and add the following code to it.</p><pre>from fastapi import FastAPI, Request<br>from fastapi.responses import JSONResponse<br>from sse_starlette.sse import EventSourceResponse<br>import uuid<br>import json<br>import asyncio<br>from datetime import datetime<br><br>app = FastAPI()<br># Serve your business card at the standard location<br>@app.get(&quot;/.well-known/agent.json&quot;)<br>async def get_agent_card():<br>    with open(&quot;agent.json&quot;) as f:<br>        return json.load(f)<br><br># Handle regular (non-streaming) requests<br>@app.post(&quot;/a2a/tasks/send&quot;)<br>async def tasks_send(request: Request):<br>    data = await request.json()<br>    task_id = data.get(&quot;task_id&quot;, str(uuid.uuid4()))<br>    user_message = next((m for m in data.get(&quot;messages&quot;, []) <br>                        if m.get(&quot;role&quot;) == &quot;user&quot;), None)<br>    <br>    if not user_message:<br>        return JSONResponse(status_code=400, content={&quot;error&quot;: &quot;No user message found&quot;})<br><br>    parts = user_message.get(&quot;parts&quot;, [])<br>    text_parts = [p.get(&quot;text&quot;) for p in parts if p.get(&quot;type&quot;) == &quot;text&quot;]<br>    echo_text = f&quot;Echo: {&#39; &#39;.join(text_parts)}&quot;<br>    <br>    return {<br>        &quot;task_id&quot;: task_id,<br>        &quot;status&quot;: &quot;completed&quot;,<br>        &quot;created_time&quot;: datetime.utcnow().isoformat(),<br>        &quot;updated_time&quot;: datetime.utcnow().isoformat(),<br>        &quot;messages&quot;: [<br>            {<br>                &quot;role&quot;: &quot;agent&quot;,<br>                &quot;parts&quot;: [{&quot;type&quot;: &quot;text&quot;, &quot;text&quot;: echo_text}]<br>            }<br>        ]<br>    }<br><br>if __name__ == &quot;__main__&quot;:<br>    import uvicorn<br>    uvicorn.run(app, host=&quot;0.0.0.0&quot;, port=8000)</pre><h4>Creating A2A client</h4><p>Create a client to communicate with the server like crafting a remote control for your freshly built device.</p><pre>import requests<br>import uuid<br><br>class SimpleA2AClient:<br>    def __init__(self, server_url):<br>        self.server_url = server_url<br>        <br>    def discover_agent(self):<br>        response = requests.get(f&quot;{self.server_url}/.well-known/agent.json&quot;)<br>        response.raise_for_status()<br>        return response.json()<br>        <br>    def send_message(self, text):<br>        task_id = str(uuid.uuid4())<br>        <br>        # Prepare our request in A2A format<br>        payload = {<br>            &quot;task_id&quot;: task_id,<br>            &quot;messages&quot;: [<br>                {<br>                    &quot;role&quot;: &quot;user&quot;,<br>                    &quot;parts&quot;: [<br>                        {<br>                            &quot;type&quot;: &quot;text&quot;,<br>                            &quot;text&quot;: text<br>                        }<br>                    ]<br>                }<br>            ]<br>        }<br>        <br>        # Send the request to the agent<br>        endpoint = f&quot;{self.server_url}/a2a/tasks/send&quot;<br>        response = requests.post(endpoint, json=payload)<br>        response.raise_for_status()<br>        return response.json()<br><br>if __name__ == &quot;__main__&quot;:<br>    # Create our client<br>    client = SimpleA2AClient(&quot;http://localhost:8000&quot;)<br>    <br>    # Check out the agent&#39;s capabilities<br>    agent_card = client.discover_agent()<br>    print(f&quot;Found agent: {agent_card[&#39;name&#39;]}&quot;)<br>    <br>    # Send a message<br>    message = input(&quot;Type a message to send: &quot;)<br>    response = client.send_message(message)<br>    <br>    # Extract and display the agent&#39;s response<br>    agent_message = response.get(&quot;messages&quot;, [{}])[0]<br>    agent_parts = agent_message.get(&quot;parts&quot;, [{}])<br>    response_text = agent_parts[0].get(&quot;text&quot;, &quot;No response&quot;) if agent_parts else &quot;No parts&quot;<br>    <br>    print(f&quot;\nAgent response: {response_text}&quot;)</pre><h4>Executing the app</h4><ul><li>Open two terminal windows.</li><li>In the first terminal, start the Echo Server:</li></ul><pre>python echo_server.py</pre><ul><li>In the second terminal, run the client:</li></ul><pre><br>python echo_client.py</pre><p>The output will look like this,</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*EL3eNsJwl2015v754id_5w.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*xbcnNOOyjpydQgpMJJvatQ.png" /></figure><h3>2. Implementing A2A using Google A2A SDK</h3><p>The A2A Protocol provides a standardized framework that allows agents to work together intelligently and efficiently. The Echo example implemented in the last section highlights the core concepts of A2A, however real-world applications can extend this by connecting agents to language models, databases, and APIs, enabling streaming for real-time updates, adding authentication for secure communication, and coordinating multiple agents to solve complex tasks. Since building these features from scratch is challenging, developers often rely on libraries like Agent Developer Kit (ADK) — Google ADK, Google A2A and Python A2A to simplify the development process.</p><h4>Installing the dependencies</h4><ul><li>Initialize a uv project by executing the following command.</li></ul><pre>uv init google_a2a_demo<br>cd google_a2a_demo</pre><ul><li>Create and activate a virtual environment by executing the following command.</li></ul><pre>uv venv<br>source .venv/bin/activate # for linux<br>.venv\Scripts\activate    # for windows</pre><ul><li>Install a2a-sdk and uvicorn using uv.</li></ul><pre>uv add a2a-sdk uvicorn</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*sLbg6ohJKq9cTkFl0z-ibg.png" /></figure><h4>Creating Agent executor</h4><p>To handle tasks, we need to create an Agent Executor. In a real-world application, this would involve connecting to an LLM or executing other complex logic. For our “Hello World” example, we’ll implement a minimal handler: whenever the agent receives a hello_world task, it simply responds with “Hello, world!”.</p><p>The A2A SDK provides an AgentExecutor class, where you define the logic for each skill. For this example, it’s as simple as implementing a function that returns the string &quot;Hello, world&quot; when called.</p><p>Create a file named agent_executor.py and add the following code to it.</p><pre>from a2a.server.agent_execution import AgentExecutor, RequestContext<br>from a2a.server.events import EventQueue<br>from a2a.utils import new_agent_text_message<br><br>class HelloWorldAgent:<br>    async def invoke(self) -&gt; str:<br>        return &#39;Hello World&#39;<br><br>class HelloWorldAgentExecutor(AgentExecutor):<br>    def __init__(self):<br>        self.agent = HelloWorldAgent()<br><br>    async def execute(<br>        self,<br>        context: RequestContext,<br>        event_queue: EventQueue,<br>    ) -&gt; None:<br>        result = await self.agent.invoke()<br>        await event_queue.enqueue_event(new_agent_text_message(result))<br><br>    async def cancel(<br>        self, context: RequestContext, event_queue: EventQueue<br>    ) -&gt; None:<br>        raise Exception(&#39;cancel not supported&#39;)</pre><h4>Setting up the A2A server</h4><p>Lets create a simple “Hello World” A2A (Agent-to-Agent) server using the A2A SDK. It defines basic skills, configures the agent’s public and extended profiles, and launches the server to handle requests via HTTP.</p><p>Add the following content to main.py file.</p><pre>import uvicorn<br><br>from a2a.server.apps import A2AStarletteApplication<br>from a2a.server.request_handlers import DefaultRequestHandler<br>from a2a.server.tasks import InMemoryTaskStore<br>from a2a.types import AgentCapabilities, AgentCard, AgentSkill<br>from agent_executor import HelloWorldAgentExecutor<br><br>if __name__ == &#39;__main__&#39;:<br>    skill = AgentSkill(<br>        id=&#39;hello_world&#39;,<br>        name=&#39;Returns hello world&#39;,<br>        description=&#39;just returns hello world&#39;,<br>        tags=[&#39;hello world&#39;],<br>        examples=[&#39;hi&#39;, &#39;hello world&#39;],<br>    )<br><br>    extended_skill = AgentSkill(<br>        id=&#39;super_hello_world&#39;,<br>        name=&#39;Returns a SUPER Hello World&#39;,<br>        description=&#39;A more enthusiastic greeting, only for authenticated users.&#39;,<br>        tags=[&#39;hello world&#39;, &#39;super&#39;, &#39;extended&#39;],<br>        examples=[&#39;super hi&#39;, &#39;give me a super hello&#39;],<br>    )<br><br>    # This will be the public-facing agent card<br>    public_agent_card = AgentCard(<br>        name=&#39;Hello World Agent&#39;,<br>        description=&#39;Just a hello world agent&#39;,<br>        url=&#39;http://localhost:9999/&#39;,<br>        version=&#39;1.0.0&#39;,<br>        default_input_modes=[&#39;text&#39;],<br>        default_output_modes=[&#39;text&#39;],<br>        capabilities=AgentCapabilities(streaming=True),<br>        skills=[skill],<br>        supports_authenticated_extended_card=True,<br>    )<br><br>    # This will be the authenticated extended agent card<br>    specific_extended_agent_card = public_agent_card.model_copy(<br>        update={<br>            &#39;name&#39;: &#39;Hello World Agent - Extended Edition&#39;,  # Different name for clarity<br>            &#39;description&#39;: &#39;The full-featured hello world agent for authenticated users.&#39;,<br>            &#39;version&#39;: &#39;1.0.1&#39;,  # Could even be a different version<br>            &#39;skills&#39;: [<br>                skill,<br>                extended_skill,<br>            ],<br>        }<br>    )<br><br>    request_handler = DefaultRequestHandler(<br>        agent_executor=HelloWorldAgentExecutor(),<br>        task_store=InMemoryTaskStore(),<br>    )<br><br>    server = A2AStarletteApplication(<br>        agent_card=public_agent_card,<br>        http_handler=request_handler,<br>        extended_agent_card=specific_extended_agent_card,<br>    )<br><br>    uvicorn.run(server.build(), host=&#39;0.0.0.0&#39;, port=9999)</pre><p>In this code,</p><ul><li>we define a skill (AgentSkill) that returns a simple &quot;Hello, world!&quot; message, along with an extended version for authenticated users.</li><li>An agent card (AgentCard) describes the agent’s capabilities, including its skills and supported input/output modes. A separate extended agent card provides enhanced functionality for authenticated clients.</li><li>The request handler connects the logic (HelloWorldAgentExecutor) to an in-memory task store.</li><li>Finally, we use A2AStarletteApplication (built on Starlette) to expose the agent as an HTTP service and run it using Uvicorn on port 9999.</li></ul><h4>Setting up the client</h4><p>Lets sets up an HTTP client, fetches the agent’s card (including an optional extended version for authenticated access), initializes the A2A client, and sends a message query (&quot;How much is 10 USD in INR?&quot;) both as a regular and a streaming message.</p><p>Create a file named test_client.py and add the following code to it.</p><pre>import logging<br>from typing import Any<br>from uuid import uuid4<br>import httpx<br><br>from a2a.client import A2ACardResolver, A2AClient<br>from a2a.types import (<br>    AgentCard,<br>    MessageSendParams,<br>    SendMessageRequest,<br>    SendStreamingMessageRequest,<br>)<br><br><br>async def main() -&gt; None:<br>    PUBLIC_AGENT_CARD_PATH = &#39;/.well-known/agent.json&#39;<br>    EXTENDED_AGENT_CARD_PATH = &#39;/agent/authenticatedExtendedCard&#39;<br><br>    logging.basicConfig(level=logging.INFO)<br>    logger = logging.getLogger(__name__)  <br><br>    base_url = &#39;http://localhost:9999&#39;<br><br>    async with httpx.AsyncClient() as httpx_client:<br>        # Initialize A2ACardResolver<br>        resolver = A2ACardResolver(<br>            httpx_client=httpx_client,<br>            base_url=base_url,<br>        )<br>        # Fetch Public Agent Card and Initialize Client<br>        final_agent_card_to_use: AgentCard | None = None<br><br>        try:<br>            logger.info(f&#39;Attempting to fetch public agent card from: {base_url}{PUBLIC_AGENT_CARD_PATH}&#39;)<br>            _public_card = await resolver.get_agent_card()<br>            logger.info(&#39;Successfully fetched public agent card:&#39;)<br>            logger.info(_public_card.model_dump_json(indent=2, exclude_none=True))<br>            final_agent_card_to_use = _public_card<br>            logger.info(&#39;\nUsing PUBLIC agent card for client initialization (default).&#39;)<br><br>            if _public_card.supports_authenticated_extended_card:<br>                try:<br>                    logger.info(f&#39;\nPublic card supports authenticated extended card. Attempting to fetch from: {base_url}{EXTENDED_AGENT_CARD_PATH}&#39;)<br>                    auth_headers_dict = {<br>                        &#39;Authorization&#39;: &#39;Bearer dummy-token-for-extended-card&#39;<br>                    }<br>                    _extended_card = await resolver.get_agent_card(<br>                        relative_card_path=EXTENDED_AGENT_CARD_PATH,<br>                        http_kwargs={&#39;headers&#39;: auth_headers_dict},<br>                    )<br>                    logger.info(&#39;Successfully fetched authenticated extended agent card:&#39;)<br>                    logger.info(<br>                        _extended_card.model_dump_json(<br>                            indent=2, exclude_none=True<br>                        )<br>                    )<br>                    final_agent_card_to_use = _extended_card<br>                    logger.info(&#39;\nUsing AUTHENTICATED EXTENDED agent card for client initialization.&#39;)<br>                except Exception as e_extended:<br>                    logger.warning(<br>                        f&#39;Failed to fetch extended agent card: {e_extended}. Will proceed with public card.&#39;,<br>                        exc_info=True,<br>                    )<br>            elif (<br>                _public_card<br>            ):  # supportsAuthenticatedExtendedCard is False or None<br>                logger.info(&#39;\nPublic card does not indicate support for an extended card. Using public card.&#39;)<br><br>        except Exception as e:<br>            logger.error(f&#39;Critical error fetching public agent card: {e}&#39;, exc_info=True)<br>            raise RuntimeError(<br>                &#39;Failed to fetch the public agent card. Cannot continue.&#39;<br>            ) from e<br><br>        client = A2AClient(<br>            httpx_client=httpx_client, agent_card=final_agent_card_to_use<br>        )<br>        logger.info(&#39;A2AClient initialized.&#39;)<br><br>        send_message_payload: dict[str, Any] = {<br>            &#39;message&#39;: {<br>                &#39;role&#39;: &#39;user&#39;,<br>                &#39;parts&#39;: [<br>                    {&#39;kind&#39;: &#39;text&#39;, &#39;text&#39;: &#39;How much is 10 USD in INR?&#39;}<br>                ],<br>                &#39;messageId&#39;: uuid4().hex,<br>            },<br>        }<br>        request = SendMessageRequest(<br>            id=str(uuid4()), params=MessageSendParams(**send_message_payload)<br>        )<br>        response = await client.send_message(request)<br>        print(response.model_dump(mode=&#39;json&#39;, exclude_none=True))<br><br>        streaming_request = SendStreamingMessageRequest(<br>            id=str(uuid4()), params=MessageSendParams(**send_message_payload)<br>        )<br>        stream_response = client.send_message_streaming(streaming_request)<br><br>        async for chunk in stream_response:<br>            print(chunk.model_dump(mode=&#39;json&#39;, exclude_none=True))<br><br>if __name__ == &#39;__main__&#39;:<br>    import asyncio<br><br>    asyncio.run(main())</pre><p>In this code,</p><ul><li>An asynchronous HTTP client (httpx.AsyncClient) is used to fetch a public agent card from a locally running A2A server. It then attempts to retrieve an extended, authenticated version of the card using a dummy bearer token.</li><li>With the obtained card (either public or extended), the script initializes an A2AClient, which is then used to send a currency conversion query (&quot;how much is 10 USD in INR?&quot;).</li><li>The response is printed using both the standard send_message method and the streaming send_message_streaming method, showcasing how to handle real-time agent replies.</li></ul><h4>A2A Client-Server Interaction Flow</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/722/1*DTj7g2fufTsnHaQ_8k8cCg.png" /><figcaption>Image source: <a href="https://a2aprotocol.ai/blog/a2a-samples-hello-world">A2A Samples: Hello World Agent | A2A Protocol</a></figcaption></figure><h4>Executing the app</h4><ul><li>Open two terminal windows.</li><li>In the first terminal, start the A2A Server:</li></ul><pre>uv run main.py</pre><ul><li>In the second terminal, run the client:</li></ul><pre>python test_client.py</pre><p>The output will look like this,</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Fs5_qhZ1vN13KrXXuPLwyw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XOhf7f6A8BRHwR2pRUuXzQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*QRjhvTV02HXUfdy5nXcsJQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*7-74U463-hTR4rHmsDh8Mg.png" /></figure><h3>3. Implementing A2A using Python-A2A library</h3><p>The Python-A2A library is a powerful and developer-friendly extension built on top of Google’s Agent-to-Agent (A2A) protocol. While Google’s native A2A SDK provides foundational tools to build interoperable agents and define standardized communication formats. It is ideal for developers looking to quickly prototype, deploy, and scale intelligent agents that can autonomously communicate and collaborate using the A2A standard. It bridges the gap between low-level protocol details and high-level use cases.</p><h4>Installing the dependencies</h4><ul><li>Initialize a uv project by executing the following command.</li></ul><pre>uv init python_a2a_demo<br>cd python_a2a_demo</pre><ul><li>Create and activate a virtual environment by executing the following command.</li></ul><pre>uv venv<br>source .venv/bin/activate # for linux<br>.venv\Scripts\activate    # for windows</pre><ul><li>Install python-a2a and python-dotenv using uv.</li></ul><pre>uv add python-a2a[all] python-dotenv</pre><h4>Example 1: Echo Agent</h4><p>Lets begin with building a simple A2A compatible agent using the python-a2a library. The agent, named &quot;Echo Agent&quot;, is designed to echo user&#39;s messages.</p><p><strong>Creating echo agent server<br></strong>Create a file named echo_agent.py and add the following code to create a basic A2A agent server.</p><pre>from python_a2a import A2AServer, Message, TextContent, MessageRole, run_server<br><br>class EchoAgent(A2AServer):<br>    def handle_message(self, message):<br>        if message.content.type == &quot;text&quot;:<br>            return Message(<br>                content=TextContent(text=f&quot;Echo: {message.content.text}&quot;),<br>                role=MessageRole.AGENT,<br>                parent_message_id=message.message_id,<br>                conversation_id=message.conversation_id<br>            )<br><br>if __name__ == &quot;__main__&quot;:<br>    agent = EchoAgent()<br>    run_server(agent, host=&quot;0.0.0.0&quot;, port=5000)</pre><p>In this code,</p><ul><li>Defines an EchoAgent class by extending A2AServer to handle incoming A2A protocol messages.</li><li>When a text message is received, it responds with an echo by prepending &quot;Echo:&quot; to the input.</li><li>The response maintains conversation context using message and conversation IDs.</li><li>Launches the agent using run_server on host 0.0.0.0 and port 5000, making it accessible for incoming agent-to-agent communication.</li></ul><p><strong>Creating echo client<br></strong>Create a file named echo_client.py and add the following code to create a client for the echo agent.</p><pre>from python_a2a import A2AClient, Message, TextContent, MessageRole<br><br>client = A2AClient(&quot;http://localhost:5000/a2a&quot;)<br>message = Message(<br>    content=TextContent(text=&quot;Hello, Good morning!&quot;),<br>    role=MessageRole.USER<br>)<br>response = client.send_message(message)<br>print(f&quot;Agent says: {response.content.text}&quot;)</pre><p>In this code,</p><ul><li>Creates an A2AClient that connects to an agent running at <a href="http://localhost:5000/a2a.">http://localhost:5000/a2a.</a></li><li>Constructs a user message with the content &quot;Hello, Good morning!&quot; using the A2A message format.</li><li>Sends the message to the agent using client.send_message(...).</li><li>Receives and prints the agent’s response, showing the echoed reply from the server.</li></ul><p><strong>Executing the code<br></strong>Open separate terminals in the project folder, activate the virtual environment in each, and run the agent server and client separately to execute the code.</p><pre># terminal 1<br>python echo_agent.py<br># terminal 2<br>python echo_client.py</pre><p>The output will look like the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XUch7-NDkbTxv-zysbPoCw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nfJKIv8vLfmELUnDhWpXsw.png" /></figure><h4>Example 2: Basic A2A Agent</h4><p>Lets begin with building a simple A2A compatible agent using the python-a2a library. The agent, named &quot;Greeting Agent&quot;, is designed to detect greetings in a user&#39;s message and respond accordingly.</p><p><strong>Creating greeting agent server<br></strong>Create a file named greeting_agent.py and add the following code to create a basic A2A agent server.</p><pre>from python_a2a import A2AServer, skill, agent, run_server<br>from python_a2a import TaskStatus, TaskState<br><br>@agent(<br>    name=&quot;Greeting Agent&quot;,<br>    description=&quot;A simple agent that responds to greetings&quot;,<br>    version=&quot;1.0.0&quot;<br>)<br>class GreetingAgent(A2AServer):<br><br>    @skill(<br>        name=&quot;Greet&quot;,<br>        description=&quot;Respond to a greeting&quot;,<br>        tags=[&quot;greeting&quot;, &quot;hello&quot;]<br>    )<br>    def greet(self, name=None):<br>        if name:<br>            return f&quot;Hello, {name}! How can I help you today?&quot;<br>        else:<br>            return &quot;Hello there! How can I help you today?&quot;<br><br>    def handle_task(self, task):<br>        message_data = task.message or {}<br>        content = message_data.get(&quot;content&quot;, {})<br>        text = content.get(&quot;text&quot;, &quot;&quot;) if isinstance(content, dict) else &quot;&quot;<br><br>        greeting_words = [&quot;hello&quot;, &quot;hi&quot;, &quot;hey&quot;, &quot;greetings&quot;]<br>        is_greeting = any(word in text.lower() for word in greeting_words)<br><br>        if is_greeting:<br>            name = None<br>            if &quot;my name is&quot; in text.lower():<br>                name = text.lower().split(&quot;my name is&quot;)[1].strip()<br><br>            greeting = self.greet(name)<br>            task.artifacts = [{<br>                &quot;parts&quot;: [{&quot;type&quot;: &quot;text&quot;, &quot;text&quot;: greeting}]<br>            }]<br>            task.status = TaskStatus(state=TaskState.COMPLETED)<br>        else:<br>            task.artifacts = [{<br>                &quot;parts&quot;: [{&quot;type&quot;: &quot;text&quot;, &quot;text&quot;: &quot;I&#39;m a greeting agent. Try saying hello!&quot;}]<br>            }]<br>            task.status = TaskStatus(state=TaskState.COMPLETED)<br><br>        return task<br><br># Run the server<br>if __name__ == &quot;__main__&quot;:<br>    agent = GreetingAgent()<br>    run_server(agent, port=5000)</pre><p>In this code,</p><ul><li>The agent is defined using the @agent decorator, and a skill named Greet is added with the @skill decorator to handle greeting responses.</li><li>The handle_task method identifies whether the incoming message is a greeting and responds accordingly, using the skill defined.</li><li>Finally, the server runs the agent on port 5000 using run_server.</li></ul><p><strong>Creating greeting agent client<br></strong>Create a file named greeting_client.py and add the following code to create a client for the greeting agent.</p><pre>from python_a2a import A2AClient<br><br># Create a client<br>client = A2AClient(&quot;http://localhost:5000&quot;)<br><br># Print agent information<br>print(f&quot;Connected to: {client.agent_card.name}&quot;)<br>print(f&quot;Description: {client.agent_card.description}&quot;)<br>print(f&quot;Skills: {[skill.name for skill in client.agent_card.skills]}&quot;)<br><br># Send a greeting<br>response = client.ask(&quot;Hello there! My name is Vishnu.&quot;)<br>print(f&quot;Response: {response}&quot;)<br><br># Send another message<br>response = client.ask(&quot;What can you do?&quot;)<br>print(f&quot;Response: {response}&quot;)</pre><p><strong>Executing the code<br></strong>Open separate terminals in the project folder, activate the virtual environment in each, and run the agent server and client separately to execute the code.</p><pre># terminal 1<br>python greeting_agent.py<br># terminal 2<br>python greeting_client.py</pre><p>The output will look like the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SA_32X5iuh3M8Zb0kF6law.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*NjXUxIn3jyex9nvYRT8GdQ.png" /></figure><h4>Example 3: LLM-Based Agent</h4><p>Lets build an OpenAI-powered Agent-to-Agent (A2A) server using the python-a2a library. The agent uses OpenAI&#39;s GPT model to respond to queries.</p><p><strong>Create .env File<br></strong>In your project root directory, create a file named .env with the following content:</p><pre>OPENAI_API_KEY=your_openai_api_key_here</pre><p><strong>Creating LLM Server</strong><br>Create a file named llm_agent.py and add the following code to it.</p><pre>from python_a2a import OpenAIA2AServer, run_server<br><br>import os<br>from dotenv import load_dotenv<br>load_dotenv()<br>api_key = os.environ.get(&quot;OPENAI_API_KEY&quot;)<br>if not api_key:<br>    raise ValueError(&quot;Please set the OPENAI_API_KEY environment variable.&quot;)<br><br># Create an OpenAI-based A2A agent<br>agent = OpenAIA2AServer(<br>    api_key=api_key,<br>    model=&quot;gpt-4&quot;,<br>    system_prompt=&quot;You are a helpful assistant that specializes in explaining complex concepts simply.&quot;<br>)<br><br># Run the server<br>if __name__ == &quot;__main__&quot;:<br>    print(&quot;Starting OpenAI-based A2A agent...&quot;)<br>    run_server(agent, host=&quot;0.0.0.0&quot;, port=5000)</pre><p><strong>Creating LLM Client</strong><br>Create a file named llm_client.py and add the following code to it.</p><pre>from python_a2a import A2AClient<br><br># Connect to the OpenAI-based A2A agent<br>client = A2AClient(&quot;http://localhost:5000&quot;)<br><br># Print agent metadata<br>print(f&quot;Connected to: {client.agent_card.name}&quot;)<br>print(f&quot;Description: {client.agent_card.description}&quot;)<br>print(&quot;Skills:&quot;)<br>for skill in client.agent_card.skills:<br>    print(f&quot; - {skill.name}: {skill.description}&quot;)<br><br># Send some example questions to test<br>messages = [<br>    &quot;In short, why speed of light is constant?&quot;,<br>    &quot;Explain bell Inequality experiment in simple terms.&quot;,<br>]<br><br># Send messages and print responses<br>for msg in messages:<br>    response = client.ask(msg)<br>    print(f&quot;\nUser: {msg}&quot;)<br>    print(f&quot;Agent: {response}&quot;)</pre><p><strong>Executing the code<br></strong>Open separate terminals in the project folder, activate the virtual environment in each, and run the agent server and client separately to execute the code.</p><pre># terminal 1<br>python llm_agent.py<br># terminal 2<br>python llm_client.py</pre><p>The output will look like the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*spefzALU_jJV68y6_sXpBw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JUWmKj0vXIvJxZX27bOkog.png" /></figure><h4>Example 4: Converting LangChain to A2A Servers</h4><p>You can turn any LangChain agent or chain into an A2A-compatible server.</p><p>LangChain requires a few additional libraries for proper integration. Run the following code to install the necessary dependencies into the project.</p><pre>uv add langchain-community langchain-openai numexpr</pre><p>Create a file named langchain_to_a2a_server.py and add the following code to it.</p><pre>from langchain.chains import LLMMathChain<br>from langchain_openai import OpenAI<br>from python_a2a.langchain import to_a2a_server<br>from python_a2a import run_server<br><br>import os<br>from dotenv import load_dotenv<br>load_dotenv()<br>api_key = os.environ.get(&quot;OPENAI_API_KEY&quot;)<br>if not api_key:<br>    raise ValueError(&quot;Please set the OPENAI_API_KEY environment variable.&quot;)<br><br># Create a LangChain chain<br>llm = OpenAI(temperature=0)<br>math_chain = LLMMathChain(llm=llm)<br><br># Convert to A2A server<br>a2a_server = to_a2a_server(math_chain)<br><br># Run the server<br>if __name__ == &quot;__main__&quot;:<br>    run_server(a2a_server, port=5000)</pre><p>You can reuse the previously created llm_client for this agent server using the following code. Execute both server and client to see the result.</p><p>The output will look like the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SlLnD7sJYGPVSIlOLMTs0g.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SV6XEfz_1RmNygQoQn45cw.png" /></figure><h4>Example 5: Converting MCP Tools to LangChain Tools</h4><p>You can convert any MCP tools into LangChain-compatible tools, enabling them to be used directly with LangChain agents.</p><pre>from python_a2a.mcp import FastMCP<br>from python_a2a.langchain import to_langchain_tool<br>from python_a2a import run_server<br>from langchain.agents import initialize_agent, AgentType<br>from langchain.llms import OpenAI<br><br># Create an MCP server with tools<br>calculator = FastMCP(name=&quot;Calculator MCP&quot;)<br><br>@calculator.tool()<br>def add(a: float, b: float) -&gt; float:<br>    &quot;&quot;&quot;Add two numbers together.&quot;&quot;&quot;<br>    return a + b<br><br>@calculator.tool()<br>def subtract(a: float, b: float) -&gt; float:<br>    &quot;&quot;&quot;Subtract b from a.&quot;&quot;&quot;<br>    return a - b<br><br># Run the MCP server in a background thread<br>import threading<br>server_thread = threading.Thread(<br>    target=run_server,<br>    args=(calculator,),<br>    kwargs={&quot;port&quot;: 8000},<br>    daemon=True<br>)<br>server_thread.start()<br><br># Convert MCP tools to LangChain tools<br>add_tool = to_langchain_tool(&quot;http://localhost:8000&quot;, &quot;add&quot;)<br>subtract_tool = to_langchain_tool(&quot;http://localhost:8000&quot;, &quot;subtract&quot;)<br><br># Use in a LangChain agent<br>llm = OpenAI(temperature=0)<br>tools = [add_tool, subtract_tool]<br><br>agent = initialize_agent(<br>    tools, <br>    llm, <br>    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,<br>    verbose=True<br>)<br><br># Run the agent<br>result = agent.run(&quot;Add 15 and 27, then subtract 5 from the result&quot;)<br>print(result)</pre><h3>Building a travel planner app using A2A</h3><p>In this section, we will build a multi-agent Travel Planner powered by the A2A (Agent-to-Agent) protocol. This system leverages LangChain, the Ollama LLM, and multiple A2A-compatible agents that work collaboratively to retrieve weather information, perform web searches, and generate a comprehensive travel recommendation.</p><p><strong>Application flow</strong></p><ol><li>The user requests a travel plan (e.g., “Plan a trip to Kerala”).</li><li>The Travel Planner asks the Weather Agent for the forecast.</li><li>Based on the weather (clear or rainy), it decides the activity type.</li><li>It then queries the Tavily Search Agent for suitable activities.</li><li>A local LLM (e.g., via Ollama) compiles this into a final itinerary — ensuring user privacy without third-party data sharing.</li></ol><h4>Prerequisites</h4><p>This hands-on requires the following tools to be installed on your machine:</p><ol><li><strong>Ollama: </strong>Ollama is a platform for running large language models locally on your computer.</li></ol><p><a href="https://ollama.com/download/windows">Download Ollama on Windows</a></p><p>Run the following command in your terminal to pull the model using Ollama:</p><pre>ollama pull llama3.2</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*erYunndnLvB0kWwrxmjR8A.png" /></figure><p><strong>2. Python</strong>: Python is the core language used in this hands-on for scripting and backend logic.</p><p><a href="https://www.python.org/downloads">Download Python</a></p><p><strong>3. uv (Micro virtualenv manager): </strong>uv is a fast and modern Python project manager, to set up and manage our environment.</p><p>To install uv, run this in your terminal:</p><pre># For Windows<br>powershell -ExecutionPolicy ByPass -c &quot;irm https://astral.sh/uv/install.ps1 | iex&quot;<br>set Path=C:\Users\Codem\.local\bin;%Path%<br><br># For linux / Mac<br>curl -LsSf https://astral.sh/uv/install.sh | sh</pre><p>Refer to the official website for detailed installation instructions.</p><p><a href="https://docs.astral.sh/uv/getting-started/installation">Installation | uv</a></p><h4>Installing the dependencies</h4><ul><li>Initialize a uv project by executing the following command.</li></ul><pre>uv init travel_planner<br>cd travel_planner</pre><ul><li>Create and activate a virtual environment by executing the following command.</li></ul><pre>uv venv<br>source .venv/bin/activate # for linux<br>.venv\Scripts\activate    # for windows</pre><ul><li>Install python-a2a, langchain-ollama, python-dotenv, tavily-python and streamlit using uv.</li></ul><pre>uv add python-a2a langchain-ollama python-dotenv tavily-python streamlit</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*7xcmiTEj6AthIBY7ZKkaPA.png" /></figure><h4>Setting up the environment</h4><p>This hands-on project uses <strong>Tavily</strong> and <strong>OpenWeather</strong> API keys.</p><ul><li>Visit the <a href="https://app.tavily.com/home">Tavily official website</a> and sign in to obtain your API key.</li><li>Go to the <a href="https://home.openweathermap.org/users/sign_up">OpenWeather website</a> and create a new account. Once you complete the sign-up, you will receive your API key in your mail.</li><li>In the root directory of your project, create a .env file and add the following content:</li></ul><pre>OPENWEATHER_API_KEY=fedc837bbb7477...<br>TAVILY_API_KEY=tvly-dev-h5xKcqcytBeJQ...</pre><h4>Creating Weather Agent</h4><p>Create a file named WeatherAgent.py and add the following code to it.</p><pre>from python_a2a import A2AServer, skill, agent, run_server, TaskStatus, TaskState<br><br>import os<br>import requests<br>import logging<br><br>from dotenv import load_dotenv<br>load_dotenv()<br>api_key = os.environ.get(&quot;OPENWEATHER_API_KEY&quot;)<br><br><br>@agent(<br>    name=&quot;Weather Agent&quot;,<br>    description=&quot;Provides weather information&quot;,<br>    version=&quot;1.0.0&quot;,<br>    url=&quot;https://zzz.example.com&quot;<br>)<br>class WeatherAgent(A2AServer):<br>    <br>    @skill(<br>        name=&quot;Get Weather&quot;,<br>        description=&quot;Get current weather for a location&quot;,<br>        tags=[&quot;weather&quot;, &quot;forecast&quot;],<br>        examples=&quot;I am a weather agent for getting weather forecast from Open weather&quot;<br>    )<br>    def get_weather(self, location):<br>        if not api_key:<br>            return &quot;Weather service not available (missing API key).&quot;<br>        <br>        try:<br>            url = (<br>                f&quot;https://api.openweathermap.org/data/2.5/weather?&quot;<br>                f&quot;q={location}&amp;units=imperial&amp;appid={api_key}&quot;<br>            )<br>            logging.debug(f&quot;Request URL: {url}&quot;)  # Log the full request URL<br><br>            response = requests.get(url, timeout=5)<br>            response.raise_for_status()<br>            logging.debug(f&quot;Response Status Code: {response.status_code}&quot;)  # Log status code<br>            logging.debug(f&quot;Response Text: {response.text}&quot;)  # Log raw response text<br>            <br>            data = response.json()<br>            <br>            temp = data[&quot;main&quot;][&quot;temp&quot;]<br>            description = data[&quot;weather&quot;][0][&quot;description&quot;]<br>            city_name = data[&quot;name&quot;]<br><br>            logging.debug(f&quot;Parsed Data: Temp = {temp}, Description = {description}, City = {city_name}&quot;)<br>            <br>            return f&quot;The weather in {city_name} is {description} with a temperature of {temp}°F.&quot;<br>        <br>        except requests.RequestException as e:<br>            return f&quot;Error fetching weather: {e}&quot;<br>        except (KeyError, TypeError):<br>            return &quot;Could not parse weather data.&quot;<br>    <br>    def handle_task(self, task):<br>        # Extract location from message<br>        message_data = task.message or {}<br>        content = message_data.get(&quot;content&quot;, {})<br>        text = content.get(&quot;text&quot;, &quot;&quot;) if isinstance(content, dict) else &quot;&quot;<br>        <br>        if &quot;weather&quot; in text.lower() and &quot;in&quot; in text.lower():<br>            location = text.split(&quot;in&quot;, 1)[1].strip().rstrip(&quot;?.&quot;)<br>            <br>            # Get weather and create response<br>            weather_text = self.get_weather(location)<br>            task.artifacts = [{<br>                &quot;parts&quot;: [{&quot;type&quot;: &quot;text&quot;, &quot;text&quot;: weather_text}]<br>            }]<br>            task.status = TaskStatus(state=TaskState.COMPLETED)<br>        else:<br>            task.status = TaskStatus(<br>                state=TaskState.INPUT_REQUIRED,<br>                message={&quot;role&quot;: &quot;agent&quot;, &quot;content&quot;: {&quot;type&quot;: &quot;text&quot;, <br>                         &quot;text&quot;: &quot;Please ask about weather in a specific location.&quot;}}<br>            )<br>        return task<br><br># Run the server<br>if __name__ == &quot;__main__&quot;:<br>    agent = WeatherAgent(google_a2a_compatible=True)<br>    run_server(agent, port=8001, debug=True)</pre><p>In this code,</p><ul><li>Loads the OpenWeather API key from .env and defines a weather agent using A2AServer.</li><li>Registers a skill to fetch and return weather data for a given location using OpenWeatherMap API.</li><li>Handles user tasks by extracting location from the message and invoking the weather skill.</li><li>Responds with the weather info or prompts the user if the location is unclear.</li><li>Runs the agent server on port 8001 with debug mode enabled.</li></ul><h4>Creating Tavily Search Agent</h4><p>Create a file named TavilySearchAgent.py and add the following code to it.</p><pre>from python_a2a import A2AServer, skill, agent, run_server, TaskStatus, TaskState<br>from tavily import TavilyClient<br>import os<br>import logging<br><br>from dotenv import load_dotenv<br>load_dotenv()<br>api_key = os.environ.get(&quot;TAVILY_API_KEY&quot;)<br><br>@agent(<br>    name=&quot;Tavily Search Agent&quot;,<br>    description=&quot;Performs internet search using Tavily API&quot;,<br>    version=&quot;1.0.0&quot;,<br>    url=&quot;https://yourdomain.com&quot;<br>)<br>class TavilySearchAgent(A2AServer):<br><br>    def __init__(self, **kwargs):<br>        super().__init__(**kwargs)<br>        self.client = TavilyClient(api_key)<br><br>    @skill(<br>        name=&quot;Search Internet&quot;,<br>        description=&quot;Perform a web search using Tavily API&quot;,<br>        tags=[&quot;search&quot;, &quot;internet&quot;, &quot;tavily&quot;],<br>        examples=&quot;Search &#39;must visit places in utah in may&#39;&quot;<br>    )<br>    def search(self, query: str):<br>        &quot;&quot;&quot;Perform search using Tavily Search API&quot;&quot;&quot;<br>        try:<br>            response = self.client.search(query=query)<br><br>            results = response.get(&quot;results&quot;, [])<br>            if not results:<br>                return &quot;No search results found.&quot;<br><br>            summary = &quot;\n&quot;.join(<br>                [f&quot;- {r.get(&#39;title&#39;)}: {r.get(&#39;url&#39;)}&quot; for r in results]<br>            )<br>            return f&quot;Top results for &#39;{query}&#39;:\n{summary}&quot;<br><br>        except Exception as e:<br>            logging.error(f&quot;Error during Tavily search: {e}&quot;)<br>            return f&quot;Search failed: {e}&quot;<br><br>    def handle_task(self, task):<br>        message_data = task.message or {}<br>        content = message_data.get(&quot;content&quot;, {})<br>        text = content.get(&quot;text&quot;, &quot;&quot;) if isinstance(content, dict) else &quot;&quot;<br><br>        if text.strip():<br>            query = text.strip()<br>            result = self.search(query)<br>            task.artifacts = [{<br>                &quot;parts&quot;: [{&quot;type&quot;: &quot;text&quot;, &quot;text&quot;: result}]<br>            }]<br>            task.status = TaskStatus(state=TaskState.COMPLETED)<br>        else:<br>            task.status = TaskStatus(<br>                state=TaskState.INPUT_REQUIRED,<br>                message={&quot;role&quot;: &quot;agent&quot;, &quot;content&quot;: {&quot;type&quot;: &quot;text&quot;, <br>                         &quot;text&quot;: &quot;Please provide a search query.&quot;}}<br>            )<br>        return task<br><br><br>if __name__ == &quot;__main__&quot;:<br>    agent = TavilySearchAgent(google_a2a_compatible=True)<br>    run_server(agent, port=8002, debug=True)</pre><h4>Creating Local LLM Agent</h4><p>Create a file named LocalLLMAgent.py and add the following code to it.</p><pre>from python_a2a import run_server<br>from python_a2a.langchain import to_a2a_server<br>from langchain_ollama.llms import OllamaLLM<br><br># Create a LangChain LLM<br>#llm = ChatOpenAI(model=&quot;gpt-3.5-turbo&quot;, temperature=0)<br>llm = OllamaLLM(model=&quot;llama3.2:latest&quot;)<br><br># Convert LLM to A2A server<br>llm_server = to_a2a_server(llm)<br><br>if __name__ == &quot;__main__&quot;:<br>    print(&quot;Starting LLM A2A server on port 5001...&quot;)<br>    run_server(llm_server, port=5001)</pre><h4>Creating Travel Planner Agent</h4><p>Create a file named TravelPlannerApp.py and add the following code to it.</p><pre>import streamlit as st<br>from python_a2a import AgentNetwork, A2AClient<br>import asyncio<br><br># Function to run async logic inside Streamlit<br>def run_async(coro):<br>    return asyncio.run(coro)<br><br>async def plan_trip(destination, travel_dates):<br>    # Create an agent network<br>    network = AgentNetwork(name=&quot;Travel Assistant Network&quot;)<br>    network.add(&quot;weather&quot;, &quot;http://localhost:8001&quot;)<br>    network.add(&quot;search&quot;, &quot;http://localhost:8002&quot;)<br><br>    # Get agents<br>    weather_agent = network.get_agent(&quot;weather&quot;)<br>    search_agent = network.get_agent(&quot;search&quot;)<br>    llm_client = A2AClient(&quot;http://localhost:5001&quot;)<br><br>    # Get weather forecast<br>    forecast = weather_agent.ask(f&quot;What&#39;s the weather in {destination}?&quot;)<br><br>    # Search based on weather<br>    if &quot;sunny&quot; in forecast.lower() or &quot;clear&quot; in forecast.lower():<br>        activities = search_agent.ask(f&quot;Recommend outdoor activities in {destination}&quot;)<br>    else:<br>        activities = search_agent.ask(f&quot;Recommend indoor activities in {destination}&quot;)<br><br>    # Summarize using LLM<br>    prompt = (<br>        f&quot;You are a travel assistant. Based on the weather forecast result &#39;{forecast}&#39; &quot;<br>        f&quot;and the recommendations [{activities}], suggest me a few must-see attractions &quot;<br>        f&quot;on date {travel_dates}.&quot;<br>    )<br><br>    llm_result = llm_client.ask(prompt)<br><br>    return forecast, activities, llm_result<br><br># Streamlit UI<br>st.set_page_config(page_title=&quot;🧳 Travel Planner Assistant&quot;)<br><br>st.title(&quot;🧭 Travel Planner Assistant&quot;)<br>st.write(&quot;Get personalized trip suggestions based on real-time weather and recommendations.&quot;)<br><br>destination = st.text_input(&quot;Enter destination&quot;, value=&quot;Kerala, India&quot;)<br>travel_dates = st.text_input(&quot;Enter travel dates&quot;, value=&quot;August 1-5&quot;)<br><br>if st.button(&quot;Plan My Trip&quot;):<br>    with st.spinner(&quot;Planning your trip...&quot;):<br>        try:<br>            forecast, activities, llm_result = run_async(plan_trip(destination, travel_dates))<br><br>            st.subheader(&quot;📍 Weather Forecast&quot;)<br>            st.success(forecast)<br><br>            st.subheader(&quot;🎯 Recommended Activities&quot;)<br>            st.info(activities)<br><br>            st.subheader(&quot;🗺️ Suggested Travel Plan&quot;)<br>            st.markdown(llm_result)<br>        except Exception as e:<br>            st.error(f&quot;Something went wrong: {e}&quot;)</pre><p>In this code,</p><ul><li>Initializes a Streamlit web UI for a travel planner that collects user inputs like destination and travel dates.</li><li>Defines an asynchronous function to coordinate multiple agents via the AgentNetwork class.</li><li>Connects to weather, search, and LLM agents running locally on different ports.</li><li>Based on the weather forecast for the destination, selects indoor or outdoor activity recommendations.</li><li>Summarizes the final travel plan using a language model (LLM) agent and displays the results on the UI.</li><li>Uses asyncio.run() to integrate asynchronous agent responses within Streamlit’s synchronous workflow.</li></ul><h4>Creating the executor</h4><p>You can either run each agent separately or create an executor script that uses Python’s subprocess module to launch all agents sequentially.</p><p>In this hands-on exercise, we will follow the subprocess approach to simplify execution.</p><p>Open your main.py file and replace its contents with the following code:</p><pre>import subprocess<br>import time<br>import sys<br><br>def main():<br>    print(&quot;Hello from travel-planner!&quot;)<br><br>    # Scripts to launch before the Streamlit app<br>    scripts = [&quot;WeatherAgent.py&quot;, &quot;TavilySearchAgent.py&quot;, &quot;LocalLLMAgent.py&quot;]<br>    streamlit_app = &quot;TravelPlannerApp.py&quot;<br><br>    processes = []<br><br>    # Launch agent scripts<br>    for script in scripts:<br>        print(f&quot;Launching {script}...&quot;)<br>        p = subprocess.Popen([sys.executable, script])<br>        processes.append(p)<br>        print(f&quot;{script} started. Waiting 2 seconds before next...\n&quot;)<br>        time.sleep(2)<br><br>    # Launch Streamlit app<br>    print(f&quot;Launching Streamlit app: {streamlit_app}...&quot;)<br>    p = subprocess.Popen([&quot;streamlit&quot;, &quot;run&quot;, streamlit_app])<br>    processes.append(p)<br><br>    # Keep the main process alive<br>    try:<br>        print(&quot;All agents (and UI) are running. Press Ctrl+C to stop.&quot;)<br>        while True:<br>            time.sleep(1)<br>    except KeyboardInterrupt:<br>        print(&quot;\nShutting down all agents...&quot;)<br>        for p in processes:<br>            p.terminate()<br>        print(&quot;All agents stopped.&quot;)<br><br>if __name__ == &quot;__main__&quot;:<br>    main()</pre><h4>Executing the app</h4><p>To launch the entire application using subprocess, simply run the following command:</p><pre>uv run main.py</pre><p>If you would prefer to run each agent separately, open four terminal windows and execute the following commands in each terminal:</p><pre>python WeatherAgent.py<br>python TavilySearchAgent.py<br>python LocalLLMAgent.py<br>streamlit run TravelPlannerApp.py</pre><p>🎉 Awesome Work! You’ve successfully built a Travel Planner powered by the A2A protocol.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*8WSn9zpWBxldXYU1YN5Zqw.gif" /></figure><p><em>Thanks for reading this article !!</em></p><p><em>Thanks Gowri M Bhatt for reviewing the content.</em></p><p>If you enjoyed this article, please click on the clap button 👏 and share to help others find it!</p><p>The full source code for this tutorial can be found here,</p><ul><li><a href="https://github.com/codemaker2015/a2a-experiments">GitHub - codemaker2015/a2a-experiments: This repository contains experiments and demos for Agent-to-Agent (A2A) communication using different frameworks and approaches in Python. Each subfolder demonstrates a unique way to implement or interact with A2A agents.</a></li><li><a href="https://github.com/codemaker2015/travel-planner">GitHub - codemaker2015/travel-planner: This project is a multi-agent Travel Planner powered by the A2A protocol. It uses LangChain, Ollama LLM, and multiple A2A-compatible agents to collaboratively fetch weather data, search results, and generate a final travel recommendation.</a></li></ul><h4>References</h4><ul><li><a href="https://blog.dailydoseofds.com/p/a-visual-guide-to-agent2agent-a2a">A Visual Guide to Agent2Agent (A2A) Protocol</a></li><li><a href="https://github.com/a2aproject/a2a-samples">GitHub - a2aproject/a2a-samples: Samples using the Agent2Agent (A2A) Protocol</a></li><li><a href="https://a2aprotocol.ai/docs/guide/python-a2a.html">A2A Protocol Documentation</a></li><li><a href="https://a2aprotocol.ai/blog/a2a-samples-hello-world">A2A Samples: Hello World Agent</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=74e7e601b05c" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Level Up Your RAG Workflow: Building a GraphRAG-Powered KYC Agent]]></title>
            <link>https://codemaker2016.medium.com/level-up-your-rag-workflow-building-a-graphrag-powered-kyc-agent-c370f0a51bbb?source=rss-b7a60e472dfc------2</link>
            <guid isPermaLink="false">https://medium.com/p/c370f0a51bbb</guid>
            <category><![CDATA[openai]]></category>
            <category><![CDATA[retrieval-augmented-gen]]></category>
            <category><![CDATA[neo4j]]></category>
            <category><![CDATA[graphrag]]></category>
            <category><![CDATA[python]]></category>
            <dc:creator><![CDATA[Vishnu Sivan]]></dc:creator>
            <pubDate>Sat, 12 Jul 2025 16:16:05 GMT</pubDate>
            <atom:updated>2025-07-12T16:16:05.766Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*tIaCza6TTp93oscsu7v5AA.png" /></figure><p>As developers, we often face the challenge of making sense of messy, scattered data — especially when it lives across dozens of PDFs, reports, and databases. Traditional RAG systems are a great start, but they often fall short when it comes to complex queries that span multiple documents or require a deeper understanding of context.</p><p>That’s where GraphRAG steps in.</p><p>GraphRAG brings structure to chaos by building a knowledge graph from raw text and using it as the backbone for LLM-powered responses. It doesn’t just retrieve text chunks — it understands how entities relate, clusters them into meaningful communities, and provides a graph-structured lens into your data. GraphRAG combines the structured clarity of knowledge graphs with the generative power of large language models (LLMs) to bring meaning and connection to fragmented information.</p><p>Unlike conventional RAG systems that rely on flat semantic search, GraphRAG extracts key entities, maps relationships, and organizes information into intuitive, navigable hierarchies. This makes it especially valuable in domains like compliance, risk management, and enterprise intelligence, where answers often lie at the intersection of multiple documents and hidden connections. By turning unstructured data into a structured web of knowledge, GraphRAG delivers more accurate, contextual, and actionable results — empowering users to make better decisions, faster.</p><p>In this article, we will explore how to build a prototype Know-Your-Customer (KYC) agent using OpenAI’s Agent SDK, illustrating how GraphRAG can streamline investigations and surface hidden risk signals.</p><h3>Getting Started</h3><h4>Table of contents</h4><ul><li><a href="#836f">How RAG works</a></li><li><a href="#bdda">What is a Knowledge Graph</a></li><li><a href="#5524">What is GraphRAG</a></li><li><a href="#e3c0">How GraphRAG works</a></li><li><a href="#3ee8">Where traditional RAG falls short</a></li><li><a href="#bc66">Vector RAG and Graph RAG</a></li><li><a href="#acfd">Experimenting with GraphRAG: Build a KYC agent with OpenAI, MCP, Ollama, and Neo4j</a></li><li><a href="#0030">Prerequisites</a></li><li><a href="#aa16">Installing the dependencies</a></li><li><a href="#5ffc">Setting up the credentials</a></li><li><a href="#19de">Synthetic dataset preparation</a></li><li><a href="#b91c">Creating schemas</a></li><li><a href="#0887">Building KYC agent</a></li><li><a href="#04dc">Importing required libraries</a></li><li><a href="#fba5">Setting up the neo4j connection</a></li><li><a href="#331c">Building tools</a></li><li><a href="#599c">Tool 1: Get customer and accounts</a></li><li><a href="#03de">Tool 2: Find customer rings</a></li><li><a href="#34d4">Tool 3: Neo4j MCP server toolset</a></li><li><a href="#a559">Tool 4: Generate Cypher</a></li><li><a href="#9dae">Tool 5: Create memory</a></li><li><a href="#837d">Main function</a></li><li><a href="#b0f1">Agent reasoning and execution flow</a></li><li><a href="#e17e">Executing the agent</a></li><li><a href="#fa27">References</a></li></ul><h4>How RAG works</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/651/1*xzpsGLBCCD5GPf4XNA9lIw.png" /></figure><p>RAG enhances large language models by allowing them to fetch relevant information from an external knowledge source (like documents or databases) before generating a response.</p><p>Instead of relying solely on the LLM’s internal training data, RAG introduces a two-step pipeline:</p><ol><li><strong>Retrieval Phase</strong></li></ol><ul><li>The input query is used to search a vector database (e.g., FAISS, Pinecone, Chroma).</li><li>This database contains embedded representations (vectors) of your documents or chunks of data.</li><li>Using semantic similarity, the system retrieves the top-K most relevant text chunks for the query.</li></ul><p><strong>2. Generation Phase</strong></p><ul><li>These retrieved documents are fed into the LLM along with the original query.</li><li>The model then generates an answer that is grounded in the retrieved context, improving factual accuracy and relevance.</li></ul><h4>What is a Knowledge Graph</h4><p>A knowledge graph is a structured and visual way of organizing information that captures real-world entities, their attributes, and the relationships between them. It helps model complex data in a way that makes hidden connections more obvious and queryable — ideal for domains like customer intelligence, fraud detection, or KYC (Know Your Customer).</p><p>Let’s break it down using the example from your customer data graph:</p><p><strong>Key Components:</strong></p><ul><li><strong>Entities (Nodes)</strong>: These are the main objects represented in the graph. In your schema, entities include:</li><li>Customer, Account, Transaction, Device, Address, Company, Payment_Method, IP_Address</li><li><strong>Attributes</strong>: These are properties of each entity, like a Customer having a name, ID, or email, though they’re not directly shown in the graph view. Attributes are usually stored as metadata inside each node.</li><li><strong>Relationships (Edges)</strong>: These define how entities are connected. For example:</li><li>A Customer OWNS an Account</li><li>A Customer USES_DEVICE (e.g., a phone or laptop)</li><li>A Customer LIVES_AT an Address</li><li>A Customer HAS_METHOD linked to a Payment_Method</li><li>An Account is TO or FROM a Transaction</li><li>A Device is ASSOCIATED WITH an IP_Address</li><li>A Customer is EMPLOYED_BY a Company</li></ul><p>It might look something like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*a6bEqAN4VykSkhyqoOY-0g.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/942/1*6uMi22BccV77vaklirxp-g.png" /></figure><p>With this structure, it becomes easy to run intelligent queries like:</p><ul><li>“Which accounts are associated with customers living at the same address?”</li><li>“Show all transactions made using devices linked to the same IP address.”</li><li>“Which customers share payment methods or devices?”</li></ul><p>A knowledge graph like this turns scattered data into a connected web of insights, making it a foundational component for systems like GraphRAG, where reasoning over relationships is key.</p><h4>What is GraphRAG</h4><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FknDDGYHnnSI%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DknDDGYHnnSI&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FknDDGYHnnSI%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/39353d52c37924f100165cb7c4d690b5/href">https://medium.com/media/39353d52c37924f100165cb7c4d690b5/href</a></iframe><p>Graph-based Retrieval-Augmented Generation (GraphRAG) is an advanced AI technique that enhances traditional RAG systems by integrating knowledge graphs into the information retrieval and generation process. While conventional RAG pipelines retrieve top-matching text chunks from unstructured documents and pass them to a language model for generation, GraphRAG introduces structure and semantics to this flow. It does this by first transforming raw textual data into a knowledge graph, where key entities (like people, organizations, or events) and their relationships are identified and connected.</p><p>This graph-based representation allows for deeper reasoning and more context-aware generation, as the model can now leverage structured relationships, not just surface-level text similarity. For instance, instead of just finding documents that mention a customer, GraphRAG can traverse the graph to understand which accounts they own, which devices they’ve used, or what addresses they’re associated with — all before generating a response.</p><p>By combining the strengths of symbolic reasoning (via graphs) and generative AI (via LLMs), GraphRAG delivers more accurate, explainable, and scalable results. This makes it particularly useful in complex domains like fraud detection, compliance, legal discovery, and enterprise search, where understanding entity relationships is crucial for meaningful answers.</p><h4>How GraphRAG works</h4><p>GraphRAG operates in two main phases: <strong>Indexing</strong> (organizing information) and <strong>Querying</strong> (retrieving meaningful answers).</p><p><strong>Indexing: Building the Knowledge Graph</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/693/1*R2wDQsknUmJfCTfZSp7TkQ.png" /></figure><ul><li><strong>Chunking the Text</strong>: GraphRAG starts by breaking down your documents into smaller segments called TextUnits. These are manageable chunks that make analysis easier.</li><li><strong>Entity and Relationship Extraction</strong>: From each TextUnit, GraphRAG identifies key entities (like people, places, or organizations), claims, and how these elements are connected.</li><li><strong>Graph Construction</strong>: All this information is structured into a knowledge graph — a visual network of entities (nodes) and their relationships (edges). Important entities appear as larger nodes, and related ones are grouped into clusters.</li><li><strong>Community Summarization</strong>: For each cluster, GraphRAG generates a concise summary capturing the key themes and topics — offering a high-level view of your entire dataset.</li></ul><p><strong>Querying: Asking Questions and Getting Answers</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/591/1*3KmFwlHbeo1erya-Qmeofw.png" /></figure><p>Once the knowledge graph is built, GraphRAG uses it to answer questions in three ways:</p><ul><li><strong>Global Search</strong>: Ideal for understanding overarching themes or trends. The system uses community-level summaries to provide broad insights.</li><li><strong>Local Search</strong>: Best for specific queries. If you ask about a person or company, GraphRAG explores their direct connections in the graph to provide a targeted, factual response.</li><li><strong>DRIFT Search</strong>: A hybrid approach that combines Local Search with community-level insights — useful when deeper, contextual understanding is needed.</li></ul><h4>Where traditional RAG falls short</h4><p>Traditional Retrieval-Augmented Generation (RAG) excels at finding and returning semantically similar text snippets. But when data becomes more complex or scattered, traditional RAG begins to struggle — especially in real-world enterprise applications.</p><p>One major limitation is its inability to synthesize information spread across multiple sources. If a question requires connecting subtle, indirect relationships — such as tracing a customer’s activity across different devices, transactions, or locations — traditional RAG often fails to assemble those connections. It lacks an understanding of how different pieces of data relate within a larger context, leading to incomplete or inaccurate responses.</p><p>Another weakness lies in capturing broader context or summarizing nuanced datasets. Traditional RAG models aren’t built for higher-level semantic reasoning. So when faced with a query like <em>“What are the main themes in the dataset?”</em>, the system flounders — unless those themes are explicitly written out. That’s because this is not a simple retrieval problem; it’s a query-focused summarization task, which requires abstracting insights across the dataset — something traditional RAG isn’t inherently equipped to handle.</p><p>In short, while traditional RAG works well for localized lookups and direct Q&amp;A, it falls short when dealing with interconnected, abstract, or multi-hop reasoning — the exact gaps that GraphRAG is designed to fill.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/717/1*ICs7ujacJQHZyqjiRleppA.png" /></figure><h4>VectorRAG vs GraphRAG</h4><p>While both VectorRAG and GraphRAG extend the power of language models by incorporating external knowledge, they approach the task very differently in how they retrieve, structure, and reason over information. VectorRAG excels at fast, precise look‑ups, while GraphRAG provides deeper, more explainable reasoning over structured relationships — ideal for complex investigative and analytical tasks.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/942/1*25fmqHm8yKmNVzE7OW6jZQ.png" /></figure><h3>Experimenting with GraphRAG: Build a KYC agent with OpenAI, MCP, Ollama, and Neo4j</h3><p>Know-Your-Customer (KYC) processes involves navigating vast networks of entities — customers, accounts, devices, IPs, transactions — each intricately linked. Traditional RAG falls short in these scenarios where uncovering fraud demands tracing indirect, multi-hop relationships across data points. This is where GraphRAG shines. By grounding retrieval in a structured knowledge graph, it enables investigators to reason across complex connections and uncover hidden patterns — making it a powerful tool for detecting money laundering, sanctions violations, and other financial crimes.</p><p>In the next section, we will walk through how to build a prototype GraphRAG-powered KYC agent.</p><h4>Prerequisites</h4><p>This hands-on requires the following tools to be installed on your machine:</p><ol><li><strong>Ollama: </strong>Ollama is a platform for running large language models locally on your computer.</li></ol><p><a href="https://ollama.com/download/windows">Download Ollama on Windows</a></p><p>To convert natural language questions into Cypher queries, we’ll use the <strong>Text-to-Cypher</strong> model provided by Neo4j.</p><p>Run the following command in your terminal to pull the model using Ollama:</p><pre>ollama pull ed-neo4j/t2c-gemma3-4b-it-q8_0-35k</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KCIb2AmOzNdbPEH0Rk_4Qw.png" /></figure><p><strong>2. Neo4j: </strong>Neo4j is a graph database used to model and query relationships between entities like customers, accounts, transactions, and more.</p><p>You have two options to set up a free Neo4j database:</p><p><strong>Option 1: Local Neo4j Docker Instance</strong></p><ul><li>Download the Neo4j installer from the official site: <a href="https://neo4j.com/download">https://neo4j.com/download</a></li><li>Follow the installation instructions based on your operating system.</li><li>Once installed, start the Neo4j Desktop application or run it via Docker to create and manage local databases.</li></ul><p><strong>Option 2: Neo4j AuraDB Free (Cloud-Based Managed Instance)</strong></p><ul><li>Visit the Neo4j Aura Console: <a href="https://console.neo4j.io">https://console.neo4j.io</a></li><li>Sign in or create a Neo4j account.</li><li>Click “Create Database” and select AuraDB Free.</li><li>Once the database is created, download the connection credentials bundle. You’ll need these to connect to the database programmatically.</li></ul><p>In this tutorial, we will proceed with Neo4j AuraDB Free as it is lightweight, cloud-hosted, and easily accessible from anywhere.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-Mw3LBZmuO5iqEakAjsnHQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*E8H_6BLTdPC-8pv6XAGcOw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*d1kpK553RH7t4PoH81_Rbg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*WRC_WszIyPUss6OmMuTZng.png" /></figure><p><strong>3. Python</strong>: Python is the core language used in this hands-on for scripting and backend logic.</p><p><a href="https://www.python.org/downloads/">Download Python</a></p><p><strong>4. uv (Micro virtualenv manager): </strong>uv is a fast and modern Python project manager, to set up and manage our environment. It simplifies tasks like handling dependencies, creating virtual environments, and running scripts.</p><p>To install uv, run this in your terminal:</p><pre>powershell -ExecutionPolicy ByPass -c &quot;irm https://astral.sh/uv/install.ps1 | iex&quot;<br>set Path=C:\Users\Codem\.local\bin;%Path%</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/875/0*qi5ZqSolwUYwVW9a.png" /></figure><p>Refer to the official website for detailed installation instructions.</p><p><a href="https://docs.astral.sh/uv/getting-started/installation">Installation | uv</a></p><h4>Installing the dependencies</h4><ul><li>Initialize a uv project by executing the following command.</li></ul><pre>uv init kyc_agent<br>cd kyc_agent</pre><ul><li>Create and activate a virtual environment by executing the following command.</li></ul><pre>uv venv<br>source .venv/bin/activate # for linux<br>.venv\Scripts\activate    # for windows</pre><ul><li>Install neo4j, numpy, ollama, openai-agents, python-dotenv libraries using pip.</li></ul><pre>uv add neo4j neo4j-rust-ext numpy ollama openai-agents python-dotenv</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*eBRMb6dfn-pBzGLxnIM_wQ.png" /></figure><h4>Setting up the credentials</h4><ul><li>Create a file named .env. This file will store your environment variables, including the OpenAI key and Neo4j credentials.</li><li>Open the .env file and add the following code to specify your OpenAI API key and Neo4j credentials. Copy the Neo4j credentials from the file you downloaded while setting up the Neo4j AuraDB instance.</li></ul><pre>OPENAI_API_KEY=sk-proj-C1K1hKug99wXxtj...<br>NEO4J_URI=neo4j+s://b3383662.databases.neo4j.io<br>NEO4J_USERNAME=neo4j<br>NEO4J_PASSWORD=fa_BBW1s6kjvOjSLnTvkKht...<br>NEO4J_DATABASE=neo4j<br>AURA_INSTANCEID=b3383662<br>AURA_INSTANCENAME=Free instance</pre><h4>Synthetic dataset preparation</h4><p>For the purpose of this blog, we have generated a synthetic dataset comprising 8,000 customers along with their associated accounts, transactions, registered addresses, devices, and IP addresses.</p><h4>Dataset generation script</h4><p>Use the following script to generate the dataset.</p><pre>import numpy as np<br>import os<br>import random<br>import uuid<br>import time<br>from datetime import datetime, timedelta<br>from neo4j import GraphDatabase<br>from dotenv import load_dotenv<br>load_dotenv()  <br><br>random.seed(42)<br><br>NEO4J_URI = os.getenv(&quot;NEO4J_URI&quot;, &quot;bolt://localhost:7687&quot;)<br>NEO4J_USER = os.getenv(&quot;NEO4J_USERNAME&quot;, &quot;neo4j&quot;)<br>NEO4J_PASSWORD = os.getenv(&quot;NEO4J_PASSWORD&quot;, &quot;password&quot;)<br>NEO4J_DATABASE = os.getenv(&quot;NEO4J_DATABASE&quot;, &quot;neo4j&quot;)<br><br>driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))<br><br>def get_session():<br>    return driver.session(database=NEO4J_DATABASE)<br><br># Create uniqueness constraints (once)<br>with get_session() as sess:<br>    <br>    for label in (&#39;Customer&#39;,&#39;Account&#39;,&#39;Company&#39;,&#39;Address&#39;,<br>                  &#39;Device&#39;,&#39;IP_Address&#39;,&#39;Payment_Method&#39;,&#39;Transaction&#39;):<br>        sess.execute_write(<br>            lambda tx, L=label: tx.run(<br>                f&quot;CREATE CONSTRAINT IF NOT EXISTS FOR (n:{L}) REQUIRE n.id IS UNIQUE&quot;<br>            )<br>        )<br><br># Configuration &amp; ID lists<br>random.seed(42)<br>np.random.seed(42)<br><br>n_customers = 8_000<br>mean_accounts_per_customer   = 1.5<br>mean_devices_per_customer    = 2<br>mean_addresses_per_customer  = 1.2<br>mean_payment_methods_per_customer = 1<br>mean_transactions_per_account     = 10<br>p_pep       = 0.01<br>p_watchlist = 0.02<br><br>customers   = [f&quot;CUST_{i:05d}&quot; for i in range(1, n_customers+1)]<br>n_companies = int(n_customers * 0.2)<br>companies   = [f&quot;COMP_{i:05d}&quot; for i in range(1, n_companies+1)]<br><br><br># Prepare payloads<br>customer_rows = [<br>    {&quot;id&quot;: cust,<br>     &quot;pep&quot;: (random.random() &lt; p_pep),<br>     &quot;wl&quot;:  (random.random() &lt; p_watchlist),<br>     &quot;name&quot;: cust<br>    }<br>    for cust in customers<br>]<br><br>company_rows = [<br>    {&quot;id&quot;: comp,<br>     &quot;ind&quot;: random.choice([&#39;Finance&#39;,&#39;Tech&#39;,&#39;Manufacturing&#39;,&#39;Retail&#39;]),<br>     &quot;name&quot;: comp<br>     }<br>    for comp in companies<br>]<br><br># 3. Push Customers &amp; Companies<br>print(f&quot;loading start...&quot;)<br>start_time = time.perf_counter()<br>batch_size=50<br><br>with get_session() as sess:<br>    # Customers in implicit transactions of 50 rows each<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MERGE (c:Customer {id: row.id})<br>          SET c.is_pep       = row.pep,<br>              c.on_watchlist = row.wl,<br>              c.name = row.name<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=customer_rows,<br>        batch_size=batch_size<br>    )<br><br>    # Companies in implicit transactions of 50 rows each<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MERGE (c:Company {id: row.id})<br>          SET c.industry = row.ind,<br>              c.name = row.name <br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=company_rows,<br>        batch_size=batch_size<br>    )<br><br>    <br>end_time = time.perf_counter()<br>elapsed = end_time - start_time<br>print(f&quot;⌛ Loading Customers &amp; Companies took {elapsed:.2f} seconds&quot;)<br><br><br># 3. Push Accounts, Addresses, Devices, IP addresses, Payment Methods and Transactions<br><br># Build payloads<br>acct_counter = addr_counter = dev_counter = ip_counter = pm_counter = txn_counter = 0<br><br>account_rows     = []<br>employed_rows    = []<br>address_rows     = []<br>device_rows      = []<br>ip_rows          = []<br>payment_rows     = []<br>transaction_rows = []<br><br><br># 1.1 Accounts &amp; OWNS<br>for cust in customers:<br>    for _ in range(np.random.poisson(mean_accounts_per_customer)):<br>        acct_counter += 1<br>        aid = f&quot;ACCT_{acct_counter:05d}&quot;<br>        account_rows.append({&quot;cust&quot;: cust, &quot;acct&quot;: aid,&quot;name&quot;:aid})<br><br># 1.2 EMPLOYED_BY<br>for cust in customers:<br>    if random.random() &lt; 0.8:<br>        comp = random.choice(companies)<br>        employed_rows.append({&quot;cust&quot;: cust, &quot;co&quot;: comp})<br><br># 1.3 Addresses &amp; LIVES_AT<br>for cust in customers:<br>    for _ in range(max(1, np.random.poisson(mean_addresses_per_customer))):<br>        addr_counter += 1<br>        aid = f&quot;ADDR_{addr_counter:05d}&quot;<br>        city = random.choice([&#39;London&#39;,&#39;Manchester&#39;,&#39;Birmingham&#39;,&#39;Leeds&#39;])<br>        address_rows.append({&quot;cust&quot;: cust, &quot;addr&quot;: aid, &quot;city&quot;: city,&quot;name&quot;:aid})<br><br># 1.4 Devices &amp; USES_DEVICE → ASSOCIATED_WITH IP_Address<br>for cust in customers:<br>    for _ in range(np.random.poisson(mean_devices_per_customer)):<br>        dev_counter += 1<br>        did = f&quot;DEV_{dev_counter:05d}&quot;<br>        osys = random.choice([&#39;Android&#39;,&#39;iOS&#39;,&#39;Windows&#39;,&#39;MacOS&#39;])<br>        device_rows.append({&quot;cust&quot;: cust, &quot;dev&quot;: did, &quot;os&quot;: osys,&quot;name&quot;:did})<br><br>        ip_counter += 1<br>        iid = f&quot;IP_{ip_counter:05d}&quot;<br>        ip_rows.append({&quot;dev&quot;: did, &quot;ip&quot;: iid,&quot;name&quot;:iid})<br><br># 1.5 Payment Methods &amp; HAS_METHOD<br>for cust in customers:<br>    for _ in range(np.random.poisson(mean_payment_methods_per_customer)):<br>        pm_counter += 1<br>        pid = f&quot;PM_{pm_counter:05d}&quot;<br>        ptype = random.choice([&#39;Credit_Card&#39;,&#39;Debit_Card&#39;,&#39;EWallet&#39;])<br>        cnum = &#39;&#39;.join(random.choice(&#39;0123456789&#39;) for _ in range(16)) \<br>               if ptype in (&#39;Credit_Card&#39;,&#39;Debit_Card&#39;) \<br>               else uuid.uuid4().hex[:16]<br>        payment_rows.append({<br>            &quot;cust&quot;: cust,<br>            &quot;pid&quot;: pid,<br>            &quot;ptype&quot;: ptype,<br>            &quot;cnum&quot;: cnum,<br>            &quot;name&quot;: pid<br>        })<br><br><br># 1.6 Transactions &amp; FROM/TO<br>all_accts = [r[&quot;acct&quot;] for r in account_rows]<br>for src in all_accts:<br>    for _ in range(np.random.poisson(mean_transactions_per_account)):<br>        txn_counter += 1<br>        tid = f&quot;TXN_{txn_counter:06d}&quot;<br>        amt = round(np.random.lognormal(mean=3, sigma=1), 2)<br>        ts  = (datetime(2025,1,1) + timedelta(days=random.randint(0,120))).isoformat()<br>        dst = random.choice(all_accts)<br>        transaction_rows.append({<br>            &quot;src&quot;: src, &quot;tid&quot;: tid, &quot;amt&quot;: amt, &quot;ts&quot;: ts, &quot;dst&quot;: dst, &quot;name&quot;:tid<br>        })<br><br><br># 2. Push in batches<br>start_time = time.perf_counter()<br>with get_session() as sess:<br>    # 2.1 Accounts<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MERGE (a:Account {id: row.acct})<br>          SET a.name = row.name<br>          WITH a, row<br>          MATCH (c:Customer {id: row.cust})<br>          MERGE (c)-[:OWNS]-&gt;(a)<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=account_rows, batch_size=batch_size<br>    )<br><br>    # 2.2 Employed<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MATCH (c:Customer {id: row.cust})<br>          MATCH (co:Company  {id: row.co})<br>          MERGE (c)-[:EMPLOYED_BY]-&gt;(co)<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=employed_rows, batch_size=batch_size<br>    )<br><br>    # 2.3 Addresses<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MERGE (a:Address {id: row.addr})<br>          SET a.city = row.city,<br>              a.name = row.name<br>          WITH a, row<br>          MATCH (c:Customer {id: row.cust})<br>          MERGE (c)-[:LIVES_AT]-&gt;(a)<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=address_rows, batch_size=batch_size<br>    )<br><br>    # 2.4 Devices<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MERGE (d:Device {id: row.dev})<br>          SET d.os = row.os,<br>            d.name = row.name<br>          WITH d, row<br>          MATCH (c:Customer {id: row.cust})<br>          MERGE (c)-[:USES_DEVICE]-&gt;(d)<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=device_rows, batch_size=batch_size<br>    )<br>    # 2.5 IPs<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MERGE (i:IP_Address {id: row.ip})<br>          SET i.name = row.name<br>          WITH i, row<br>          MATCH (d:Device {id: row.dev})<br>          MERGE (d)-[:ASSOCIATED_WITH]-&gt;(i)<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=ip_rows, batch_size=batch_size<br>    )<br><br>    # 2.6 Payment Methods<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MERGE (p:Payment_Method {id: row.pid})<br>          SET p.pm_type     = row.ptype,<br>              p.card_number = row.cnum,<br>              p.name = row.name<br>          WITH p, row<br>          MATCH (c:Customer {id: row.cust})<br>          MERGE (c)-[:HAS_METHOD]-&gt;(p)<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=payment_rows, batch_size=batch_size<br>    )<br><br>    # 2.7 Transactions<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MERGE (t:Transaction {id: row.tid})<br>          SET t.amount    = row.amt,<br>              t.timestamp = row.ts,<br>              t.name = row.name<br>          WITH t, row<br>          MATCH (a1:Account {id: row.src})<br>          MATCH (a2:Account {id: row.dst})<br>          MERGE (a1)-[:FROM]-&gt;(t)-[:TO]-&gt;(a2)<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=transaction_rows, batch_size=batch_size<br>    )<br><br>end_time = time.perf_counter()<br>elapsed = end_time - start_time<br>print(f&quot;⌛ Loading Account, Employed, Owns, Addresses, Devices, Payment Methods &amp; Transactions took {elapsed:.2f} seconds&quot;)<br><br><br># 5. Select anomalies<br>n_anomalies        = int(0.05 * len(customers))<br>anoms             = random.sample(customers, n_anomalies)<br>chunk             = n_anomalies // 5<br><br># Prepare payload lists<br>super_rows        = []<br>ring_acct_rows    = []<br>ring_txn_rows     = []<br>bridge_rows       = []<br>isolate_rows      = []<br>dense_addr_rows   = []<br>dense_pm_rows     = []<br><br><br># Super-hubs: 50 new accounts per customer<br>for cust in anoms[0:chunk]:<br>    for _ in range(50):<br>        acct_counter += 1<br>        aid = f&quot;ACCT_{acct_counter:05d}&quot;<br>        super_rows.append({&quot;cust&quot;: cust, &quot;acct&quot;: aid,&quot;name&quot;:aid})<br><br>#Upload<br>with get_session() as sess:<br>    # 4.1 Super-hubs<br>    start_time = time.perf_counter()<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MERGE (a:Account {id: row.acct})<br>          SET a.name = row.name<br>          WITH a, row<br>          MATCH (c:Customer {id: row.cust})<br>          MERGE (c)-[:OWNS]-&gt;(a)<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=super_rows, batch_size=50<br>    )<br>    end_time = time.perf_counter()<br>    elapsed = end_time - start_time<br>    print(f&quot;⌛ Loading Anomalies: Super Hubs took {elapsed:.2f} seconds&quot;)<br><br># 2.2 Circular rings: 3-customer cycles<br>for i in range(chunk, 2*chunk, 3):<br>    trio = anoms[i : i+3]<br>    if len(trio) == 3:<br>        accts = []<br>        for c in trio:<br>            acct_counter += 1<br>            aid = f&quot;ACCT_{acct_counter:05d}&quot;<br>            ring_acct_rows.append({&quot;cust&quot;: c, &quot;acct&quot;: aid})<br>            accts.append(aid)<br>        for j in range(3):<br>            txn_counter += 1<br>            tid = f&quot;TXN_{txn_counter:06d}&quot;<br>            ring_txn_rows.append({<br>                &quot;src&quot;:      accts[j],<br>                &quot;dst&quot;:      accts[(j+1) % 3],<br>                &quot;tid&quot;:      tid,<br>                &quot;amount&quot;:   1000,<br>                &quot;ts&quot;:       datetime(2025, 2, 1).isoformat()<br>            })<br><br>with get_session() as sess:<br>    # 4.2 Circular rings – ring transactions<br>    start_time = time.perf_counter()<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MERGE (t:Transaction {id: row.tid})<br>          SET t.amount = row.amount, t.timestamp = row.ts<br>          WITH t, row<br>          MATCH (a1:Account {id: row.src}), (a2:Account {id: row.dst})<br>          MERGE (a1)-[:FROM]-&gt;(t)-[:TO]-&gt;(a2)<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=ring_txn_rows, batch_size=50<br>    )<br>    end_time = time.perf_counter()<br>    elapsed = end_time - start_time<br>    print(f&quot;⌛ Loading Anomalies: Circular Rings took {elapsed:.2f} seconds&quot;)<br><br><br># 2.3 Bridges: employed by two companies<br>for cust in anoms[2*chunk : 3*chunk]:<br>    c1, c2 = random.sample(companies, 2)<br>    bridge_rows.append({&quot;cust&quot;: cust, &quot;co&quot;: c1})<br>    bridge_rows.append({&quot;cust&quot;: cust, &quot;co&quot;: c2})<br>with get_session() as sess:<br>    # 4.3 Bridges<br>    start_time = time.perf_counter()<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MATCH (c:Customer {id: row.cust}), (co:Company {id: row.co})<br>          MERGE (c)-[:EMPLOYED_BY]-&gt;(co)<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=bridge_rows, batch_size=50<br>    )<br>    end_time = time.perf_counter()<br>    elapsed = end_time - start_time<br>    print(f&quot;⌛ Loading Anomalies: Bridges - Customers employeed by 2 companies took {elapsed:.2f} seconds&quot;)<br><br>#  Isolates: 5 device→IP pairs per customer, no link to customers<br>for cust in anoms[3*chunk : 4*chunk]:<br>    for _ in range(5):<br>        dev_counter += 1<br>        ip_counter  += 1<br>        isolate_rows.append({<br>            &quot;dev&quot;: f&quot;DEV_{dev_counter:05d}&quot;,<br>            &quot;ip&quot;:  f&quot;IP_{ip_counter:05d}&quot;<br>        })<br><br>with get_session() as sess:<br>    # 4.4 Isolates<br>    start_time = time.perf_counter()<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MERGE (d:Device {id: row.dev})<br>          SET d.os = &#39;Unknown&#39;<br>          MERGE (i:IP_Address {id: row.ip})<br>          MERGE (d)-[:ASSOCIATED_WITH]-&gt;(i)<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=isolate_rows, batch_size=50<br>    )<br>    end_time = time.perf_counter()<br>    elapsed = end_time - start_time<br>    print(f&quot;⌛ Loading Anomalies: Isolated Devices and IP Addresses with no Customers took {elapsed:.2f} seconds&quot;)<br><br># Dense watchlist cluster: shared address &amp; payment method<br>shared_addr = f&quot;ADDR_{addr_counter+1:05d}&quot;<br>shared_pm   = f&quot;PM_{pm_counter+1:05d}&quot;<br>dense_addr_rows = [{&quot;cust&quot;: cust, &quot;addr&quot;: shared_addr}<br>                   for cust in anoms[4*chunk : ]]<br>dense_pm_rows   = [{&quot;cust&quot;: cust, &quot;pm&quot;:   shared_pm}<br>                   for cust in anoms[4*chunk : ]]<br><br># 3. Create the two shared nodes up front<br>with get_session() as sess:<br>    sess.run(<br>        &quot;MERGE (a:Address {id:$addr}) SET a.city=&#39;London&#39;, a.name=$addr&quot;,<br>        addr=shared_addr<br>    )<br>    sess.run(<br>        &quot;MERGE (p:Payment_Method {id:$pm}) SET p.pm_type=&#39;Credit_Card&#39;, p.name=$pm&quot;,<br>        pm=shared_pm<br>    )<br>with get_session() as sess:<br>    start_time = time.perf_counter()<br>    # 4.5 Dense cluster – shared address<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MATCH (c:Customer {id: row.cust}), (a:Address {id: row.addr})<br>          MERGE (c)-[:LIVES_AT]-&gt;(a)<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=dense_addr_rows, batch_size=50<br>    )<br>    # 4.5 Dense cluster – shared payment method + watchlist flag<br>    sess.run(<br>        &quot;&quot;&quot;<br>        UNWIND $rows AS row<br>        CALL (row) {<br>          MATCH (c:Customer {id: row.cust}), (p:Payment_Method {id: row.pm})<br>          MERGE (c)-[:HAS_METHOD]-&gt;(p)<br>          SET c.on_watchlist = true<br>        } IN TRANSACTIONS OF $batch_size ROWS<br>        &quot;&quot;&quot;,<br>        rows=dense_pm_rows, batch_size=50<br>    )<br>    end_time = time.perf_counter()<br>    elapsed = end_time - start_time<br>    print(f&quot;⌛ Loading Anomalies: Dense clusters - Around shared address &amp; payment method took {elapsed:.2f} seconds&quot;)</pre><p>You can now navigate to the Neo4j Aura Console and click on the “Query” section. Connect to your current database instance, then execute the following query to view the schema:</p><pre>CALL db.schema.visualization();</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*DiAqN7UVbvclSWqOQW9fvw.png" /></figure><h4>Creating schemas</h4><p>Next, we will define schemas for each of the core entities in our graph model — Customer, Account, Transaction, etc.</p><pre># schemas.py<br>from pydantic import BaseModel<br>from typing import List, Optional<br><br># Tool 1: Get Customer and Accounts<br>class CustomerAccountsInput(BaseModel):<br>    customer_id: str<br><br>class TransactionModel(BaseModel):<br>    id: Optional[str] = None<br>    amount: Optional[float] = None<br>    timestamp: Optional[str] = None<br><br>class AccountModel(BaseModel):<br>    id: str = None<br>    name: str = None<br>    transactions: List[TransactionModel] = []<br><br>class CustomerModel(BaseModel):<br>    id: Optional[str] = None<br>    name: Optional[str] = None<br>    on_watchlist: Optional[bool] = False<br>    is_pep: Optional[bool] = False<br><br>class CustomerAccountsOutput(BaseModel):<br>    customer: CustomerModel<br>    accounts: List[AccountModel]<br><br># Tool 2: Identify watchlisted customers in suspicious rings<br>from typing import Dict, Any<br><br>class RingModel(BaseModel):<br>    ring_path: List[Dict[str, Any]]  # List of node dicts<br>    watched_customers: List[Dict[str, Any]]  # List of customer dicts<br>    watch_relationships: List[Dict[str, Any]]  # List of relationship dicts<br><br>class CustomerRingsInput(BaseModel):<br>    max_number_rings: int = 10<br>    customer_in_watchlist: Optional[bool] = True<br>    customer_is_pep: Optional[bool] = False<br><br>class CustomerRingsOutput(BaseModel):<br>    customer_rings: List[RingModel]<br><br><br>class GenerateCypherRequest(BaseModel):<br>    question: str<br>    database_schema: str</pre><h4>Building KYC agent</h4><p>Let’s begin with the agent creation process.</p><p>Create a file named agent.py and add the following code to it.</p><h4>Importing required libraries</h4><pre>import os<br>from agents import Agent, Runner, function_tool<br>from agents.mcp import MCPServerStdio<br>from neo4j import GraphDatabase<br>from schemas import CustomerAccountsInput, CustomerAccountsOutput, CustomerModel, AccountModel, TransactionModel, GenerateCypherRequest<br>import asyncio<br>from ollama import chat<br>from dotenv import load_dotenv<br>import logging<br><br># Configure logging<br>logging.basicConfig(<br>    level=logging.INFO,<br>    format=&#39;%(asctime)s - %(name)s - %(levelname)s - %(message)s&#39;<br>)<br>logging.getLogger(&quot;httpx&quot;).setLevel(logging.ERROR)<br>logger = logging.getLogger(&quot;KYC_AGENT&quot;)<br><br># Load environment variables<br>load_dotenv()</pre><h4>Setting up the neo4j connection</h4><pre># Read Neo4j environment variables into variables<br>NEO4J_URI = os.getenv(&quot;NEO4J_URI&quot;, &quot;bolt://localhost:7687&quot;)<br>NEO4J_USER = os.getenv(&quot;NEO4J_USERNAME&quot;, &quot;neo4j&quot;)<br>NEO4J_PASSWORD = os.getenv(&quot;NEO4J_PASSWORD&quot;, &quot;password&quot;)<br>NEO4J_DATABASE = os.getenv(&quot;NEO4J_DATABASE&quot;, &quot;neo4j&quot;)<br><br># Neo4j connection setup<br>def get_neo4j_driver():<br>    return GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))<br><br># Neo4j driver<br>driver = get_neo4j_driver()</pre><h4>Building tools</h4><p>An agent’s effectiveness depends on its tools. In this case, we are providing five key tools to the KYC agent, including optimized Cypher queries wrapped in Python functions using the @function_tool decorator from the OpenAI Agent SDK.</p><h4><strong>Tool 1: Get customer and accounts</strong></h4><p>This tool retrieves a customer’s profile, including their accounts and recent transactions — an essential part of any investigation. It uses a function that takes a customer ID and runs a simple Cypher query.</p><pre>@function_tool<br>def get_customer_and_accounts(input: CustomerAccountsInput, tx_limit: int = 5) -&gt; CustomerAccountsOutput:<br>    logger.info(f&quot;TOOL: GET_CUSTOMER_AND_ACCOUNTS - {input.customer_id}&quot;)<br>    with driver.session() as session:<br>        result = session.run(<br>            &quot;&quot;&quot;<br>            MATCH (c:Customer {id: $customer_id})-[o:OWNS]-&gt;(a:Account)<br>            WITH c, a<br>            CALL (c,a) {<br>                MATCH (a)-[b:TO|FROM]-&gt;(t:Transaction)<br>                ORDER BY t.timestamp DESC<br>                LIMIT $tx_limit<br>                RETURN collect(t) as transactions<br>            }<br>            RETURN c as customer, a as account, transactions<br>            &quot;&quot;&quot;,<br>            customer_id=input.customer_id,<br>            tx_limit=tx_limit<br>        )<br>        # Get the records from the result<br>        records = result.data()<br>        # Initialize lists to store the customer, accounts, and transactions<br>        accounts = []<br>        for record in records:<br>            customer = dict(record[&quot;customer&quot;])<br>            account = dict(record[&quot;account&quot;])<br>            account[&quot;transactions&quot;] = [dict(t) for t in record[&quot;transactions&quot;]]<br>            accounts.append(account)<br><br>        return CustomerAccountsOutput(<br>            customer=CustomerModel(**customer),<br>            accounts=[AccountModel(**a) for a in accounts]<br>        )</pre><h4><strong>Tool 2: Find customer rings</strong></h4><p>This tool detects circular transaction patterns — commonly linked to money laundering — by identifying cycles in the KYC graph where funds return to their origin. It uses a find_customer_rings function to run a Cypher query that returns up to 10 potential rings, including the involved customers, accounts, and transactions.</p><pre>@function_tool <br>def find_customer_rings(max_number_rings: int = 10, customer_in_watchlist: bool = True, customer_is_pep: bool = False, customer_id: str = None):<br>    logger.info(f&quot;TOOL: FIND_CUSTOMER_RINGS - {max_number_rings} - {customer_in_watchlist} - {customer_is_pep}&quot;)<br>    with driver.session() as session:<br>        result = session.run(<br>            f&quot;&quot;&quot;<br>            MATCH p=(a:Account)-[:FROM|TO*6]-&gt;(a:Account)<br>            WITH p, [n IN nodes(p) WHERE n:Account] AS accounts<br>            UNWIND accounts AS acct<br>            MATCH (cust:Customer)-[r:OWNS]-&gt;(acct)<br>            WHERE cust.on_watchlist = $customer_in_watchlist AND cust.is_pep = $customer_is_pep<br>            WITH <br>              p, <br>              COLLECT(DISTINCT cust)   AS watchedCustomers,<br>              COLLECT(DISTINCT r)      AS watchRels<br>            RETURN <br>              p, <br>              watchedCustomers,<br>              watchRels<br>            LIMIT $max_number_rings<br>            &quot;&quot;&quot;,<br>            max_number_rings=max_number_rings,<br>            customer_in_watchlist=customer_in_watchlist,<br>            customer_is_pep=customer_is_pep<br>        )<br>        rings = []<br>        for record in result:<br>            # Convert path to a list of node dictionaries for easier consumption<br>            path_nodes = [dict(node) for node in record[&quot;p&quot;].nodes]<br>            watched_customers = [dict(cust) for cust in record[&quot;watchedCustomers&quot;]]<br>            watch_rels = [dict(rel) for rel in record[&quot;watchRels&quot;]]<br>            rings.append({<br>                &quot;ring_path&quot;: path_nodes,<br>                &quot;watched_customers&quot;: watched_customers,<br>                <br>            })<br>        <br>        return {&quot;customer_rings&quot;: rings}</pre><h4><strong>Tool 3: Neo4j MCP server toolset</strong></h4><p>This section outlines a common architecture for enabling agents to interact with a knowledge graph. It combines a Text-to-Cypher model (Gemma3–4B) with the Neo4 MCP Server to translate natural language into Cypher and execute dynamic queries. Key tools include get-neo4j-schema, read-neo4j-cypher, and write-neo4j-cypher, allowing the agent to understand the graph structure and perform read/write operations.</p><pre>neo4j_mcp_server = MCPServerStdio(<br>    params={<br>        &quot;command&quot;: &quot;uvx&quot;,<br>        &quot;args&quot;: [&quot;mcp-neo4j-cypher@0.2.1&quot;],<br>        &quot;env&quot;: {<br>            &quot;NEO4J_URI&quot;: NEO4J_URI,<br>            &quot;NEO4J_USERNAME&quot;: NEO4J_USER,<br>            &quot;NEO4J_PASSWORD&quot;: NEO4J_PASSWORD,<br>            &quot;NEO4J_DATABASE&quot;: NEO4J_DATABASE,<br>        },<br>    },<br>    cache_tools_list=True,<br>    name=&quot;Neo4j MCP Server&quot;,<br>    client_session_timeout_seconds=20<br>)</pre><h4><strong>Tool 4: Generate Cypher</strong></h4><p>Translating natural language into Cypher queries relies on schema-aware LLMs fine-tuned for this task. Open-source models like neo4j/text-to-cypher-Gemma-3-4B-Instruct-2025.04.0 from Hugging Face enable accurate query generation. For example, given a question about shared addresses, the agent can dynamically generate the appropriate Cypher query without failure.</p><pre>@function_tool<br>def generate_cypher(request: GenerateCypherRequest) -&gt; str:<br>    USER_INSTRUCTION = &quot;&quot;&quot;Generate a Cypher query for the Question below:<br>    Use the information about the nodes, relationships, and properties from the Schema section below to generate the best possible Cypher query. <br>    Return only the Cypher query as your final output, without any additional text or explanation.<br>    ####Schema:<br>    {schema}<br>    ####Question:<br>    {question}&quot;&quot;&quot;<br><br>    logger.info(f&quot;TOOL: GENERATE_CYPHER - INPUT - {request.question}&quot;)<br>    user_message = USER_INSTRUCTION.format(<br>        schema=request.database_schema, <br>        question=request.question<br>    )<br>    # Generate Cypher query using the text2cypher model<br>    model: str = &quot;ed-neo4j/t2c-gemma3-4b-it-q8_0-35k&quot;<br>    response = chat(<br>        model=model,<br>        messages=[{&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: user_message}]<br>    )<br>    generated_cypher = response[&#39;message&#39;][&#39;content&#39;]<br>    # Replace \n with new line<br>    generated_cypher = generated_cypher.replace(&quot;\\n&quot;, &quot;\n&quot;)<br><br>    print(f&quot;GENERATED CYPHER: - OUTPUT - {generated_cypher}&quot;)<br>    <br>    return generated_cypher</pre><h4><strong>Tool 5: Create memory</strong></h4><p>While agents handle short-term memory through conversation history, complex tasks like financial investigations require persistent long-term memory. This memory acts as a dynamic knowledge base, tracking insights and context across sessions. The create_memory tool enables this by storing investigation summaries as nodes linked to relevant entities in the knowledge graph.</p><pre>@function_tool<br>def create_memory(content: str, customer_ids: list[str] = [], account_ids: list[str] = [], transaction_ids: list[str] = []) -&gt; str:<br>    logger.info(f&quot;TOOL: CREATE_MEMORY - {content} - {customer_ids} - {account_ids} - {transaction_ids}&quot;)<br>    with driver.session() as session:<br>        result = session.run(<br>            &quot;&quot;&quot;<br>            CREATE (m:Memory {content: $content, created_at: datetime()})<br>            WITH m<br>            UNWIND $customer_ids as cid<br>            MATCH (c:Customer {id: cid})<br>            MERGE (m)-[:FOR_CUSTOMER]-&gt;(c)<br>            WITH m<br>            UNWIND $account_ids as aid<br>            MATCH (a:Account {id: aid})<br>            MERGE (m)-[:FOR_ACCOUNT]-&gt;(a)<br>            WITH m<br>            UNWIND $transaction_ids as tid<br>            MATCH (t:Transaction {id: tid})<br>            MERGE (m)-[:FOR_TRANSACTION]-&gt;(t)<br>            RETURN m.content as content<br>            &quot;&quot;&quot;,<br>            content=content,<br>            customer_ids=customer_ids,<br>            account_ids=account_ids,<br>            transaction_ids=transaction_ids<br>        )<br>       <br>        return f&quot;Created memory: {str(result)}&quot;</pre><h4>Main function</h4><p>The main() function sets up and runs an interactive KYC agent using the Neo4j MCP server and OpenAI&#39;s Agent SDK. It connects to the MCP server, defines the agent&#39;s instructions and tools, and maintains a conversation history for context. In a loop, it accepts user queries, passes them to the agent for processing, and displays the results. If needed, the agent dynamically generates and executes Cypher queries. The function also ensures proper cleanup of resources when the session ends.</p><pre>async def main():<br>    await neo4j_mcp_server.connect()  # Connect the MCP server before using it<br><br>    # Define the instructions for the agent<br>    instructions = &quot;&quot;&quot;You are a KYC analyst with access to a knowledge graph. Use the tools to answer questions about customers, accounts, and suspicious patterns.<br>    You are also a Neo4j expert and can use the Neo4j MCP server to query the graph.<br>    If you get a question about the KYC database that you can not answer with GraphRAG tools, you should<br>    - use the Neo4j MCP server to get the schema of the graph (if needed)<br>    - use the generate_cypher tool to generate a Cypher query from question and the schema<br>    - use the Neo4j MCP server to query the graph to answer the question<br>    &quot;&quot;&quot;<br><br>    kyc_agent = Agent(<br>        name=&quot;KYC Analyst&quot;,<br>        instructions=instructions,<br>        tools=[get_customer_and_accounts, find_customer_rings, create_memory, generate_cypher],<br>        mcp_servers=[neo4j_mcp_server]<br>    )<br>    <br>    # Initialize conversation history<br>    conversation_history = []<br>    <br>    while True:<br>        query = input(&quot;Enter your KYC query (or &#39;quit&#39; to exit): &quot;)<br>        if query.lower() == &#39;quit&#39;:<br>            break<br>            <br>        # Run the agent with conversation history<br>        result = await Runner.run(<br>            kyc_agent, <br>            conversation_history + [{&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: query}]<br>        )<br>        <br>        # Add the new interaction to conversation history<br>        conversation_history.extend([<br>            {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: query},<br>            {&quot;role&quot;: &quot;assistant&quot;, &quot;content&quot;: result.final_output}<br>        ])<br>        <br>        print(result.final_output)<br><br>    # Clean up<br>    await neo4j_mcp_server.cleanup()<br><br>if __name__ == &quot;__main__&quot;:<br>    try:<br>        asyncio.run(main())<br>    finally:<br>        # Clean up any remaining resources<br>        driver.close() </pre><h4><strong>Agent reasoning and execution flow</strong></h4><p>When the agent receives a user query, it follows a structured decision-making process to determine the appropriate tools to use:</p><ol><li><strong>Schema Discovery</strong><br><strong>Example Query:</strong> <em>“Get me the schema of the database.”</em><br><strong>Action:</strong> The agent recognizes a schema request and uses the neo4j-mcp-server.get-neo4j-schema tool to retrieve the graph schema.</li><li><strong>Use of Custom GraphRAG Tool</strong><br><strong>Example Query:</strong> <em>“Show me 5 watchlisted customers involved in suspicious transaction rings.”</em><br><strong>Action:</strong> This aligns with a predefined tool. The agent calls find_customer_rings with the parameter customer_in_watchlist=True.</li><li><strong>Dynamic Cypher Generation &amp; Execution</strong><br><strong>Example Query:</strong> <em>“For each of these customers, find their addresses and check if they’re shared with others.”</em><br><strong>Action:</strong> Since no GraphRAG tool directly addresses this, the agent:<br>Uses the previously fetched schema.<br>Passes the question and schema to generate_cypher to generate a Cypher query.<br>Executes the query via neo4j-mcp-server.read-neo4j-cypher.</li><li><strong>Entity-Specific Data Retrieval</strong><br><strong>Example Query:</strong> <em>“Get details for the customer whose address is shared.”</em><br><strong>Action:</strong> The agent identifies this as a profile lookup and calls get_customer_and_accounts with the customer ID.</li><li><strong>Memory Creation for Long-Term Context</strong><br><strong>Example Query:</strong> <em>“Write a 300-word summary of this investigation and link it to all related accounts and transactions.”</em><br><strong>Action:</strong> The agent generates the summary using its internal LLM and then stores it using create_memory, linking it to the relevant entities.</li></ol><h4>Executing the agent</h4><p>The entire workflow is now fully integrated and operational. We can proceed to execute the agent and test it using the sample queries outlined above to observe its behavior and capabilities in handling real-world KYC scenarios.</p><pre>python agent.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*QsrzTcCJT-WVIco-BnXvQw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fiMT3hIvKIKP5UM0kEoBtw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*E2q2NsurWeOapvGnNp7FYg.png" /></figure><p><em>Thanks for reading this article !!</em></p><p><em>Thanks Gowri M Bhatt for reviewing the content.</em></p><p>If you enjoyed this article, please click on the clap button 👏 and share to help others find it!</p><p>The full source code for this tutorial can be found here,</p><p><a href="https://github.com/codemaker2015/kyc-agent">GitHub - codemaker2015/kyc-agent: A smart, tool‑augmented AI agent for Know Your Customer (KYC) investigations.</a></p><h4>References</h4><ul><li><a href="https://youtu.be/bpiphxrYn9I?si=Ie3nJ0e51DhLBTH2">https://youtu.be/bpiphxrYn9I</a></li><li><a href="https://towardsdatascience.com/graphrag-in-action/">GraphRAG in Action: A Simple Agent for Know-Your-Customer Investigations | Towards Data Science</a></li><li><a href="https://medium.com/@codemaker2016/the-ultimate-mcp-handbook-from-basics-to-advanced-llm-integration-00872d1df1e4">The Ultimate MCP Handbook: From Basics to Advanced LLM Integration | Medium</a></li><li><a href="https://openai.github.io/openai-agents-python/">OpenAI Agents SDK</a></li><li><a href="https://neo4j.com/developer/genai-ecosystem/">GraphRAG Developer Guide — Developer Guides</a></li><li><a href="https://neo4j.com/blog/news/text2cypher-vertex-ai/">Google Cloud &amp; Neo4j: Teaming Up at the Intersection of Knowledge Graphs, Agents, MCP, and Natural Language Interfaces</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c370f0a51bbb" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[From Pretrained to Purposeful: Fine-Tuning LLaMA 3.2 Made Easy with Unsloth]]></title>
            <link>https://codemaker2016.medium.com/from-pretrained-to-purposeful-fine-tuning-llama-3-2-made-easy-with-unsloth-54f2d2530e8c?source=rss-b7a60e472dfc------2</link>
            <guid isPermaLink="false">https://medium.com/p/54f2d2530e8c</guid>
            <category><![CDATA[large-language-models]]></category>
            <category><![CDATA[fine-tuning]]></category>
            <category><![CDATA[google-colab]]></category>
            <category><![CDATA[llama-3]]></category>
            <category><![CDATA[unsloth]]></category>
            <dc:creator><![CDATA[Vishnu Sivan]]></dc:creator>
            <pubDate>Mon, 23 Jun 2025 15:25:05 GMT</pubDate>
            <atom:updated>2025-06-24T04:36:20.239Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*K_RPxxbgGsrKJofAhpSgJg.png" /></figure><p>Large language models like LLaMA and Mistral have accelerated open-source innovation, but their size often makes them difficult to fine-tune or deploy on everyday hardware. To address this, smaller models such as TinyLlama-1B, Microsoft Phi-2, and Alibaba Qwen-3B have emerged — offering strong performance in a much smaller footprint.</p><p>Still, fine-tuning is essential to adapt any base model to specific tasks like support chat, summarization, or domain-specific Q&amp;A. Traditionally, this requires high memory and compute, putting it out of reach for many developers.</p><p>Unsloth solves this challenge by enabling efficient fine-tuning on modest hardware. It uses LoRA (Low-Rank Adaptation) to reduce the number of trainable parameters, lowering memory consumption significantly. Key to Unsloth’s performance is the integration of BitsAndBytes, a library that enables 4-bit and 8-bit quantization. Combined with LoRA, it drastically reduces memory usage and accelerates training — making it possible to fine-tune models on GPUs with limited VRAM.</p><p>In this guide, we’ll fine-tune LLaMA 3.2 (3B) using Maxime Labonne’s FineTome-100k dataset in ShareGPT format, demonstrating a practical and efficient setup for real-world fine-tuning without the need for expensive infrastructure.</p><h3>Getting Started</h3><h4>Table of contents</h4><ul><li><a href="#9cbb">What is LLM fine-tuning</a></li><li><a href="#f672">What is Unsloth</a></li><li><a href="#84b3">Getting Started with Fine-Tuning LlaMa 3.2</a></li><li><a href="#7975">Setting up the environment</a></li><li><a href="#4051">Create and Configure Your Colab Notebook</a></li><li><a href="#4261">Add Hugging Face Access Token (Optional but Recommended)</a></li><li><a href="#858f">Install Unsloth and Required Dependencies</a></li><li><a href="#f4aa">Loading the Model and Tokenizer</a></li><li><a href="#3dcf">Applying LoRA Adapters for Efficient Fine-Tuning</a></li><li><a href="#5003">Preparing the Training Dataset</a></li><li><a href="#9354">Standardizing the Dataset Format</a></li><li><a href="#1c64">Loading the dataset</a></li><li><a href="#b26b">Formatting Prompts</a></li><li><a href="#328a">Setting Up and Configuring the Trainer</a></li><li><a href="#cb6a">Training Only on Assistant Responses</a></li><li><a href="#d620">Inference — Generating responses</a></li><li><a href="#a970">Saving and Loading the Fine-Tuned Model</a></li><li><a href="#e9dc">Load the LoRA Adapters for Inference</a></li></ul><h4>What is LLM Fine-Tuning</h4><p>Fine-tuning is the process of adapting a pre-trained large language model (LLM) to perform better on a specific task or domain. While pre-trained models are trained on massive amounts of general-purpose data, they often fall short in specialized use cases. Fine-tuning bridges this gap by training the model further on curated, domain-specific datasets.</p><p>For instance, while a base LLM might perform well on single-turn question-answering tasks, it may struggle with multi-turn conversations typically expected from chatbot systems. To handle such scenarios, the model needs exposure to dialogue-format datasets — something achieved through fine-tuning.</p><p>Fine-tuning allows developers to mold general LLMs into custom “avatars” suited for various tasks such as legal document summarization, healthcare Q&amp;A, or multilingual support. The effectiveness of fine-tuning largely depends on the quality of the dataset, the capabilities of the base model, and the method of fine-tuning used.</p><h4><strong>Common Fine-Tuning Techniques</strong></h4><ul><li><strong>Full Fine-Tuning: </strong>This traditional method updates all the parameters of the model. While effective, it requires significant computational resources and memory, making it less feasible for large models or limited hardware setups.</li><li><strong>LoRA (Low-Rank Adaptation): </strong>LoRA introduces small trainable matrices (adapters) into the model and only updates them, freezing the rest of the model weights. This reduces compute requirements and speeds up training — ideal for fine-tuning large models on consumer GPUs.</li><li><strong>QLoRA (Quantized LoRA):</strong> QLoRA goes a step further by applying LoRA to a quantized version of the model. The model weights are first reduced to 4-bit or 8-bit precision using libraries like BitsAndBytes, drastically lowering memory consumption while retaining near-original performance.</li><li><strong>Adapter Tuning:</strong> Adapter tuning inserts additional layers (adapters) into the network without modifying the original model weights. Like LoRA, it allows task-specific tuning with low resource usage and easy parameter sharing.</li><li><strong>Prompt-Tuning / Prefix-Tuning: </strong>Instead of changing model parameters, this method learns a small prompt or prefix that conditions the model to perform a specific task. It’s lightweight and especially useful when storage or compute resources are constrained.</li></ul><h4>What is Unsloth</h4><p>Unsloth is an open-source framework purpose-built for fast and efficient fine-tuning of large language models (LLMs). It provides an optimized training backend, making fine-tuning possible even on limited hardware setups by drastically improving training speed and memory efficiency.</p><p>At its core, Unsloth integrates custom Triton kernels and a manual backpropagation engine to accelerate training. This results in significant speedups — up to 2x faster than traditional fine-tuning pipelines — without compromising on performance.</p><p>Unsloth’s compatibility with QLoRA and BitsAndBytes further enhances its resource efficiency, making it one of the best frameworks for developers looking to fine-tune LLMs quickly and affordably.</p><p>Unsloth supports a wide range of popular models, including the latest LLaMA 3.2, Mistral, Phi, and Gemma variants. Most of these models are available in 4-bit quantized format (bnb-4bit), making them ideal for fine-tuning on consumer GPUs with limited VRAM.</p><p><strong>Currently Supported Models (4-bit):</strong></p><ul><li><strong>LLaMA 3.1 &amp; 3.2</strong>: Meta-Llama-3.1-8B-bnb-4bit, Meta-Llama-3.1-8B-Instruct-bnb-4bit, Meta-Llama-3.1-70B-bnb-4bit, Meta-Llama-3.1-405B-bnb-4bit, Llama-3.2-1B-bnb-4bit, Llama-3.2-1B-Instruct-bnb-4bit, Llama-3.2-3B-bnb-4bit, Llama-3.2-3B-Instruct-bnb-4bit, Llama-3.3-70B-Instruct-bnb-4bit</li><li><strong>Mistral</strong>: Mistral-Small-Instruct-2409, mistral-7b-instruct-v0.3-bnb-4bit</li><li><strong>Phi</strong>: Phi-3.5-mini-instruct, Phi-3-medium-4k-instruct</li><li><strong>Gemma:</strong> gemma-2-9b-bnb-4bit, gemma-2-27b-bnb-4bit</li></ul><h3>Getting Started with Fine-Tuning LlaMa 3.2</h3><p>Fine-tuning large language models, even smaller variants, is a compute-intensive task. It typically requires a machine with at least 10–15 GB of VRAM. Fortunately, free cloud platforms like Google Colab and Kaggle Notebooks offer accessible environments equipped with GPUs — ideal for getting started without local setup.</p><p>For this hands-on guide, we will be using Google Colab with a T4 GPU.</p><h4>Setting up the environment</h4><p><strong>Step 1: Create and Configure Your Colab Notebook</strong></p><ol><li>Open <a href="https://colab.research.google.com/">Google Colaboratory</a> and sign in with your Google account.</li><li>Create a new notebook by clicking on + New Notebook.</li><li>Navigate to Runtime → Change runtime type.</li></ol><ul><li>Set Hardware Accelerator to GPU.</li><li>Choose T4 GPU (recommended for this tutorial).</li><li>Click Save.</li></ul><blockquote><em>💡 </em>Tip: You can also run this setup on Kaggle by enabling GPU under “Accelerator” in notebook settings.</blockquote><p><strong>Step 2: Add Hugging Face Access Token (Optional but Recommended)</strong></p><p>If you’re pulling models from Hugging Face, you’ll need an access token:</p><ol><li>In the left sidebar, select the 🔑 Secrets tab.</li><li>Add a new secret:</li></ol><ul><li>Key: HF_TOKEN</li><li>Value: <em>Your Hugging Face access token</em></li></ul><p>3. Generate a write token from your <a href="https://huggingface.co/settings/tokens">Hugging Face profile settings</a> → Create new token → Select write → Provide token name → Click on Create token.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1012/1*z_8qDaCrJ2V2oSDEcMII2w.png" /></figure><p><strong>Step 3: Install Unsloth and Required Dependencies</strong></p><p>Use the following script to install all necessary packages:</p><pre>!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo<br>!pip install sentencepiece protobuf &quot;datasets&gt;=3.4.1&quot; huggingface_hub hf_transfer<br>!pip install --no-deps unsloth</pre><h4>Loading the Model and Tokenizer</h4><p>Load the LLaMA 3.2 model using Unsloth’s optimized loading utilities. For this tutorial, we will be working with the Llama-3.2-3B-Instruct-bnb-4bit variant, which is quantized for efficient fine-tuning on limited hardware.</p><pre>from unsloth import FastLanguageModel<br>import torch<br>max_seq_length = 2048<br>dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+<br>load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.<br><br>model, tokenizer = FastLanguageModel.from_pretrained(<br>    model_name = &quot;unsloth/Llama-3.2-3B-Instruct-bnb-4bit&quot;,<br>    max_seq_length = max_seq_length,<br>    dtype = dtype,<br>    load_in_4bit = load_in_4bit,<br>)</pre><blockquote>💡 Tip: If you’d like to fine-tune a different model, simply update the model_name variable with the desired model&#39;s name from Unsloth’s supported list.</blockquote><h4>Applying LoRA Adapters for Efficient Fine-Tuning</h4><p>Low-Rank Adaptation (LoRA) enables efficient fine-tuning by updating only a small subset of the model’s parameters. This significantly reduces memory usage and accelerates training, making it ideal for resource-constrained environments.</p><pre>model = FastLanguageModel.get_peft_model(<br>    model,<br>    r = 16, # Choose any number &gt; 0 ! Suggested 8, 16, 32, 64, 128<br>    target_modules = [&quot;q_proj&quot;, &quot;k_proj&quot;, &quot;v_proj&quot;, &quot;o_proj&quot;,<br>                      &quot;gate_proj&quot;, &quot;up_proj&quot;, &quot;down_proj&quot;,],<br>    lora_alpha = 16,<br>    lora_dropout = 0, # Supports any, but = 0 is optimized<br>    bias = &quot;none&quot;,    # Supports any, but = &quot;none&quot; is optimized<br>    # [NEW] &quot;unsloth&quot; uses 30% less VRAM, fits 2x larger batch sizes!<br>    use_gradient_checkpointing = &quot;unsloth&quot;, # True or &quot;unsloth&quot; for very long context<br>    random_state = 3407,<br>    use_rslora = False,  # Unsloth support rank stabilized LoRA<br>    loftq_config = None, # And LoftQ<br>)</pre><h4>Preparing the Training Dataset</h4><p>Before starting the training process, we need to load and preprocess the dataset. In this guide, we will use Maxime Labonne’s FineTome-100k, a high-quality dataset formatted in ShareGPT-style multi-turn conversations.</p><p>You are free to use any dataset, but it must be structured correctly for the model to interpret the inputs properly. If your dataset isn’t already in the expected format, you will need to preprocess it accordingly. The Hugging Face Datasets documentation is a helpful resource for transforming and preparing datasets for fine-tuning.</p><h4><strong>Standardizing the Dataset Format</strong></h4><p>For LLaMA 3.x models, Unsloth expects conversations to follow a specific format — similar to:</p><pre>&lt;|begin_of_text|&gt;&lt;|start_header_id|&gt;user&lt;|end_header_id|&gt;<br>Hello!&lt;|eot_id|&gt;&lt;|start_header_id|&gt;assistant&lt;|end_header_id|&gt;<br>Hey there! How are you?&lt;|eot_id|&gt;</pre><p>To ensure compatibility, we convert the ShareGPT format to the standard Hugging Face multi-turn format using fields like &quot;role&quot; and &quot;content&quot; (instead of &quot;from&quot; and &quot;value&quot;).</p><p>To ensure compatibility with the LLaMA 3.x training pipeline, we use the standardize_sharegpt utility to convert datasets from ShareGPT-style to Hugging Face’s generic multi-turn conversation format.</p><p>For example, the original format:</p><pre>{&quot;from&quot;: &quot;system&quot;, &quot;value&quot;: &quot;You are an assistant&quot;}<br>{&quot;from&quot;: &quot;human&quot;, &quot;value&quot;: &quot;What is 2+2?&quot;}<br>{&quot;from&quot;: &quot;gpt&quot;, &quot;value&quot;: &quot;It&#39;s 4.&quot;}</pre><p>Is transformed into the standardized format:</p><pre>{&quot;role&quot;: &quot;system&quot;, &quot;content&quot;: &quot;You are an assistant&quot;}<br>{&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;What is 2+2?&quot;}<br>{&quot;role&quot;: &quot;assistant&quot;, &quot;content&quot;: &quot;It&#39;s 4.&quot;}</pre><p>This standardized structure ensures compatibility with Unsloth’s get_chat_template() function and avoids tokenization or formatting issues during training. It’s a crucial preprocessing step for models that expect role-based dialogue formatting.</p><h4>Loading the dataset</h4><p>To begin fine-tuning, we first need to load the dataset into our environment. In this tutorial, we use the <a href="https://huggingface.co/datasets/mlabonne/FineTome-100k">FineTome-100k</a> dataset curated by Maxime Labonne, which contains high-quality multi-turn conversations in ShareGPT format.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_5u1uhFzMLJsm-UyYRIgaQ.png" /></figure><pre>from datasets import load_dataset<br>dataset = load_dataset(&quot;mlabonne/FineTome-100k&quot;, split = &quot;train&quot;)</pre><h4>Formatting Prompts</h4><p>After preparing the dataset, the next step is to structure the data using the appropriate chat format expected by the model. In this case, we apply the LLaMA 3.1 chat template using Unsloth’s get_chat_template() function. This function configures the tokenizer to format prompts in the LLaMA-style conversational structure, ensuring the model can effectively process and learn from multi-turn dialogues during fine-tuning.</p><pre>from unsloth.chat_templates import get_chat_template<br>from unsloth.chat_templates import standardize_sharegpt<br><br>tokenizer = get_chat_template(<br>    tokenizer,<br>    chat_template = &quot;llama-3.1&quot;,<br>)<br><br>def formatting_prompts_func(examples):<br>    convos = examples[&quot;conversations&quot;]<br>    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]<br>    return { &quot;text&quot; : texts, }<br><br>dataset = standardize_sharegpt(dataset)<br>dataset = dataset.map(formatting_prompts_func, batched = True,)</pre><p>To verify that the dataset is correctly structured for fine-tuning with the LLaMA 3.1 format, it’s useful to inspect both the original conversation format and the formatted text version.</p><pre># View the original conversation format<br>print(dataset[5][&quot;conversations&quot;])<br># View the same item in the formatted text format<br>print(dataset[5][&quot;text&quot;])</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*quqfem1qNniawucugFUOUA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*644tkAFnmJugsKY1g5tSoA.png" /></figure><h4>Setting Up and Configuring the Trainer</h4><p>With the dataset and model prepared, the next step is to configure the fine-tuning process using Hugging Face’s SFTTrainer. This trainer simplifies fine-tuning by handling essential tasks such as tokenization, batching, gradient accumulation, and optimization. It is fully compatible with Unsloth, enabling efficient training with reduced VRAM consumption and improved speed.</p><p>For this tutorial, the training is limited to 60 steps for demonstration purposes. However, for a complete fine-tuning run, you can set num_train_epochs=1 and max_steps=None to train over the entire dataset.</p><pre>from trl import SFTTrainer<br>from transformers import TrainingArguments, DataCollatorForSeq2Seq<br>from unsloth import is_bfloat16_supported<br><br>trainer = SFTTrainer(<br>    model = model,<br>    tokenizer = tokenizer,<br>    train_dataset = dataset,<br>    dataset_text_field = &quot;text&quot;,<br>    max_seq_length = max_seq_length,<br>    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),<br>    dataset_num_proc = 2,<br>    packing = False, # Can make training 5x faster for short sequences.<br>    args = TrainingArguments(<br>        per_device_train_batch_size = 2,<br>        gradient_accumulation_steps = 4,<br>        warmup_steps = 5,<br>        # num_train_epochs = 1, # Set this for 1 full training run.<br>        max_steps = 60, # Limit training steps to 60 (for quick testing)<br>        learning_rate = 2e-4,<br>        fp16 = not is_bfloat16_supported(),<br>        bf16 = is_bfloat16_supported(),<br>        logging_steps = 1,<br>        optim = &quot;adamw_8bit&quot;,<br>        weight_decay = 0.01,<br>        lr_scheduler_type = &quot;linear&quot;,<br>        seed = 3407,<br>        output_dir = &quot;outputs&quot;, # Directory to save model checkpoints<br>        report_to = &quot;none&quot;, # Use this for WandB etc<br>    ),<br>)</pre><h4>Training only on Assistant Responses</h4><p>To make the training process more efficient and focused, we configure the model to learn only from the assistant’s responses, while ignoring the user’s inputs during loss computation. This approach helps the model better understand how to generate high-quality replies without being penalized for user input patterns.</p><p>Unsloth provides a convenient utility for this purpose: train_on_responses_only from unsloth.chat_templates.</p><pre>from unsloth.chat_templates import train_on_responses_only<br>trainer = train_on_responses_only(<br>    trainer,<br>    instruction_part=&quot;&lt;|start_header_id|&gt;user&lt;|end_header_id|&gt;\n\n&quot;,     # Marks user input<br>    response_part=&quot;&lt;|start_header_id|&gt;assistant&lt;|end_header_id|&gt;\n\n&quot;,   # Marks assistant response<br>)<br># Begin training<br>trainer_stats = trainer.train()</pre><p>This setup ensures that the model is optimized solely on the assistant’s outputs. While training loss may decrease gradually, that’s expected — especially when using a small number of training steps. In this example, we limited training to 60 steps for quick experimentation.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/222/1*RNATiWaUPZ74swlv2B3WQA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/208/1*OOwKXu8cn3fB6MXq0sAZMg.png" /></figure><blockquote><em>💡 </em>Tip:<em> </em>For better performance, it’s recommended to fine-tune the model for 2–3 epochs on large datasets or 3–5 epochs on smaller ones. Aim for 500+ training steps at minimum, and ideally 1000+ steps if hardware resources permit.</blockquote><h4>Inference — Generating responses</h4><p>Once fine-tuning is complete, the trained model is ready for inference — generating responses based on new inputs. To run inference, simply provide an instruction and input, leaving the output field blank. The model will generate a response accordingly.</p><p>For this example, we use the following decoding parameters:</p><ul><li>min_p = 0.1 – to ensure a level of sampling diversity</li><li>temperature = 1.5 – to introduce controlled randomness in the output</li></ul><p>Feel free to adjust these values to fine-tune response creativity and coherence based on your use case.</p><pre>from unsloth.chat_templates import get_chat_template<br><br>tokenizer = get_chat_template(<br>    tokenizer,<br>    chat_template = &quot;llama-3.1&quot;,<br>)<br>FastLanguageModel.for_inference(model) # Enable native 2x faster inference<br><br>messages = [<br>    {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,&quot;},<br>]<br>inputs = tokenizer.apply_chat_template(<br>    messages,<br>    tokenize = True,<br>    add_generation_prompt = True, # Must add for generation<br>    return_tensors = &quot;pt&quot;,<br>).to(&quot;cuda&quot;)<br><br>outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True,<br>                         temperature = 1.5, min_p = 0.1)<br><br># Decode the generated tokens into human-readable text<br>text = tokenizer.decode(outputs[0], skip_special_tokens=True)<br>print(text) </pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fEc1qRLvEBEamAhkqm_Dqw.png" /></figure><h4>Saving and Loading the Fine-Tuned Model</h4><p>Once training is complete, you can save your fine-tuned model and tokenizer either locally or push them to the Hugging Face Hub.</p><p><strong>Save Locally</strong></p><pre>model_name = &quot;Llama32_fine_tuned&quot;<br>model.save_pretrained(model_name)<br>tokenizer.save_pretrained(model_name)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ph36TgR5Y2KOLcb91F2rpg.png" /></figure><p>This will store only the LoRA adapter weights, not the full base model.</p><p><strong>Push to Hugging Face Hub</strong></p><p>To make your fine-tuned adapters publicly or privately accessible:</p><pre>model.push_to_hub(&quot;your_name/your_model_name&quot;)<br>tokenizer.push_to_hub(&quot;your_name/your_model_name&quot;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*w0ws6GAMtS06ynSUIzuMFg.png" /></figure><p><strong>Save the Full Model in GGUF Format (Optional)</strong></p><p>To export the complete model (base + LoRA adapters) in the efficient GGUF format which is suitable for CPU inference, use the following method:</p><pre>model.push_to_hub_gguf(model_name, tokenizer, quantization_method=&quot;q4_k_m&quot;)</pre><p>This compresses the model using the q4_k_m quantization method, which helps reduce model size and boosts inference performance.</p><h4>Load the LoRA Adapters for Inference</h4><p>To use your saved LoRA adapters for inference:</p><pre>from unsloth import FastLanguageModel<br>model, tokenizer = FastLanguageModel.from_pretrained(<br>    model_name = &quot;Llama32_fine_tuned&quot;,  # Name of your fine-tuned model<br>    max_seq_length = max_seq_length,<br>    dtype = dtype,<br>    load_in_4bit = load_in_4bit,<br>)<br>FastLanguageModel.for_inference(model)  # Enable optimized inference</pre><p>Now generate responses using the tokenizer and the trained model:</p><pre>messages = [<br>    {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;Describe a tall tower in the capital of France.&quot;},<br>]<br>inputs = tokenizer.apply_chat_template(<br>    messages,<br>    tokenize=True,<br>    add_generation_prompt=True,<br>    return_tensors=&quot;pt&quot;,<br>).to(&quot;cuda&quot;)<br>from transformers import TextStreamer<br>text_streamer = TextStreamer(tokenizer, skip_prompt=True)<br>_ = model.generate(<br>    input_ids=inputs,<br>    streamer=text_streamer,<br>    max_new_tokens=128,<br>    use_cache=True,<br>    temperature=1.5,<br>    min_p=0.1<br>)</pre><p>This will generate a response in real-time using your fine-tuned model.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cjNo3z8XZIQXwpcjfJnSGA.png" /></figure><p><em>Thanks for reading this article !!</em></p><p><em>Thanks Gowri M Bhatt for reviewing the content.</em></p><p>If you enjoyed this article, please click on the clap button 👏 and share to help others find it!</p><p>The full source code for this tutorial can be found here,</p><ul><li><a href="https://github.com/codemaker2015/llama3.2-fine-tuning">GitHub - codemaker2015/llama3.2-fine-tuning: This repository contains a step-by-step Colab notebook for fine-tuning Meta&#39;s LLaMA 3.2 - 3B Instruct model using the Unsloth framework.</a></li><li><a href="https://colab.research.google.com/drive/1-fTUJdlFEb-E686H-UgV76Q8gz-iTPl2?usp=sharing">Google Colab</a></li><li><a href="https://huggingface.co/codemaker2015/Llama32_fine_tuned">codemaker2015/Llama32_fine_tuned · Hugging Face</a></li></ul><h4>References</h4><ul><li><a href="https://github.com/unslothai/unsloth">GitHub - unslothai/unsloth: Unified web UI for training and running open models like Qwen, DeepSeek, gpt-oss and Gemma locally.</a></li><li><a href="https://docs.unsloth.ai/">Unsloth Docs | Unsloth Documentation</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=54f2d2530e8c" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The Ultimate MCP Handbook: From Basics to Advanced LLM Integration]]></title>
            <link>https://codemaker2016.medium.com/the-ultimate-mcp-handbook-from-basics-to-advanced-llm-integration-00872d1df1e4?source=rss-b7a60e472dfc------2</link>
            <guid isPermaLink="false">https://medium.com/p/00872d1df1e4</guid>
            <category><![CDATA[claude-ai]]></category>
            <category><![CDATA[generative-ai-tools]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[mcp-server]]></category>
            <dc:creator><![CDATA[Vishnu Sivan]]></dc:creator>
            <pubDate>Thu, 15 May 2025 06:02:10 GMT</pubDate>
            <atom:updated>2025-07-26T09:47:07.469Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XRT1g-gekCw0_mZ1VzQyZw.png" /></figure><p>Large Language Models (LLMs) like GPT or Claude are incredibly powerful at generating natural language text — but at their core, they’re just really good at predicting the next token in a sequence. Out of the box, they can’t fetch local files, run custom code, or interact with external APIs. So, how do we bridge the gap between this language intelligence and the real world?</p><p>That’s where the <strong>Model Context Protocol (MCP)</strong> comes in.</p><p>MCP is a fast-emerging standard designed to extend LLM capabilities beyond static conversations. It acts as a universal connector between AI systems and the outside world, enabling seamless interaction with tools, databases, APIs, and other services — much like how USB-C connects different devices in a plug-and-play fashion.</p><p>In this tutorial, we will explore MCP through a practical, beginner-friendly project. You will learn how to build a custom MCP server that connects an AI model to the Yahoo Finance API, empowering the model to fetch real-time stock prices, compare multiple stocks, and even perform historical trend analysis. By the end of this article, you will have a working MCP server and a solid foundation to build more advanced, real-world AI integrations.</p><h3>Getting Started</h3><h4>Table of contents</h4><ul><li><a href="#ed87">What is MCP</a></li><li><a href="#c773">How the MCP server works</a></li><li><a href="#abbc">MCP Core functionalities</a></li><li><a href="#3694">Building your first MCP server</a></li><li><a href="#aa88">Example 1: Creating Your First MCP Server</a></li><li><a href="#f90a">Example 2: Interacting with SQLite database</a></li><li><a href="#75ed">Example 3: Using Pre-Built MCP Servers</a></li><li><a href="#6e9f">Example 4: Using a Python MCP Client</a></li><li><a href="#c388">Stock Price Comparison using MCP Server</a></li><li><a href="#411f">Building MCP Server for Stock Market Analysis</a></li></ul><h4>What is MCP</h4><p>Model Context Protocol (MCP) is a standardized approach for organizing, delivering, and processing context information to large language models (LLMs). It is designed to help models better understand and utilize the information provided to them in the prompt.</p><p>The key components of Model Context Protocol include:</p><ol><li><strong>Structured formatting</strong>: Using a consistent format with clear section delineations (often XML-style tags) to organize different types of information.</li><li><strong>Information hierarchies</strong>: Arranging information by importance and relevance to help the model prioritize what matters most.</li><li><strong>Metadata tagging</strong>: Providing additional information about the context, such as source, reliability, or timestamp.</li><li><strong>Processing instructions</strong>: Explicit guidance on how the model should handle, interpret, or use specific pieces of information.</li></ol><p>MCP is particularly useful for:</p><ul><li>Complex applications where models need to process multiple types of information</li><li>Situations requiring specific handling of different context elements</li><li>Improving model performance by reducing ambiguity in how context should be used</li></ul><p>By standardizing how context is presented to models, MCP aims to make AI interactions more reliable, consistent, and effective across different use cases and implementations.</p><h4>How the MCP server works</h4><p>Basically, the host handles the user interface, the MCP client routes requests, and the MCP server performs the actual tasks — acting as the operational backbone that enables LLMs to interact with real-world systems.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/901/1*L4qlCa9jz1eAorJ86cOtIw.png" /></figure><ul><li><strong>User Input</strong>: A user makes a request through the host application.</li><li><strong>LLM Interpretation</strong>: The LLM processes the request and identifies if a corresponding MCP tool is available to fulfill it.</li><li><strong>Client Routing</strong>: If a suitable tool is configured, the MCP client packages the request and forwards it using the MCP protocol to the MCP server.</li><li><strong>Task Execution by Server</strong>:</li><li>The MCP server receives the request.</li><li>It then triggers the appropriate action — this could be:</li><li>Querying a local resource (e.g., a SQLite database)</li><li>Calling an external service (e.g., an email API, stock data API, or internal system)</li><li><strong>Response Handling</strong>: The server processes the response from the tool and sends the result back to the MCP client.</li><li><strong>LLM Response Generation</strong>: The LLM takes the returned result, integrates it into a natural language response, and sends it back to the user via the host application.</li></ul><h4>MCP Core functionalities</h4><p>MCP servers provide three core functionalities.</p><ul><li><strong>Resources: Storing and Serving File-like Data</strong></li></ul><p>MCP resources function as read-only data sources that provide structured information to a Large Language Model (LLM). They are similar to REST API GET requests—exposing data without performing any computations.</p><p>These resources can be accessed by the LLM on demand.</p><pre>@mcp.resource(&quot;greeting://{name}&quot;)<br>def get_greeting(name: str) -&gt; str:<br>    &quot;&quot;&quot;Get a personalized greeting&quot;&quot;&quot;<br>    return f&quot;Hello, {name}!&quot;</pre><p>This example defines a resource at greeting://{name} that returns a simple greeting string when accessed.</p><ul><li><strong>Tools: Functions Executed by the LLM</strong></li></ul><p>MCP tools allow the AI to perform specific tasks, similar to API POST requests. These functions can carry out computations, interact with databases, or call external APIs, enabling the LLM to go beyond static data and take meaningful actions.</p><pre>@mcp.tool()<br>def add(a: int, b: int) -&gt; int:<br>    &quot;&quot;&quot;Add two numbers and return the result.&quot;&quot;&quot;<br>    return a + b</pre><p>This tool calculates the BMI based on the user’s weight and height.</p><ul><li><strong>Prompts: Predefined Instruction Templates</strong></li></ul><p>MCP prompts are reusable templates that help the LLM carry out structured tasks consistently. These templates guide the model’s responses for common or complex request types.</p><pre>@mcp.prompt()<br>def review_code(code: str) -&gt; str:<br>    return f&quot;Please review this code:\n\n{code}&quot;</pre><p>This prompt helps the LLM respond in a structured manner when asked to perform a code review.</p><h3>Building your first MCP server</h3><p>Let’s build a local MCP server in Python that connects to an SQLite database to retrieve the top chatters in a community. You will interact with your LLM through tools like Cursor or Claude Desktop, while the MCP server handles all the backend database operations.</p><h4>Installing uv</h4><p>We will use uv, a fast and modern Python project manager, to set up and manage our environment. It simplifies tasks like handling dependencies, creating virtual environments, and running scripts.</p><p>To install uv, run this in your terminal:</p><pre>powershell -ExecutionPolicy ByPass -c &quot;irm https://astral.sh/uv/install.ps1 | iex&quot;<br><br>set Path=C:\Users\Codem\.local\bin;%Path%</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fQhf6hVoIriyabAqBMJfwQ.png" /></figure><p>Refer to the official website for detailed installation instructions.</p><p><a href="https://docs.astral.sh/uv/getting-started/installation/#installation-methods">Installation | uv</a></p><h4>Installing the dependencies</h4><ul><li>Initialize a uv project by executing the following command.</li></ul><pre>uv init mcp_demo<br>cd mcp_demo</pre><ul><li>Create and activate a virtual environment by executing the following command.</li></ul><pre>uv venv<br>source .venv/bin/activate # for linux<br>.venv\Scripts\activate    # for windows</pre><ul><li>Install mcp SDK using uv. The mcp package includes both the server framework and optional CLI utilities. Installing it with the [cli] extra gives you access to helpful command-line tools.</li></ul><pre>uv add mcp[cli]</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*GkNWp0cuJdtKARQVJ__bkQ.png" /></figure><ul><li>To confirm that the installation was successful, run:</li></ul><pre>mcp version</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/908/1*Ur_aGG8mxXpbMj6MfbXmHA.png" /></figure><h3>Example 1: Creating Your First MCP Server</h3><p>Let’s begin by creating a simple calculator tool that adds two numbers.</p><ul><li>Create a file named calculator.py inside the mcp_demo directory and insert the following code into it:</li></ul><pre>from mcp.server.fastmcp import FastMCP  # Import FastMCP, the quickstart server base<br><br>mcp = FastMCP(&quot;Calculator Server&quot;)  # Initialize an MCP server instance with a descriptive name<br><br>@mcp.tool()  # Register a function as a callable tool for the model<br>def add(a: int, b: int) -&gt; int:<br>    &quot;&quot;&quot;Add two numbers and return the result.&quot;&quot;&quot;<br>    return a + b<br><br># Add a dynamic greeting resource<br>@mcp.resource(&quot;greeting://{name}&quot;)<br>def get_greeting(name: str) -&gt; str:<br>    &quot;&quot;&quot;Get a personalized greeting&quot;&quot;&quot;<br>    return f&quot;Hello, {name}!&quot;<br><br>if __name__ == &quot;__main__&quot;:<br>    mcp.run(transport=&quot;stdio&quot;)  # Run the server, using standard input/output for communication</pre><p>This script sets up a basic MCP server with a single tool named add. The @mcp.tool() decorator registers the function with the MCP framework, making it accessible to connected LLMs.</p><ul><li>You can test the mcp server using the following command.</li></ul><pre>mcp dev calculator.py</pre><p>Once you run the MCP Inspector, you can access the interface in your browser at <a href="http://127.0.0.1:6274">http://127.0.0.1:6274</a>. The Inspector provides a user-friendly interface to view available tools and resources. It also allows you to interact with these tools directly using built-in UI controls.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*pmn-tfapihzXwIyRaXsjaQ.gif" /></figure><h4>Integrating your MCP server with Claude Desktop</h4><p>To connect your MCP server to Claude, you’ll need to have Claude for Desktop installed.<br>You can download it from the official site: <a href="https://claude.ai/download">https://claude.ai/download</a></p><p>Follow the installation instructions provided for your operating system.</p><p><strong>Adding Your MCP Server to Claude Desktop</strong><br>Once Claude Desktop is installed, you can add your MCP server using the following command:</p><pre>mcp install calculator.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*kiILQZQ72jPAG780I2zrPg.png" /></figure><p>This registers your MCP server with Claude so it can be accessed from the desktop app.</p><p><strong>Manual Configuration (Alternative Method)</strong><br>If you would prefer to configure it manually, open the Claude configuration file by clicking Claude Desktop -&gt; File -&gt; Settings -&gt; Developer -&gt; Edit Config button.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/515/1*lQrzwOY3T_TSAevfnsJd4Q.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*AChrEfJXkj2jxzRV9vZcAg.png" /></figure><ul><li>Windows: %APPDATA%\Claude\claude_desktop_config.json</li><li>macOS: ~/Library/Application Support/Claude/claude_desktop_config.json</li></ul><p>Then, add the following entry to the mcpServers section:</p><pre>{<br> &quot;mcpServers&quot;: {<br>   &quot;Calculator Server&quot;: {<br>     &quot;command&quot;: &quot;C:\\Users\\Codem\\.local\\bin\\uv.EXE&quot;,<br>     &quot;args&quot;: [<br>       &quot;run&quot;,<br>       &quot; - with&quot;,<br>       &quot;mcp[cli]&quot;,<br>       &quot;mcp&quot;,<br>       &quot;run&quot;,<br>       &quot;Absolute path to calculator.py&quot;<br>     ]<br>   }<br> }<br>}</pre><p>Replace “Absolute path to calculator.py” with the full path to your actual calculator.py MCP server script.</p><p><strong>Testing MCP tool with Claude desktop</strong></p><p>Restart the Claude Desktop application to see the MCP tool appear inside the IDE. Once it’s loaded, you will be able to use the tool directly within the interface.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*PmyDicWnFl62f4Bi9Lympw.png" /></figure><h3>Example 2: Interacting with SQLite database</h3><p>In the previous example, we created an add tool and integrated it with Claude Desktop. However, it&#39;s important to note that such a basic arithmetic function may not be triggered by Claude, as simple calculations are already handled natively within the IDE and don’t require external processing.</p><p>Now, let’s move on to a more practical use case that demonstrates the true potential of MCP — retrieving data from a local source and making it accessible to the LLM for dynamic and context-aware interactions.</p><h4>Get the Sample Database</h4><p>Download the <a href="https://doimages.nyc3.cdn.digitaloceanspaces.com/006Community/MCP-server-python/community.db">community.db</a> file, which contains a chatters table with sample data. Once downloaded, move the database file into your project directory.</p><p><a href="https://doimages.nyc3.cdn.digitaloceanspaces.com/006Community/MCP-server-python/community.db">https://doimages.nyc3.cdn.digitaloceanspaces.com/006Community/MCP-server-python/community.db</a></p><h4>Create SQLite MCP server</h4><p>Create a new file called sqlite_server.py and add the following code to it.</p><pre># sqlite-server.py<br>from mcp.server.fastmcp import FastMCP<br>import sqlite3<br><br># Initialize the MCP server with a name<br>mcp = FastMCP(&quot;Community Chatters&quot;)<br><br># Define a tool to fetch the top chatters from the SQLite database<br>@mcp.tool()<br>def get_top_chatters():<br>    &quot;&quot;&quot;Retrieve the top chatters sorted by number of messages.&quot;&quot;&quot;<br>    # Connect to the SQLite database<br>    conn = sqlite3.connect(&#39;E:\\Experiments\\GenerativeAI\\MCP\\mcp_demo\\community.db&#39;)<br>    cursor = conn.cursor()<br>    <br>    # Execute the query to fetch chatters sorted by messages<br>    cursor.execute(&quot;SELECT name, messages FROM chatters ORDER BY messages DESC&quot;)<br>    results = cursor.fetchall()<br>    conn.close()<br>    <br>    # Format the results as a list of dictionaries<br>    chatters = [{&quot;name&quot;: name, &quot;messages&quot;: messages} for name, messages in results]<br>    return chatters<br><br># Run the MCP server locally<br>if __name__ == &#39;__main__&#39;:<br>    mcp.run()</pre><blockquote><strong>Make sure to provide the absolute path to the database in the </strong><strong><em>sqlite3.connect()</em> method. If you use a relative path, Claude Desktop may not be able to locate or access the database correctly.</strong></blockquote><ul><li>Test the sqlite server using the following command.</li></ul><pre>mcp dev sqlite_server.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_4nKZhhiODJkZ4YrM5BDZg.png" /></figure><h4>Testing with Claude Desktop</h4><p>First, integrate the SQLite MCP server with Claude by running the following command:</p><pre>mcp install sqlite_server.py</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9QTPu3wDIQjgBNUw213cXw.png" /></figure><p>Next, restart the Claude Desktop application. Once it’s running, you can begin asking questions related to the local database directly within the interface. When executing the SQLite server MCP, a prompt will appear requesting permission to run the MCP tool. Please approve the prompt to proceed.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ld7eVeuGlTKK4t5ODxawUw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*sTEUvNbZ5ILHSRiJXwsJbA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*uunO8IjBQYeHnFRMAnr5wQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Uy3xrAmNgzxK-xF9a39V3A.png" /></figure><h3>Example 3: Using Pre-Built MCP Servers</h3><p>Anthropic and its community provide a set of pre-built MCP servers that can be directly integrated with Claude Desktop or Cursor to enable this functionality in your application.</p><p><a href="https://github.com/modelcontextprotocol/servers">GitHub - modelcontextprotocol/servers: Model Context Protocol Servers</a></p><p>In this section, we will implement the File System and Git MCP servers.</p><h4>File System</h4><p>To enable filesystem functionality, we will install a pre-built <a href="https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem">Filesystem MCP server</a> in Claude for Desktop.</p><ul><li>Open the configuration file using any text editor and append the following content to the end of the file.</li></ul><pre>{<br>  &quot;mcpServers&quot;: {<br>    &quot;filesystem&quot;: {<br>      &quot;command&quot;: &quot;npx&quot;,<br>      &quot;args&quot;: [<br>        &quot;-y&quot;,<br>        &quot;@modelcontextprotocol/server-filesystem&quot;,<br>        &quot;/path/to/directory&quot;<br>      ]<br>    }<br>  }<br>}</pre><p>For instance, the sample configuration file look like this.</p><pre>{<br>  &quot;mcpServers&quot;: {<br>    &quot;Calculator Server&quot;: {<br>      &quot;command&quot;: &quot;C:\\Users\\Codem\\.local\\bin\\uv.EXE&quot;,<br>      &quot;args&quot;: [<br>        &quot;run&quot;,<br>        &quot;--with&quot;,<br>        &quot;mcp[cli]&quot;,<br>        &quot;mcp&quot;,<br>        &quot;run&quot;,<br>        &quot;E:\\Experiments\\GenerativeAI\\MCP\\mcp_demo\\calculator.py&quot;<br>      ]<br>    },<br>    &quot;Community Chatters&quot;: {<br>      &quot;command&quot;: &quot;C:\\Users\\Codem\\.local\\bin\\uv.EXE&quot;,<br>      &quot;args&quot;: [<br>        &quot;run&quot;,<br>        &quot;--with&quot;,<br>        &quot;mcp[cli]&quot;,<br>        &quot;mcp&quot;,<br>        &quot;run&quot;,<br>        &quot;E:\\Experiments\\GenerativeAI\\MCP\\mcp_demo\\sqlite_server.py&quot;<br>      ]<br>    },<br>    &quot;filesystem&quot;: {<br>      &quot;command&quot;: &quot;npx&quot;,<br>      &quot;args&quot;: [<br>        &quot;-y&quot;,<br>        &quot;@modelcontextprotocol/server-filesystem&quot;,<br>        &quot;E:\\Experiments\\GenerativeAI\\MCP\\mcp_demo&quot;<br>      ]<br>    }<br>  }<br>}</pre><ul><li>After updating the configuration file, restart Claude for Desktop to apply the changes. Once it’s running, you can begin asking questions related to the specified folder directly within the interface.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*iCJPBkZ2Gun3HBGQzIuK1w.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mpnx-pyljP4NiJmD3NW4fA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_EYCZVjglrLe-AIpXzq71Q.png" /></figure><h4>Git</h4><p>Anthropic provides the mcp-server-git MCP server, which includes tools for reading, searching, and manipulating Git repositories using Large Language Models.</p><ul><li>To enable git functionality, open the configuration file using any text editor and append the following content to the end of the file.</li></ul><pre>&quot;mcpServers&quot;: {<br>  &quot;git&quot;: {<br>    &quot;command&quot;: &quot;uvx&quot;,<br>    &quot;args&quot;: [&quot;mcp-server-git&quot;, &quot;--repository&quot;, &quot;path/to/git/repo&quot;]<br>  }<br>}</pre><p>Your application now has Git support enabled, allowing you to execute Git commands through Large Language Models (LLMs).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UL7yHMync1jgxq5Y5Y5VXw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZHD0fIaKEBN5kuXYeIXHhg.png" /></figure><h3>Example 4: Using a Python MCP Client</h3><p>To perform a specific task programmatically, you can use the MCP Python SDK to create a client. In the earlier section, we created a simple tool (calculator.py) that adds two numbers. Now, let&#39;s use the MCP client to invoke the add tool and perform an addition operation.</p><ul><li>Create a file named calculator_client.py and add the following code to it.</li></ul><pre>from mcp import ClientSession, StdioServerParameters, types<br>from mcp.client.stdio import stdio_client<br><br># Create server parameters for stdio connection<br>server_params = StdioServerParameters(<br>    command=&quot;python&quot;,  # Executable<br>    args=[&quot;calculator.py&quot;],  # Optional command line arguments<br>    env=None,  # Optional environment variables<br>)<br><br>async def run():<br>    async with stdio_client(server_params) as (read, write):<br>        async with ClientSession(read, write) as session:<br>            # Initialize the connection<br>            await session.initialize()<br><br>            # Call a tool<br>            result = await session.call_tool(&quot;add&quot;, arguments={&quot;a&quot;: 3, &quot;b&quot;: 4})<br>            print(f&quot;Result of add tool: {result}&quot;)<br><br><br>if __name__ == &quot;__main__&quot;:<br>    import asyncio<br><br>    asyncio.run(run())</pre><p>To run the client, start the server in a terminal using the following command:</p><pre>python calculator.py</pre><p>Next, open another terminal and run the client using:</p><pre>python calculator_client.py</pre><p>Once both are running, you will see an output similar to the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*FI_P2dua10DuBZF0PPfvNw.png" /></figure><h3>Stock Price Comparison using MCP Server</h3><p>In this section, we will build a custom MCP server using the Yahoo Finance Python API. The server will be capable of fetching real-time stock prices, performing comparisons, and providing historical data analysis.</p><h4>Installing dependencies</h4><p>This application is created inside the uv project (mcp_demo) which we created earlier. You can also create it as a separate project by initializing uv if required.</p><ul><li>Install the YFinanace Python package using the PIP command.</li></ul><pre>pip install yfinance</pre><ul><li>Create a file named stock_price_server.py and add the following code to it.</li></ul><pre>from mcp.server.fastmcp import FastMCP<br>import yfinance as yf<br><br># Create an MCP server with a custom name<br>mcp = FastMCP(&quot;Stock Price Server&quot;)<br><br>@mcp.tool()<br>def get_stock_price(symbol: str) -&gt; float:<br>    &quot;&quot;&quot;<br>    Retrieve the current stock price for the given ticker symbol.<br>    Returns the latest closing price as a float.<br>    &quot;&quot;&quot;<br>    try:<br>        ticker = yf.Ticker(symbol)<br>        # Get today&#39;s historical data; may return empty if market is closed or symbol is invalid.<br>        data = ticker.history(period=&quot;1d&quot;)<br>        if not data.empty:<br>            # Use the last closing price from today&#39;s data<br>            price = data[&#39;Close&#39;].iloc[-1]<br>            return float(price)<br>        else:<br>            # As a fallback, try using the regular market price from the ticker info<br>            info = ticker.info<br>            price = info.get(&quot;regularMarketPrice&quot;, None)<br>            if price is not None:<br>                return float(price)<br>            else:<br>                return -1.0  # Indicate failure<br>    except Exception:<br>        # Return -1.0 to indicate an error occurred when fetching the stock price<br>        return -1.0<br><br>@mcp.resource(&quot;stock://{symbol}&quot;)<br>def stock_resource(symbol: str) -&gt; str:<br>    &quot;&quot;&quot;<br>    Expose stock price data as a resource.<br>    Returns a formatted string with the current stock price for the given symbol.<br>    &quot;&quot;&quot;<br>    price = get_stock_price(symbol)<br>    if price &lt; 0:<br>        return f&quot;Error: Could not retrieve price for symbol &#39;{symbol}&#39;.&quot;<br>    return f&quot;The current price of &#39;{symbol}&#39; is ${price:.2f}.&quot;<br><br>@mcp.tool()<br>def get_stock_history(symbol: str, period: str = &quot;1mo&quot;) -&gt; str:<br>    &quot;&quot;&quot;<br>    Retrieve historical data for a stock given a ticker symbol and a period.<br>    Returns the historical data as a CSV formatted string.<br>    <br>    Parameters:<br>        symbol: The stock ticker symbol.<br>        period: The period over which to retrieve historical data (e.g., &#39;1mo&#39;, &#39;3mo&#39;, &#39;1y&#39;).<br>    &quot;&quot;&quot;<br>    try:<br>        ticker = yf.Ticker(symbol)<br>        data = ticker.history(period=period)<br>        if data.empty:<br>            return f&quot;No historical data found for symbol &#39;{symbol}&#39; with period &#39;{period}&#39;.&quot;<br>        # Convert the DataFrame to a CSV formatted string<br>        csv_data = data.to_csv()<br>        return csv_data<br>    except Exception as e:<br>        return f&quot;Error fetching historical data: {str(e)}&quot;<br><br>@mcp.tool()<br>def compare_stocks(symbol1: str, symbol2: str) -&gt; str:<br>    &quot;&quot;&quot;<br>    Compare the current stock prices of two ticker symbols.<br>    Returns a formatted message comparing the two stock prices.<br>    <br>    Parameters:<br>        symbol1: The first stock ticker symbol.<br>        symbol2: The second stock ticker symbol.<br>    &quot;&quot;&quot;<br>    price1 = get_stock_price(symbol1)<br>    price2 = get_stock_price(symbol2)<br>    if price1 &lt; 0 or price2 &lt; 0:<br>        return f&quot;Error: Could not retrieve data for comparison of &#39;{symbol1}&#39; and &#39;{symbol2}&#39;.&quot;<br>    if price1 &gt; price2:<br>        result = f&quot;{symbol1} (${price1:.2f}) is higher than {symbol2} (${price2:.2f}).&quot;<br>    elif price1 &lt; price2:<br>        result = f&quot;{symbol1} (${price1:.2f}) is lower than {symbol2} (${price2:.2f}).&quot;<br>    else:<br>        result = f&quot;Both {symbol1} and {symbol2} have the same price (${price1:.2f}).&quot;<br>    return result<br><br>if __name__ == &quot;__main__&quot;:<br>    mcp.run()</pre><ul><li>Test the MCP server using the MCP Inspector by running the following command in your terminal.</li></ul><pre>mcp dev stock_price_server.py</pre><p>You will receive an output similar to the following.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*RGSZBGMb1hWYR7xrvI_SSA.gif" /></figure><ul><li>Integrate the stock price server with Claude for Desktop by running the following command:</li></ul><pre>mcp install stock_price_server.py --with yfinance</pre><blockquote>If your server has any dependencies that need to be installed, it is important to specify them using <em>— with</em> as arguments. This ensures that the necessary libraries and modules are installed before the server runs.</blockquote><p>After the integration, restart Claude for Desktop to enable the new MCP server for stock price-related queries.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Rc2U8dy-F2YF7f9R7KCUFg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*w4Q3RTBIXydvGIKnfRBteg.png" /></figure><h3>Building MCP Server for Stock Market Analysis</h3><p>Manually performing stock market predictions and analysis can be a tedious and time-consuming task. Instead, imagine being able to simply ask: <em>“What’s the RSI for MSFT right now?”</em></p><p>An MCP server can instantly fetch the latest stock data, calculate the RSI, and return the result — making it significantly easier to make informed trading decisions without switching between multiple apps and websites.</p><p>In this section, we will use the Alpha Vantage API (free tier) to pull real-time stock data and integrate it into an MCP server. This integration allows us to analyze stocks using custom-built AI tools.</p><h4>Installing the dependencies</h4><p>This application is created inside the uv project (mcp_demo) which we created earlier. You can also create it as a separate project by initializing uv if required.</p><ul><li>For separate project, create a new uv project and add the dependencies using the following command.</li></ul><pre># Create a new directory for our project<br>uv init finance<br>cd finance<br><br># Create virtual environment and activate it<br>uv venv<br>.venv\Scripts\activate<br><br># Install dependencies<br>uv add mcp[cli] requests pandas tabulate</pre><ul><li>For the existing project (mcp_demo), Install the MCP and httpx Python package using the PIP command.</li></ul><pre>pip install requests pandas tabulate</pre><h4>Fetching Stock Market Data Using the Alpha Vantage API</h4><p><a href="https://www.alphavantage.co">Alpha Vantage</a> is a widely used service that provides both real-time and historical financial market data. It offers a range of APIs for accessing information on equities, currencies, cryptocurrencies, and more.</p><p>To begin using Alpha Vantage, you’ll need to sign up on their <a href="https://www.alphavantage.co/support/#api-key">official website</a> and obtain a free API key. The free tier allows up to 25 API requests per day. Once you have your API key, you can retrieve intraday stock price data using the TIME_SERIES_INTRADAY endpoint. This API returns time series data with key metrics such as open, high, low, close, and volume, updated in real time.</p><p>To make a request, you’ll need to specify:</p><ul><li>The stock symbol (e.g., MSFT)</li><li>The interval between data points (1min, 5min, 15min, 30min, or 60min)</li><li>Your API key</li></ul><p>Example API Call (5-minute interval for MSFT):</p><pre>https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&amp;symbol=MSFT&amp;interval=5min&amp;apikey=YOUR_API_KEY</pre><p>This call returns the latest 5-minute interval data for the Microsoft stock, which you can then parse and use in your application or analysis workflow.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4mKacXM1scSURyrPEXJ9SA.png" /></figure><h4>Implementing MCP Tools for Stock Analysis</h4><p>Let&#39;s implement the MCP tool for stock analysis. This MCP server will create tools for moving averages, Relative Strength Index and Trade Recommendation.</p><h4>Tool 1: Moving Averages</h4><p><strong>Moving averages</strong> are used in stock analysis to smooth price data and identify trends.</p><ul><li><strong>Short-term averages</strong> (5–10 days) react quickly and highlight recent market shifts.</li><li><strong>Long-term averages</strong> (50–200 days) change slowly and reveal broader, sustained trends.</li></ul><p>Implementing tool for calculating moving averages.</p><pre>@mcp.tool()<br>def calculate_moving_averages(symbol: str, short_period: int = 20, long_period: int = 50) -&gt; Dict[str, Any]:<br>    &quot;&quot;&quot;<br>    Calculate short and long moving averages for a symbol<br>    <br>    Args:<br>        symbol: The ticker symbol to analyze<br>        short_period: Short moving average period in minutes<br>        long_period: Long moving average period in minutes<br>        <br>    Returns:<br>        Dictionary with moving average data and analysis<br>    &quot;&quot;&quot;<br>    cache_key = f&quot;{symbol}_1min&quot;<br>    <br>    if cache_key not in market_data_cache:<br>        df = AlphaVantageAPI.get_intraday_data(symbol, &quot;1min&quot;, outputsize=&quot;full&quot;)<br>        market_data_cache[cache_key] = MarketData(<br>            symbol=symbol,<br>            interval=&quot;1min&quot;,<br>            data=df,<br>            last_updated=datetime.now()<br>        )<br>    <br>    data = market_data_cache[cache_key].data<br>    <br>    # Calculate moving averages<br>    data[f&#39;SMA{short_period}&#39;] = data[&#39;close&#39;].rolling(window=short_period).mean()<br>    data[f&#39;SMA{long_period}&#39;] = data[&#39;close&#39;].rolling(window=long_period).mean()<br>    <br>    # Get latest values<br>    latest = data.iloc[-1]<br>    current_price = latest[&#39;close&#39;]<br>    short_ma = latest[f&#39;SMA{short_period}&#39;]<br>    long_ma = latest[f&#39;SMA{long_period}&#39;]<br>    <br>    # Determine signal<br>    if short_ma &gt; long_ma:<br>        signal = &quot;BULLISH (Short MA above Long MA)&quot;<br>    elif short_ma &lt; long_ma:<br>        signal = &quot;BEARISH (Short MA below Long MA)&quot;<br>    else:<br>        signal = &quot;NEUTRAL (MAs are equal)&quot;<br>    <br>    # Check for crossover in the last 5 periods<br>    last_5 = data.iloc[-5:]<br>    crossover = False<br>    crossover_type = &quot;&quot;<br>    <br>    for i in range(1, len(last_5)):<br>        prev = last_5.iloc[i-1]<br>        curr = last_5.iloc[i]<br>        <br>        # Golden Cross (short crosses above long)<br>        if prev[f&#39;SMA{short_period}&#39;] &lt;= prev[f&#39;SMA{long_period}&#39;] and curr[f&#39;SMA{short_period}&#39;] &gt; curr[f&#39;SMA{long_period}&#39;]:<br>            crossover = True<br>            crossover_type = &quot;GOLDEN CROSS (Bullish)&quot;<br>            break<br>            <br>        # Death Cross (short crosses below long)<br>        if prev[f&#39;SMA{short_period}&#39;] &gt;= prev[f&#39;SMA{long_period}&#39;] and curr[f&#39;SMA{short_period}&#39;] &lt; curr[f&#39;SMA{long_period}&#39;]:<br>            crossover = True<br>            crossover_type = &quot;DEATH CROSS (Bearish)&quot;<br>            break<br>    <br>    return {<br>        &quot;symbol&quot;: symbol,<br>        &quot;current_price&quot;: current_price,<br>        f&quot;SMA{short_period}&quot;: short_ma,<br>        f&quot;SMA{long_period}&quot;: long_ma,<br>        &quot;signal&quot;: signal,<br>        &quot;crossover_detected&quot;: crossover,<br>        &quot;crossover_type&quot;: crossover_type if crossover else &quot;None&quot;,<br>        &quot;analysis&quot;: f&quot;&quot;&quot;Moving Average Analysis for {symbol}:<br>Current Price: ${current_price:.2f}<br>{short_period}-period SMA: ${short_ma:.2f}<br>{long_period}-period SMA: ${long_ma:.2f}<br>Signal: {signal}<br>Recent Crossover: {&quot;Yes - &quot; + crossover_type if crossover else &quot;No&quot;}<br><br>Recommendation: {<br>    &quot;STRONG BUY&quot; if crossover and crossover_type == &quot;GOLDEN CROSS (Bullish)&quot; else<br>    &quot;BUY&quot; if signal == &quot;BULLISH (Short MA above Long MA)&quot; else<br>    &quot;STRONG SELL&quot; if crossover and crossover_type == &quot;DEATH CROSS (Bearish)&quot; else<br>    &quot;SELL&quot; if signal == &quot;BEARISH (Short MA below Long MA)&quot; else<br>    &quot;HOLD&quot;<br>}&quot;&quot;&quot;<br>    }</pre><h4>Tool 2: Relative Strength Index (RSI)</h4><p>The Relative Strength Index (RSI) is a momentum indicator that helps identify overbought (RSI &gt; 70) or oversold (RSI &lt; 30) conditions in an asset. Calculated over a typical 14-day period, it uses the ratio of average gains to losses to assess the speed and change of price movements, aiding in better trading decisions.</p><p>Implementing tool for calculating RSI.</p><pre>@mcp.tool()<br>def calculate_rsi(symbol: str, period: int = 14) -&gt; Dict[str, Any]:<br>    &quot;&quot;&quot;<br>    Calculate Relative Strength Index (RSI) for a symbol<br>    <br>    Args:<br>        symbol: The ticker symbol to analyze<br>        period: RSI calculation period in minutes<br>        <br>    Returns:<br>        Dictionary with RSI data and analysis<br>    &quot;&quot;&quot;<br>    cache_key = f&quot;{symbol}_1min&quot;<br>    <br>    if cache_key not in market_data_cache:<br>        df = AlphaVantageAPI.get_intraday_data(symbol, &quot;1min&quot;, outputsize=&quot;full&quot;)<br>        market_data_cache[cache_key] = MarketData(<br>            symbol=symbol,<br>            interval=&quot;1min&quot;,<br>            data=df,<br>            last_updated=datetime.now()<br>        )<br>    <br>    data = market_data_cache[cache_key].data.copy()<br>    <br>    # Calculate price changes<br>    delta = data[&#39;close&#39;].diff()<br>    <br>    # Create gain and loss series<br>    gain = delta.copy()<br>    loss = delta.copy()<br>    gain[gain &lt; 0] = 0<br>    loss[loss &gt; 0] = 0<br>    loss = abs(loss)<br>    <br>    # Calculate average gain and loss<br>    avg_gain = gain.rolling(window=period).mean()<br>    avg_loss = loss.rolling(window=period).mean()<br>    <br>    # Calculate RS and RSI<br>    rs = avg_gain / avg_loss<br>    rsi = 100 - (100 / (1 + rs))<br>    <br>    # Get latest RSI<br>    latest_rsi = rsi.iloc[-1]<br>    <br>    # Determine signal<br>    if latest_rsi &lt; 30:<br>        signal = &quot;OVERSOLD (Potential buy opportunity)&quot;<br>    elif latest_rsi &gt; 70:<br>        signal = &quot;OVERBOUGHT (Potential sell opportunity)&quot;<br>    else:<br>        signal = &quot;NEUTRAL&quot;<br>    <br>    return {<br>        &quot;symbol&quot;: symbol,<br>        &quot;period&quot;: period,<br>        &quot;rsi&quot;: latest_rsi,<br>        &quot;signal&quot;: signal,<br>        &quot;analysis&quot;: f&quot;&quot;&quot;RSI Analysis for {symbol}:<br>            {period}-period RSI: {latest_rsi:.2f}<br>            Signal: {signal}<br><br>            Recommendation: {<br>                &quot;BUY&quot; if latest_rsi &lt; 30 else<br>                &quot;SELL&quot; if latest_rsi &gt; 70 else<br>                &quot;HOLD&quot;<br>            }&quot;&quot;&quot;<br>    }</pre><h4>Tool 3: Trade Recommendation</h4><p>This tool aggregates insights from both the moving average and RSI indicators to provide a clear recommendation on whether to buy, hold, or sell an asset.</p><pre>@mcp.tool()<br>def trade_recommendation(symbol: str) -&gt; Dict[str, Any]:<br>    &quot;&quot;&quot;<br>    Provide a comprehensive trade recommendation based on multiple indicators<br>    <br>    Args:<br>        symbol: The ticker symbol to analyze<br>        <br>    Returns:<br>        Dictionary with trading recommendation and supporting data<br>    &quot;&quot;&quot;<br>    # Calculate individual indicators<br>    ma_data = calculate_moving_averages(symbol)<br>    rsi_data = calculate_rsi(symbol)<br>    <br>    # Extract signals<br>    ma_signal = ma_data[&quot;signal&quot;]<br>    ma_crossover = ma_data[&quot;crossover_detected&quot;]<br>    ma_crossover_type = ma_data[&quot;crossover_type&quot;]<br>    rsi_value = rsi_data[&quot;rsi&quot;]<br>    rsi_signal = rsi_data[&quot;signal&quot;]<br>    <br>    # Determine overall signal strength<br>    signal_strength = 0<br>    <br>    # MA contribution<br>    if &quot;BULLISH&quot; in ma_signal:<br>        signal_strength += 1<br>    elif &quot;BEARISH&quot; in ma_signal:<br>        signal_strength -= 1<br>        <br>    # Crossover contribution<br>    if ma_crossover:<br>        if &quot;GOLDEN&quot; in ma_crossover_type:<br>            signal_strength += 2<br>        elif &quot;DEATH&quot; in ma_crossover_type:<br>            signal_strength -= 2<br>            <br>    # RSI contribution<br>    if &quot;OVERSOLD&quot; in rsi_signal:<br>        signal_strength += 1.5<br>    elif &quot;OVERBOUGHT&quot; in rsi_signal:<br>        signal_strength -= 1.5<br>    <br>    # Determine final recommendation<br>    if signal_strength &gt;= 2:<br>        recommendation = &quot;STRONG BUY&quot;<br>    elif signal_strength &gt; 0:<br>        recommendation = &quot;BUY&quot;<br>    elif signal_strength &lt;= -2:<br>        recommendation = &quot;STRONG SELL&quot;<br>    elif signal_strength &lt; 0:<br>        recommendation = &quot;SELL&quot;<br>    else:<br>        recommendation = &quot;HOLD&quot;<br>    <br>    # Calculate risk level (simple version)<br>    risk_level = &quot;MEDIUM&quot;<br>    if abs(signal_strength) &gt; 3:<br>        risk_level = &quot;LOW&quot;  # Strong signal, lower risk<br>    elif abs(signal_strength) &lt; 1:<br>        risk_level = &quot;HIGH&quot;  # Weak signal, higher risk<br>    <br>    analysis = f&quot;&quot;&quot;# Trading Recommendation for {symbol}<br><br>        ## Summary<br>        Recommendation: {recommendation}<br>        Risk Level: {risk_level}<br>        Signal Strength: {signal_strength:.1f} / 4.5<br><br>        ## Technical Indicators<br>        Moving Averages: {ma_signal}<br>        Recent Crossover: {&quot;Yes - &quot; + ma_crossover_type if ma_crossover else &quot;No&quot;}<br>        RSI ({rsi_data[&quot;period&quot;]}): {rsi_value:.2f} - {rsi_signal}<br><br>        ## Reasoning<br>        This recommendation is based on a combination of Moving Average analysis and RSI indicators.<br>        {<br>            f&quot;The {ma_crossover_type} provides a strong directional signal. &quot; if ma_crossover else &quot;&quot;<br>        }{<br>            f&quot;The RSI indicates the stock is {rsi_signal.split(&#39; &#39;)[0].lower()}. &quot; if &quot;NEUTRAL&quot; not in rsi_signal else &quot;&quot;<br>        }<br><br>        ## Action Plan<br>        {<br>            &quot;Consider immediate entry with a stop loss at the recent low. Target the next resistance level.&quot; if recommendation == &quot;STRONG BUY&quot; else<br>            &quot;Look for a good entry point on small dips. Set reasonable stop loss.&quot; if recommendation == &quot;BUY&quot; else<br>            &quot;Consider immediate exit or setting tight stop losses to protect gains.&quot; if recommendation == &quot;STRONG SELL&quot; else<br>            &quot;Start reducing position on strength or set trailing stop losses.&quot; if recommendation == &quot;SELL&quot; else<br>            &quot;Monitor the position but no immediate action needed.&quot;<br>        }<br>        &quot;&quot;&quot;<br>    <br>    return {<br>        &quot;symbol&quot;: symbol,<br>        &quot;recommendation&quot;: recommendation,<br>        &quot;risk_level&quot;: risk_level,<br>        &quot;signal_strength&quot;: signal_strength,<br>        &quot;ma_signal&quot;: ma_signal,<br>        &quot;rsi_signal&quot;: rsi_signal,<br>        &quot;current_price&quot;: ma_data[&quot;current_price&quot;],<br>        &quot;analysis&quot;: analysis<br>    }</pre><h4><strong>Prompt 1: Analyze a Single Ticker</strong></h4><pre>@mcp.prompt()<br>def analyze_ticker(symbol: str) -&gt; str:<br>    &quot;&quot;&quot;<br>    Analyze a ticker symbol for trading opportunities<br>    &quot;&quot;&quot;<br>    return f&quot;&quot;&quot;You are a professional stock market analyst. I would like you to analyze the stock {symbol} and provide trading insights.<br><br>        Start by examining the current market data and technical indicators. Here are the specific tasks:<br><br>        1. First, check the current market data for {symbol}<br>        2. Calculate the moving averages using the calculate_moving_averages tool<br>        3. Calculate the RSI using the calculate_rsi tool<br>        4. Generate a comprehensive trade recommendation using the trade_recommendation tool<br>        5. Based on all this information, provide your professional analysis, highlighting:<br>        - The current market position<br>        - Key technical indicators and what they suggest<br>        - Potential trading opportunities and risks<br>        - Your recommended action (buy, sell, or hold) with a brief explanation<br><br>        Please organize your response in a clear, structured format suitable for a professional trader.<br>        &quot;&quot;&quot;</pre><h4><strong>Prompt 2: Compare Multiple Tickers</strong></h4><pre>@mcp.prompt()<br>def compare_tickers(symbols: str) -&gt; str:<br>    &quot;&quot;&quot;<br>    Compare multiple ticker symbols for the best trading opportunity<br>    <br>    Args:<br>        symbols: Comma-separated list of ticker symbols<br>    &quot;&quot;&quot;<br>    symbol_list = [s.strip() for s in symbols.split(&quot;,&quot;)]<br>    symbol_section = &quot;\n&quot;.join([f&quot;- {s}&quot; for s in symbol_list])<br>    <br>    return f&quot;&quot;&quot;You are a professional stock market analyst. I would like you to compare these stocks and identify the best trading opportunity:<br><br>        {symbol_section}<br><br>        For each stock in the list, please:<br><br>        1. Check the current market data using the appropriate resource<br>        2. Generate a comprehensive trade recommendation using the trade_recommendation tool<br>        3. Compare all stocks based on:<br>        - Current trend direction and strength<br>        - Technical indicator signals<br>        - Risk/reward profile<br>        - Trading recommendation strength<br><br>        After analyzing each stock, rank them from most promising to least promising trading opportunity. Explain your ranking criteria and why you believe the top-ranked stock represents the best current trading opportunity.<br><br>        Conclude with a specific recommendation on which stock to trade and what action to take (buy, sell, or hold).<br>        &quot;&quot;&quot;</pre><h4><strong>Prompt 3: Build an Intraday Trading Strategy</strong></h4><pre>@mcp.prompt()<br>def intraday_strategy_builder(symbol: str) -&gt; str:<br>    &quot;&quot;&quot;<br>    Build a custom intraday trading strategy for a specific ticker<br>    &quot;&quot;&quot;<br>    return f&quot;&quot;&quot;You are an expert algorithmic trader specializing in intraday strategies. I want you to develop a custom intraday trading strategy for {symbol}.<br><br>        Please follow these steps:<br><br>        1. First, analyze the current market data for {symbol} using the market-data resource<br>        2. Calculate relevant technical indicators:<br>        - Moving averages (short and long periods)<br>        - RSI<br>        3. Based on your analysis, design an intraday trading strategy that includes:<br>        - Specific entry conditions (technical setups that would trigger a buy/sell)<br>        - Exit conditions (both take-profit and stop-loss levels)<br>        - Position sizing recommendations<br>        - Optimal trading times during the day<br>        - Risk management rules<br><br>        Make your strategy specific to the current market conditions for {symbol}, not just generic advice. Include exact indicator values and price levels where possible.<br><br>        Conclude with a summary of the strategy and how a trader should implement it for today&#39;s trading session.<br>        &quot;&quot;&quot;</pre><h4><strong>Complete Code for Stock Market Analysis</strong></h4><ul><li>Create a file named stock_analysis_server.py to implement the MCP server. Add the following code to it.</li></ul><pre># stock_analysis_server.py<br>from mcp.server.fastmcp import FastMCP<br>import requests<br>import pandas as pd<br>from dataclasses import dataclass<br>from datetime import datetime<br>from typing import Dict, Any<br><br># Create the MCP server<br>mcp = FastMCP(&quot;Stock Analysis Server&quot;, dependencies=[&quot;requests&quot;, &quot;pandas&quot;, &quot;tabulate&quot;])<br><br># Constants and configurations<br>API_KEY = &quot;6BZ33KPJPJ09AQAP&quot;  # Replace with your actual AlphaVantage API key<br><br>@dataclass<br>class MarketData:<br>    symbol: str<br>    interval: str<br>    data: pd.DataFrame<br>    last_updated: datetime<br>    <br>class AlphaVantageAPI:<br>    @staticmethod<br>    def get_intraday_data(symbol: str, interval: str = &quot;1min&quot;, outputsize: str = &quot;compact&quot;) -&gt; pd.DataFrame:<br>        &quot;&quot;&quot;Fetch intraday data from AlphaVantage API&quot;&quot;&quot;<br>        url = f&quot;https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&amp;symbol={symbol}&amp;interval={interval}&amp;outputsize={outputsize}&amp;apikey={API_KEY}&quot;<br>        <br>        response = requests.get(url)<br>        data = response.json()<br>        <br>        # Check for error responses<br>        if &quot;Error Message&quot; in data:<br>            raise ValueError(f&quot;API Error: {data[&#39;Error Message&#39;]}&quot;)<br>        if &quot;Note&quot; in data:<br>            print(f&quot;API Note: {data[&#39;Note&#39;]}&quot;)<br>            <br>        # Extract time series data<br>        time_series_key = f&quot;Time Series ({interval})&quot;<br>        if time_series_key not in data:<br>            raise ValueError(f&quot;No time series data found for {symbol} with interval {interval}&quot;)<br>            <br>        time_series = data[time_series_key]<br>        <br>        # Convert to DataFrame<br>        df = pd.DataFrame.from_dict(time_series, orient=&quot;index&quot;)<br>        df.index = pd.to_datetime(df.index)<br>        df = df.sort_index()<br>        <br>        # Rename columns and convert to numeric<br>        df.columns = [col.split(&quot;. &quot;)[1] for col in df.columns]<br>        for col in df.columns:<br>            df[col] = pd.to_numeric(df[col])<br>            <br>        return df<br><br># In-memory cache for market data<br>market_data_cache: Dict[str, MarketData] = {}<br><br># Resources<br>@mcp.resource(&quot;config://app&quot;)<br>def get_config() -&gt; str:<br>    &quot;&quot;&quot;Static configuration data&quot;&quot;&quot;<br>    return &quot;App configuration here&quot;<br><br># Technical Analysis Tools<br>@mcp.tool()<br>def calculate_moving_averages(symbol: str, short_period: int = 20, long_period: int = 50) -&gt; Dict[str, Any]:<br>    &quot;&quot;&quot;<br>    Calculate short and long moving averages for a symbol<br>    <br>    Args:<br>        symbol: The ticker symbol to analyze<br>        short_period: Short moving average period in minutes<br>        long_period: Long moving average period in minutes<br>        <br>    Returns:<br>        Dictionary with moving average data and analysis<br>    &quot;&quot;&quot;<br>    cache_key = f&quot;{symbol}_1min&quot;<br>    <br>    if cache_key not in market_data_cache:<br>        df = AlphaVantageAPI.get_intraday_data(symbol, &quot;1min&quot;, outputsize=&quot;full&quot;)<br>        market_data_cache[cache_key] = MarketData(<br>            symbol=symbol,<br>            interval=&quot;1min&quot;,<br>            data=df,<br>            last_updated=datetime.now()<br>        )<br>    <br>    data = market_data_cache[cache_key].data<br>    <br>    # Calculate moving averages<br>    data[f&#39;SMA{short_period}&#39;] = data[&#39;close&#39;].rolling(window=short_period).mean()<br>    data[f&#39;SMA{long_period}&#39;] = data[&#39;close&#39;].rolling(window=long_period).mean()<br>    <br>    # Get latest values<br>    latest = data.iloc[-1]<br>    current_price = latest[&#39;close&#39;]<br>    short_ma = latest[f&#39;SMA{short_period}&#39;]<br>    long_ma = latest[f&#39;SMA{long_period}&#39;]<br>    <br>    # Determine signal<br>    if short_ma &gt; long_ma:<br>        signal = &quot;BULLISH (Short MA above Long MA)&quot;<br>    elif short_ma &lt; long_ma:<br>        signal = &quot;BEARISH (Short MA below Long MA)&quot;<br>    else:<br>        signal = &quot;NEUTRAL (MAs are equal)&quot;<br>    <br>    # Check for crossover in the last 5 periods<br>    last_5 = data.iloc[-5:]<br>    crossover = False<br>    crossover_type = &quot;&quot;<br>    <br>    for i in range(1, len(last_5)):<br>        prev = last_5.iloc[i-1]<br>        curr = last_5.iloc[i]<br>        <br>        # Golden Cross (short crosses above long)<br>        if prev[f&#39;SMA{short_period}&#39;] &lt;= prev[f&#39;SMA{long_period}&#39;] and curr[f&#39;SMA{short_period}&#39;] &gt; curr[f&#39;SMA{long_period}&#39;]:<br>            crossover = True<br>            crossover_type = &quot;GOLDEN CROSS (Bullish)&quot;<br>            break<br>            <br>        # Death Cross (short crosses below long)<br>        if prev[f&#39;SMA{short_period}&#39;] &gt;= prev[f&#39;SMA{long_period}&#39;] and curr[f&#39;SMA{short_period}&#39;] &lt; curr[f&#39;SMA{long_period}&#39;]:<br>            crossover = True<br>            crossover_type = &quot;DEATH CROSS (Bearish)&quot;<br>            break<br>    <br>    return {<br>        &quot;symbol&quot;: symbol,<br>        &quot;current_price&quot;: current_price,<br>        f&quot;SMA{short_period}&quot;: short_ma,<br>        f&quot;SMA{long_period}&quot;: long_ma,<br>        &quot;signal&quot;: signal,<br>        &quot;crossover_detected&quot;: crossover,<br>        &quot;crossover_type&quot;: crossover_type if crossover else &quot;None&quot;,<br>        &quot;analysis&quot;: f&quot;&quot;&quot;Moving Average Analysis for {symbol}:<br>            Current Price: ${current_price:.2f}<br>            {short_period}-period SMA: ${short_ma:.2f}<br>            {long_period}-period SMA: ${long_ma:.2f}<br>            Signal: {signal}<br>            Recent Crossover: {&quot;Yes - &quot; + crossover_type if crossover else &quot;No&quot;}<br><br>            Recommendation: {<br>                &quot;STRONG BUY&quot; if crossover and crossover_type == &quot;GOLDEN CROSS (Bullish)&quot; else<br>                &quot;BUY&quot; if signal == &quot;BULLISH (Short MA above Long MA)&quot; else<br>                &quot;STRONG SELL&quot; if crossover and crossover_type == &quot;DEATH CROSS (Bearish)&quot; else<br>                &quot;SELL&quot; if signal == &quot;BEARISH (Short MA below Long MA)&quot; else<br>                &quot;HOLD&quot;<br>            }&quot;&quot;&quot;<br>    }<br><br>@mcp.tool()<br>def calculate_rsi(symbol: str, period: int = 14) -&gt; Dict[str, Any]:<br>    &quot;&quot;&quot;<br>    Calculate Relative Strength Index (RSI) for a symbol<br>    <br>    Args:<br>        symbol: The ticker symbol to analyze<br>        period: RSI calculation period in minutes<br>        <br>    Returns:<br>        Dictionary with RSI data and analysis<br>    &quot;&quot;&quot;<br>    cache_key = f&quot;{symbol}_1min&quot;<br>    <br>    if cache_key not in market_data_cache:<br>        df = AlphaVantageAPI.get_intraday_data(symbol, &quot;1min&quot;, outputsize=&quot;full&quot;)<br>        market_data_cache[cache_key] = MarketData(<br>            symbol=symbol,<br>            interval=&quot;1min&quot;,<br>            data=df,<br>            last_updated=datetime.now()<br>        )<br>    <br>    data = market_data_cache[cache_key].data.copy()<br>    <br>    # Calculate price changes<br>    delta = data[&#39;close&#39;].diff()<br>    <br>    # Create gain and loss series<br>    gain = delta.copy()<br>    loss = delta.copy()<br>    gain[gain &lt; 0] = 0<br>    loss[loss &gt; 0] = 0<br>    loss = abs(loss)<br>    <br>    # Calculate average gain and loss<br>    avg_gain = gain.rolling(window=period).mean()<br>    avg_loss = loss.rolling(window=period).mean()<br>    <br>    # Calculate RS and RSI<br>    rs = avg_gain / avg_loss<br>    rsi = 100 - (100 / (1 + rs))<br>    <br>    # Get latest RSI<br>    latest_rsi = rsi.iloc[-1]<br>    <br>    # Determine signal<br>    if latest_rsi &lt; 30:<br>        signal = &quot;OVERSOLD (Potential buy opportunity)&quot;<br>    elif latest_rsi &gt; 70:<br>        signal = &quot;OVERBOUGHT (Potential sell opportunity)&quot;<br>    else:<br>        signal = &quot;NEUTRAL&quot;<br>    <br>    return {<br>        &quot;symbol&quot;: symbol,<br>        &quot;period&quot;: period,<br>        &quot;rsi&quot;: latest_rsi,<br>        &quot;signal&quot;: signal,<br>        &quot;analysis&quot;: f&quot;&quot;&quot;RSI Analysis for {symbol}:<br>            {period}-period RSI: {latest_rsi:.2f}<br>            Signal: {signal}<br><br>            Recommendation: {<br>                &quot;BUY&quot; if latest_rsi &lt; 30 else<br>                &quot;SELL&quot; if latest_rsi &gt; 70 else<br>                &quot;HOLD&quot;<br>            }&quot;&quot;&quot;<br>    }<br><br>@mcp.tool()<br>def trade_recommendation(symbol: str) -&gt; Dict[str, Any]:<br>    &quot;&quot;&quot;<br>    Provide a comprehensive trade recommendation based on multiple indicators<br>    <br>    Args:<br>        symbol: The ticker symbol to analyze<br>        <br>    Returns:<br>        Dictionary with trading recommendation and supporting data<br>    &quot;&quot;&quot;<br>    # Calculate individual indicators<br>    ma_data = calculate_moving_averages(symbol)<br>    rsi_data = calculate_rsi(symbol)<br>    <br>    # Extract signals<br>    ma_signal = ma_data[&quot;signal&quot;]<br>    ma_crossover = ma_data[&quot;crossover_detected&quot;]<br>    ma_crossover_type = ma_data[&quot;crossover_type&quot;]<br>    rsi_value = rsi_data[&quot;rsi&quot;]<br>    rsi_signal = rsi_data[&quot;signal&quot;]<br>    <br>    # Determine overall signal strength<br>    signal_strength = 0<br>    <br>    # MA contribution<br>    if &quot;BULLISH&quot; in ma_signal:<br>        signal_strength += 1<br>    elif &quot;BEARISH&quot; in ma_signal:<br>        signal_strength -= 1<br>        <br>    # Crossover contribution<br>    if ma_crossover:<br>        if &quot;GOLDEN&quot; in ma_crossover_type:<br>            signal_strength += 2<br>        elif &quot;DEATH&quot; in ma_crossover_type:<br>            signal_strength -= 2<br>            <br>    # RSI contribution<br>    if &quot;OVERSOLD&quot; in rsi_signal:<br>        signal_strength += 1.5<br>    elif &quot;OVERBOUGHT&quot; in rsi_signal:<br>        signal_strength -= 1.5<br>    <br>    # Determine final recommendation<br>    if signal_strength &gt;= 2:<br>        recommendation = &quot;STRONG BUY&quot;<br>    elif signal_strength &gt; 0:<br>        recommendation = &quot;BUY&quot;<br>    elif signal_strength &lt;= -2:<br>        recommendation = &quot;STRONG SELL&quot;<br>    elif signal_strength &lt; 0:<br>        recommendation = &quot;SELL&quot;<br>    else:<br>        recommendation = &quot;HOLD&quot;<br>    <br>    # Calculate risk level (simple version)<br>    risk_level = &quot;MEDIUM&quot;<br>    if abs(signal_strength) &gt; 3:<br>        risk_level = &quot;LOW&quot;  # Strong signal, lower risk<br>    elif abs(signal_strength) &lt; 1:<br>        risk_level = &quot;HIGH&quot;  # Weak signal, higher risk<br>    <br>    analysis = f&quot;&quot;&quot;# Trading Recommendation for {symbol}<br><br>        ## Summary<br>        Recommendation: {recommendation}<br>        Risk Level: {risk_level}<br>        Signal Strength: {signal_strength:.1f} / 4.5<br><br>        ## Technical Indicators<br>        Moving Averages: {ma_signal}<br>        Recent Crossover: {&quot;Yes - &quot; + ma_crossover_type if ma_crossover else &quot;No&quot;}<br>        RSI ({rsi_data[&quot;period&quot;]}): {rsi_value:.2f} - {rsi_signal}<br><br>        ## Reasoning<br>        This recommendation is based on a combination of Moving Average analysis and RSI indicators.<br>        {<br>            f&quot;The {ma_crossover_type} provides a strong directional signal. &quot; if ma_crossover else &quot;&quot;<br>        }{<br>            f&quot;The RSI indicates the stock is {rsi_signal.split(&#39; &#39;)[0].lower()}. &quot; if &quot;NEUTRAL&quot; not in rsi_signal else &quot;&quot;<br>        }<br><br>        ## Action Plan<br>        {<br>            &quot;Consider immediate entry with a stop loss at the recent low. Target the next resistance level.&quot; if recommendation == &quot;STRONG BUY&quot; else<br>            &quot;Look for a good entry point on small dips. Set reasonable stop loss.&quot; if recommendation == &quot;BUY&quot; else<br>            &quot;Consider immediate exit or setting tight stop losses to protect gains.&quot; if recommendation == &quot;STRONG SELL&quot; else<br>            &quot;Start reducing position on strength or set trailing stop losses.&quot; if recommendation == &quot;SELL&quot; else<br>            &quot;Monitor the position but no immediate action needed.&quot;<br>        }<br>        &quot;&quot;&quot;<br>    <br>    return {<br>        &quot;symbol&quot;: symbol,<br>        &quot;recommendation&quot;: recommendation,<br>        &quot;risk_level&quot;: risk_level,<br>        &quot;signal_strength&quot;: signal_strength,<br>        &quot;ma_signal&quot;: ma_signal,<br>        &quot;rsi_signal&quot;: rsi_signal,<br>        &quot;current_price&quot;: ma_data[&quot;current_price&quot;],<br>        &quot;analysis&quot;: analysis<br>    }<br><br># Prompts<br>@mcp.prompt()<br>def analyze_ticker(symbol: str) -&gt; str:<br>    &quot;&quot;&quot;<br>    Analyze a ticker symbol for trading opportunities<br>    &quot;&quot;&quot;<br>    return f&quot;&quot;&quot;You are a professional stock market analyst. I would like you to analyze the stock {symbol} and provide trading insights.<br><br>        Start by examining the current market data and technical indicators. Here are the specific tasks:<br><br>        1. First, check the current market data for {symbol}<br>        2. Calculate the moving averages using the calculate_moving_averages tool<br>        3. Calculate the RSI using the calculate_rsi tool<br>        4. Generate a comprehensive trade recommendation using the trade_recommendation tool<br>        5. Based on all this information, provide your professional analysis, highlighting:<br>        - The current market position<br>        - Key technical indicators and what they suggest<br>        - Potential trading opportunities and risks<br>        - Your recommended action (buy, sell, or hold) with a brief explanation<br><br>        Please organize your response in a clear, structured format suitable for a professional trader.<br>        &quot;&quot;&quot;<br><br>@mcp.prompt()<br>def compare_tickers(symbols: str) -&gt; str:<br>    &quot;&quot;&quot;<br>    Compare multiple ticker symbols for the best trading opportunity<br>    <br>    Args:<br>        symbols: Comma-separated list of ticker symbols<br>    &quot;&quot;&quot;<br>    symbol_list = [s.strip() for s in symbols.split(&quot;,&quot;)]<br>    symbol_section = &quot;\n&quot;.join([f&quot;- {s}&quot; for s in symbol_list])<br>    <br>    return f&quot;&quot;&quot;You are a professional stock market analyst. I would like you to compare these stocks and identify the best trading opportunity:<br><br>        {symbol_section}<br><br>        For each stock in the list, please:<br><br>        1. Check the current market data using the appropriate resource<br>        2. Generate a comprehensive trade recommendation using the trade_recommendation tool<br>        3. Compare all stocks based on:<br>        - Current trend direction and strength<br>        - Technical indicator signals<br>        - Risk/reward profile<br>        - Trading recommendation strength<br><br>        After analyzing each stock, rank them from most promising to least promising trading opportunity. Explain your ranking criteria and why you believe the top-ranked stock represents the best current trading opportunity.<br><br>        Conclude with a specific recommendation on which stock to trade and what action to take (buy, sell, or hold).<br>        &quot;&quot;&quot;<br><br>@mcp.prompt()<br>def intraday_strategy_builder(symbol: str) -&gt; str:<br>    &quot;&quot;&quot;<br>    Build a custom intraday trading strategy for a specific ticker<br>    &quot;&quot;&quot;<br>    return f&quot;&quot;&quot;You are an expert algorithmic trader specializing in intraday strategies. I want you to develop a custom intraday trading strategy for {symbol}.<br><br>        Please follow these steps:<br><br>        1. First, analyze the current market data for {symbol} using the market-data resource<br>        2. Calculate relevant technical indicators:<br>        - Moving averages (short and long periods)<br>        - RSI<br>        3. Based on your analysis, design an intraday trading strategy that includes:<br>        - Specific entry conditions (technical setups that would trigger a buy/sell)<br>        - Exit conditions (both take-profit and stop-loss levels)<br>        - Position sizing recommendations<br>        - Optimal trading times during the day<br>        - Risk management rules<br><br>        Make your strategy specific to the current market conditions for {symbol}, not just generic advice. Include exact indicator values and price levels where possible.<br><br>        Conclude with a summary of the strategy and how a trader should implement it for today&#39;s trading session.<br>        &quot;&quot;&quot;</pre><h4>Integrating the MCP server with Claude Desktop</h4><ul><li>Integrate the stock price server with Claude for Desktop by running the following command:</li></ul><pre>mcp install stock_analysis_server.py --with requests --with pandas --with tabulate</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*a0yhwKemVwJkY_xDcGStqQ.png" /></figure><p>After the integration, restart Claude for Desktop to enable the new MCP server for stock analysis related queries.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*g7GseNQBM5YvX6IdEsFDdw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*jL7v3MlYtx7RkGAOA7Hikg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*eeiirXNwPbU4KRjNysEmSw.png" /></figure><p><em>Thanks for reading this article !!</em></p><p><em>Thanks Gowri M Bhatt for reviewing the content.</em></p><p>If you enjoyed this article, please click on the clap button 👏 and share to help others find it!</p><p>The full source code for this tutorial can be found here,</p><p><a href="https://github.com/codemaker2015/mcp-server-experiments">GitHub - codemaker2015/mcp-server-experiments</a></p><h4>Resources</h4><ul><li><a href="https://docs.astral.sh/uv/getting-started/installation/">Installation | uv</a></li><li><a href="https://github.com/modelcontextprotocol/python-sdk">GitHub - modelcontextprotocol/python-sdk: The official Python SDK for Model Context Protocol servers and clients</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=00872d1df1e4" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Getting Started with CI/CD in Machine Learning]]></title>
            <link>https://codemaker2016.medium.com/getting-started-with-ci-cd-in-machine-learning-8c3d8b956eb8?source=rss-b7a60e472dfc------2</link>
            <guid isPermaLink="false">https://medium.com/p/8c3d8b956eb8</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[beginners-guide]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[continuous-deployment]]></category>
            <category><![CDATA[continuous-integration]]></category>
            <dc:creator><![CDATA[Vishnu Sivan]]></dc:creator>
            <pubDate>Fri, 11 Apr 2025 12:40:47 GMT</pubDate>
            <atom:updated>2025-04-11T12:40:47.476Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*C4-A6EYnwlZJKYQpIDN4hA.png" /></figure><p>Continuous Integration (CI) and Continuous Deployment (CD) have long been essential practices in modern software development, enabling teams to integrate changes frequently, run automated tests, and deploy applications efficiently. While these methodologies were originally designed for traditional software, their value is increasingly evident in the world of machine learning (ML), where reproducibility, automation, and rapid iteration are just as critical.</p><p>In this article, we’ll explore how CI/CD can streamline your machine learning workflows — from training and evaluating models to deploying them seamlessly. Instead of relying on complex tools or platforms, we’ll keep things simple and accessible by using GitHub Actions, Makefile, CML (Continuous Machine Learning), and the Hugging Face CLI to build a fully automated ML pipeline.</p><h3>Getting Started</h3><h4>Table of contents</h4><ul><li><a href="#67fe">Setting up the project</a></li><li><a href="#e3e4">Step 1: GitHub Repository</a></li><li><a href="#6d52">Step 2: Hugging Face Spaces</a></li><li><a href="#c62e">Step 3: Project Structure</a></li><li><a href="#1d53">Training and Evaluating Drug Classification Model</a></li><li><a href="#d87e">Installing the dependencies</a></li><li><a href="#aca6">Loading the Dataset</a></li><li><a href="#fa49">Splitting train and test data</a></li><li><a href="#4c02">Building the Training Pipeline</a></li><li><a href="#dd14">Evaluating the model</a></li><li><a href="#f0d4">Saving the model and results</a></li><li><a href="#4f1e">Building Your Machine Learning CI Pipeline</a></li><li><a href="#e1dc">Creating update branch</a></li><li><a href="#f594">Makefile</a></li><li><a href="#f0be">GitHub Actions</a></li><li><a href="#1ab1">Building Your Machine Learning CD Pipeline</a></li><li><a href="#6eba">Building the Gradio App</a></li><li><a href="#a82b">CD Workflow</a></li><li><a href="#2bf6">Setting up repository secrets</a></li><li><a href="#dde3">Project Resources</a></li></ul><h3>Setting up the project</h3><p>In this section, we will guide you through setting up your environment, building a CI/CD pipeline, and optimizing the entire workflow. The drug classifier model is trained using a scikit-learn pipeline with a Random Forest model, automate evaluation with CML, and deploy everything to the Hugging Face Hub. Once everything is set up, every code push to GitHub will automatically retrain the model, evaluate it, and update the app, model, and results on Hugging Face.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/451/1*gStn7iDnSiRFO42x_e57uA.png" /></figure><h4>Step 1: GitHub Repository</h4><p>To begin, create a new GitHub repository for your machine learning project. This repo will host your code, datasets and configuration files for automation.</p><ul><li>Go to <a href="https://github.com/">GitHub</a>, click the <strong>“+”</strong> icon in the top right, and select “New repository”.</li><li>Enter a repository name and optional description.</li><li>Check “Add a README file”.</li><li>Set .gitignore to Python.</li><li>Click “Create repository”.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*469TmAV_UDDya_vsm0Bc7Q.png" /></figure><ul><li>Copy the repository URL and run the following commands in your terminal to clone it:</li></ul><pre>git clone your-github-repo-url</pre><ul><li>Example:</li></ul><pre>git clone https://github.com/codemaker2015/CICD-for-Machine-Learning.git <br>cd CICD-for-Machine-Learning</pre><h4>Step 2: Hugging Face Spaces</h4><p>To begin, create a new Hugging Face Space for your machine learning project. This Space will host your web application, model files, and serve as the deployment endpoint for your CI/CD pipeline.</p><ul><li>Go to Hugging Face and click on your profile picture in the top right corner. Select “New Space” from the dropdown.</li><li>Fill in the required details:</li><li>Space name: Choose a unique name for your Space</li><li>License: Select an appropriate license</li><li>SDK type: Choose Gradio or Streamlit depending on your app</li><li>Click “Create Space” to finish setup.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/897/1*uuS689YDI1aNkJEa25TEUg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/871/1*7LO2IQN6KXDdkizM0oF5fQ.png" /></figure><h4>Step 3: Project Structure</h4><p>Let’s set up the required folders and files before experimenting and building the pipeline.</p><p>Create app, data, model and results folders in your GitHub cloned repository.</p><h4>3.1 App folder</h4><p>The App folder is used to store all files related to the Hugging Face Space. It contains the web application script (drug_app.py), a README.md file with metadata for the Space, and a requirements.txt file listing the necessary Python packages.</p><p>Create the following files inside the app folder:</p><ul><li><strong>app.py</strong>: The main script for your classifier web app.</li><li><strong>README.md</strong>: Contains metadata and a description for your Hugging Face Space. You can either download the README.md file directly from the Hugging Face Space you created earlier or use the sample content provided below.</li></ul><pre>---<br>title: Drug Classification<br>emoji: 💻<br>colorFrom: pink<br>colorTo: red<br>sdk: gradio<br>sdk_version: 5.23.3<br>app_file: app.py<br>pinned: false<br>license: apache-2.0<br>---</pre><ul><li><strong>requirements.txt</strong>: Specifies the dependencies needed to run your app. Add the following packages to the requirements.txt file.</li></ul><pre>scikit-learn<br>skops</pre><h4>3.2 Data folder</h4><p>Download the Drug Classification dataset from Kaggle, extract the contents, and move the CSV file into the data folder.</p><p><a href="https://www.kaggle.com/datasets/prathamtripathi/drug-classification">Drug Classification</a></p><h4>3.3 Model and Results folder</h4><p>Both the models and results folders will initially be empty. They will be automatically populated by the Python scripts during training and evaluation.</p><h4>3.4 Repository files</h4><ul><li><strong>Makefile</strong>: Defines command shortcuts for running scripts, making it easier to trigger processes in the GitHub Actions workflow.</li><li><strong>requirements.txt</strong>: Lists all the dependencies required to set up the environment for CI workflow jobs. Add the following dependencies to the requirements.txt.</li></ul><pre>pandas<br>scikit-learn<br>numpy<br>matplotlib<br>skops</pre><ul><li><strong>train.py</strong>: Contains the core Python logic to load and preprocess data, train and evaluate the model, and save both the trained model and performance metrics.</li></ul><p>Your project folder should now look like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/273/1*eCiHZ5vlsspcHfvKy1RYmg.png" /></figure><h3>Training and Evaluating Drug Classification Model</h3><p>In this section, we will experiment with Python code to process the data and train a model using a scikit-learn pipeline. After training, evaluate the model performance and save both the results and the trained model for later use.</p><h4>Installing the dependencies</h4><ul><li>Create and activate a virtual environment by executing the following command.</li></ul><pre>python -m venv venv<br>source venv/bin/activate #for ubuntu<br>venv/Scripts/activate #for windows</pre><ul><li>Install pandas, scikit-learn, numpy, matplotlib, skops and black libraries using pip.</li></ul><pre>pip install pandas scikit-learn numpy matplotlib skops black</pre><h4>Loading the Dataset</h4><p>Load the CSV file using Pandas, shuffle the rows using the sample() function to randomize the data, and then display the first three rows.</p><pre>import pandas as pd<br><br>drug_df = pd.read_csv(&quot;data/drug.csv&quot;)<br>drug_df = drug_df.sample(frac=1)<br>print(drug_df.head(3))</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/617/1*V9r0jamKQmmlL_U8ak1OgA.png" /></figure><h4>Splitting train and test data</h4><p>Define the independent variables (X) and dependent variable (y) then split the dataset into training and testing sets. This is essential for evaluating the model performance on unseen data.</p><pre>from sklearn.model_selection import train_test_split<br><br>X = drug_df.drop(&quot;Drug&quot;, axis=1).values<br>y = drug_df.Drug.values<br><br>X_train, X_test, y_train, y_test = train_test_split(<br>    X, y, test_size=0.3, random_state=125<br>)</pre><h4>Building the Training Pipeline</h4><p>Construct a data processing pipeline using ColumnTransformer, which performs the following operations:</p><ul><li>Encodes categorical columns using OrdinalEncoder</li><li>Fills missing values in numerical columns using SimpleImputer</li><li>Scales the numerical columns using StandardScaler</li></ul><p>After preprocessing, build a training pipeline that feeds the transformed data into a RandomForestClassifier.</p><pre>from sklearn.compose import ColumnTransformer<br>from sklearn.ensemble import RandomForestClassifier<br>from sklearn.impute import SimpleImputer<br>from sklearn.pipeline import Pipeline<br>from sklearn.preprocessing import OrdinalEncoder, StandardScaler<br><br># Define categorical and numerical column indices<br>cat_col = [1, 2, 3]<br>num_col = [0, 4]<br><br># Create a column transformer for preprocessing<br>transform = ColumnTransformer(<br>    transformers=[<br>        (&quot;encoder&quot;, OrdinalEncoder(), cat_col),<br>        (&quot;num_imputer&quot;, SimpleImputer(strategy=&quot;median&quot;), num_col),<br>        (&quot;num_scaler&quot;, StandardScaler(), num_col),<br>    ]<br>)<br><br># Build the complete pipeline<br>pipe = Pipeline(<br>    steps=[<br>        (&quot;preprocessing&quot;, transform),<br>        (&quot;model&quot;, RandomForestClassifier(n_estimators=100, random_state=125)),<br>    ]<br>)<br><br># Train the model<br>pipe.fit(X_train, y_train)</pre><h4>Evaluating the model</h4><p>After training the model, evaluate its performance using two common metrics: accuracy and F1 score.</p><pre>from sklearn.metrics import accuracy_score, f1_score<br><br>predictions = pipe.predict(X_test)<br>accuracy = accuracy_score(y_test, predictions)<br>f1 = f1_score(y_test, predictions, average=&quot;macro&quot;)<br><br>print(&quot;Accuracy:&quot;, str(round(accuracy * 100, 2)) + &quot;%&quot;, &quot;F1 Score:&quot;, round(f1, 2))</pre><h4>Saving the model and results</h4><p>We will store the evaluation metrics and confusion matrix in the results/ folder. This helps in tracking performance over time, especially in CI/CD pipelines.</p><p><strong>1. Save Accuracy and F1 Score to a Text File</strong></p><pre>with open(&quot;results/metrics.txt&quot;, &quot;w&quot;) as outfile:<br>    outfile.write(f&quot;Accuracy = {round(accuracy, 2)}, F1 Score = {round(f1, 2)}&quot;)</pre><p><strong>2. Save Confusion Matrix as an Image</strong></p><pre>import matplotlib.pyplot as plt<br>from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix<br># Generate confusion matrix<br>cm = confusion_matrix(y_test, predictions, labels=pipe.classes_)<br>disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=pipe.classes_)<br># Plot and save the confusion matrix<br>disp.plot()<br>plt.savefig(&quot;results/model_results.png&quot;, dpi=120)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/768/1*XS3BUUIpg8coNrAP4D_lGg.png" /></figure><p><strong>3. Save and Load model using skops</strong></p><p>We will use the skops Python package to save our entire pipeline including both the preprocessing steps and the trained model. With skops, model versioning and reproducibility become much easier in a CI/CD workflow.</p><pre><br>import skops.io as sio<br><br># Save the trained pipeline to a file<br>sio.dump(pipe, &quot;model/drug_pipeline.skops&quot;)<br>Load the Model Pipeline<br><br># Load the saved pipeline<br>loaded_pipe = sio.load(&quot;model/drug_pipeline.skops&quot;, trusted=True)</pre><h4>Creating train.py file</h4><p>Here’s how you can structure your train.py file using the code snippets you&#39;ve worked on. This script will handle the <strong>l</strong>oading, training, evaluation, saving of the model and results in a modular way.</p><pre>import pandas as pd<br>import matplotlib.pyplot as plt<br>import skops.io as sio<br><br>from sklearn.model_selection import train_test_split<br>from sklearn.compose import ColumnTransformer<br>from sklearn.ensemble import RandomForestClassifier<br>from sklearn.impute import SimpleImputer<br>from sklearn.pipeline import Pipeline<br>from sklearn.preprocessing import OrdinalEncoder, StandardScaler<br>from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, ConfusionMatrixDisplay<br><br># Load and shuffle dataset<br>drug_df = pd.read_csv(&quot;data/drug.csv&quot;)<br>drug_df = drug_df.sample(frac=1)<br><br># Train-test split<br>X = drug_df.drop(&quot;Drug&quot;, axis=1).values<br>y = drug_df[&quot;Drug&quot;].values<br>X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=125)<br><br># Define preprocessing and model pipeline<br>cat_col = [1, 2, 3]<br>num_col = [0, 4]<br><br>transform = ColumnTransformer([<br>    (&quot;encoder&quot;, OrdinalEncoder(), cat_col),<br>    (&quot;num_imputer&quot;, SimpleImputer(strategy=&quot;median&quot;), num_col),<br>    (&quot;num_scaler&quot;, StandardScaler(), num_col),<br>])<br><br>pipe = Pipeline(steps=[<br>    (&quot;preprocessing&quot;, transform),<br>    (&quot;model&quot;, RandomForestClassifier(n_estimators=100, random_state=125)),<br>])<br><br># Train the model<br>pipe.fit(X_train, y_train)<br><br># Make predictions and evaluate<br>predictions = pipe.predict(X_test)<br>accuracy = accuracy_score(y_test, predictions)<br>f1 = f1_score(y_test, predictions, average=&quot;macro&quot;)<br><br># Save metrics<br>with open(&quot;results/metrics.txt&quot;, &quot;w&quot;) as outfile:<br>    outfile.write(f&quot;Accuracy = {round(accuracy, 2)}, F1 Score = {round(f1, 2)}&quot;)<br><br># Save confusion matrix<br>cm = confusion_matrix(y_test, predictions, labels=pipe.classes_)<br>disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=pipe.classes_)<br>disp.plot()<br>plt.savefig(&quot;results/model_results.png&quot;, dpi=120)<br><br># Save model<br>sio.dump(pipe, &quot;model/drug_pipeline.skops&quot;)</pre><h3>Building Your Machine Learning CI Pipeline</h3><p>In this section, we will explore how to use CML, Makefile, and GitHub Actions to automate model training, evaluation, and version control for our machine learning project.</p><h4>CML</h4><p>Continuous Machine Learning (CML) is an open-source tool for integrating CI into ML projects. We will use the iterative/setup-cml GitHub Action to automate model evaluation reporting. On every push, it generates a report with performance metrics and a confusion matrix under the commit and sends an email notification.</p><h4>Creating update branch</h4><p>We’re generating the evaluation report, but currently, the model and results aren’t being versioned. To track these changes properly, we’ll create a new branch called “update” and push the updated model and results to it.</p><p>To create the “update” branch:</p><ul><li>Click on the branch dropdown (where it says main)</li><li>Type “update” in the search box</li><li>Select “Create branch: update from main” to finalize the creation.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/461/1*ofGv3CUgn_jqatCIDDtRAQ.png" /></figure><h4>Makefile</h4><p>A Makefile contains command sets that can automate tasks like preprocessing, training, testing, and deploying. It helps simplify the CI workflow by bundling related commands, keeping the GitHub Actions file clean and modular.</p><ul><li>Add the following content to the Makefile.</li></ul><pre>install:<br> pip install --upgrade pip &amp;&amp;\<br>  pip install -r requirements.txt<br><br>format: <br> black *.py <br><br>train:<br> python train.py<br><br>eval:<br> echo &quot;## Model Metrics&quot; &gt; report.md<br> cat ./results/metrics.txt &gt;&gt; report.md<br> <br> echo &#39;\n## Confusion Matrix Plot&#39; &gt;&gt; report.md<br> echo &#39;![Confusion Matrix](./results/model_results.png)&#39; &gt;&gt; report.md<br> <br> cml comment create report.md<br>  <br>update-branch:<br> git config --global user.name $(USER_NAME)<br> git config --global user.email $(USER_EMAIL)<br> git commit -am &quot;Update with new results&quot;<br> git push --force origin HEAD:update<br><br>hf-login: <br> pip install -U &quot;huggingface_hub[cli]&quot;<br> git pull origin update<br> git switch update<br> huggingface-cli login --token $(HF) --add-to-git-credential<br><br>push-hub: <br> huggingface-cli upload codemaker2015/Drug-Classification ./app --repo-type=space --commit-message=&quot;Sync App files&quot;<br> huggingface-cli upload codemaker2015/Drug-Classification ./model model --repo-type=space --commit-message=&quot;Sync Model&quot;<br> huggingface-cli upload codemaker2015/Drug-Classification ./results metrics --repo-type=space --commit-message=&quot;Sync Model&quot;<br><br>deploy: hf-login push-hub<br><br>all: install format train eval update-branch deploy</pre><ul><li>After we make the necessary changes, commit them, and push the updates to the remote GitHub repository.</li></ul><pre>git add .<br>git commit -m &quot;code integration&quot;<br>git push origin main</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*DY6rIPGJy3i7eD4FlDp6CQ.png" /></figure><h4>GitHub Actions</h4><p>To automate training and evaluation, we will create a GitHub Actions workflow.</p><ul><li>Go to the “Actions” tab in your GitHub repository.</li><li>Click on “set up a workflow yourself.”</li><li>Rename the default main.yml file to ci.yml.</li><li>Start by defining the name of the workflow.</li><li>Set the trigger so that it runs on every push or pull request to the main branch or through manual dispatch.</li><li>Define the environment by using the latest Ubuntu runner.</li><li>Set up and activate the GitHub Actions we need for this CI workflow.</li><li>Use make commands to add different steps like installing dependencies, training, formatting, and evaluating.</li><li>Commit your changes to trigger the workflow — GitHub Actions will execute each step sequentially.</li><li>Provide a GitHub Token to the CML job via repository secrets (e.g., secrets.GITHUB_TOKEN).</li><li>Add the following code into your ci.yml file:</li></ul><pre>name: Continuous Integration<br>on:<br>  push:<br>    branches: [ &quot;main&quot; ]<br>  pull_request:<br>    branches: [ &quot;main&quot; ]<br>  workflow_dispatch:<br>  <br>permissions: write-all<br>jobs:<br>  build:<br>    runs-on: ubuntu-latest<br>    steps:<br>      - uses: actions/checkout@v3<br>      - uses: iterative/setup-cml@v2<br>      - name: Install Packages<br>        run: make install<br>      - name: Format<br>        run: make format<br>      - name: Train<br>        run: make train<br>      - name: Evaluation<br>        env:<br>          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}<br>        run: make eval<br>      - name: Update Branch<br>        env:<br>          NAME: ${{ secrets.USER_NAME }}<br>          EMAIL: ${{ secrets.USER_EMAIL }}<br>        run: make update-branch USER_NAME=$NAME USER_EMAIL=$EMAIL</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zWTnWjsN1rSpnRSIQerq-A.png" /></figure><h3>Building Your Machine Learning CD Pipeline</h3><p>In this section, we will explore how to automate the deployment of both the model and the application. This includes pulling the updated model and app files from the update branch, logging into the Hugging Face CLI using a token, pushing the necessary files, and ultimately deploying the application.</p><h4>Building the Gradio App</h4><p>To deploy our model and make it accessible, we’ll create a Gradio app with the following components:</p><ul><li>Load the scikit-learn pipeline and trained model.</li><li>Define a Python function to predict drug labels based on user input.</li><li>Design the input interface using sliders for numerical values and radio buttons for categorical inputs.</li><li>Add sample inputs to quickly test the model’s functionality.</li><li>Provide metadata such as title for the application and brief description highlighting its features and purpose.</li></ul><p>Add the following code to app.py file inside the app folder.</p><pre>import gradio as gr<br>import skops.io as sio<br>import warnings<br>from sklearn.exceptions import InconsistentVersionWarning<br><br># Suppress the version warnings<br>warnings.filterwarnings(&quot;ignore&quot;, category=InconsistentVersionWarning)<br><br># Explicitly specify trusted types<br>trusted_types = [<br>    &quot;sklearn.pipeline.Pipeline&quot;,<br>    &quot;sklearn.preprocessing.OneHotEncoder&quot;,<br>    &quot;sklearn.preprocessing.StandardScaler&quot;,<br>    &quot;sklearn.compose.ColumnTransformer&quot;,<br>    &quot;sklearn.preprocessing.OrdinalEncoder&quot;,<br>    &quot;sklearn.impute.SimpleImputer&quot;,<br>    &quot;sklearn.tree.DecisionTreeClassifier&quot;,<br>    &quot;sklearn.ensemble.RandomForestClassifier&quot;,<br>    &quot;numpy.dtype&quot;,<br>]<br>pipe = sio.load(&quot;./model/drug_pipeline.skops&quot;, trusted=trusted_types)<br><br><br>def predict_drug(age, sex, blood_pressure, cholesterol, na_to_k_ratio):<br>    &quot;&quot;&quot;Predict drugs based on patient features.<br><br>    Args:<br>        age (int): Age of patient<br>        sex (str): Sex of patient<br>        blood_pressure (str): Blood pressure level<br>        cholesterol (str): Cholesterol level<br>        na_to_k_ratio (float): Ratio of sodium to potassium in blood<br><br>    Returns:<br>        str: Predicted drug label<br>    &quot;&quot;&quot;<br>    features = [age, sex, blood_pressure, cholesterol, na_to_k_ratio]<br>    predicted_drug = pipe.predict([features])[0]<br><br>    label = f&quot;Predicted Drug: {predicted_drug}&quot;<br>    return label<br><br><br>inputs = [<br>    gr.Slider(15, 74, step=1, label=&quot;Age&quot;),<br>    gr.Radio([&quot;M&quot;, &quot;F&quot;], label=&quot;Sex&quot;),<br>    gr.Radio([&quot;HIGH&quot;, &quot;LOW&quot;, &quot;NORMAL&quot;], label=&quot;Blood Pressure&quot;),<br>    gr.Radio([&quot;HIGH&quot;, &quot;NORMAL&quot;], label=&quot;Cholesterol&quot;),<br>    gr.Slider(6.2, 38.2, step=0.1, label=&quot;Na_to_K&quot;),<br>]<br>outputs = [gr.Label(num_top_classes=5)]<br><br>examples = [<br>    [30, &quot;M&quot;, &quot;HIGH&quot;, &quot;NORMAL&quot;, 15.4],<br>    [35, &quot;F&quot;, &quot;LOW&quot;, &quot;NORMAL&quot;, 8],<br>    [50, &quot;M&quot;, &quot;HIGH&quot;, &quot;HIGH&quot;, 34],<br>]<br><br><br>title = &quot;Drug Classification&quot;<br>description = &quot;Enter the details to correctly identify Drug type?&quot;<br>article = &quot;A Beginners Guide to CI/CD for Machine Learning. It teaches how to automate training, evaluation, and deployment of models to Hugging Face using GitHub Actions.&quot;<br><br><br>gr.Interface(<br>    fn=predict_drug,<br>    inputs=inputs,<br>    outputs=outputs,<br>    examples=examples,<br>    title=title,<br>    description=description,<br>    article=article,<br>    theme=gr.themes.Soft(),<br>).launch()</pre><p>Add the following dependencies to the requirements.txt inside the app folder.</p><pre>scikit-learn<br>skops<br>gradio</pre><h4>CD workflow</h4><p>To make our workflow fully CI/CD compliant, we need to create another file named cd.yml, similar to the existing ci.yml file. Once the CI pipeline completes successfully, it will trigger the cd.yml workflow using the on.workflow_run parameter. This deployment workflow will set up the environment and execute the make deploy command, using the Hugging Face token to push the latest model and application updates to the Hugging Face Hub.</p><ul><li>Go to GitHub actions and create a workflow named as cd.yml.</li><li>Add the following code to it.</li></ul><pre>name: Continuous Deployment<br>on:<br>  workflow_run:<br>    workflows: [&quot;Continuous Integration&quot;]<br>    types:<br>      - completed<br><br>  workflow_dispatch:<br><br>jobs:<br>  build:<br>    runs-on: ubuntu-latest<br>    steps:<br>      - uses: actions/checkout@v3<br><br>      - name: Deployment To Hugging Face<br>        env:<br>          HF: ${{ secrets.HF }}<br>        run: make deploy HF=$HF</pre><h3>Setting up repository secrets</h3><p>To commit and push changes using Git, you need to configure a username and email. While you can set these directly, it’s recommended to use GitHub Secrets for better security. Additionally, the CD pipeline requires a Hugging Face token to deploy the application to the Hugging Face Hub.</p><p>Follow these steps to securely add the necessary credentials to your GitHub repository using Secrets.</p><ul><li>Go to your repository Settings and click on “Secrets and variables” under the Security section.</li><li>Select “Actions”, then click the green “New repository secret” button.<br> Add a name and value — this works like setting an environment variable on your local machine.</li><li>To generate a Hugging Face token, click on your profile picture and select “Settings”.</li><li>Navigate to “Access Tokens”, then click “New Token” and ensure it has write permissions.</li><li>Copy the token and add it as a repository secret in the same way as you did for the Git username and email.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*duJxXN5wQPKIijFo5DF91w.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*VZrLZMHyJdCDSN9ZwAwsGg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1007/1*wYIQeIWwwjkveQPVzK-afA.png" /></figure><ul><li>After we make the necessary changes, commit them, and push the updates to the remote GitHub repository. Note that we have added a few GitHub Actions, so make sure to pull the latest changes from the remote repository before pushing your local updates.</li></ul><pre>git add .<br>git commit -m &quot;gradio code added&quot;<br>git pull origin main<br>git push origin main</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1010/1*dEpfC25BpVAtWW9hzYJtXw.png" /></figure><ul><li>Once you push the changes, GitHub Actions will be triggered automatically to run the CI/CD pipelines. Alternatively, you can run them manually by clicking the Re-run all jobs button.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/377/1*crGnRIbzlbMgi3_TeGr9YA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zY9NJJ9bPZ9QnqGQZihNEQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*F_XlAfgf1Z1igYE12k3VgQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*7Ig0r0gAmur5naq9nScOKg.png" /></figure><ul><li>You can monitor live logs for each step by selecting the run option in the workflow build. Once the files are successfully uploaded to the Hugging Face server, the corresponding Space will begin setting up the environment. Shortly after, the application will launch and start running.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ogfBcfYCC5riY6hBTnPw3w.gif" /></figure><h3>Project Resources</h3><ul><li>GitHub Repository: <a href="https://github.com/codemaker2015/CI-CD-for-Machine-Learning/">codemaker2015/CI-CD-for-Machine-Learning</a></li><li>Hugging Face Space: <a href="https://huggingface.co/spaces/codemaker2015/Drug-Classification">Drug Classification — a Hugging Face Space by codemaker2015</a></li><li>Kaggle Dataset: <a href="https://www.kaggle.com/datasets/prathamtripathi/drug-classification">Drug Classification</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8c3d8b956eb8" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>