Simform Engineering - Medium

Event-Driven Architecture with Kafka in .NET: A Modern Approach to Building Scalable Systems

Dharmesh Khakhkhar — Wed, 01 Jul 2026 08:58:29 GMT

Designing loosely coupled, highly scalable, and resilient distributed systems using Apache Kafka and .NET

Topic Overview — Introduction

Modern applications often need to process millions of events in real time.
Traditional request-response architectures can become bottlenecks as systems grow and scale.

Event-Driven Architecture addresses this challenge by allowing services to communicate asynchronously through events, enabling better scalability, flexibility, and loose coupling between components.

In this article, we will explore how to build an Event-Driven system using Apache Kafka with .NET and understand how this architecture helps create scalable and resilient applications.

Key Features of Event-Driven Architecture / Kafka

Asynchronous Communication: Services communicate via events without waiting for immediate responses.
Loose Coupling: Producers and consumers do not need to know about each other.
Scalability: Kafka supports horizontal scaling through partitions.
High Throughput: Kafka is optimized for handling millions of messages per second.
Fault Tolerance: Data is replicated across brokers to prevent data loss.
Event Replay: Consumers can replay past events when needed.

Advantages of Event-Driven Systems

Improved System Scalability: Services can scale independently.
Better Resilience: If one service fails, others continue to operate.
Real-Time Processing: Kafka enables near real-time data streaming.
Flexibility in Integration: Easy to integrate new services without impacting existing ones.
Better Observability: Event logs provide traceability and auditability.

Real-World Use Cases

Event-Driven Architecture with Kafka is ideal for:

Order Processing Systems (E-commerce)
Payment Processing Pipelines
Real-Time Notifications
Inventory Management
Log Aggregation Systems
IoT Data Streaming
Microservices Communication

What is Apache Kafka?

Apache Kafka is a distributed event streaming platform used to publish, store, and process large streams of real-time data.

It was originally developed at LinkedIn and later open-sourced under the Apache Software Foundation.

Kafka Architecture Overview

The architecture of Apache Kafka is designed to handle high-throughput, real-time data streaming with strong scalability and fault tolerance. Kafka works on a distributed publish–subscribe model, where producers send events and consumers process them asynchronously.

Explain : Kafka acts as the central event broker. Producers publish events to topics. Topics are split into partitions, enabling parallel processing. Consumers within the same consumer group share partitions and scale message processing horizontally.

Topics and Partitions

Kafka organizes messages into topics, which are logical streams of events.

Each topic is divided into partitions, which allow Kafka to scale horizontally.

Benefits of partitions:

Enable parallel processing
Allow consumers to process messages independently
Preserve ordering within a partition

Example:

Kafka guarantees message order only within a partition.

Producer

Kafka Producer — How It Works

When your application sends an order event to Kafka, it goes through 5 simple steps:

Order request — your app calls producer.send() with the order data.
Serialize — the order object is converted to bytes (JSON, Avro, etc.) before sending.
Connect to broker — the producer routes the message to the correct partition leader using a persistent connection.
Publish to topic — records are batched and sent to the Kafka topic for efficiency.
Store in partition — the broker appends the record to the partition log and assigns it a unique offset.

Consumer

Kafka Consumer — How It Works

A Kafka consumer reads events from a topic in 5 steps:

Join group — the consumer registers with the group coordinator and announces itself as part of a named consumer group.
Partition assigned — the broker assigns specific partitions to this consumer. Each partition is owned by exactly one consumer in the group at a time.
Poll broker — the consumer enters a continuous poll loop, fetching a batch of records starting from its last committed offset.
Deserialize records — raw bytes are converted back into usable objects (JSON, Avro, etc.) using the configured deserializer.
Process message — your business logic runs: save to a database, trigger a downstream service, update a cache, and so on.

After successful processing, the consumer commits its offset — telling Kafka how far it has read. On the next poll, it picks up from there.

Consumer Groups

Kafka consumers read messages as part of a consumer group, allowing multiple consumer instances to share the processing workload efficiently.

Key characteristics of consumer groups include:

Each partition is assigned to only one consumer within a consumer group at any given time.
Multiple consumers in the same group enable parallel processing across partitions.
This partition-to-consumer assignment ensures message ordering within a partition and prevents duplicate processing within the same group.
If a consumer fails, Kafka automatically rebalances the group and reassigns its partitions to other active consumers.

Example

Consider a topic with four partitions and a consumer group containing two consumers. Kafka may assign partitions 0 and 1 to Consumer A, and partitions 2 and 3 to Consumer B. If Consumer A crashes, Kafka automatically reassigns its partitions to Consumer B during the rebalance process.

In our code:

GroupId = "order-consumer-group";

All consumers using the same GroupId become members of the same consumer group and cooperate to process messages from the topic.

Consumer Group vs GroupId

These two terms are closely related but often confused.

Consumer Group: A logical group of consumers that work together to consume messages from a topic
GroupId: A configuration property used to uniquely identify that consumer group in Kafka

Example:

var config = new ConsumerConfig
{
 BootstrapServers = “localhost:29092”,
 GroupId = “order-consumer-group”,
 AutoOffsetReset = AutoOffsetReset.Earliest
};

All consumers using the same GroupId belong to the same consumer group.

Production Architecture Overview

In a modern Event-Driven Architecture (EDA), services communicate through events rather than direct synchronous API calls. Kafka acts as the central event backbone that enables asynchronous communication between independent services.

For example, when an Order Service creates a new order, it publishes an OrderCreated event to a Kafka topic. Multiple downstream services such as Inventory, Payment, Notification, and Analytics can consume the same event independently without impacting the producer.

Benefits of this architecture include:

Loose coupling between services
Independent scaling of producers and consumers
Improved fault tolerance and resilience
Better deployment flexibility
Support for real-time event processing

A typical flow looks like:

Order API
    |
    v
Kafka Producer
    |
    v
+------------------+
|  Order Topic     |
+------------------+
   |      |      |
   v      v      v
Inventory Payment Notification
 Service   Service    Service

This approach allows organizations to build scalable distributed systems where services evolve independently while maintaining reliable communication.

Message Keys for Ordered Processing

This is important for real-world systems.

Add to Producer Code

await _producer.ProduceAsync("orders", new Message
{
    Key = order.OrderId.ToString(),
    Value = JsonSerializer.Serialize(order)
});

Explain:

By using OrderId as the message key, Kafka ensures that all events for the same order are routed to the same partition, preserving order.

Offset Behavior Section

Kafka tracks message position using offsets.

Each message within a partition has a unique offset number.

Example:

Consumers store offsets to know where to resume processing.

Offsets represent positions within a partition log. They are not globally unique message identifiers and are only meaningful within a specific partition.

Consumer Configuration:

AutoOffsetReset = AutoOffsetReset.Earliest

Use cases:

• Earliest → replay events
• Latest → real-time processing

Graceful Shutdown

Add to consumer code:

Console.CancelKeyPress += (_, e) =>
{
    e.Cancel = true;
    consumer.Close();
};

Explain:

Graceful shutdown ensures that the consumer commits offsets before stopping, preventing duplicate message processing.

Dead Letter Queue (DLQ)

Dead Letter Queue (DLQ) — Your Kafka Safety Net

A Dead Letter Queue is a special Kafka topic that catches messages your consumer failed to process — even after retrying. Instead of losing the message silently or blocking the entire partition, you route it to a DLQ and keep moving.

DLQs should be actively monitored. Failed messages should be reviewed, replayed, or resolved through operational processes to prevent silent message loss.

How it works:

Consumer receives a message from orders topic and tries to process it.
Processing fails (bad data, downstream service down, schema mismatch, etc.).
Consumer retries up to a configured limit (e.g. 3 attempts).
Retries exhausted → message is published to orders.DLQ along with error metadata (original topic, partition, offset, exception, retry count).
Your team can then inspect the failure, replay the message once the bug is fixed, alert on-call, or discard if the message is genuinely invalid.

Why it matters:

No message is silently dropped — every failure is observable.
The main consumer keeps moving — one bad message doesn’t block thousands of good ones.
Replay is safe — fix the bug, re-publish from the DLQ, done.

Naming convention: orders.DLQ or orders.dead-letter — one DLQ per source topic is the common pattern.

Monitor it: Set an alert if DLQ message count grows — it means something in your pipeline is broken.

Idempotent producer

Idempotent Producer — Exactly-Once, No Duplicates

Idempotent producers prevent duplicate writes caused by producer retries but do not provide end-to-end exactly-once processing guarantees.

Without idempotence, a network hiccup causes a silent bug: the broker writes a message, the ack gets lost, the producer retries — and now the same message exists twice in the partition. Your order gets processed twice. Your invoice gets sent twice.

The idempotent producer fixes this with two simple things:

Producer ID (PID) — the broker assigns a unique ID to each producer instance on first connect.
Sequence number — the producer stamps every message with a per-partition counter that increments with each send.

When a retry arrives with the same PID + sequence number, the broker recognises it as a duplicate and silently discards it. Your consumer sees the message exactly once.

How to enable it:

properties

enable.idempotence=true

That single setting automatically enforces acks=all and retries=MAX_INT — the safest defaults.

What it covers and what it doesn’t:

Exactly-once within a single partition — guaranteed.
Exactly-once across multiple partitions or topics — you need the Kafka Transactions API (producer.beginTransaction() / commitTransaction()).

Overhead: Nearly zero. The broker only tracks the last 5 sequence numbers per producer per partition.

Bottom line: Always enable idempotence for any production producer writing critical data. It costs nothing and eliminates an entire class of subtle data bugs.

Schema Registry

Schema Registry — The Contract Between Producers and Consumers

In Kafka, messages are just bytes. Without a schema, nothing stops a producer from changing its payload structure and silently breaking every consumer downstream. Schema Registry solves this.

How it works:

Producer registers the schema (Avro, Protobuf, or JSON Schema) with the registry on first use. The registry returns a schema ID.
Each message carries only the schema ID (4 bytes) in its header — not the full schema. This keeps messages tiny.
Consumer reads the schema ID, fetches the full schema from the registry (cached after first fetch), and deserializes the bytes correctly.

Schema Evolution — Compatibility Modes:

ModeWhat it meansBACKWARD (default)New schema can read old messages — safe to add optional fieldsFORWARDOld schema can read new messages — safe to remove optional fieldsFULLBoth directions — safest, most restrictiveNONENo checks — use only in development

Why it matters:

Breaking schema changes are rejected at publish time — not discovered when consumers crash in production.
The schema is stored once, referenced by ID — smaller messages, faster serialization.
Producers and consumers are decoupled from each other but coupled to the contract — the right tradeoff.

Popular implementations: Confluent Schema Registry, AWS Glue Schema Registry, Apicurio Registry.

Bottom line: If you’re using Kafka in production with more than one team touching the same topic, Schema Registry is essential — it’s the API contract for your event streams.

Partitioning Strategy

Kafka Partitioning Strategy — Choosing the Right One

Partitioning decides which partition a message lands in. The right strategy depends on whether you need ordering, throughput, or custom routing.

1. Key-based (default when key is set) hash(key) % numPartitions — same key always goes to the same partition. Guarantees ordering for all events sharing a key (e.g. all events for order-123 arrive in sequence). Best for: orders, user activity, sessions.

2. Round-robin (no key set, pre-Kafka 2.4) Messages spread evenly across all partitions one at a time. Maximum load balance, zero ordering guarantee. Best for: logs, metrics, analytics where order doesn’t matter.

3. Sticky (default no-key, Kafka 2.4+) Fills one partition’s batch completely before switching to the next. Produces fewer, larger batches — significantly better throughput than round-robin with the same no-ordering tradeoff. Best for: high-volume keyless workloads.

4. Custom partitioner Implement the Partitioner interface and return any partition number you want. Use it for VIP user routing, geo-based lanes, or priority queues (critical vs. background jobs on separate partitions).

Hot partition — the silent killer If a single key generates far more traffic than others, one partition absorbs all that load while others sit idle. The fix: append a random salt suffix to the key (user_id + "-" + random(0, N)) to spread the load — at the cost of losing strict per-user ordering.

Rule of thumb: Use key-based when order matters. Use sticky when it doesn’t and throughput does. Use custom only when your routing logic can’t be expressed as a key hash.

Batch Processing

Kafka Batch Processing — How Records Are Grouped and Sent

Kafka doesn’t send one message per network request. Instead, the producer groups records into batches before dispatching them — this is the primary reason Kafka can handle millions of messages per second.

How it works:

1. RecordAccumulator buffers records Every call to producer.send() places the record into an in-memory buffer called the RecordAccumulator — organized per (topic, partition) pair. Records sit here until the batch is ready to flush.

2. Two flush triggers A batch is sent when either condition is met first:

batch.size — the batch has accumulated enough bytes (default 16 KB). Send immediately.
linger.ms — the wait timer has expired (default 0 ms). Send whatever is buffered, even if the batch isn't full.

3. Sender thread compresses and dispatches A background Sender thread drains ready batches, optionally compresses them, and sends one network request to the broker carrying all records in the batch.

4. Broker appends the whole batch The broker writes the entire batch to the partition log in a single operation — far more efficient than writing records one at a time.

Tuning tradeoffs:

GoalSettingsLow latencylinger.ms=0, small batch.sizeHigh throughputlinger.ms=5–20, batch.size=64–512KBReduced I/Ocompression.type=lz4 or zstd

Rule of thumb: For real-time pipelines, keep linger.ms=0. For analytics or log pipelines where a few milliseconds don't matter, bumping linger.ms to 5–20ms can double or triple your throughput with zero code changes.

Health Check and monitoring

Kafka Health Check & Monitoring — What to Watch

Kafka exposes hundreds of metrics via JMX. These are the ones that actually matter in production.

Broker Health — The Non-Negotiables

MetricHealthy valueActiveControllerCountMust be exactly 1 — 0 means no controller, cluster is brokenUnderReplicatedPartitionsMust be 0 — any value means data isn't fully replicatedOfflinePartitionsCountMust be 0 — offline partitions mean data is unavailableDisk / CPU / NetworkWatch for saturation trends

Producer Metrics

record-error-rate — should always be 0. Any errors mean messages are being dropped.
request-latency-avg — how long the broker takes to ack. Alert if consistently above 100ms.
record-queue-time-avg — time records spend waiting in the accumulator. A rising trend means the producer is falling behind.

Consumer Lag — Your Most Important Metric

Consumer lag = log-end-offset − committed-offset. It tells you how far behind your consumers are.

A stable lag is fine. A growing lag means consumers can’t keep up with producers — scale up consumers or optimize processing.
Track records-lag-max per partition and records-lag-avg per consumer group.
If lag exceeds your retention window, consumers will start losing messages.

Topic Metrics

MessagesInPerSec — write throughput per topic.
BytesIn/OutPerSec — bandwidth usage, useful for capacity planning.
LogEndOffset growth rate — how fast data is being written.

Alert Thresholds

Page immediately: OfflinePartitions > 0, UnderReplicatedPartitions > 0, ActiveController != 1
Warn: Consumer lag growing, record-error-rate > 0, disk above 80%

Recommended tools: Prometheus + Grafana (open source), Confluent Control Center (managed), Datadog or New Relic for full-stack observability.

Error Handling

Error Handling Strategy

Production systems must handle failures gracefully.

Producer Errors

Handle Kafka publish failures:

try
{
    await producer.ProduceAsync(topic, message);
}
catch (ProduceException ex)
{
    Console.WriteLine($"Kafka publish error: {ex.Error.Reason}");
}

Consumer Errors

Possible failures:

• Deserialization errors
• Business logic failures
• External API failures

Best practice:

Use Dead Letter Topics (DLT) for failed messages.

Example:
An Order Service publishes an OrderCreated event.

Payment Service consumes it
Inventory Service consumes it
Notification Service consumes it

All independently.

Coding Example Architecture Diagram

Explain:

In this demo:

• Order API acts as the event producer
• Kafka stores the event in the orders topic
• Consumer service processes events asynchronously

Example Code

Let’s build a simple example:

Scenario:
An Order API publishes an event to Kafka.
A Background Worker consumes that event.

Prerequisites

.NET 10+ SDK installed
Kafka installed locally or via Docker
Basic knowledge of ASP.NET Core
Visual Studio / VS Code

To run Kafka using Docker:

docker run -p 29092:29092 apache/kafka

Required NuGet Packages

Install:

dotnet add package Confluent.Kafka

Package Used:

Confluent.Kafka

Configuration

Add in appsettings.json:

{
  "Kafka": {
    "BootstrapServers": "localhost:29092",
    "Topic": "order-events",
    "GroupId": "order-consumer-group"
  }
}

Code Examples

Create Order Event Model

public class OrderCreatedEvent
{
    public string OrderId { get; set; }
    public string ProductName { get; set; }
    public double Price { get; set; }
}

2. Kafka Producer Service

using Confluent.Kafka;
using System.Text.Json;

public class KafkaProducer
{
    private readonly IProducer _producer;
    private readonly string _topic;

    public KafkaProducer(IConfiguration configuration)
    {
        var config = new ProducerConfig
        {
            BootstrapServers = configuration["Kafka:BootstrapServers"],
            Acks = Acks.All,
            EnableIdempotence = true,
            MessageSendMaxRetries = int.MaxValue,

            CompressionType = CompressionType.Zstd,
            LingerMs = 5,
            BatchSize = 64 * 1024

        };

        _producer = new ProducerBuilder(config).Build();
        _topic = configuration["Kafka:Topic"];
    }

    public async Task ProduceAsync(OrderCreatedEvent orderEvent)
    {
        var message = JsonSerializer.Serialize(orderEvent);

        await _producer.ProduceAsync(_topic, new Message
        {
            Value = message
        });
    }
}

Explain each setting

Acks = All

Producer waits until all replicas confirm.

Producer
   ↓
Leader Broker
   ↓
Replica 1
Replica 2

Safer but slightly slower.

EnableIdempotence = true

Prevents duplicate messages when retries happen.

Without:

Send OrderCreated
↓
Network timeout
↓
Retry
↓
Duplicate event

With idempotence:

Kafka stores only one copy.

CompressionType

Reduces network traffic.

100 MB
 ↓
20 MB

Faster and cheaper.

LingerMs

Waits a few milliseconds before sending.

Instead of:

1 message
1 request

1 message
1 request

Kafka batches:

100 messages
1 request

3. API Controller

[ApiController]
[Route("api/[controller]")]
public class OrdersController : ControllerBase
{
    private readonly KafkaProducer _producer;

    public OrdersController(KafkaProducer producer)
    {
        _producer = producer;
    }

    [HttpPost]
    public async Task CreateOrder(OrderCreatedEvent order)
    {
        await _producer.ProduceAsync(order);
        return Ok("Order event published successfully");
    }
}

4. Kafka Consumer (Background Service)

using Confluent.Kafka;
using System.Text.Json;

public class KafkaConsumer : BackgroundService
{
    private readonly IConfiguration _configuration;

    public KafkaConsumer(IConfiguration configuration)
    {
        _configuration = configuration;
    }

    protected override Task ExecuteAsync(CancellationToken stoppingToken)
    {
        var config = new ConsumerConfig
        {
            BootstrapServers = _configuration["Kafka:BootstrapServers"],
            GroupId = _configuration["Kafka:GroupId"],
            AutoOffsetReset = AutoOffsetReset.Earliest
        };

        var consumer = new ConsumerBuilder(config).Build();
        consumer.Subscribe(_configuration["Kafka:Topic"]);

        return Task.Run(() =>
        {
            while (!stoppingToken.IsCancellationRequested)
            {
                var result = consumer.Consume(stoppingToken);
                var orderEvent = JsonSerializer.Deserialize(result.Message.Value);

                Console.WriteLine($"Order Received: {orderEvent.OrderId}");
            }
        }, stoppingToken);
    }
}

5. Register Services in Program.cs

builder.Services.AddSingleton();
builder.Services.AddHostedService();

How It Works

Client calls POST /api/orders
API publishes event to Kafka topic
Kafka stores event in partition
Consumer reads event
Business logic executes independently

Example Output

After running the ASP.NET Core API and Kafka consumer, we send a POST request:

POST /api/orders

Request Body:

{
  "orderId": "ORD-1001",
  "productName": "Laptop",
  "price": 75000
}

The following output confirms successful event publishing and consumption.

API Response

Order event published successfully

Consumer Console Output

Order Received: ORD-1001
Product: Laptop
Price: 75000

This confirms:

Event was published
Kafka stored the message
Consumer processed the event successfully

All asynchronously. No direct service-to-service blocking.

The complete project is available on this GitHub repository.

Final Conclusion

Event-Driven Architecture with Kafka allows .NET applications to become more scalable, resilient, and loosely coupled.

By using Kafka as the event streaming platform, services can communicate asynchronously and handle massive workloads without tight dependencies.

As modern systems grow more distributed, adopting event-driven patterns becomes essential for building reliable and scalable applications.

Event-driven architectures require more than integrating Kafka into an application — they depend on thoughtful system design, resilient messaging patterns, and operational best practices. Simform helps organizations build scalable event-driven platforms that improve resilience, throughput, and long-term maintainability across distributed systems.

For more updates on the latest development trends, follow the Simform Engineering blog.

Follow Us: Twitter | LinkedIn

Event-Driven Architecture with Kafka in .NET: A Modern Approach to Building Scalable Systems was originally published in Simform Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Mastering Agent Handoffs in Copilot: Build Powerful Multi-Agent Workflows

Vanshita Shah — Mon, 29 Jun 2026 11:05:21 GMT

Create specialized agents and seamless handoffs to make Copilot operate like a real development team.

Agent Handoff in Github Copilot

Overview: Why One AI Agent Is No Longer Enough?

One agent can plan.
Another can research.
A third can execute.

Building a feature rarely involves just one kind of work. You might start by designing the architecture, move on to implementation, and then finish with code reviews, testing, performance checks, or security validation. Each stage requires a different mindset: planner, builder, reviewer.

Traditionally, developers switch between tools, prompts, and contexts to handle each step manually. But what if Copilot could work more like a real engineering team?

Imagine specialized AI agents taking ownership at different stages of the workflow: one planning the system design, another implementing the feature, and another reviewing the final code; all while seamlessly sharing context with each other.

That’s the power of agent handoffs in VS Code Copilot.

In this comprehensive guide, you’ll learn what agent handoffs are, how to build them using YAML configuration, the difference between handoffs and subagent orchestration, and best practices for production use.

Benefits of Specialized Agents + Handoffs vs One Giant Agent

While a single giant agent may feel quicker initially, the specialized handoff approach reduces total time by minimizing rework.

Agents + Handoffs vs One Giant Agent

Summary: When to Use Which?
Use One Giant Agent for simple, quick, one-off tasks where speed is the priority and model choice doesn’t matter much.

Use Specialized Agents + Handoffs for structured, multi-step, production-grade workflows. This approach shines when you want:

Higher quality and consistency
Better cost-efficiency through per-step model selection (e.g., fast & cheap models like Claude Haiku for Types Agent, and more powerful models like Claude Opus for Implementation Agent)
Clear human oversight between phases

Understanding Agent Handoffs

Handoffs let one agent suggest or automatically transition to another specialized agent after completing its part of the task.

Instead of one giant agent trying to do everything, you break complex work into focused phases. After an agent finishes, Copilot shows prominent buttons that let the user:

Switch to the next agent
Carry over the full conversation context
Pre-fill a prompt (and optionally auto-send it)

Understanding YAML Frontmatter in Custom Agents

Every custom agent begins with YAML frontmatter, the configuration layer that defines how the agent behaves, what responsibilities it handles, and how it participates in your workflow.

---
description: "Short description shown as placeholder in chat if hint is not there"
name: "Types Agent"                      # Display name in agent picker
model: "model-name"                      # Single model or array for fallbacks
tools: [read, edit, execute, search]     # Allowed tools only
argument-hint: "Describe the feature you want to build: Placeholder"

user-invocable: true                     # Show in agent selector

handoffs:                                # Array defines all available handoff destinations from the current agent
  - label: Build Service Layer
    agent:  Service Agent
    prompt: "Use the generated types and module context above to implement typed API service functions that match this codebase conventions."
    send: true
  - label: "Start Implementation"        # Button text
    agent: "Implementation Agent"        # Target agent name (exact match)
    prompt: "Use the service functions and module context above to implement the requested UI (page or component) and wire data flow end-to-end."
    send: false                          # Auto-send the prompt? (true/false)
---

📌 Important Handoff Properties:

🔸label: Clear, action-oriented button text

🔸agent: Exact name of the target agent

🔸prompt: Pre-filled message passed to the next agent

🔸send: Set to true for seamless auto-transition (use carefully)

The rest of the .agent.md contains detailed Markdown instructions that act as the agent’s system prompt.

Real-World Example: The Three-Agent Feature Development Pipeline

Build a complete feature from API specification to working React component, with each agent specialized for one task.

Stage 1️⃣: Types Agent (Intake & Validation)

File:.github/agents/types-agent.agent.md

---
description: "Step 1 of handoff flow: validate API URL + sample JSON, infer module name, and generate TypeScript types for the module."
name: "Types Agent"
model: "Claude Haiku 4.5"
tools: [read, edit, execute]
argument-hint: "Paste API endpoint URL and sample JSON response"
handoffs:
- label: Build Service Layer
agent: Service Agent
prompt: "Use the generated types and module context above to implement typed API service functions that match this codebase conventions."
send: true
---

# You are a TypeScript types specialist… ( Add task details and specifications for types agent )

What it does:

Input: API endpoint URL + sample JSON response
Output: TypeScript interfaces (e.g., IPost, PostCreateInput) written to src/types/[module]/index.ts

✅ When the Types Agent finishes, a ”Build Service Layer” button appears at the bottom of the response.

Click it. Because send: true is set in the Types Agent's YAML, the prompt is automatically submitted - no manual input needed. The Service Agent starts immediately.

Types agent outcome

Stage 2️⃣: Service Agent (API Integration)

File: .github/agents/service-agent.agent.md

---
description: "Step 2: Generate typed API service functions"
name: "Service Agent"
model: ["GPT-5.3-Codex (copilot)", "Gemini 3.1 Pro (Preview) (copilot)"]
tools: [read, edit, execute]
handoffs:
  - label: Implement UI
    agent: Implementation Agent
    prompt: "Use the service functions and module context above to implement the requested UI (page or component) and wire data flow end-to-end."
    send: false
---

# You are a TypeScript API service specialist...

What it does:

Input: Generated TypeScript types from Types Agent
Output: Typed CRUD service functions using Axios + endpoint constants, written to src/services/[module]/moduleApi.ts

✅ When the Service Agent finishes, an “Implement UI” button appears at the bottom of the response.

Click it. Because send: false is set in the Service Agent's YAML, the prompt is pre-filled into the input field, but not submitted. Review it, then manually click Send (or press Enter).

Service agent outcome

Stage 3️⃣: Implementation Agent (UI Assembly)

File: .github/agents/implementation-agent.agent.md

---
description: "Step 3: Build React page or component with React Query hooks, wired to the service layer end-to-end."
name: "Implementation Agent"
model: ["Claude Sonnet 4.5 (copilot)", "GPT-5.3-Codex (copilot)"]
tools: [read, edit, execute]
---

# You are the Implementation Agent...

🧩What it does:

Input: Service functions + module context from Service Agent
Output: React page/component with React Query hooks (useQuery, useMutation), written to src/pages/[module]/ or src/components/[module]/

Subagents Orchestration vs Handoffs

Subagents Orchestration vs Handoffs

When to use which?

Use Handoffs when you want structured phases and human review gates.
Use Subagents when you want one “manager” agent to orchestrate work behind the scenes.

You can also combine both approaches in advanced setups.

You can access the agent files using the following link:
GitHub Repository
Documentation

Conclusion

Custom agents and handoffs transform Copilot from a generic coding assistant into a structured development teammate that can follow your team’s architecture, conventions, and workflow patterns.

The three-agent API pipeline shared in this guide is production-ready and easily extensible. You can introduce additional specialists later, such as Testing, Security, Documentation, or Review agents, depending on your workflow needs.

Next Steps

Copy the three agents into your repository
Customize the agent instructions to match your project conventions
Try the workflow on your next API integration or feature implementation

As AI-assisted development becomes part of modern engineering workflows, organizations need more than isolated coding assistants. They need well-defined development practices that combine automation, governance, and engineering standards. Simform helps teams design and implement AI-powered software development workflows that improve productivity while maintaining code quality, security, and architectural consistency.

For more updates on the latest tools and technologies, follow the Simform Engineering blog.

Follow Us: Twitter | LinkedIn

Mastering Agent Handoffs in Copilot: Build Powerful Multi-Agent Workflows was originally published in Simform Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Dynamic Islands Architecture with Astro: Hydration Reimagined

Ritesh Adwani — Mon, 29 Jun 2026 11:00:15 GMT

Great apps don’t hydrate more. They hydrate right.

Ever opened a website that looked ready… but wasn’t?

You click a button. Nothing happens.

You click again. Still nothing.

Then suddenly - everything wakes up at once like the browser just remembered it has responsibilities.

Modern frontend applications have quietly normalized this experience.

The page appears instantly, but underneath the surface, the browser is still downloading JavaScript bundles, parsing framework runtimes, reconstructing component trees, and hydrating an entire application just so a search bar and a dark mode toggle can function.

Somewhere along the way, we collectively normalized shipping JavaScript to parts of the page that never truly needed it.

And honestly? That’s a little insane.

For years, frontend development optimized heavily for developer experience.

Reusable components
Client-side routing
Reactive state
SPA architectures

And to be clear - none of these things are bad. Frameworks like React fundamentally changed frontend engineering for the better.

But there’s a tradeoff we stopped questioning.

Why are we sending JavaScript for parts of the page that were never interactive to begin with?

That question sits at the heart of Astro’s Islands Architecture. And once you see it, it becomes very difficult to unsee.

The Web Accidentally Became a JavaScript Application

There was a time when websites mostly shipped HTML. The browser received a document. The document rendered. Users interacted with it.

Life was simple.

What’s interesting is that the web didn’t become JavaScript-heavy overnight. We got here through a series of decisions that were, for the most part, entirely reasonable.

Applications became richer.
User expectations increased.
Interfaces became more dynamic.

Frameworks evolved to keep up.

Then SPAs happened. Suddenly, the browser wasn’t just rendering pages anymore - it was booting applications.

Modern frameworks started shipping things like:

Virtual DOM runtimes
Hydration logic
Client-side routers
State managers
Reactive systems
Component trees

Even for pages that were mostly static.

A blog post
A documentation page
A marketing website
A product page

We began treating every website like it was Figma running in the browser.

The result? A hydration cascade.

Why Pages Feel Ready Before They Actually Are

A typical SPA rendering flow often looks something like this:

📄 HTML Arrives
        ↓
📦 JavaScript Downloads
        ↓
⚙️ JavaScript Executes
        ↓
💧 Hydration Begins
        ↓
✨ Finally Interactive

The important detail here is this:

Hydration usually happens for the entire page.

The browser doesn’t really know that your hero section is static, your footer never changes, and your blog content is just text.

As far as the framework is concerned, every component gets treated as a potential source of interactivity.

Not just the parts that need interactivity. Everything.

Navbar
Footer
Blog content
Cards
Hero sections
Static layouts.

All of it gets JavaScript. Even if users never interact with those parts.

And this creates a subtle but important performance tax:

Larger bundles
More parsing work
More CPU execution
Delayed Time to Interactive
Worse performance on lower-end devices

The browser spends time waking up components that never needed waking up.

React Isn’t the Problem

This is an important distinction. The issue isn’t React itself - it’s the assumption that every part of a page deserves the same level of interactivity. React was built for highly interactive applications, and for those use-cases, it absolutely shines.

Dashboards
Collaborative tools
Realtime apps
Complex state-heavy interfaces.

That’s React’s territory. But most websites on the internet are not collaborative design tools. Most websites are still mostly content.

And that leads to a much more interesting question:

Does the entire page really need JavaScript?

Or more specifically:

Which parts of the page actually deserve interactivity?

That’s the question Astro asks. And that question changes everything.

Astro: A Framework That Questions the Default

Astro arrived with a surprisingly simple philosophy 🚀

Ship HTML by default. Add JavaScript only where it earns its place.

That sounds obvious. But modern frontend tooling conditioned us to think the opposite way. In many frameworks, JavaScript everywhere became the default.

Astro flipped the mental model entirely. Instead of treating hydration as a page-level concern, Astro treats it as a component-level decision.

Some parts of the page can remain static. Others can become interactive.

And that distinction changes everything.

That tiny shift creates a massive architectural difference. Astro is what people often call an HTML-first framework.

Meaning:

Pages render to HTML at build time
Components ship zero JavaScript by default
Interactivity is opt-in
Hydration becomes selective instead of global

This is where Islands Architecture enters the story.

The Islands Mental Model

The term “Islands Architecture” was popularized by Jason Miller, who described modern pages as mostly static HTML with small islands of interactivity scattered throughout. And honestly, it’s one of the best frontend analogies ever created.

Imagine your webpage as an ocean

Most of that ocean is static content : headings, paragraphs, images, layouts, product descriptions, documentation, blog content

None of that needs JavaScript to exist. It’s just content.

Now scattered across that sea are small interactive regions

Search bars, carts, dropdowns, comment sections, theme toggles, carousels, filters.

Those are the islands. Astro hydrates the islands. The sea remains pure HTML. That distinction is the entire philosophy. And the size difference matters.

The sea should be vast. The islands should be small.

The goal isn’t to eliminate JavaScript. The goal is to become intentional about where JavaScript actually adds value. The goal is to stop treating JavaScript like free candy.

Because every kilobyte shipped to the browser eventually becomes:

download time
parse time
execution time
memory usage
CPU work

Especially on slower devices.

Astro simply forces developers to become intentional about it.

What Actually Makes Something an Island?

Not every component is automatically an island. An island has three important characteristics:

🔹 Independent: It doesn’t need to know other islands exist.

🔹 Self-contained: It owns its own state, lifecycle, and JavaScript.

🔹 Isolated: If one island fails or loads slowly, the rest of the page remains unaffected.

That’s what makes the architecture resilient.

Rendering ≠ Hydration

One of Astro’s smartest ideas is separating two concepts most frontend developers casually blur together. Rendering and hydration are not the same thing.

Rendering

Rendering means:

Converting components into HTML.

This happens on the server or at build time.

Hydration

Hydration means:

Attaching JavaScript behavior to already-rendered HTML.

This happens in the browser.

Most frameworks collapse these ideas into one continuous operation. Astro separates them deliberately. Every component renders. Only selected components hydrate. That might sound like a small implementation detail. Architecturally, it’s a completely different way of thinking about frontend rendering.

Or as I like to phrase it:

In traditional SPAs, hydration is a page-level operation. In Astro, hydration becomes a component-level decision.

That’s the shift.

Selective Hydration: The Real Superpower

Astro introduces something called client directives.

These directives decide:

if a component hydrates
when it hydrates
under what conditions it hydrates

And the API is beautifully simple.

Example

That single attribute tells Astro - “This component actually needs JavaScript.”

Meanwhile, everything else can remain static HTML. And this is where Astro becomes incredibly interesting. Because hydration is no longer all-or-nothing. It becomes strategic.

Meet the Client Directives

client:load

Hydrates immediately when the page loads.

Best for:

search bars
nav interactions
essential UI

client:idle

Hydrates once the browser finishes critical work.

Basically Astro saying:

“This feature matters… but not urgently.”

Perfect for:

chat widgets
recommendation sections
secondary UI

client:visible

Hydrates only when the component enters the viewport. Which is honestly brilliant. After all, there’s no reason to wake up UI that the user hasn’t even seen yet.

client:media

Hydrates only when a media query matches. Useful for device-specific interactions.

client:only

Skips server-side rendering entirely. Useful for components that fundamentally depend on browser APIs - maps, canvas & certain third-party libraries.

The Cool Part: Astro Doesn’t Care About Your Framework

This is where Astro starts feeling slightly rebellious

You can use pretty much anything you like:

React
Vue
Svelte
Preact
SolidJS

In the same project. On the same page. At the same time.

Which means this is completely valid:

Different frameworks. Different runtimes. Independent islands.

That’s kind of wild when you think about it. Astro doesn’t force framework loyalty. It only cares about delivery strategy. That’s a very different mindset from most frontend ecosystems and is surprisingly refreshing.

What This Looks Like in Practice

Let’s take a typical e-commerce product page.

What actually needs JavaScript here?

Traditional SPA approach? Hydrate everything.

Astro approach?

That’s selective hydration in action.

Instead of treating the page as one massive interactive runtime, Astro isolates interactivity to only the components that genuinely need it.

To put that into perspective, imagine this page is made up of 20 components.

A traditional SPA hydration model would typically hydrate all 20 components, regardless of whether they’re interactive or not. An Astro implementation, on the other hand, might only hydrate the 3-4 components that actually need JavaScript.

The exact numbers will vary from project to project, but the architectural difference remains the same.

Astro doesn’t necessarily render less. In many cases, both approaches produce identical HTML. The difference is what happens after that HTML reaches the browser.

Less JavaScript to download. Less code to execute. Less UI to hydrate.

This is why teams migrating content-heavy sites to Astro frequently report Lighthouse scores jumping from the 50-70 range into the 90s. The content was never the bottleneck. The hydration overhead was.

And honestly? That architectural mindset feels far closer to how most websites actually behave.

Performance Isn’t a Feature in Astro. It’s the Default.

One thing Astro gets very right is this:

Performance isn’t treated like an optimization pass you do later. It becomes the architecture itself. Because when static components ship zero JavaScript:

bundles shrink
parsing work decreases
CPU usage drops
TTI improves
content becomes readable faster

The browser spends less time booting frameworks and more time simply rendering content.

Which, you know… used to be the browser’s main job.

And this is especially important on:

slower phones
weak CPUs
poor network conditions
emerging markets

Sometimes frontend discussions happen entirely on high-end MacBooks with fiber internet. Real users do not live there. Astro’s model respects that reality surprisingly well.

But Astro Isn’t Magic

And this is important. Islands Architecture is not the perfect solution for every application.

If your app is something like:

highly realtime
deeply collaborative
state-heavy everywhere
interaction-first

Then a traditional SPA is often the better architectural fit. In those scenarios, shrinking hydration boundaries often doesn’t buy you much. The application itself is the island.

A complex dashboard probably doesn’t benefit much from shrinking hydration boundaries. And that’s okay. Astro isn’t trying to replace React. It’s solving a different category of problems - especially content-heavy experiences where full-page hydration can feel excessive. That nuance matters.

Good engineering is not about picking one technology forever.

It’s about understanding tradeoffs.

The Bigger Shift Isn’t Technical. It’s Philosophical.

What makes Astro interesting isn’t just selective hydration. It’s the mental model underneath it. For years, frontend frameworks trained us to think:

“Everything is an application.”

Astro asks something much smaller. And much smarter.

“Which parts actually deserve interactivity?”

That sounds subtle. Architecturally, it changes everything.

Because once JavaScript stops being the automatic answer, you start evaluating interactivity much more critically. And that’s where better architectural decisions begin.

And honestly, the web needed that conversation. We became incredibly good at building complex frontend systems. But somewhere along the way, we also normalized making users download and execute huge amounts of JavaScript for pages that mostly just needed to display content.

Astro feels like a course correction. Not backwards. Just… more intentional.

The Real Lesson Behind Islands Architecture

For years, frontend frameworks optimized for building applications.

Astro reminds us that many websites aren’t applications at all.

They’re documents.
They’re content.
They’re experiences.

And those experiences shouldn’t have to pay for JavaScript they never asked for.

The most interesting thing Astro introduces isn’t a new rendering strategy. It’s a new default. And defaults matter. Because many performance problems don’t come from the decisions we actively make. They come from the decisions we never realize we’re making.

That’s what Islands Architecture challenges.

Not React. Not SPAs. Just the assumption that every page deserves the same amount of JavaScript.

Because the best JavaScript…is still the JavaScript you never had to ship.

Modern frontend architectures require thoughtful decisions around rendering, hydration, and JavaScript delivery to achieve fast, scalable user experiences. Simform helps organizations design and modernize frontend applications with performance-first architectures that improve responsiveness, maintainability, and long-term scalability.

References and Further Reading

For more updates on the latest tools and technologies, follow the Simform Engineering blog.

Follow us: Twitter | LinkedIn

Dynamic Islands Architecture with Astro: Hydration Reimagined was originally published in Simform Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

How I Decide Which Tech Debt to Kill and Which to Protect

Milan Dadhaniya — Thu, 25 Jun 2026 05:50:05 GMT

After years of leading engineering teams, I stopped treating all tech debt the same. Here’s the actual framework I use.

Every lead engineer has been in this meeting.

Someone from product wants a new feature by end of quarter. Your backend has a data model that was clearly designed on a Friday afternoon in 2021. The authentication module is a tower of duct tape that somehow hasn’t fallen over. And somewhere in the backlog, there are 47 tickets tagged “tech debt” that haven’t been touched in eight months.

The question is never “do we have tech debt?” The question is always “which of this debt actually matters right now, and which can safely stay buried?”

For a long time I answered that question with instinct. Senior engineers would flag something, I’d agree it was messy, it would go into the backlog, and the cycle would continue. Nothing systemic. Nothing defensible when product asked why we were spending sprint capacity cleaning up old code instead of shipping.

Then we had an incident that forced me to get serious about it.

The Incident That Changed My Thinking

We were scaling a Node service that had been humming along fine at moderate traffic. As load increased, we started seeing intermittent failures that were nearly impossible to reproduce locally. It took three engineers two weeks to trace it back to a connection pooling pattern that had been written years earlier — technically functional, completely unmaintained, and sitting on an assumption about concurrency that was no longer true at our scale.

Nobody had flagged it as debt. It wasn’t messy-looking code. It just had a hidden load-bearing assumption that nobody remembered making, and nobody had documented.

That incident taught me the most important thing I know about tech debt: the danger isn’t always in the ugly code. It’s in the load-bearing assumptions that nobody wrote down.

After that, I started thinking about debt completely differently.

The Two Questions That Actually Matter

Most frameworks for evaluating tech debt ask you to score it on “effort to fix” vs “impact if fixed.” That’s fine for prioritization once you’ve already decided something needs fixing. It doesn’t help you answer the harder question: should this be fixed at all right now, or should we protect it?

The two questions I actually ask are:

1. What is this debt blocking? Not in the abstract — specifically. Is it blocking a feature we’re shipping in the next two quarters? Is it blocking hiring, because no engineer wants to touch it? Is it blocking reliability, because it degrades under load? Or is it sitting in a stable part of the system that nobody is touching and nobody needs to touch?

2. What happens when this breaks — not if? Every piece of debt breaks eventually. The question is what the failure mode looks like. Does it fail loudly with an error you can trace immediately? Does it fail silently in a way that corrupts data? Does it cascade across services? Does it require the one engineer who wrote it to fix it, or can anyone with context debug it in under an hour?

These two questions together tell me almost everything I need to know. Debt that blocks active work and fails catastrophically when it breaks gets killed. Debt that sits dormant and fails loudly and cleanly when it eventually breaks? That can wait — and sometimes, it can wait forever.

The Four Categories I Actually Use

Over time I’ve settled on four buckets. I don’t use a scoring matrix or a spreadsheet. I use these four words, and they’re enough.

Kill — fix it now, in this quarter, full stop.

This is debt that is actively in the critical path of something the team is building, or that carries a failure mode that would be catastrophic and silent. The Node connection pooling issue above was Kill category — not because it looked bad, but because it had a hidden assumption that would keep causing incidents as we scaled.

The test I use: would this debt meaningfully slow down or break the next important thing we’re trying to do? If yes, it gets scheduled, sized, and shipped like any feature. No exceptions.

Shrink — reduce the blast radius without a full rewrite.

Some debt is too risky to leave completely untouched but too expensive to eliminate entirely right now. The goal isn’t to fix it — it’s to make sure that when it eventually breaks, the damage is contained and diagnosable.

In practice this usually means: adding observability (logging, alerting, tracing), writing a runbook so anyone on the team can respond to an incident in that area, and adding a thin abstraction layer so the messy implementation can be replaced incrementally later without touching everything that depends on it.

This is the category most leads skip. They either fix things fully or leave them alone. The shrink move is underused and genuinely powerful.

Protect — leave it alone deliberately.

This one surprises people. Some debt is doing a job. It’s ugly, it’s old, it probably violates three patterns we care about today, but it is stable, well-understood by the team, and in a part of the system that isn’t actively changing.

Rewriting stable code is one of the most reliable ways to introduce new bugs. I’ve seen teams spend a quarter cleaning up a module that was working fine, only to introduce a regression that took another quarter to hunt down. The rewrite felt productive. It wasn’t.

The rule here is: if it’s stable, if it’s not in the path of active development, and if it fails loudly when it eventually breaks, protect it. Don’t touch it. Don’t let a new engineer refactor it as a “good first task.” Write down what it does, why it exists, and where the bodies are buried — then leave it alone.

Watch — not urgent, but not invisible either.

This is debt that doesn’t meet the bar for Kill or Shrink right now, but shouldn’t just disappear into the backlog either. It goes on a watchlist with a specific trigger: “we revisit this when X happens.” X might be “when we add a second service that depends on this,” or “when we hire a third engineer who needs to understand this codebase,” or “when we hit 10x current traffic.”

The watchlist prevents the debt from becoming invisible. Invisible debt is how you end up with 47 items in a backlog that nobody actually understands anymore.

How I Have This Conversation With Product

The hardest part of managing tech debt as a lead isn’t the technical judgment. It’s the conversation with your product manager or your director when they want to know why sprint capacity is going toward something users can’t see.

I’ve tried many framings over the years. The one that works is anchoring debt to a specific future capability the business cares about.

“This piece of the system is the bottleneck that will slow down feature X by two sprints if we don’t address it before we start building.”

“This module has a failure mode that, based on our growth trajectory, will cause an outage in production within the next six months. Here’s why.”

Notice what I’m not saying: “this code is messy” or “we really should clean this up.” Those are engineering aesthetics. Product doesn’t have to care about engineering aesthetics, and honestly they shouldn’t have to.

What product does care about is delivery velocity, reliability, and whether you’re going to be able to build what they need to build next. Frame the debt in those terms and the conversation stops being a negotiation between “technical purity” and “shipping features” — because it was never really about that in the first place.

The Debt I Learned to Protect

There’s one category of debt that I used to kill on sight that I now actively protect: intentional shortcuts taken under real constraints.

When a team makes a deliberate, documented decision to cut a corner because shipping mattered more than perfection in that moment — that’s not the same as accidental mess. It’s a calculated bet. And some of those bets age fine.

The Vue components in our admin panel were written fast, inconsistent with the design system, and held together with more watchers than I’d like. But they’ve been stable for two years, they’re used by maybe fifteen internal users, and nobody is touching them. We have a document that says: “this module is intentionally rough; it was built in a two-week sprint to unblock ops; it is not the pattern we follow elsewhere.”

That documentation makes it protected, not embarrassing. It means a new lead engineer joining the team doesn’t spend their first week trying to “fix” something that doesn’t need fixing. Context is the difference between debt that’s dangerous and debt that’s just old.

The Habit That Changed Everything

The single practice that made my debt decisions more defensible than anything else: writing a short tech debt log entry every time I make a deliberate technical shortcut.

Not a ticket. Not a comment in the code. A short entry in a shared doc that says: what we built, what we knowingly left rough, why we made that call, and what the trigger condition is for when it should be revisited.

Three sentences. Takes two minutes. Means that six months later, when someone asks “why is this done this way,” there’s an answer — and the answer tells you whether it’s debt worth killing or debt worth protecting.

Most tech debt isn’t the result of bad engineers making bad decisions. It’s the result of good engineers making reasonable decisions under constraints, and then never writing down the context. The code survives. The reasoning doesn’t. And without the reasoning, everything looks like a mess that needs cleaning up.

The Framework in One Paragraph

When a piece of tech debt comes to my attention, I ask: is it blocking something active, or sitting dormant? Does it fail catastrophically and silently, or loudly and cleanly? Based on that, it goes into one of four buckets — Kill, Shrink, Protect, or Watch. I frame the Kill and Shrink decisions for product in terms of delivery impact, not code quality. And I document every intentional shortcut we take so future me — and future team members — have the context to make the same judgment calls without starting from scratch.

That’s it. No scoring matrix. No spreadsheet. Just two questions, four buckets, and the discipline to write things down.

One Last Thing

The lead engineers I’ve seen struggle most with tech debt aren’t the ones who make bad technical judgments. They’re the ones who treat it as a binary — clean vs messy, fix vs ignore.

Real systems aren’t clean. They’re living things with history, context, and accumulated decisions made by people who were doing their best at the time. Your job isn’t to eliminate all the debt. It’s to know which debt is quietly holding up the ceiling, which debt is safe to walk past, and which debt is about to become someone else’s 2am incident.

The difference between those three things is almost entirely about context — and almost entirely absent from the codebase itself.

Technical debt becomes a business problem when it slows delivery, increases operational risk, or makes systems harder to evolve. Simform helps organizations modernize applications, improve software maintainability, and make pragmatic engineering decisions that balance long-term architecture with near-term business priorities.

For more updates on the latest tools and technologies, follow the Simform Engineering blog.

Follow us: Twitter | LinkedIn

How I Decide Which Tech Debt to Kill and Which to Protect was originally published in Simform Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Distributed Tracing in Java Spring Boot

Avinash Hargun — Wed, 24 Jun 2026 04:58:17 GMT

Distributed Tracing in Java Spring Boot

The 3 AM Incident You Don’t Want to Have

It’s 3 AM. Your on-call phone rings. Users are reporting that checkout is failing. You SSH into your servers, frantically grepping through log files across six different microservices the API gateway, user service, order service, inventory service, payment service, and notification service and all you can find are disconnected error messages with no clear thread linking them together.

Sound familiar?

This is the reality of debugging micro-services without observability tooling in place. As soon as you break a monolith into services, your logs fragment across machines, containers, and time zones. A single user request may touch a dozen services before it either succeeds or silently dies somewhere in the middle and your logs give you no way to follow it.

Distributed tracing is the solution. In this article you’ll learn exactly what it is, how it works, and how to implement it in your Java Spring Boot applications using the industry-standard OpenTelemetry toolkit with Jaeger as the tracing backend.

What Is Distributed Tracing?

Distributed tracing is an observability technique that tracks a single request as it travels through every service in your system from the moment the client sends it to the moment a response comes back.

Think of it like attaching a GPS tracker to a package in a logistics network. You can see every warehouse it passed through, how long it spent at each stop, whether anything was delayed, and exactly where things went wrong.

In microservice terms, here’s a typical request flow and, crucially, how long each hop takes:

With distributed tracing, you can instantly see that payment processing consumed 230ms of a 287ms total request. That’s actionable. Without it, you’re guessing.

Why Traditional Logging Falls Short

Logging is essential but it was designed for a single-process world. In a distributed system, logs have some painful shortcomings:

Logs are scattered. When a request touches six services, the relevant log lines live in six separate log files on six separate machines. Pulling them together manually is tedious and error-prone.

Logs lack context. A log line like ERROR: Payment failed tells you what happened but not which user request triggered it, what chain of calls preceded it, or how long each step took.

Correlation is manual. Some teams use a correlation ID passed in HTTP headers, but implementing, propagating, and querying this consistently across all services is significant engineering work and you’re still not getting timing data.

You can’t see the full picture. Even with perfect logs, you can’t easily visualize the shape of a distributed request which calls were parallel, which were sequential, what the latency profile looks like across the whole trace.

Distributed tracing solves all of this systematically.

Core Concepts: The Vocabulary You Need

Before jumping to code, let’s nail the core concepts. These terms appear everywhere in tracing tooling, so understanding them upfront will save you a lot of confusion.

Trace

A trace represents the complete journey of a single request through your entire system. From first touch to final response, everything is grouped under one trace.

Trace ID

A Trace ID is a globally unique 128-bit identifier (expressed as a hex string) assigned at the very first entry point of your system typically the API gateway or the first service that receives the request. This single ID is propagated to every downstream service and acts as the common thread that ties all related spans together. Example: traceId: 4bf92f3577b34da6a3ce929d0e0e4736

Span

A span is a single unit of work within a trace one HTTP call between services, one database query, one cache lookup, or any other discrete operation you want to measure. A trace is built from one or more spans. Think of the trace as the folder and spans as the individual files inside it. Example spans: “validate-user in User Service (12ms)”, “create-order in Order Service (45ms)”, “process-payment in Payment Service (230ms)”.

Span ID

Every span gets its own unique Span ID within the trace. While the Trace ID stays constant across the entire request chain, the Span ID changes with every new unit of work. This is how tracing systems distinguish between, say, three separate database calls made within the same service during the same request same Trace ID, three different Span IDs.

Parent Span & Child Span

Spans are organized in a parent-child hierarchy that reflects the actual call tree. The first span in a trace is the root span (no parent). When Service A calls Service B, the span created inside Service B is a child span of Service A’s span. This is the relationship that lets tracing tools render a flame graph you can visually see which spans triggered which downstream work, and exactly where in the call tree latency is accumulating.

Context Propagation

Context propagation is how the Trace ID and Span ID travel from service to service. When Service A makes an HTTP call to Service B, it injects the trace context into outgoing HTTP headers via the W3C traceparent standard. Service B reads those headers and creates a child span. Without propagation, each service would start a disconnected, orphaned trace.

Sampling

In production you don’t want to trace every single request that generates enormous data volumes. Sampling decides which requests get traced:

Head-based sampling: Decide at trace start (e.g., sample 10% of all requests)
Tail-based sampling: Collect all traces but retain only those matching criteria errors, or requests slower than 500ms

How Distributed Tracing Works: Step by Step

Request arrives at your API Gateway or first service.
A Trace ID is generated a UUID that identifies this request chain for its entire lifetime.
A root span is created to represent the work happening in this service.
The service does its work, potentially calling downstream services.
Before each downstream call, the trace context (Trace ID + Span ID) is injected into outgoing HTTP headers via the W3C TraceContext standard.
The downstream service reads the headers, extracts context, and creates a child span linked to the parent.
Steps 5–6 repeat for every subsequent service hop.
Spans are exported to a tracing backend (Jaeger, Zipkin, Tempo) as each service completes its work.
The backend assembles the spans into a flame graph giving you the complete picture.

The Ecosystem: Tools to Know

Important: Spring Boot 3.x dropped Spring Cloud Sleuth. The correct modern approach is Micrometer Tracing + OpenTelemetry, which is exactly what we implement below.

Why OpenTelemetry? The Industry Standard Explained

OpenTelemetry (OTel) didn’t just become popular it replaced an entire generation of fragmented, vendor-specific instrumentation libraries. Here’s why it’s the right choice for every new Spring Boot project.

Vendor-neutral by design. OTel is a CNCF (Cloud Native Computing Foundation) project the same organization behind Kubernetes, Prometheus, and Envoy. You instrument your code once using the OTel API, and you can switch backends (Jaeger → Zipkin → Grafana Tempo → AWS X-Ray → Datadog) purely through configuration changes, with zero code modifications.

It unified a fragmented ecosystem. Before OTel, the Java observability landscape was split across OpenTracing, OpenCensus, Zipkin’s Brave library, and dozens of vendor SDKs all incompatible. OpenTelemetry merged OpenTracing and OpenCensus and became their official successor. The entire industry cloud providers, APM vendors, framework maintainers has aligned behind it.

Spring Boot 3.x has first-class OTel support. Spring Boot 3.x ships Micrometer Tracing as a native abstraction that bridges directly to OpenTelemetry. You get automatic HTTP request tracing, RestTemplate and WebClient context propagation, MDC log correlation, and actuator integration all out of the box with just the right dependencies. Spring Cloud Sleuth (the old approach) is no longer supported in Spring Boot 3.x; OTel via Micrometer is the official replacement.

One SDK, three observability pillars. OTel isn’t just for traces. The same SDK covers traces, metrics, and logs under a unified data model. As your observability maturity grows, you can extend the same setup to export metrics to Prometheus and structured logs to your log aggregation platform all from a single, coherently maintained library.

Hands-On: Distributed Tracing in Spring Boot 3.x

Let’s build a minimal two-service system and wire up distributed tracing end to end.

Step 1 — Maven Dependencies

Add the following to both services’ pom.xml:



  org.springframework.boot
  spring-boot-starter-web




  org.springframework.boot
  spring-boot-starter-actuator




  io.micrometer
  micrometer-tracing-bridge-otel




  io.opentelemetry
  opentelemetry-exporter-otlp

Step 2 — application.yml Configuration

User Service (src/main/resources/application.yml):

spring:
  application:
    name: user-service
server:
  port: 8080
management:
  tracing:
    sampling:
      probability: 1.0  # 100% — lower to 0.1 in production
  otlp:
    tracing:
      endpoint: http://localhost:4318/v1/traces
logging:
  pattern:
    level: "%5p [${spring.application.name:},%X{traceId},%X{spanId}]"

Apply the same config to Order Service, changing name: order-service and port: 8081.

Step 3 — RestTemplate Bean

Creating RestTemplate via RestTemplateBuilder is critical this is how Spring Boot auto-registers the tracing interceptor that injects the traceparent header on every outbound call. A plain new RestTemplate() silently skips propagation.


@Configuration
public class RestTemplateConfig {

  @Bean
  public RestTemplate restTemplate(RestTemplateBuilder builder) {
    // Builder auto-registers the tracing interceptor.
    // This is what injects the W3C traceparent header on every call.
    return builder.build();
  }
}

Step 4 — User Service Controller

@RestController
@RequestMapping("/users")
public class UserController {

  private static final Logger log = LoggerFactory.getLogger(UserController.class);
  private final RestTemplate restTemplate;

  public UserController(RestTemplate restTemplate) {
    this.restTemplate = restTemplate;
  }

  @GetMapping("/{userId}/profile")
  public Map getUserProfile(@PathVariable String userId) {
    log.info("Fetching profile for userId={}", userId);

    // traceparent header injected automatically by the instrumented RestTemplate
    String url = "http://localhost:8081/orders/user/" + userId;
    Map orders = restTemplate.getForObject(url, Map.class);

    return Map.of(
        "userId", userId, "name", "Jane Doe",
        "email", "jane@example.com", "recentOrders", orders
    );
  }
}

Step 5 — Order Service Controller

@RestController
@RequestMapping("/orders")
public class OrderController {

  private static final Logger log = LoggerFactory.getLogger(OrderController.class);

  @GetMapping("/user/{userId}")
  public Map getOrdersForUser(@PathVariable String userId) {
    // This traceId matches the one in User Service — same trace!
    log.info("Fetching orders for userId={}", userId);

    return Map.of(
        "userId", userId,
        "orders", List.of(
            Map.of("orderId", "ORD-001", "status", "DELIVERED"),
            Map.of("orderId", "ORD-002", "status", "PROCESSING")
        )
    );
  }
}

Step 6 — Run Jaeger Locally

docker run -d --name jaeger \
  -p 16686:16686 \   # Jaeger UI
  -p 4318:4318 \     # OTLP HTTP receiver
  -p 4317:4317 \     # OTLP gRPC receiver
  jaegertracing/all-in-one:latest

Open http://localhost:16686 to access the Jaeger UI.

Step 7 — See It in Action

curl http://localhost:8080/users/42/profile

Check your log output across both services. You’ll see the same Trace ID appearing in both, with different Span IDs:

# user-service log
INFO [user-service,4bf92f3577b34da6a3ce929d0e0e4736,a3ce929d0e0e4736] Fetching profile for userId=42

# order-service log — SAME traceId, different spanId
INFO [order-service,4bf92f3577b34da6a3ce929d0e0e4736,b9c4f1d2e3a5b6c7] Fetching orders for userId=42

In the Jaeger UI, select user-service from the dropdown and click Find Traces. You'll see the complete flame graph — User Service parent span with Order Service child span nested underneath, with exact timing for each.

Observing Traces in the Jaeger UI

Once both services are running and you’ve fired a request via curl, here's how to read exactly what Jaeger is showing you.

Finding Your Trace

Open http://localhost:16686. In the left panel, set the Service dropdown to user-service and click Find Traces. You'll see a reverse-chronological list of recent traces. Each row shows the service name, root operation, total duration, and the number of spans. Click any row to open the detail view.

Reading the Flame Graph

The detail view is a Gantt-style flame graph. The widest bar at the top is the root span — the total end-to-end duration of the request. Nested below are child spans, each indented to reflect the parent-child call tree. For our two-service example you’ll see something like this:

# Root span: total request time in user-service
user-service   GET /users/42/profile        287ms  ███████████████████████████
  # Child span: work done inside user-service
  user-service   validate-user                12ms  ██
  # Child span: the outbound call to order-service (same Trace ID!)
  order-service  GET /orders/user/42          45ms  ████

Expanding a Span for Details

Clicking any span bar expands it to reveal its tags (key-value attributes) and logs (timestamped events). For an HTTP span, you’ll see the URL, method, status code, and response size. For a database span, you’ll see the query. This is where you look when a span shows unexpected latency the tags tell you the exact SQL query, cache key, or downstream endpoint that is taking time.

Identifying Bottlenecks Visually

The visual width of each bar directly represents its proportion of the total trace duration. A span that fills 80% of the bar width is your bottleneck no arithmetic required. In real production usage, Jaeger’s Compare Traces feature lets you overlay a slow trace against a fast one to immediately spot structural differences: an extra downstream call, a missing cache hit, or a sudden spike in a previously fast span.

Filtering by Error or Latency

In the search panel, you can filter traces by tags for example, set error=true to show only failed traces, or set a minimum duration to surface traces slower than a threshold. These filters turn Jaeger from a debugging tool into a continuous performance monitoring view: after every deployment, filter for P99 traces to immediately confirm whether latency improved or regressed.

Conclusion

Distributed tracing transforms the way you understand and debug microservice systems. Instead of piecing together logs from a dozen places, you get a single coherent view of every request across every service, every hop, and every millisecond.

Here’s what we covered: the core concepts (traces, spans, trace IDs, context propagation, sampling), how a trace flows automatically using W3C TraceContext headers, and how to implement end-to-end tracing in Spring Boot 3.x using Micrometer Tracing + OpenTelemetry + Jaeger. The implementation is deliberately low-effort just dependencies, three lines of YAML per service, and a RestTemplate bean.

The Jaeger UI gives you immediate visual feedback. And when something breaks at 3 AM, you’ll have exactly the tool you need to find it in seconds rather than hours.

As distributed systems grow in complexity, end-to-end observability becomes critical for maintaining reliability and performance. Simform helps organizations implement tracing, metrics, and logging solutions that improve visibility across applications and accelerate issue resolution.

Try It Yourself

If you’d like to explore the implementation discussed in this article, here is the repository : http://github.com/backend-simformsolutions/distributed-tracing-blog-poc

Distributed Tracing in Java Spring Boot was originally published in Simform Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Design Patterns Won’t Save You If You Don’t Know When to Use Them (Part 2 of 2)

Akash Chauhan — Tue, 23 Jun 2026 05:28:54 GMT

Part 2 of 2 — Covers 11 behavioral patterns: The Communicators and Strategists.

Part 1: Builders, Architects, and Connectors

Part 1 covered the structural half: how objects are created, wrapped, and connected. Part 2 is about how objects communicate — who sends messages, who receives them, and how they’re routed.

These are the patterns with the sharpest production teeth. The Observer leak from the opening story (14 modules, memory leaking, event loop choking) is here. The middleware pipeline you write in Express every week is here. The Redux action model you use daily is here — most developers just don’t know the name.

The Communicators

How objects send, route, and respond to messages.

Observer

Intent: When one object changes state, notify all dependents automatically.

Problem: Twelve React components need to react when a user’s subscription changes. Prop-drilling reaches 8 levels deep. Callbacks compound. A global event bus with string names works until nobody remembers to unsubscribe — and the app fires stale callbacks for users who logged out three sessions ago.

Solution: Subscriptions are typed and explicit. Every subscription returns a cleanup function. Callers are responsible for calling it.

Analogy: A YouTube subscription — you don’t poll the channel. The channel notifies you when something publishes. You can unsubscribe anytime.

type SubscriptionEvents = {
  'plan:upgraded': { newPlan: Plan; previousPlan: Plan };
  'plan:cancelled': { effectiveDate: Date };
};

class SubscriptionService {
  private listeners = new Map>();

  on(event: T, handler: (e: SubscriptionEvents[T]) => void): () => void {
    if (!this.listeners.has(event)) this.listeners.set(event, new Set());
    this.listeners.get(event)!.add(handler);
    return () => this.listeners.get(event)?.delete(handler); // returns a cleanup fn — caller must call it or leak
  }

  private emit(event: T, data: SubscriptionEvents[T]) {
    this.listeners.get(event)?.forEach(h => h(data)); // notify every registered handler
  }
}

// In React — useEffect's return value IS the cleanup function
useEffect(() => {
  return subscriptionService.on('plan:upgraded', ({ newPlan }) => setCurrentPlan(newPlan));
  //     ↑ returning the cleanup fn here: React calls it on unmount, removing the handler
}, []);

Use when

Multiple unrelated parts of your system react to changes in one part
Subscriptions join and leave dynamically at runtime

Strengths

Decouples publisher from subscribers
Open/Closed — add subscribers without modifying publisher

Tradeoffs

Control flow is hard to trace
Memory leaks when cleanup isn’t called

Critical rule: Every on() must have a corresponding off() or cleanup path. This single rule prevents the most common memory leak in browser applications.

Mediator

Intent: Route communication between a set of objects through a central coordinator instead of directly between them.

Problem: Your trading dashboard has OrderBook, PriceChart, TradeHistory, PositionTracker, and RiskCalculator — all needing to react to each other. Wired directly, it becomes a graph of n² connections. Adding, removing, or changing any component requires understanding all its connections.

Solution: All components communicate through a Mediator. No component holds a reference to any other.

Analogy: Air traffic control — planes don’t talk to each other; they all route through the tower.

type TradingEvent =
  | { type: 'order:filled'; orderId: string; price: number }
  | { type: 'position:changed'; symbol: string; netQty: number };

class TradingMediator {
  private components = new Map();

  register(name: string, component: { onUpdate(e: TradingEvent): void }) {
    this.components.set(name, component); // components know the mediator; they never know each other
  }

  dispatch(event: TradingEvent, source: string) {
    // fan out to every component except the one that sent the event
    this.components.forEach((c, name) => {
      if (name !== source) c.onUpdate(event);
    });
  }
}

// Without Mediator: OrderBook → PriceChart, OrderBook → RiskCalculator, PriceChart → PositionTracker… n² wires
// With Mediator:    every component → Mediator → every other component  (n wires total)

Use when

Components communicate in complex ways and the resulting dependencies are painful to maintain
Reusing a component is hard because it holds references to many others
UI widget coordination, game state, or chat systems

Strengths

Reduces n² connections to n
Components become reusable

Tradeoffs

The Mediator itself can become a God Object
Centralizes logic that may be better distributed

Related: Observer is decentralized — components subscribe directly. Mediator is centralized — everything routes through one coordinator. Use Observer for loose coupling, Mediator when you need coordination logic.

Chain of Responsibility

Intent: Pass a request through a chain of handlers; each decides to process it or pass it on.

Problem: An HTTP request needs to pass through auth, rate limiting, authorization, and validation before reaching the handler. Each concern is independent. If any fails, the chain stops. They need to be composable and orderable without any step knowing about the others.

Solution: Each handler implements the same interface, calls next() to continue, or throws to stop. You've used this every time you wrote Express middleware.

Analogy: Airport security — passport control, screening, customs, gate check. Each is independent. Fail one and you stop.

type NextFn = () => Promise;

interface Middleware {
  handle(ctx: RequestContext, next: NextFn): Promise;
}

class AuthMiddleware implements Middleware {
  async handle(ctx: RequestContext, next: NextFn) {
    const token = ctx.headers['authorization']?.replace('Bearer ', '');
    if (!token) throw new UnauthorizedError(); // stop the chain — next() is never called
    ctx.user = await verifyJWT(token);         // attach result to context for downstream handlers
    await next();                              // pass control to the next handler in the chain
  }
}

class RateLimitMiddleware implements Middleware {
  async handle(ctx: RequestContext, next: NextFn) {
    const count = await redis.incr(`limit:${ctx.user?.id}`);
    if (count > RATE_LIMIT) throw new TooManyRequestsError(); // stop
    await next();                                              // pass through
  }
}

function buildChain(middlewares: Middleware[]) {
  return async (ctx: RequestContext) => {
    let i = 0;
    const next = async () => {
      if (i < middlewares.length) await middlewares[i++].handle(ctx, next); // call current, advance index
    };
    await next(); // kick off the chain from index 0
  };
}

// Flow: AuthMiddleware → RateLimitMiddleware → … → handler
// A throw at any step stops propagation; forgetting to call next() stops it silently
const handle = buildChain([new AuthMiddleware(), new RateLimitMiddleware()]);

Use when

Multiple handlers may process a request and you don’t know which ahead of time
You need to compose, reorder, or swap processing steps without touching others
Building request pipelines, approval workflows, or validation chains

Strengths

Each handler is isolated and testable
Easy to add, remove, or reorder steps

Tradeoffs

Unhandled requests are a silent failure mode
Can be confusing when requests pass through many handlers

Command

Intent: Encapsulate a request as an object so it can be stored, queued, logged, or reversed.

Problem: Your app needs undo/redo. Snapshot-based undo (save full state before every keystroke) is too expensive. Hard-coding inverse operations in every action handler is unmanageable.

Solution: Every user action becomes a Command object with execute() and undo(). A CommandHistory stack manages the undo/redo flow.

Analogy: A restaurant order ticket — the waiter writes the order (the Command). The kitchen queues and executes it. The ticket can be recalled before the food is cooked.

interface Command {
  execute(): void;
  undo(): void;
}

class InsertTextCommand implements Command {
  constructor(private doc: Document, private pos: number, private text: string) {}
  execute() { this.doc.insertAt(this.pos, this.text); }
  undo()    { this.doc.deleteAt(this.pos, this.text.length); } // exact inverse: delete what execute inserted
}

class CommandHistory {
  private history: Command[] = []; // commands that have run — pop to undo
  private undone:  Command[] = []; // commands that were undone — pop to redo

  execute(cmd: Command) {
    cmd.execute();
    this.history.push(cmd);
    this.undone = []; // new action clears the redo stack (same as every text editor)
  }

  undo() {
    const cmd = this.history.pop();          // remove from executed stack
    if (cmd) { cmd.undo(); this.undone.push(cmd); } // reverse it, save for potential redo
  }

  redo() {
    const cmd = this.undone.pop();           // take the most-recently-undone command
    if (cmd) { cmd.execute(); this.history.push(cmd); } // re-run it, put back in history
  }
}

Use when

You need undo/redo functionality
Operations must be queued, scheduled, or serialized for deferred execution

Strengths

Clean undo/redo without full snapshots
Supports transaction rollback

Tradeoffs

Every action requires a Command class
Overkill for simple CRUD with no undo requirement

In the wild: Redux actions are the Command pattern. Actions = Commands. Reducer = executor. Redux DevTools time-travel = CommandHistory.undo().

Iterator

Intent: Access elements of a collection sequentially without exposing its internal representation.

Problem: You have a paginated API — 50,000 users spread across 500 pages. Callers shouldn’t need to manage pagination cursors, page sizes, or buffer logic. They just want for await (const user of users).

Solution: Implement AsyncIterator so the collection handles its own traversal.

Analogy: A TV remote’s channel button — you don’t know how channels are stored. Press next, get the next one.

class PaginatedAPIIterator implements AsyncIterator {
  private buffer: T[] = [];             // items fetched but not yet consumed by the caller
  private cursor: string | null = null; // tracks position in the remote dataset
  private done = false;

  constructor(private fetchPage: (cursor: string | null) => Promise<{ items: T[]; nextCursor: string | null }>) {}

  async next(): Promise> {
    if (this.buffer.length > 0) return { value: this.buffer.shift()!, done: false }; // serve buffered items first
    if (this.done) return { value: undefined as any, done: true };                    // no more pages

    const { items, nextCursor } = await this.fetchPage(this.cursor); // fetch the next page
    this.cursor = nextCursor;
    this.done = nextCursor === null; // null cursor means the API has no more pages
    this.buffer.push(...items);

    return this.buffer.length > 0
      ? { value: this.buffer.shift()!, done: false }
      : { value: undefined as any, done: true }; // page was empty
  }

  [Symbol.asyncIterator]() { return this; } // makes `for await (... of iterator)` work
}

for await (const user of new PaginatedAPIIterator(cursor => api.get('/users', { cursor }))) {
  await processUser(user); // caller never touches cursors, page sizes, or buffers
}

Use when

You want standard for...of traversal over a custom data structure
Lazy evaluation over large or infinite sequences

Strengths

Clean iteration interface
Lazy — doesn’t load all data upfront

Tradeoffs

A custom Iterator class is often unnecessary

Honest note: JavaScript’s built-in Symbol.iterator and generators make this largely unnecessary for simple cases. Before writing an Iterator class, check if a generator does it in 5 lines:

function* flatten(nested: T[][]): Generator {
  for (const group of nested) yield* group;
}

The Strategists

How to organize logic that does the same thing in different ways.

Strategy

Intent: Define a family of algorithms, encapsulate each, and make them interchangeable.

Problem: Your payment service supports Stripe, PayPal, Crypto, and Bank Transfer. A switch statement in processPayment means every new provider modifies existing logic, every test for one provider has to dodge the others, and any refactor touches everything.

Solution: Each payment method is a Strategy. The service holds a reference to one and delegates to it.

Analogy: A GPS app — same destination, different routing algorithms. Swap the algorithm without changing the car or the route display.

interface PaymentStrategy {
  charge(amount: number, currency: string): Promise;
  refund(transactionId: string): Promise;
}

class StripeStrategy implements PaymentStrategy {
  async charge(amount: number, currency: string) {
    const intent = await stripe.paymentIntents.create({ amount: amount * 100, currency }); // Stripe uses cents
    return { transactionId: intent.id };
  }
  async refund(transactionId: string) {
    const r = await stripe.refunds.create({ payment_intent: transactionId });
    return { refundId: r.id };
  }
}

class PaymentService {
  constructor(private strategy: PaymentStrategy) {} // accepts any strategy matching the interface

  processPayment(order: Order) {
    return this.strategy.charge(order.total, order.currency); // delegates — never knows which provider runs
  }

  setStrategy(s: PaymentStrategy) { this.strategy = s; } // swap provider at runtime, no rebuild
}

// New provider = new class. PaymentService and all call sites are untouched.
const service = new PaymentService(new StripeStrategy());
service.setStrategy(new PayPalStrategy());

Use when

Multiple variants of an algorithm need to be interchangeable at runtime
A method has a large conditional that selects between algorithm variants

Strengths

Algorithms are isolated and testable
Open/Closed — add strategies without modifying the service

Tradeoffs

More classes for simple scenarios

TypeScript shortcut: If your strategy has no state and only one method, it’s just a function parameter:

type ChargeFn = (amount: number) => Promise;
function processPayment(order: Order, charge: ChargeFn) { ... }

Reserve the full class hierarchy for stateful strategies with multiple methods.

Template Method

Intent: Define the skeleton of an algorithm in a base class; let subclasses fill in the variable steps.

Problem: CSV, JSON, and XML importers all follow the same pipeline: read → validate → parse → transform → save. Steps 1 and 5 are identical for all formats. Steps 2 and 3 differ. Without Template Method, you either duplicate the shared steps or create awkward abstractions.

Solution: The invariant sequence lives in a final method in the base class. The variable steps are abstract methods subclasses implement.

Analogy: A franchise restaurant — “open, prep, serve, close” is the template. The menu changes; the sequence doesn’t.

abstract class DataImporter {
  // sealed — subclasses must NOT override this; it is the invariant sequence
  async import(filePath: string): Promise {
    const raw = await fs.readFile(filePath, 'utf-8'); // shared step 1
    this.validate(raw);                               // format-specific step 2
    const rows = await this.parse(raw);               // format-specific step 3
    const records = rows.map(r => this.transform(r)); // format-specific step 4
    return { imported: (await db.insertMany(records)).length }; // shared step 5
  }

  protected abstract validate(raw: string): void;           // subclass must implement these three
  protected abstract parse(raw: string): Promise;
  protected abstract transform(row: RawRow): DomainRecord;
}

class CSVImporter extends DataImporter {
  protected validate(raw: string) { if (!raw.startsWith(HEADER)) throw new Error('Bad header'); }
  protected async parse(raw: string) { return parseCSV(raw); }
  protected transform(row: RawRow): DomainRecord { return { id: row['ID'], name: row['Name'] }; }
}

// CSVImporter never calls db.insertMany — the base class handles that
// JSONImporter and XMLImporter would differ only in validate/parse/transform

Use when

Multiple classes share the same algorithm skeleton with different steps
Building frameworks or pipelines with a fixed structure

Strengths

Eliminates duplication of the invariant sequence
Enforces process order

Tradeoffs

Requires inheritance (harder to test)
Tight coupling between base and subclass

Related: Can often be replaced by Strategy (pass the variable steps as functions rather than inheriting them).

State

Intent: Allow an object to alter its behavior when its internal state changes.

Problem: A network connection can be disconnected, connecting, connected, or failed. send() behaves differently in each state. Without State, every method has a switch on the status flag, and adding a new state means touching every method.

Solution: Each state is a class implementing the same interface. The object delegates to its current state.

Analogy: A traffic light — the same “change” signal produces a different result depending on current state.

interface ConnectionState {
  connect(ctx: Connection): void;
  send(ctx: Connection, data: Buffer): void;
  disconnect(ctx: Connection): void;
}

class DisconnectedState implements ConnectionState {
  connect(ctx: Connection) {
    ctx.setState(new ConnectingState()); // transition before the async work starts
    ctx.openSocket()
      .then(() => ctx.setState(new ConnectedState()))  // success path
      .catch(() => ctx.setState(new FailedState()));    // failure path
  }
  send()       { throw new Error('Not connected'); } // illegal in this state
  disconnect() { /* no-op — already disconnected */ }
}

class Connection {
  private state: ConnectionState = new DisconnectedState(); // starts disconnected

  setState(s: ConnectionState) { this.state = s; } // state objects call this to transition

  // These public methods never change — behavior changes because this.state changes
  connect()           { this.state.connect(this); }
  send(data: Buffer)  { this.state.send(this, data); }
  disconnect()        { this.state.disconnect(this); }
}

Use when

An object’s behavior changes radically based on internal state
Methods have large conditionals all branching on the same status flag

Strengths

Eliminates state-based conditionals
Each state is isolated and testable

Tradeoffs

State classes multiply quickly
Overkill for 2–3 states that won’t grow

TypeScript shortcut: For small, stable state machines, a discriminated union + switch is often cleaner than State classes.

Visitor

Intent: Add operations to an object structure without modifying the objects.

Problem: Your query language AST needs to support: SQL serialization, cost estimation, validation, and display formatting. If each operation is a method on AST nodes, node classes grow to include concerns that have nothing to do with representing queries.

Solution: Operations become Visitor classes. Each node has an accept(visitor) method. New operations = new Visitor classes. Zero changes to nodes.

Analogy: A tax auditor visiting businesses — the auditor knows how to audit each type. Each business knows how to receive an auditor. The audit logic isn’t inside the business itself.

interface ASTNode { accept(v: ASTVisitor): T; }

class SelectNode implements ASTNode {
  constructor(public fields: string[], public table: string) {}
  accept(v: ASTVisitor): T { return v.visitSelect(this); } // node says "I'm a SelectNode, dispatch to visitSelect"
}

interface ASTVisitor {
  visitSelect(node: SelectNode): T;
  // add visitWhere(node: WhereNode), visitJoin(node: JoinNode), etc. as the AST grows
}

class SQLSerializer implements ASTVisitor {
  visitSelect(node: SelectNode): string {
    return `SELECT ${node.fields.join(', ')} FROM ${node.table}`; // knows how to render SelectNode as SQL
  }
}

// How the double dispatch works:
// 1. node.accept(visitor)        → node calls visitor.visitSelect(this)
// 2. visitor.visitSelect(node)   → visitor handles it with full type information
// Adding CostEstimator: implement ASTVisitor. Zero changes to any node class.
const sql = new SelectNode(['id', 'email'], 'users').accept(new SQLSerializer());
// → "SELECT id, email FROM users"

Use when

Many distinct operations on a stable object structure
The structure rarely changes but operations are added frequently
Working with ASTs, document models, or recursive structures

Strengths

Add operations without touching the structure
Separates algorithm from data structure

Tradeoffs

Add a new node type → modify every visitor
Double-dispatch is non-obvious

Wrong fit: If your structure grows frequently but operations are stable, use polymorphism or a discriminated union instead.

Memento

Intent: Save and restore an object’s state without exposing its internals.

Problem: A multi-step onboarding wizard needs “Back” to restore exactly what the user entered. Snapshot-based undo is simpler than Command-based inverse operations when the state is complex or when “inverse” is hard to define.

Solution: The object produces opaque snapshots of its state. A history manager stores and restores them.

Analogy: A git commit — a complete snapshot of repository state. Restore any previous commit at any time.

interface WizardSnapshot {
  readonly step: number;
  readonly data: Readonly>; // opaque to callers — only restored via restore()
}

class OnboardingWizard {
  private step = 1;
  private data: Record = {};

  save(): WizardSnapshot {
    return { step: this.step, data: structuredClone(this.data) }; // deep copy — future mutations won't corrupt the snapshot
  }

  restore(s: WizardSnapshot) {
    this.step = s.step;
    this.data = structuredClone(s.data); // deep copy again — protects the stored snapshot from being modified
  }

  updateField(field: string, value: unknown) { this.data[field] = value; }
  next() { this.step++; }
}

class WizardHistory {
  private snapshots: WizardSnapshot[] = [];
  save(w: OnboardingWizard) { this.snapshots.push(w.save()); }             // call before each step change
  undo(w: OnboardingWizard) { const s = this.snapshots.pop(); if (s) w.restore(s); } // go back one step
}

Use when

You need undo/redo based on state snapshots rather than inverse operations
Checkpoints or save points in a long-running process

Strengths

Simple undo without complex inverse logic
Encapsulates internals — callers get opaque snapshots

Tradeoffs

Memory-intensive for large state

Related: If your state lives in an immutable structure (Redux, Zustand with Immer), snapshots are essentially free — you already have them.

Interpreter

Intent: Evaluate sentences in a simple language by representing grammar rules as a class hierarchy.

Problem: You’re building a filter query language: status:open AND label:bug. Users type queries; your system evaluates them against records. Each token type has a rule; rules compose recursively.

Solution: Each grammar rule is a class implementing interpret(). Sentences are trees of rule objects.

Analogy: A compiler’s front-end — each grammar rule is a class; a sentence is a composed tree of those rules.

interface Expression {
  interpret(ctx: SearchContext): boolean;
}

class FieldEquals implements Expression { // leaf: checks one field
  constructor(private field: string, private value: string) {}
  interpret(ctx: SearchContext) { return ctx.get(this.field) === this.value; }
}

class And implements Expression { // composite: both children must pass
  constructor(private left: Expression, private right: Expression) {}
  interpret(ctx: SearchContext) { return this.left.interpret(ctx) && this.right.interpret(ctx); }
}

// "status:open AND label:bug" becomes a tree of Expression objects:
//   And
//   ├── FieldEquals('status', 'open')
//   └── FieldEquals('label', 'bug')
const query = new And(new FieldEquals('status', 'open'), new FieldEquals('label', 'bug'));

// Evaluating recurses down the tree: And → left.interpret AND right.interpret
const results = allIssues.filter(i => query.interpret(new SearchContext(i)));

Use when

A simple, stable language or rule set can be represented as an AST
Building filters, rules engines, or configuration DSLs with simple grammars

Strengths

Grammar rules are explicit and isolated
Easy to add new expressions

Tradeoffs

Complex grammars become unmaintainable class trees
Only practical for simple grammars

Honest take: This is the most rarely-needed GoF pattern. For anything beyond simple expressions, use a parser library (chevrotain, nearley, peg.js) instead of rolling your own class hierarchy.

What GoF Gets Wrong in 2026

Strategy is often just a function. In C++, passing algorithm variants required a class hierarchy. In TypeScript:

// This is Strategy without the ceremony
function processPayment(order: Order, charge: (amount: number) => Promise) { ... }

Use the full class pattern only when your strategy holds state or has multiple methods.

React hooks are design patterns.

useReducer = Command (actions are commands, reducer is executor)
useContext + Provider = Singleton for component trees
Redux DevTools time-travel = CommandHistory.undo()

The React team understood patterns deeply enough to bake them into the API invisibly.

Patterns GoF missed that you use every day:

Module pattern — import/export is this pattern, native
Pub/Sub — distinct from Observer (has a broker: Redis, SNS, socket.io)
Middleware pipeline — Chain of Responsibility formalized as an ecosystem convention
Repository — not in GoF; arguably the most important pattern for keeping domain logic clean

Patterns that TypeScript makes mostly obsolete:

Iterator — generators replace 90% of custom Iterator implementations
Template Method — composition (passing step functions) is usually cleaner than inheritance
Interpreter — parser libraries exist for every language now

When to Reach for a Pattern

Three questions. In order.

1. What specific pain exists right now — not someday?

Patterns solve problems. No problem, no pattern. If you’re not feeling the pain of 14 inconsistent event subscription implementations, don’t implement Observer as prevention.

2. Would more direct code be so much worse?

A readable 50-line method beats a 5-file pattern implementation that requires orientation to navigate. The measure isn’t elegance. It’s how fast a new developer understands what it does and why.

3. Who has to maintain this?

Patterns are a shared vocabulary only for people who share the vocabulary. Introducing Visitor where your team hasn’t encountered it doesn’t give you elegance — it gives you a puzzle with no answer key.

Heuristic: If you can describe the pattern in one sentence and a colleague immediately says “yes, that makes sense” — it belongs. If they need a 10-minute explanation of double dispatch — it’s costing more than it’s buying.

The Real Lesson

The goal was never to learn 23 patterns.

The goal is to develop the muscle for recognizing structure. To look at 14 modules independently reimplementing the same subscription logic and immediately see: this is an Observer problem. To look at a method doing 8 different things and see: this wants to be a Chain. To see new EmailClient() embedded in business logic and recognize: that's a Factory waiting to be named.

Naming something gives you power over it. Once you can say “this is an Observer problem,” you know the failure modes: subscription lifecycles, cleanup paths, ordering bugs, memory leaks. You’ve collapsed months of debugging into a known problem with known solutions.

Pick one pattern you’ve never consciously applied. Implement it this week — in your actual codebase, against an actual problem. Then re-read its section here.

The words will mean something different the second time.

Recognizing patterns is one thing. Applying them to complex, real-world systems is where engineering experience matters.

Learn how Simform helps businesses build scalable, maintainable, and resilient software through modern architecture, platform engineering, cloud modernization, and AI-driven solutions.

For more updates on the latest tools and technologies, follow the Simform Engineering blog.

Follow us: Twitter | LinkedIn

Design Patterns Won’t Save You If You Don’t Know When to Use Them (Part 2 of 2) was originally published in Simform Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Design Patterns Won’t Save You If You Don’t Know When to Use Them (Part 1 of 2)

Akash Chauhan — Tue, 23 Jun 2026 05:22:27 GMT

Part 1 of 2 — Covers 12 structural patterns: The Builders, Architects, and Connectors.

Three weeks before we shipped a fintech platform to 200,000 users, our notification system started dropping messages under load.

The root cause: 14 separate modules had each independently reimplemented an event subscription system, with different cleanup logic, some listeners never removed. Memory leaked. The event loop choked.

The fix took four hours. The real fix — the architectural one — was understanding the Observer pattern deeply enough to recognize we’d built it 14 times, badly. That’s the thing about design patterns: you’re already using them. You’re just doing it inconsistently.

What design patterns actually are:

Not solutions to memorize. A shared vocabulary for structural problems that, once named, become vastly easier to communicate and solve. The GoF catalogued 23 of them in 1994, for C++. Half exist because of language limitations TypeScript simply doesn’t have. That context matters.

One warning: The most dangerous thing you can do with patterns is treat them as solutions looking for problems. No pain, no pattern.

The Builders

How objects come into existence when construction gets complex.

Singleton

Intent: Ensure a class has exactly one instance and provide a global access point to it.

Problem: Your database connection pool is being instantiated 47 times because four different modules all believe they’re responsible for creating it.

Solution: Make the constructor private. Expose a static method that creates the instance on first call and returns the same one on every subsequent call.

Analogy: A country’s central bank — one exists, everyone accesses the same one, by deliberate design.

class DatabasePool {
  private static instance: DatabasePool | null = null;

  private constructor(private config: DBConfig) { // private: `new DatabasePool()` is now a compile error
    this.connect();
  }

  static getInstance(config: DBConfig): DatabasePool {
    if (!DatabasePool.instance) {                         // first call: create
      DatabasePool.instance = new DatabasePool(config);
    }
    return DatabasePool.instance;                         // every subsequent call: same object
  }

  async query(sql: string): Promise {
    return this.getAvailableConnection().execute(sql);
  }
}

const pool = DatabasePool.getInstance(dbConfig); // same instance everywhere

Use when

One instance of a resource-intensive object is required across the entire app (connection pools, config, logger)
You need global accessibility without prop-drilling

Strengths

Guarantees a single instance
Lazy initialization
Global access point

Tradeoffs

Breaks unit tests — hard to mock
Hidden global state
Violates Single Responsibility

Related: Dependency Injection solves the same problem with better testability. Prefer DI in new services.

Factory Method

Intent: Define an interface for creating an object, but let subclasses decide which class to instantiate.

Problem: Your notification service is coupled to new EmailNotification(). Next quarter: push notifications. The quarter after: SMS. Every new channel means modifying business logic that should never need to change.

Solution: Replace new with an abstract "factory method" that subclasses override. Business logic stays constant; only the created type changes.

Analogy: A job posting — you define the role requirements; the hired person fills the role. HR doesn’t pick the candidate.

abstract class NotificationService {
  abstract createNotification(): Notification; // hook: subclass decides what to build

  async notify(userId: string, message: string) {
    const prefs = await getUserPrefs(userId);
    const notification = this.createNotification(); // calls the hook — no `new` hardcoded here
    await notification.send(prefs.contact, message); // works regardless of which type was created
  }
}

class EmailNotificationService extends NotificationService {
  createNotification(): Notification { return new EmailNotification(); } // concrete decision lives here
}

class PushNotificationService extends NotificationService {
  createNotification(): Notification { return new PushNotification(); }  // swap type — nothing else changes
}

Use when

A class can’t anticipate the exact type of objects it must create
Building plugin systems with open-ended product families

Strengths

Decouples creator from concrete types
Open/Closed Principle
Easy to extend with new types

Tradeoffs

More files, more classes
Can over-engineer simple creation

Related: Factory Method creates one product via a hook. Abstract Factory creates families of coordinated products.

Abstract Factory

Intent: Create families of related objects without specifying their concrete classes.

Problem: You’re building a cross-platform UI library. Web and mobile each have their own Button, Modal, and Input — but they must always be used together. You can never mix a web Button with a mobile Modal.

Solution: Define a factory interface that produces a complete product family. Platform decision happens once, at the boundary.

Analogy: An IKEA furniture collection — each collection has matching pieces designed to work together. You choose the collection once.

interface UIFactory {
  createButton(): Button;
  createModal(): Modal;
}

class WebUIFactory implements UIFactory {
  createButton(): Button { return new WebButton(); }
  createModal(): Modal   { return new WebModal(); }
}

class MobileUIFactory implements UIFactory {
  createButton(): Button { return new MobileButton(); }
  createModal(): Modal   { return new MobileBottomSheet(); } // different type, same contract
}

class Screen {
  private button: Button;
  private modal: Modal;

  constructor(factory: UIFactory) { // Screen never knows which factory it received
    this.button = factory.createButton();
    this.modal  = factory.createModal(); // guaranteed matched pair — can't mix WebButton + MobileBottomSheet
  }
}

const factory = isMobile ? new MobileUIFactory() : new WebUIFactory(); // one decision, one place
const screen  = new Screen(factory);

Use when

Code must work with multiple families of related objects
Building theme systems, cross-platform UI kits, or DB driver abstractions

Strengths

Guarantees product family consistency
Decoupled from concrete types

Tradeoffs

Adding a new product type requires changing every factory
Lots of classes even for small families

Related: Often implemented with Factory Methods internally. Can use Singleton to ensure one factory per environment.

Builder

Intent: Construct complex objects step by step, separating construction from representation.

Problem: A UserAccount constructor with 8+ parameters — name, email, password, optional 2FA, optional billing, optional team — produces an argument list nobody can write or read without mistakes. This is the telescoping constructor problem.

Solution: A builder with one method per field, each returning this for chaining. Validate at each step, not in one buried constructor call.

Analogy: A custom PC configurator — build incrementally: processor, then RAM, then storage. Each step is validated and meaningful on its own.

class UserAccountBuilder {
  private account: Partial = { twoFactorEnabled: false };

  withEmail(email: string): this { // returns `this` so calls can chain: .withEmail(...).withPassword(...)
    if (!email.includes('@')) throw new Error(`Invalid email: ${email}`); // validate at the step, not buried in build()
    this.account.email = email;
    return this;
  }

  withPassword(plaintext: string): this {
    this.account.passwordHash = bcrypt.hashSync(plaintext, 12); // transform happens here, not in the domain object
    return this;
  }

  withTeam(teamId: string): this {
    this.account.teamId = teamId; // optional — omitting this is fine
    return this;
  }

  build(): UserAccount {
    if (!this.account.email || !this.account.passwordHash) {
      throw new Error('Email and password are required'); // final gate: catch anything still missing
    }
    return this.account as UserAccount;
  }
}

// Each method name documents what it sets — no positional argument guessing
const account = new UserAccountBuilder()
  .withEmail('user@company.com')
  .withPassword('securepassword')
  .withTeam('team_abc')
  .build();

Use when

Constructor has 5+ parameters, especially optional ones
You need per-step validation during construction
The same construction process should produce different representations

Strengths

Readable, self-documenting call sites
Validates at each step
Supports optional params cleanly

Tradeoffs

Overkill for simple objects
Requires a separate Builder class

Related: For simple objects with 2–3 fields, use TypeScript’s destructured params instead: createUser({ email, teamId? }).

Prototype

Intent

Create new objects by cloning an existing instance rather than building from scratch.

Problem: A Document object requires a database round-trip, AST parsing, and style computation to create. A user wants "duplicate this document" — 90% identical to an existing one.

Solution: Define a clone() method. New objects copy an existing instance and specialize the copy.

Analogy: Biological cell division — copy what already works, then specialize.

class DocumentTemplate {
  constructor(
    public baseStyles: StyleSheet,
    public meta: DocumentMeta,
    private sections: Section[],
  ) {}

  clone(): DocumentTemplate {
    return new DocumentTemplate(
      structuredClone(this.baseStyles), // deep copy — mutations to the clone won't touch the original
      structuredClone(this.meta),
      this.sections.map(s => s.clone()), // each section clones itself recursively
    );
  }

  withTitle(title: string): DocumentTemplate {
    const copy = this.clone(); // always start from a clean copy of the template
    copy.meta.title = title;
    return copy;              // return the modified copy; original is untouched
  }
}

const blogTemplate = new DocumentTemplate(styles, meta, sections); // expensive once: DB + parsing
const post  = blogTemplate.withTitle('Design Patterns 2025'); // cheap: clone + one field change
const draft = blogTemplate.withTitle('Untitled Draft');       // blogTemplate is still unchanged

Use when

Object creation is expensive (DB reads, network calls, heavy parsing)
You need many similar objects with small variations
Implementing undo systems that snapshot state before each change

Strengths

Avoids expensive re-initialization
Decoupled from concrete classes

Tradeoffs

Deep-cloning circular refs is hard
Custom clone logic becomes a maintenance burden

Related: Objects produced by Prototype can be stored in a Prototype Registry (a Flyweight-like factory).

The Architects

How objects wrap each other to change their interface, behavior, or access.

Adapter

Intent: Allow incompatible interfaces to work together without modifying either side.

Problem : Your codebase calls analytics.trackEvent(name, props). The new third-party SDK expects sdk.record({ type, metadata, timestamp }). You can't change the SDK. Coupling to it in 40 files means a full-codebase change every time you swap providers.

Solution: Write an Adapter class that implements your interface and translates calls to the SDK internally.

Analogy: A universal power adapter — your laptop and the wall socket don’t change; the adapter bridges them.

interface Analytics { // your codebase's contract — you own and control this
  trackEvent(name: string, props: Record): void;
}

class SegmentAdapter implements Analytics {
  constructor(private sdk: SegmentSDK) {}

  trackEvent(name: string, props: Record): void {
    // translate your interface into what the SDK actually expects
    this.sdk.record({ type: name, metadata: props, timestamp: Date.now() });
  }
}

// Swap Segment for Mixpanel → write MixpanelAdapter. Every call site stays the same.
const analytics: Analytics = new SegmentAdapter(new SegmentSDK());
analytics.trackEvent('signup', { plan: 'pro' }); // callers never import or touch the SDK directly

Use when

A class interface doesn’t match what your code expects and you can’t change it
Migrating between libraries gradually

Strengths

Zero changes to existing code
Swap providers in one place

Tradeoffs

Adds an indirection layer
Adapters accumulate with each SDK version

Related: Decorator also wraps an object — but to add behavior, not translate an interface.

Decorator

Intent

Add behavior to an object dynamically without modifying its class or interface.

Problem: An API client needs logging in dev, retry logic in prod, and request signing for auth. Building LoggingRetrySignedClient as a single class is a combinatorial explosion — 8 combinations for 3 concerns.

Solution: Each concern is its own wrapper that implements the same interface and delegates to an inner object.

Analogy: A coffee order — start with espresso, wrap it with milk, wrap that with vanilla. The cup still fits in the same holder.

interface APIClient {
  request(endpoint: string, opts: RequestOptions): Promise;
}

class RetryDecorator implements APIClient {
  constructor(private inner: APIClient, private maxAttempts = 3) {}

  async request(endpoint: string, opts: RequestOptions): Promise {
    for (let i = 1; i <= this.maxAttempts; i++) {
      try { return await this.inner.request(endpoint, opts); } // delegate to the next layer
      catch (e) { if (i === this.maxAttempts) throw e; await delay(i * 200); } // backoff, then retry
    }
    throw new Error('unreachable');
  }
}

class SigningDecorator implements APIClient {
  constructor(private inner: APIClient, private secret: string) {}

  async request(endpoint: string, opts: RequestOptions): Promise {
    const sig = computeHMAC(endpoint, opts.body, this.secret);
    return this.inner.request(endpoint, {  // add the header, then pass through
      ...opts, headers: { ...opts.headers, 'X-Signature': sig },
    });
  }
}

// Each layer wraps the one inside it — outermost runs first
// Call order: RetryDecorator → SigningDecorator → BaseAPIClient
const client: APIClient = new RetryDecorator(
  new SigningDecorator(new BaseAPIClient(), process.env.API_SECRET!), 3
);

Use when

Adding cross-cutting concerns (logging, caching, auth, metrics) without modifying core logic
Concerns need to be mixed and matched at runtime

Strengths

Compose behaviors independently
Follows Open/Closed Principle

Tradeoffs

Stack traces become harder to read
Order of decorators matters and isn’t obvious

Related: Decorator adds behavior; Proxy controls access. TypeScript’s @decorator syntax is a language feature — not the same thing.

Facade

Intent: Provide a simple, unified interface to a complex subsystem.

Problem: Your app uses AWS S3 and DynamoDB together for document storage. Every developer using it must understand both SDKs, the correct initialization sequence, and idempotency patterns. That’s a tax paid every time someone new joins the team.

Solution: A Facade exposes only the 8% of the surface area you actually use. The SDKs are an implementation detail.

Analogy: A hotel concierge — behind that desk is a network of vendors and services. You just say “book me a taxi.” You don’t see the machinery.

class StorageService {
  private s3     = new AWS.S3({ region: config.region });            // SDKs are private —
  private dynamo = new AWS.DynamoDB.DocumentClient({ region: config.region }); // callers never see them

  async uploadDocument(userId: string, file: Buffer, meta: DocumentMeta): Promise {
    const key = `users/${userId}/${crypto.randomUUID()}`;

    // Step 1: store the file in S3
    await this.s3.putObject({ Bucket: BUCKET, Key: key, Body: file }).promise();

    // Step 2: write metadata with an idempotency guard (prevents duplicate entries on retry)
    await this.dynamo.put({
      TableName: TABLE,
      Item: { userId, key, ...meta },
      ConditionExpression: 'attribute_not_exists(#k)',
      ExpressionAttributeNames: { '#k': 'key' },
    }).promise();

    // Step 3: return a time-limited URL — callers never touch S3 directly
    return this.s3.getSignedUrlPromise('getObject', { Bucket: BUCKET, Key: key, Expires: 3600 });
  }
}
// Callers: uploadDocument(userId, buffer, meta) → presigned URL. That's it.

Use when

Wrapping a complex SDK or legacy system
You want a single, documented entry point for a subsystem
Migrating legacy systems gradually (Facade = the new contract)

Strengths

Simplifies the interface for callers
Isolates SDK changes to one place

Tradeoffs

Can hide complexity you actually need to understand
Becomes a leaky abstraction if callers need different behaviors

Related: Facade provides a new interface to a subsystem. Adapter makes an existing interface compatible.

Proxy

Intent: Provide a substitute that controls access to another object.

Problem: Your data layer needs authorization checks, but putting if (user.canAccess...) inside every repository method couples security concerns to business logic. Changes to auth rules ripple everywhere.

Solution: A Proxy implements the same interface as the real object and intercepts calls to enforce rules, caching, lazy init, or logging.

Analogy: A credit card — same interface as cash (pay for things), but it controls access: verifies funds, can decline.

interface UserRepository {
  findUser(id: string): Promise;
  updateUser(id: string, data: Partial): Promise;
}

class AuthorizedUserRepository implements UserRepository {
  constructor(private inner: UserRepository, private currentUser: AuthUser) {}

  async findUser(id: string): Promise {
    if (id !== this.currentUser.id && !this.currentUser.isAdmin) {
      throw new ForbiddenError(`Cannot access user ${id}`); // stop here — never reaches inner
    }
    return this.inner.findUser(id); // gate passed: delegate to the real repository
  }

  async updateUser(id: string, data: Partial): Promise {
    if (id !== this.currentUser.id && !this.currentUser.isAdmin) {
      throw new ForbiddenError(`Cannot modify user ${id}`);
    }
    const { role, ...safeData } = data; // strip fields a regular user cannot self-assign
    return this.inner.updateUser(id, safeData);
  }
}
// Callers use AuthorizedUserRepository exactly like UserRepository — auth is invisible to them

Use when

Access control without polluting the object’s core logic (authorization proxy)
Expensive objects created only when first accessed (virtual/lazy proxy)

Strengths

Separates security from business logic
Transparent to callers

Tradeoffs

Adds an extra layer of indirection
Responses may be delayed (lazy init)

Related: Proxy and Decorator look similar. Key distinction: Proxy manages the lifecycle and access of its subject; Decorator simply adds behavior without controlling access.

The Connectors

How objects connect structurally at a deeper level.

Bridge

Intent: Separate an abstraction from its implementation so both can evolve independently.

Problem: You have chart types (Bar, Line, Pie) and rendering backends (SVG, Canvas, WebGL). Naive approach: BarChartSVG, BarChartCanvas... Nine classes. Add one chart or one renderer and it triples.

Solution: Chart types and renderers become separate hierarchies connected by composition, not inheritance.

Analogy: A universal TV remote — the remote’s interface and the TV brand vary independently. Any remote pairs with any TV.

interface ChartRenderer {         // implementation side — grows independently (SVG, Canvas, WebGL…)
  drawBar(x: number, y: number, w: number, h: number): void;
  drawLine(points: Point[]): void;
}

abstract class Chart {            // abstraction side — grows independently (Bar, Line, Pie…)
  constructor(protected data: ChartData, protected renderer: ChartRenderer) {}
  abstract render(): void;
}

class BarChart extends Chart {
  render() {
    const max = Math.max(...this.data.values);
    this.data.values.forEach((v, i) => {
      // BarChart knows layout math; renderer knows how to draw — neither knows the other's internals
      this.renderer.drawBar(i * 50, 300 - (v / max) * 300, 40, (v / max) * 300);
    });
  }
}

// 3 chart types + 3 renderers = 6 classes covering 9 combinations — no class explosion
const chart  = new BarChart(salesData, new SVGRenderer());
const chart2 = new BarChart(salesData, new WebGLRenderer()); // swap renderer, chart type unchanged

Use when

Two independent dimensions of variation would produce a class explosion
You want to switch implementations at runtime
Building platform-independent abstractions

Strengths

Eliminates class explosion
Both dimensions evolve independently

Tradeoffs

Adds indirection even when it isn’t needed yet
Can be over-engineering if you only have one dimension

Composite

Intent: Compose objects into tree structures and treat individual items and collections identically.

Problem: File systems, UI component trees, org charts. Anywhere things contain other things of the same type. Code that traverses them shouldn’t care whether it’s dealing with a leaf or a container.

Solution: Both leaf nodes and composite nodes implement the same interface. Composites delegate to their children recursively.

Analogy: A folder on your filesystem — contains files and other folders. Code listing contents treats both uniformly.

interface UIComponent {
  render(): string;
  getHeight(): number;
}

class Button implements UIComponent { // leaf — has no children
  constructor(private label: string, private height: number) {}
  render()    { return ``; }
  getHeight() { return this.height; }
}

class Panel implements UIComponent { // composite — contains other UIComponents (including other Panels)
  private children: UIComponent[] = [];
  add(c: UIComponent): this { this.children.push(c); return this; }
  render()    { return `${this.children.map(c => c.render()).join('')}
`; }
  getHeight() { return this.children.reduce((h, c) => h + c.getHeight(), 0) + 32; } // sum children + own padding
}

// measure() works on a single Button, a Panel, or a Panel containing Panels — same call either way
function measure(c: UIComponent) { return c.getHeight(); }

const nav = new Panel()
  .add(new Button('Home', 40))
  .add(new Panel()                      // nested Panel — Panel doesn't care, it just recurses
  .add(new Button('Profile', 40))
  .add(new Button('Settings', 40)));

Use when

Your domain has part-whole hierarchies (trees within trees)
Client code should treat leaves and composites the same way
Recursive operations over tree-structured data

Strengths

Uniform interface across the tree
Easy to add new component types

Tradeoffs

Makes it hard to restrict what can be added where

Flyweight

Intent: Reduce memory by sharing common state across large numbers of similar objects.

Problem: A map rendering engine needs 50,000 trees. Each tree has species, texture, and a 3D mesh. If each object stores all this, you’re holding gigabytes of duplicated data — most trees of the same species share the exact same texture and mesh.

Solution: Split state into intrinsic (shared, immutable — stored once) and extrinsic (unique per instance — passed in at use time).

Analogy: A chess set — the white king piece is shared; its position on the board changes. You don’t carve 64 boards into each piece.

interface TreeType {
  species: string;
  texture: WebGLTexture; // intrinsic: same for every oak — loaded once, shared across all oak instances
  mesh: Float32Array;    // intrinsic: same for every oak
}

class TreeTypeFactory {
  private cache = new Map();

  get(species: string, textureUrl: string): TreeType {
    const key = `${species}:${textureUrl}`;
    if (!this.cache.has(key)) {
      // cache miss: load GPU texture + mesh (expensive — only happens once per species)
      this.cache.set(key, { species, texture: loadGPUTexture(textureUrl), mesh: loadMesh(species) });
    }
    return this.cache.get(key)!; // cache hit: all subsequent oaks share this exact object
  }
}

class Tree {
  constructor(
    public x: number, public y: number, // extrinsic: unique per tree instance
    private type: TreeType,              // intrinsic: shared reference — NOT a per-tree copy
  ) {}

  draw(renderer: Renderer) {
    renderer.draw(this.type.mesh, this.type.texture, this.x, this.y);
  }
}

// 50,000 trees, 5 species → 5 TreeType objects in GPU memory instead of 50,000

Use when

Creating massive numbers of similar objects (thousands to millions)
Memory consumption is a measured, real constraint

Strengths

Dramatic memory savings when applied correctly

Tradeoffs

Significantly complicates code
Only worthwhile when you’ve measured the memory problem

Related: Flyweight is similar to Singleton in spirit (shared instance), but Flyweight manages many shared types, not one.

What’s in Part 2

Twelve patterns down, the structural half. You’ve seen how objects are made, wrapped, and connected.

Part 2 is about how objects talk to each other: Observer (the 14-module story from the opening), Chain of Responsibility (the middleware pipeline you write every day), Command (undo/redo, Redux), and six more. Plus: what GoF gets wrong in 2026, which patterns are obsolete in TypeScript, and the 3-question framework for knowing when to reach for any of them.

Part 2: Communicators & Strategists →

For more updates on the latest tools and technologies, follow the Simform Engineering blog.

Follow us: Twitter | LinkedIn

Design Patterns Won’t Save You If You Don’t Know When to Use Them (Part 1 of 2) was originally published in Simform Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Integrating AI with .NET Using the Official MCP C# SDK v1.0

Panthee Patel — Thu, 18 Jun 2026 05:39:28 GMT

From .NET Services to AI-Callable Tools — A Practical Guide

AI is improving rapidly at answering questions, writing code and sparking new ideas. But in practical use, that’s just part of the picture. The true power lies in its ability to integrate seamlessly with the tools, services and workflows your application already depends on. The MCP C# SDK v1.0 provides that capability, built specifically for .NET, maintained by Microsoft, and designed to feel like a natural extension of the code you already write.

What MCP is and why it exists

MCP (Model Context Protocol) is a standard that helps AI applications connect to tools, services and data sources in a consistent way. It exists to reduce custom integration work and make it easier to connect AI with the systems your application already uses.

MCP flow in a .NET application

Notice the transport layer — this is where HTTP and stdio differ. Your choice here depends on whether you need a persistent networked server or a local development tool. Later in the blog we will explore this in detail.

How MCP Architecture Works

MCP follows a simple structure with three parts: host, client and server. The host is the app or interface the user interacts with. The client lives inside that app and manages the MCP connection. The server contains the tools, actions or data sources. The host sends the request, the client forwards it and the server performs the work and returns the result. This separation keeps the AI-facing part of the application clean while your core business logic stays in one place.

For .NET developers, MCP enables AI to move beyond generating responses and interact with real application behavior. Instead of exposing internal services, APIs and business logic through custom integrations, MCP allows AI to request capabilities while your application handles execution. This keeps your architecture intact and provides a consistent way to enable AI-driven workflows.

Just like USB-C reduced cable clutter, MCP reduces the need for custom integrations between AI models and applications.

If you’ve used OpenAI, Azure OpenAI or Semantic Kernel, MCP may look similar to Function Calling. The difference is scope: Function Calling is model-specific and requires separate integrations for each AI client, while MCP is a standard protocol. Expose your .NET capabilities once as MCP tools and any MCP-compatible host can discover and use them without additional integration work.

Comparison of MCP vs Function Calling

QUE: Why not just call my APIs directly?
ANS: You can. But then you write a custom integration for every AI model you support — Claude needs one format, GPT needs another, future tools need another. MCP standardizes the connection once. Your .NET server stays the same regardless of which AI host connects to it.

QUE: What actually changes for the end user?
ANS: Before MCP, users manually fill forms, select categories and track requests. With MCP, they simply describe the issue and the AI handles everything — creating the ticket, categorizing it and retrieving relevant information.

What You Get with the Official MCP C# SDK v1.0

The MCP C# SDK makes AI integration feel like a natural extension of your existing .NET application instead of a separate system.

Key Benefits

Turn existing logic into AI tools
Expose your services and methods using simple attributes like [McpServerTool]
No need for custom integrations
Avoid building separate connectors for every AI use case
Works with your current architecture
Fits naturally with service layers, APIs and dependency injection
Structured and consistent approach
Define tools once and let AI clients discover and use them
Supports multiple hosting styles
Use stdio for local tools or HTTP for production scenarios

Setting Up the MCP C# SDK in a .NET Project

Step 1: Create the project and install the MCP package
You can set up MCP in two ways depending on your starting point.

Option 1: Use the MCP Server Template (Recommended for new projects)
If you’re starting fresh, the MCP Server App template is the quickest way to get started, as it sets up a minimal MCP server with the required configuration.

Built-in template provided by Microsoft for MCP Server App

Option 2: Add MCP to an Existing or Custom Project
If you’re working with an existing .NET application or prefer full control, you can install the MCP SDK manually using NuGet packages.

The SDK is split into three packages:

ModelContextProtocol — The main SDK and the right starting point for most projects. Includes the MCP server runtime, stdio transport support, and integration with Microsoft.Extensions.Hosting and dependency injection. References ModelContextProtocol.Core.
ModelContextProtocol.Core — The base layer. Use this only if you need low-level client or server APIs with minimal dependencies, such as building custom clients or servers.
ModelContextProtocol.AspNetCore — Required when your MCP server runs as an ASP.NET Core web application and communicates over HTTP/SSE. Includes everything above plus HTTP transport — you do not need to install the other two separately.

Install the required packages using the .NET CLI:

dotnet add package ModelContextProtocol
dotnet add package Microsoft.Extensions.Hosting
# Optional — only if needed
dotnet add package ModelContextProtocol.AspNetCore
dotnet add package ModelContextProtocol.Core

The MCP C# SDK is officially maintained by Microsoft and distributed via NuGet. It targets .NET 8 and above, and both stdio and HTTP transports are included — no additional packages needed to switch between them.

Step 2: Configure MCP in Program.cs

using HelpDesk.McpServer.Data;
using HelpDesk.McpServer.Tools;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddSingleton();
builder.Services.AddSingleton();

builder.Services.AddMcpServer()
    .WithHttpTransport(options => { options.Stateless = true; })
    .WithToolsFromAssembly(typeof(TicketTools).Assembly); //

var app = builder.Build();
        
app.MapMcp(); //MapMcp() maps the MCP endpoint to the root / path.

app.Run();

Your MCP server is now set up, but it still needs tools before an AI can actually use it. Next, we’ll connect it to real application logic.

QUE: How MCP Finds Your Tools?
ANS: Because you configured .WithToolsFromAssembly() in Program.cs, the SDK scans your assembly at startup and registers every method marked with [McpServerTool] inside a class marked with [McpServerToolType]. Both attributes are required — neither works without the other.
No manual registration. No tool list to maintain. Add a new method with the right attributes and it is automatically available to any AI host that connects.

QUE: Dependency Injection in Tools
ANS: Your tool methods receive services the same way controllers or minimal API handlers do — through DI. Register your services in Program.cs and the SDK injects them automatically

How AI Decides Which Tool to Call
Tool discovery tells the AI what tools exist. Tool selection is how it decides which one to invoke for a given user request. The AI makes that decision based on mainly these things:

Tool name — should be specific and action-oriented. CreateTicket is clear; ProcessData is not
Description — the [Description("...")] attribute is what the AI reads to understand the tool's purpose. Treat it like documentation, not a label.
Parameters — parameter names and types signal what the tool expects. TicketCategory category tells the AI more than string input
Return schema — a structured return type gives the AI predictable output to reason about and present to the user

This is why two tools with identical logic but different names and descriptions will behave differently in practice. The AI isn’t reading your code ; it’s reading the metadata you attach to it. Investing a few extra seconds in a clear name and a precise description directly improves how reliably the AI selects and uses your tools.

Creating MCP Tools in .NET

In simple terms, MCP tool is just a method that your application exposes so an AI system can call it when needed.

Defining Your First MCP Tool

Let’s start with a simple example. We are taking an example of Helpdesk Ticket Manager using MCP. Suppose you want to expose a method that creates a ticket.

using System.ComponentModel;
using HelpDesk.McpServer.Data;
using HelpDesk.McpServer.Models;
using ModelContextProtocol.Server;

namespace HelpDesk.McpServer.Tools;

[McpServerToolType]
public static class TicketTools
{
    [McpServerTool(Name = "CreateTicket")]
    [Description("Creates a new IT helpdesk support ticket")]
    public static async Task> CreateTicket(
      ITicketRepository repo,
      string title,
      string description,
      string createdBy,
      TicketCategory category,
      TicketPriority priority)
    {
        var ticket = new Ticket
      {
        Id = Guid.NewGuid().ToString(),
        Title = title,
        Description = description,
        CreatedBy = createdBy,
        Category = category,
        Priority = priority,
        Status = TicketStatus.Open,
        CreatedAt = DateTime.UtcNow
      };

      await repo.CreateAsync(ticket);

      return new ToolResponse
      {
        Success = true,
        Data = new CreateTicketResult
        {
          TicketId = ticket.Id,
          Title = ticket.Title,
          Category = ticket.Category.ToString(),
          Priority = ticket.Priority.ToString(),
          Status = ticket.Status.ToString(),
          CreatedAt = ticket.CreatedAt.ToString("u")
        }
      };
    }
}

namespace HelpDesk.McpServer.Models;

public class ToolResponse
{
  public bool Success { get; set; }
  public T? Data { get; set; }
  public string? Error { get; set; }
}

public class CreateTicketResult
{
  public string TicketId { get; set; } = string.Empty;
  public string Title { get; set; } = string.Empty;
  public string Category { get; set; } = string.Empty;
  public string Priority { get; set; } = string.Empty;
  public string Status { get; set; } = string.Empty;
  public string CreatedAt { get; set; } = string.Empty;
}

What’s Happening Here

[McpServerToolType] — tells the SDK this class contains MCP tools. Without this, WithToolsFromAssembly() will not discover any tools inside it, even if they have [McpServerTool]
[McpServerTool(Name = "CreateTicket")] —registers this method as a callable tool and sets the exact name the AI will use to invoke it
ITicketRepository repo as first parameter — the MCP SDK integrates with .NET's dependency injection. Services registered in Program.cs are automatically injected into tool methods. You do not instantiate them manually.
async Task> — all tools should be async. Tool calls are I/O operations and blocking them defeats the purpose of a responsive server. Returning a typed DTO instead of a plain string gives the AI a consistent structure across all tools
ToolResponse— wraps every tool response with a Success flag, a Data payload, and an Error message. The AI uses this structure to determine whether the operation succeeded and what to surface to the user.

Validation and Error Handling

Since AI constructs tool arguments from natural language, inputs won’t always arrive in the format your code expects. The CreateTicket tool validates each required field — title, description and createdBy — before any business logic runs and enforces basic rules such as minimum title length. If a required value is missing or invalid, the tool returns Success: false with a descriptive error message rather than throwing an exception. The AI receives this signal and responds accordingly — either asking the user for the missing information or surfacing the failure cleanly. Wrapping the core logic in exception handling ensures that any unexpected runtime errors are also caught and returned in the same structured format, keeping the AI's experience consistent regardless of what goes wrong.

Connect your MCP server to GitHub Copilot in Visual Studio

Open GitHub Copilot Chat
Click the Tools / Toolbox icon
Click ‘+’ icon to Add MCP Server
Select your running MCP server
Ensure it appears in the active tools list

Initializing MCP in Visual Studio

Once connected, Copilot can discover and call your tools automatically and you can have full list of methods which you have defined using [McpServerTool]. You do not write any client code — the AI host handles tool discovery, decides which tools to call based on user intent and presents results naturally.

TIP: Using stdio instead of HTTP? Switch WithHttpTransport() to WithStdioServerTransport() in Program.cs. Then configure Copilot with Type: stdio and Command: dotnet run --project "path\to\HelpDesk.McpServer". Copilot manages the server process automatically. Tools work identically — only the transport changes.

Both stdio and HTTP transports are fully supported, but the right choice depends on your deployment context. Use this as a quick reference:

Key MCP SDK Components at a Glance

The MCP C# SDK uses attributes and configuration to expose your application’s functionality to AI systems. Even a basic setup works, but knowing the key components helps you structure it better.

Attributes

[McpServerToolType] — marks a class so the SDK can discover it during assembly scanning
[McpServerTool] — marks a method as an MCP tool that AI can invoke
[Description("...")] — explains what the tool does; AI uses this to decide when to call it

[McpServerTool] properties

Name — overrides the default tool name exposed to the AI
Title — a human-friendly display name
OpenWorld — indicates the tool works with dynamic or external data
Destructive — signals the tool modifies or deletes data
ReadOnly — signals the tool only reads data, no side effects
IconSource — allows associating a visual icon with each tool for client display

Transport registration (in Program.cs)

.WithStdioServerTransport() — for local/CLI-based communication
.WithHttpTransport() — for HTTP-based communication

Tool discovery (in Program.cs)

.WithToolsFromAssembly() — scans the assembly and registers all classes marked with [McpServerToolType]

Where MCP Fits in Real Projects

MCP shines when AI needs to work with your existing .NET services rather than just generating text. Here are practical scenarios where it delivers real value:

Perfect use cases:

Internal tools — Helpdesk ticket lookup, IT support queries
Service automation — Order status, inventory checks, shipment tracking
Data lookups — Customer records, reports, database queries
Workflow triggers — Deployments, notifications, approval flows
DevOps tasks — Build status, CI/CD monitoring, repo info

QUE: Is MCP only useful for data access?
ANS: No. Data retrieval is the simplest pattern. MCP also enables actions (creating, updating, triggering workflows), knowledge retrieval (searching unstructured content), system integrations (wrapping third-party APIs), and orchestration (chaining multiple tools based on user intent). The HelpDesk Practical covers four of these patterns in one scenario.

What v1.0 makes better:

Incremental scope consent — The AI client starts with the minimum permissions needed and requests additional access only when a specific operation requires it. This follows the principle of least privilege — your server stays secure by default without manually managing permission scopes per tool.
Richer tool metadata — Tools, resources and prompts can now carry icons, titles and descriptions. This helps the AI select the right tool in the right context, especially when multiple tools are available. Clear metadata reduces incorrect tool invocations and makes your server easier to use from any AI host.
Authentication support for HTTP transport — V1.0 ships with built-in OAuth 2.0 support for HTTP-hosted servers. This includes authorization server discovery, JWT token validation and incremental scope handling — everything needed to secure a production MCP server without building your own auth layer.
Long-running requests and progress tracking over HTTP — This is the most significant addition for production use. Previously, long operations risked HTTP timeouts with no recovery path. V1.0 solves this with an SSE-based polling model — the server sends an initial event with an ID and closes the connection. The client reconnects using that ID to check progress and retrieve the result when ready. No held connections, no lost results.
Tasks — durable state tracking (experimental) Built on top of HTTP polling, tasks add persistent tracking for operations that run in the background. The client gets a task ID immediately and can check status, retrieve results, or cancel the operation at any point — even if the original connection dropped. This maps naturally to existing .NET patterns like background services, batch jobs or anything returning Task.

V1.0 also introduces advanced capabilities including tool calling in sampling and URL mode elicitation — features worth exploring once your first MCP server is running.

When to skip MCP:

Simple chat responses
One-off API calls (use REST directly)
Pure text generation apps
No existing business logic

The pattern: Your .NET app keeps the real logic. MCP just gives AI a clean, secure way to access it. Start small — expose one service, see what AI can do with it.

Using MCP Safely in Production

MCP makes it easy to expose your application’s logic to AI. That openness is also what makes it worth thinking about carefully before you deploy.

Define clear tool boundaries Each tool should do one thing and expose only what the AI actually needs. Avoid creating broad tools that return entire records or datasets when only a specific field is required. The more focused your tools are, the less surface area there is for unintended behaviour.

Never put secrets inside tool responses If your tool calls an internal service that returns sensitive data, filter that data before returning it. Connection strings, API keys, tokens, and internal identifiers should never pass through a tool response — even if the AI is unlikely to surface them directly.

Use OAuth for HTTP-hosted servers When your MCP server runs over HTTP, treat it like any other protected API. V1.0 ships with built-in OAuth 2.0 support including authorization server discovery and JWT token validation. Use it. An unprotected HTTP MCP endpoint is an open door to your business logic.

Apply the principle of least privilege Use ReadOnly = true on tools that only read data and Destructive = true on tools that modify or delete it. These signals help the AI host and your own middleware make better decisions about when and how tools are invoked.

Skipping the input validations Without validation, a missing or malformed value from the AI passes directly into your service layer. The result is either an unhandled exception or silent incorrect behavior — neither of which gives the AI anything useful to work with.

Log tool invocations Treat MCP tool calls like API requests. Log what was called, with what parameters, and what was returned. This is essential for debugging unexpected AI behaviour and for auditing in regulated environments.

Common MCP Pitfalls to Avoid

These mistakes are easy to make when you’re first building MCP tools and harder to undo once your server is in use.

Returning large datasets unnecessarily
A tool that returns an entire list of tickets when the AI only needed one creates noise in the response and increases token usage. Return only what the AI needs to answer the user’s question — filter, paginate, or summarize at the tool level.

Exposing database entities directly
Returning your EF Core entities or database models as tool responses leaks your internal schema to the AI. Map to a dedicated response DTO instead. It gives you control over what gets exposed and makes future schema changes easier to manage without breaking tool contracts.

Creating overly generic tools
A tool named ManageTicket that handles create, update and delete based on a string parameter forces the AI to guess intent. Separate tools with clear, specific names let the AI select the right one confidently. Specificity in tool design directly improves selection accuracy.

Skipping input validation
AI constructs tool arguments from natural language, which means inputs won’t always arrive in the format your code expects. Treat every parameter the way you’d treat form input from a web page — validate before passing it to your services.

Embedding business logic inside tool classes
Tool classes should be thin. They receive input, call a service, and return a result. If your [McpServerTool] method contains conditional logic, database calls, or calculations, that logic is now invisible to the rest of your application and impossible to unit test cleanly. Keep your services as the source of truth and let tools be the bridge.

Practical Example: HelpDesk Ticket Manager

The HelpDesk POC is a working .NET 10 solution that exposes a ticket management system as an MCP server. It demonstrates seven MCP tools covering the full range of helpdesk operations — creating tickets, checking status, searching a knowledge base, categorizing, escalating, and updating. The client simulates how an AI host discovers and invokes these tools.

This repository contains the full .NET MCP server and client implementation used in this example, including all tools, data models and integration setup.

HelpDeskMcp/
├── HelpDesk.McpServer/      ← ASP.NET Core MCP Server
│   ├── Models/              ← Ticket, enums, KnowledgeBaseArticle
│   ├── Data/                ← ITicketRepository, InMemoryTicketRepository,
│   │                             KnowledgeBaseRepository
│   ├── Tools/               ← TicketTools.cs (MCP tools)
│   └── Program.cs           ← MCP registration + HTTP transport
│
└── HelpDesk.McpClient/      ← Console App MCP Client
    └── Program.cs           ← Interactive menu, SseClientTransport

Moving Beyond In-Memory Storage: This POC uses in-memory storage — data resets on restart. In production, replace InMemoryTicketRepository with an EF Core implementation. Because all tools depend on ITicketRepository, not the implementation, your tool code changes nothing. Register the new implementation in Program.cs and you're done.

How Tools Works Together:
(Make sure server project is running before asking)

Prompt used in Copilot:
"Please use the connected MCP tools. First search the knowledge base, then create a ticket, then escalate if needed. Issue: My laptop keyboard stopped working after a Windows update."

User: "My laptop keyboard stopped working after a Windows update."

1. Copilot calls SearchKnowledgeBase("keyboard")  
   → Returns 1 relevant KB article  
   (Troubleshooting Unresponsive Keyboards and Mice)

2. Copilot calls CreateTicket(...)  
   → Ticket created: ID a7b9f224

3. Copilot calls EscalateTicket(...)  
   → Escalated to IT Infrastructure, priority set to Critical

Final Response:  
"I've found a relevant KB article and raised ticket a7b9f224 as Hardware/Critical, assigned to OS Support Team. In the meantime, try checking Device Manager for driver issues."

Summary by HelpDeskMCP integrated in project

GitHub Repository (Complete Working POC):
dotnet-simformsolutions/ai-dotnet-mcp-demo

Final thoughts: What .NET developers should try next

Most developers initially see MCP as a data-access layer as a way for AI to query data; but that is only part of the picture. MCP gives AI hands: it can not only read data but also trigger actions, call external systems, and chain multiple operations into complete workflows. This is what sets MCP apart from traditional APIs — while APIs require explicit instructions on what to call, MCP enables AI to decide what to invoke and in what sequence based on the user’s intent.

Start small, then scale
Expose one existing service as an MCP tool, test it locally and then expand to more APIs and production transport as needed.

Why this matters: MCP keeps your architecture clean. Your .NET app handles the real work; AI just knows what to ask for.

Building a proof of concept MCP server is relatively straightforward. Scaling it across enterprise applications is where architecture, governance, security, and tool design become critical. As organizations look to connect AI systems with existing .NET services, they need consistent patterns for authentication, authorization, observability, and lifecycle management. Simform helps engineering teams design and implement production-ready AI integrations, modern application architectures, and platform capabilities that allow AI systems to interact safely with business-critical applications at scale.

MCP isn’t about replacing .NET — it’s about making your services smarter. Start with one tool, see what AI can do with it, then scale from there.

Integrating AI with .NET Using the Official MCP C# SDK v1.0 was originally published in Simform Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Reactive vs Blocking: Real Benchmark Results

Arshit Moradiya — Wed, 17 Jun 2026 07:53:24 GMT

A practical comparison using Spring Boot MVC and WebFlux — with numbers from actual local testing

Reactive vs Blocking: Real Benchmark Results

Introduction

Every few months the “should we go reactive?” conversation comes up. Usually it ends with someone linking a Medium article that shows WebFlux handling 10x more requests, and someone else pointing out that nobody on the team can debug a Mono.zip() chain at 2 AM.

So I ran my own benchmark. Nothing fancy — no Kubernetes, no cloud, no distributed tracing. Three Spring Boot services on my laptop: a blocking service (Spring MVC), a reactive service (Spring WebFlux), and a shared mock service simulating slow downstream APIs. Then k6 hammering both with concurrent load.

I wanted real numbers from a real scenario, not theoretical arguments.

The Scenario

I picked a Product Dashboard Aggregation API — a single GET /products/dashboard endpoint that:

Fetches products from a database
Calls an inventory API (300ms latency)
Calls a pricing API (300ms latency)
Aggregates everything into one response

The endpoint spends most of its time waiting on downstream IO, not doing CPU work. That’s the kind of scenario where reactive is supposed to shine, so it felt like a fair test.

The blocking version calls inventory and pricing sequentially (thread blocked for ~600ms total). The reactive version fires both calls concurrently using Mono.zip() (waits ~300ms total).

Features

Feature Comparison

Advantages

Blocking (Spring MVC)

Familiar programming model. Every Java developer knows how to read and write sequential code. There’s no mental overhead.
Full ecosystem support. JPA, Hibernate, Spring Security — everything works out of the box without workarounds.
Simple debugging. Stack traces make sense. You can set a breakpoint and step through the logic.
Good enough for most workloads. If your service handles a few hundred concurrent requests and your downstream calls are fast, blocking is fine.

Reactive (Spring WebFlux)

Efficient thread usage. A small Netty event-loop pool can handle thousands of concurrent connections because threads are never blocked.
Better under IO pressure. When downstream APIs are slow, reactive doesn’t burn threads waiting — it schedules callbacks.
Concurrent downstream calls. Mono.zip() fires multiple API calls in parallel with zero extra thread overhead.
Lower memory footprint. Fewer threads means less memory consumed per connection at high concurrency.

Use Cases

When Blocking Makes Sense

CRUD applications — standard REST APIs backed by a relational database. JPA fits perfectly here.
Enterprise internal systems — moderate traffic, complex business logic, teams that need to move fast without reactive expertise.
Simpler business applications — where the bottleneck is CPU or database, not concurrent IO waits.

When Reactive Makes Sense

API aggregation / gateway layers — exactly what we benchmarked. Multiple downstream calls, lots of IO waiting.
High-concurrency IO-heavy systems — chat backends, notification services, real-time dashboards.
Streaming systems — SSE, WebSocket feeds, or continuous data pipelines where backpressure matters.

A Note on Virtual Threads

Java 21 shipped virtual threads (Project Loom). The idea is simple — you write normal blocking code, but the JVM parks the virtual thread during IO instead of tying up an OS thread. In theory this gives you the readability of blocking with much of the concurrency benefit of reactive. Worth testing before committing to a full WebFlux stack, especially if your team is already on Java 21.

Benchmark Setup

Environment

Java 21
32 GB RAM
Three services running locally — mock-service (9090), blocking-service (8080), reactive-service (8081)
k6 for load testing
Both services use JPA with H2 in-memory databases

Scenarios

Two scenarios were tested: 100 virtual users for 30 seconds (low concurrency) and 500 virtual users for 30 seconds (high concurrency). Both with a fixed 300ms simulated downstream delay on inventory and pricing APIs.

k6 Script

import http from 'k6/http';
import { check, sleep } from 'k6';

const BASE_URL = __ENV.TARGET_URL || 'http://localhost:8080';
const SCENARIO = __ENV.SCENARIO || 'low';

export const options = {
    scenarios: {
        benchmark: {
            executor: 'constant-vus',
            vus: SCENARIO === 'high' ? 500 : 100,
            duration: '30s',
            exec: 'dashboard',
        },
    },
};

export function dashboard() {
    const res = http.get(`${BASE_URL}/products/dashboard`);
    check(res, { 'status 200': (r) => r.status === 200 });
    sleep(0.1);
}

Run each scenario separately:

# Blocking — 100 VUs
k6 run -e TARGET_URL=http://localhost:8080 -e SCENARIO=low benchmark.js

# Blocking — 500 VUs
k6 run -e TARGET_URL=http://localhost:8080 -e SCENARIO=high benchmark.js

# Reactive — 100 VUs
k6 run -e TARGET_URL=http://localhost:8081 -e SCENARIO=low benchmark.js

# Reactive — 500 VUs
k6 run -e TARGET_URL=http://localhost:8081 -e SCENARIO=high benchmark.js

Real Benchmark Results

100 Concurrent Users

Metric                  Blocking (MVC)    Reactive (WebFlux)
──────────────────────  ────────────────  ──────────────────
Avg Response Time       5,670 ms          315 ms
P95 Latency             6,180 ms          331 ms
Throughput              15.9 req/s        238 req/s
Total Requests          570               7,245
Success Rate            100%              100%
Threads Used            ~100 (Tomcat)     ~4 (Netty)

Even at 100 users, the gap is significant. The blocking service averaged ~5.7s per request. Each request holds a Tomcat thread for ~600ms just sitting on two sequential downstream calls. Under sustained load from 100 VUs cycling continuously, requests pile up — the thread doesn’t do anything useful during those 600ms, it just waits. The reactive service stayed at ~315ms because Mono.zip() fires inventory and pricing concurrently (wait = max(300, 300) ≈ 300ms) and Netty event loops never block.

500 Concurrent Users

Metric                  Blocking (MVC)    Reactive (WebFlux)
──────────────────────  ────────────────  ──────────────────
Avg Response Time       18,250 ms         508 ms
P95 Latency             30,140 ms         531 ms
Throughput              20.3 req/s        810 req/s
Total Requests          1,222             24,814
Success Rate            79.3%             98.8%
Threads Used            ~500 (pool max)   ~4 (Netty)

This is where the blocking model falls apart. With 500 VUs and a 500-thread Tomcat pool, every thread is blocked on downstream IO. Requests queue massively, p95 hit 30 seconds, and 20% of requests failed. The reactive service handled the same load with ~500ms average response time and 40x higher throughput. It processed over 24,000 requests while blocking managed just 1,222.

Key Observations

Thread pool exhaustion is real. Even at 100 VUs, the blocking service struggled because threads were held for 600ms each during sequential IO waits. At 500 VUs it essentially collapsed.
Reactive throughput scaled linearly. From 238 req/s at 100 VUs to 810 req/s at 500 VUs — the Netty event loop handled it without breaking a sweat.
Reactive used 4 threads the entire time. The Netty event loop doesn’t grow with load. That’s the whole point.
The gap is about IO waits, not raw speed. If those downstream calls returned in 5ms instead of 300ms, the blocking service would perform much closer to reactive.
Blocking was easier to profile and debug. Thread dumps made sense. Adding logging was straightforward.

Example Code

Prerequisites

Java 21
Maven 3.9+
k6 installed locally

Dependencies

Blocking: spring-boot-starter-web, spring-boot-starter-data-jpa, h2

Reactive: spring-boot-starter-webflux, spring-boot-starter-data-jpa, h2

Blocking Service — Controller & Service

@RestController
public class ProductDashboardController {
    private final ProductService productService;

    public ProductDashboardController(ProductService productService) {
        this.productService = productService;
    }

    @GetMapping("/products/dashboard")
    public DashboardResponse getDashboard() {
        return productService.getDashboard();
    }
}

@Service
public class ProductService {
    private final ProductRepository productRepository;
    private final RestTemplate restTemplate;

    public DashboardResponse getDashboard() {
        List products = productRepository.findAll();

        // Sequential — thread blocked for ~600ms total
        Map inventory = restTemplate.getForObject(
            "http://localhost:9090/mock/inventory", Map.class);
        Map pricing = restTemplate.getForObject(
            "http://localhost:9090/mock/pricing", Map.class);

        return new DashboardResponse(products, inventory, pricing);
    }
}

Reactive Service — Controller & Service

@RestController
public class ProductDashboardController {
    private final ProductService productService;

    @GetMapping("/products/dashboard")
    public Mono getDashboard() {
        return productService.getDashboard();
    }
}

@Service
public class ProductService {
    private final ProductRepository productRepository;
    private final WebClient webClient;

    public Mono getDashboard() {
        List products = productRepository.findAll();
        
        Mono> inventory = webClient.get()
            .uri("http://localhost:9090/mock/inventory")
            .retrieve()
            .bodyToMono(new ParameterizedTypeReference<>() {});

        Mono> pricing = webClient.get()
            .uri("http://localhost:9090/mock/pricing")
            .retrieve()
            .bodyToMono(new ParameterizedTypeReference<>() {});

        // Mono.zip — fires both API calls concurrently
        return Mono.zip(
            Mono.just(products),
            inventory,
            pricing
        )
            .map(t -> new DashboardResponse(t.getT1(), t.getT2(), t.getT3()));
    }
}

GitHub Repository

The complete source code is available here: java-blog-poc-reactive-vs-blocking

java-blog-poc-reactive-vs-blocking/
├── mock-service/                    # Shared mock downstream APIs (port 9090)
│   ├── src/main/java/.../mock/
│   │   ├── MockServiceApplication.java
│   │   └── controller/
│   │       └── MockApiController.java
│   ├── src/main/resources/application.yml
│   └── pom.xml
├── blocking-service/                # Spring MVC (port 8080)
│   ├── src/main/java/.../blocking/
│   │   ├── BlockingServiceApplication.java
│   │   ├── DataSeeder.java
│   │   ├── controller/
│   │   │   └── ProductDashboardController.java
│   │   ├── dto/
│   │   │   └── DashboardResponse.java
│   │   ├── entity/
│   │   │   └── Product.java
│   │   ├── repository/
│   │   │   └── ProductRepository.java
│   │   └── service/
│   │       └── ProductService.java
│   ├── src/main/resources/application.yml
│   └── pom.xml
├── reactive-service/                # Spring WebFlux (port 8081)
│   ├── src/main/java/.../reactive/
│   │   ├── ReactiveServiceApplication.java
│   │   ├── DataSeeder.java
│   │   ├── controller/
│   │   │   └── ProductDashboardController.java
│   │   ├── dto/
│   │   │   └── DashboardResponse.java
│   │   ├── entity/
│   │   │   └── Product.java
│   │   ├── repository/
│   │   │   └── ProductRepository.java
│   │   └── service/
│   │       └── ProductService.java
│   ├── src/main/resources/application.yml
│   └── pom.xml
├── k6-test/
│   └── benchmark.js
├── pom.xml
└── README.md

Conclusion

The numbers tell a clear story for this kind of workload: reactive wins at high concurrency with slow IO, and it’s not close. 40x throughput difference at 500 users is hard to argue with.

But I’d push back on using that as justification to go reactive everywhere. A standard CRUD service doesn’t need an event loop — it needs JPA, readable stack traces, and a team that can fix bugs quickly without a Reactor deep-dive every time something breaks in production.

The honest answer is: if your service fans out to multiple slow downstream APIs under high concurrency, reactive is the better fit. If it’s a business application backed by a relational database with moderate traffic, blocking Spring MVC is still excellent and far easier to maintain.

Virtual threads in Java 21 are worth watching — they let you write sequential code without blocking OS threads, which narrows this gap considerably. But that’s a benchmark for another day.

Run this against your own downstream latency profile before making any call. Local results on a laptop are directionally useful, not gospel.

Benchmark results can reveal performance characteristics, but production architecture decisions require a deeper understanding of workload behavior, scalability requirements, and operational trade-offs. Whether you’re evaluating Spring MVC, WebFlux, virtual threads, or broader application modernization initiatives, Simform helps organizations design, optimize, and scale Java-based systems using modern engineering and cloud-native practices.

Explore more articles on the Simform engineering blog for in-depth technical analyses, performance benchmarks, implementation patterns, and lessons learned from building and scaling real-world software systems.

For more updates, connect with us on Twitter and LinkedIn

Reactive vs Blocking: Real Benchmark Results was originally published in Simform Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

OpenTelemetry Java SDK — Standardized Tracing & Metrics

Vidit Sampat — Tue, 16 Jun 2026 06:28:41 GMT

A practical guide to vendor-neutral observability for Spring Boot services

At a glance — two Spring Boot services, the OpenTelemetry Java SDK, and a local Collector that feeds Jaeger, Prometheus, and Grafana. One order request becomes a single distributed trace that crosses both services with the same trace ID. Full runnable source is in the companion repository; the README carries the technical detail.

Topic Overview

Distributed systems generate three signals worth caring about: traces, metrics, and logs. When a single user request fans out across a handful of services, a stack trace from any one of them tells you almost nothing on its own. You need to follow the request as it threads through every hop, correlate latency spikes with throughput, and pin error rates to the exact deployment that introduced them. That is the job of observability.

The historical problem is fragmentation. Jaeger shipped its own client. Zipkin had Brave. Datadog, New Relic, and AppDynamics each shipped proprietary agents with their own APIs, their own propagation headers, and their own mental models. Switching vendors meant rewriting instrumentation across every service.

OpenTelemetry, a CNCF graduated project, collapses that fragmentation into a single API and SDK. You instrument once. You export to whichever backend you choose, and you can switch backends without touching application code.

By the end of this article you will have built two Spring Boot microservices that:

Emit traces and custom metrics through the OpenTelemetry Java SDK
Propagate W3C Trace Context across HTTP calls
Export everything via OTLP to a local OpenTelemetry Collector
Surface traces in Jaeger and metrics in Prometheus and Grafana

The companion GitHub repository contains a runnable demo. Clone it, run docker compose up, start each service with

mvn spring-boot:run, and watch a distributed trace cross service boundaries in real time.

Features

Unified API for traces, metrics, and logs. A single set of interfaces (Tracer, Meter, Logger) covers all three signals, so you do not need a different library per signal.
Vendor-neutral OTLP export. The OpenTelemetry Protocol is the wire format. Any backend that speaks OTLP — Jaeger, Tempo, Honeycomb, Datadog, New Relic, Dynatrace — consumes your data without code changes.
Context propagation. Built-in support for W3C Trace Context and Baggage headers means traces stitch together cleanly across HTTP, gRPC, and messaging boundaries.
Auto-instrumentation via the Java agent. Attach a single -javaagent JAR and the agent instruments JDBC, Spring MVC, Kafka, Redis, and roughly 100 other libraries with no code changes.
Manual instrumentation via the SDK. When auto-instrumentation is not enough, the SDK exposes a clean API to create spans, record attributes and events, and emit metrics with full control.
Resource attributes and semantic conventions. Every signal carries identity (service.name, service.version, deployment.environment, host metadata) following the OpenTelemetry semantic conventions, which keeps backends consistent across teams.
Pluggable exporters. Swap OTLP gRPC for OTLP HTTP, Prometheus pull, Jaeger Thrift, or a no-op debug exporter without rewriting application code.
Sampling strategies. Choose AlwaysOn, AlwaysOff, TraceIdRatioBased, or ParentBased samplers. Production systems typically use head-based parent sampling at a fixed ratio and defer tail-based sampling to the Collector.

What we build

Two services, deliberately small so the observability is the star:

order-service (:8080) — accepts POST /api/orders, then calls inventory over HTTP.
inventory-service (:8081) — reserves stock; any SKU starting with OOS- is "out of stock".

Both emit telemetry through the OpenTelemetry Java SDK over OTLP to a local Collector, which routes traces to Jaeger and metrics to Prometheus (visualised in Grafana).

order-service ──HTTP (with W3C traceparent)──▶ inventory-service
      │                                               │
      └──────────── OTLP ──▶ Collector ──▶ Jaeger (traces) + Prometheus/Grafana (metrics)

By the end you can place an order and watch the request thread through both services as one trace, see the same trace_id on every log line in both processes, and query business metrics like order throughput and p95 latency.

Project layout

java-blog-poc-opentelemetry-java-sdk/
├── pom.xml                       # parent: pins OTel instrumentation BOM (+ alpha BOM)
├── BLOG.md                       # the published blog (Markdown source)
├── README.md
├── LICENSE
├── docker/
│   ├── docker-compose.yml        # Collector + Jaeger + Prometheus + Grafana
│   ├── otel-collector-config.yaml
│   ├── prometheus.yml
│   └── grafana/provisioning/datasources/datasource.yml
├── otel-common/                  # shared instrumentation library (plain JAR)
│   └── src/main/java/com/simform/otel/common/
│       ├── annotation/Traced.java
│       ├── aspect/TracingAspect.java
│       └── config/OtelCommonAutoConfiguration.java
│   └── src/main/resources/META-INF/spring/...AutoConfiguration.imports
├── order-service/                # port 8080 — entry point, calls inventory-service
│   └── src/main/java/com/simform/otel/order/
│       ├── controller/OrderController.java
│       ├── service/OrderService.java        # @Traced + Span.current() + custom metrics
│       ├── service/InventoryClient.java     # explicit W3C context propagation
│       └── config/OpenTelemetryConfig.java  # manual-otel profile (optional)
└── inventory-service/            # port 8081 — reserves stock
    └── src/main/java/com/simform/otel/inventory/
        ├── controller/InventoryController.java
        └── service/InventoryService.java    # @Traced + manual ERROR status for OOS

Architecture

+----------------+   HTTP + W3C traceparent   +-------------------+
|  order-service | -------------------------> | inventory-service |
|   (port 8080)  |                            |    (port 8081)    |
+--------+-------+                            +---------+---------+
         |                                              |
         |  OTLP gRPC :4317                             |  OTLP gRPC :4317
         v                                              v
                    +-------------------------+
                    | OpenTelemetry Collector |
                    +-----+-------------+-----+
                  traces  |             |  metrics (:8889 scrape)
                          v             v
                    +-----------+   +---------------+
                    |  Jaeger   |   |  Prometheus   |
                    | (UI 16686)|   |  (UI :9090)   |
                    +-----------+   +-------+-------+
                                            |
                                            v
                                       +---------+
                                       | Grafana |
                                       | (:3000) |
                                       +---------+

Prerequisites

JDK 17 (a 17–23 JDK; not 24/25 — see the version table above)
Maven 3.9+
Docker + Docker Compose v2

Quickstart

# 1. Start the observability backend (Collector + Jaeger + Prometheus + Grafana)
cd docker
docker compose up -d
cd ..

# 2. Build all modules with a JDK 17 toolchain
#    (example: export JAVA_HOME to a Corretto/Temurin 17 install first)
mvn clean package -DskipTests

# 3. Run inventory-service (terminal A) - start this first; order-service calls it
java -jar inventory-service/target/inventory-service.jar
#    or: cd inventory-service && mvn spring-boot:run

# 4. Run order-service (terminal B)
java -jar order-service/target/order-service.jar
#    or: cd order-service && mvn spring-boot:run

# 5. Generate traffic (terminal C)
curl -X POST http://localhost:8080/api/orders \
  -H 'Content-Type: application/json' \
  -d '{"sku":"SKU-001","quantity":2}'

The one thing that matters: shared context

If you take away a single idea, take this one. The value of distributed tracing is entirely about context propagation — the trace identity travelling with the request across the network. Get it right and a request that spans five operations across two services renders as one clean waterfall:

POST /api/orders               order-service       ← HTTP server span (automatic)
  OrderService.createOrder     order-service       ← business span
    InventoryClient.reserve    order-service       ← outbound call span
      POST /api/inventory/reserve   inventory-service   ← continues the SAME trace
        InventoryService.reserve    inventory-service   ← business span

Error Traces in Jaeger:

Get it wrong and you get two disconnected traces, each blind to the other — the exact failure that makes people give up on tracing. The mechanics are simple: the caller writes a W3C traceparent header onto the outgoing request; the callee reads it and continues the trace instead of starting a new one. On the receiving side a Spring Boot service does this for you automatically. On the sending side, the catch is that the HTTP client only propagates if it is actually instrumented — so in this demo we inject the header explicitly and leave nothing to chance. (The README shows the five-line interceptor that does it.)

Business metrics, not just system metrics

Traces tell you about one request; metrics tell you about all of them. The services record a few custom instruments through the same SDK:

orders created — a counter, broken down by SKU.
order processing duration — a histogram, so you get real p50/p95/p99, not a misleading average.
inventory reservations — a counter labelled by SKU and success/failure.

These land in Prometheus as standard time series (orders_created_total, orders_processing_duration_milliseconds_bucket, inventory_reservations_total) and drive Grafana panels for throughput and latency. A few example queries live in the README.

Why teams choose OpenTelemetry

No vendor lock-in. Switching backends is a Collector config change, not a re-instrumentation project.
One API for the whole fleet. Developers learn it once; on-call engineers read consistent data everywhere.
Auto + manual, side by side. The Spring Boot starter instruments HTTP, JDBC, and more for free; the SDK is there when you need a custom span or metric.
Multi-signal correlation. Traces, metrics, and logs share IDs, so a latency alert is one hop from the exact trace and the exact log line.
Cost control at the edge. The Collector samples, batches, drops, and routes data before it reaches a paid backend.
Future-proof. Logs are a first-class signal and profiling is stabilising; the same instrumentation collects them as they land.

Where it fits

End-to-end tracing across microservices — find the slow hop in a multi-service request without correlating timestamps by hand.
SLA and latency monitoring — histogram metrics give honest percentiles for dashboards and alerts.
Cross-service error debugging — the trace shows the failing downstream span, its exception, and your business attributes.
Migrating off a proprietary APM — run OpenTelemetry alongside the old agent, compare dashboards, then cut over with application code untouched.

A note on what’s not worth it

OpenTelemetry shines once a request crosses a process boundary. For a single monolith you are already happy observing through one APM, or a brand-new prototype where any instrumentation is premature, hold off. Everywhere else — anything that fans out across services — the payoff is immediate and the lock-in is gone.

Conclusion

OpenTelemetry has settled what tracing and metrics should look like on the JVM. The Java SDK is stable, the Spring Boot starter covers the libraries that matter, and the Collector gives you the operational freedom to send data anywhere. The demo in this repo shows the whole loop: instrument with one annotation, propagate context across a real HTTP boundary, and watch a single request light up across two services in Jaeger — with metrics in Prometheus and Grafana, and the same trace_id on every log line.

Clone it, run docker compose up, start both services, and place an order. The README walks through every screen and the technical choices behind them.

Building observability is often easier in a proof of concept than in production. As systems grow, teams need consistent instrumentation standards, distributed tracing strategies, telemetry governance, and cost-effective data collection across hundreds of services. Simform helps engineering teams design and implement scalable observability platforms using OpenTelemetry, cloud-native monitoring tools, and modern platform engineering practices.

Companion repository: https://github.com/backend-simformsolutions/java-blog-poc-opentelemetry-java-sdk. Built with Spring Boot 3.4, the OpenTelemetry Java SDK, and a Docker Compose stack of the OpenTelemetry Collector, Jaeger, Prometheus, and Grafana.

OpenTelemetry Java SDK — Standardized Tracing & Metrics was originally published in Simform Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.