Stories by Konstantin Tarkus on Medium

From Prototype to Production: Change Control for AI Decisions

Konstantin Tarkus — Mon, 09 Feb 2026 12:51:17 GMT

You update a prompt. You test it on five examples. Looks good. You ship.

Three days later, a support ticket: a field that used to contain "$4.2M in Q3 revenue" now says "strong revenue growth". Another ticket: a claim that listed a specific hire date is gone entirely. Your prompt change improved 95% of cases and silently broke 3%. The inputs that regressed weren't in your five test cases. They never are.

This is the gap between prototype and production. Prototypes need to work on a few examples. Production systems need to prove that nothing broke.

Change control for AI decisions

We’ve solved this problem before. Not for AI, but for code.

https://medium.com/media/7fca0108b1cc960fbb6c78b58d96b4f9/href

The difference: with code, the diff exists automatically. With AI, it doesn’t exist unless you explicitly compute it. You wouldn’t merge a PR without reviewing the diff. Why would you ship a prompt change without seeing what decisions it would alter?

Verist applies this workflow to AI decisions. Define a step, capture baselines, change your prompt, recompute, and get a diff showing exactly what changed before you deploy.

Try it: catch a prompt regression in 60 seconds

npm install verist @verist/llm zod openai

verist is the kernel (steps, replay, diff). @verist/llm adds LLM provider adapters.

Define a step

A step wraps an LLM call with typed input and output. defineExtractionStep handles the boilerplate of calling the model, parsing the response, and validating against a Zod schema.

import { defineExtractionStep, createOpenAI } from "@verist/llm";
import { run, unwrap, recompute, formatDiff } from "verist";
import OpenAI from "openai";
import { z } from "zod";

const ClaimsSchema = z.object({
  claims: z.array(z.string()),
});

const extractClaims = defineExtractionStep({
  name: "extract-claims",
  input: z.object({ text: z.string() }),
  output: ClaimsSchema,
  request: (input) => ({
    model: "gpt-4o-mini",
    temperature: 0,
    messages: [
      {
        role: "system",
        content: `Extract specific, verifiable claims from the text.
Each claim must contain a concrete number, name, or date.
Return raw JSON only, no markdown: { "claims": ["claim1", ...] }`,
      },
      { role: "user", content: input.text },
    ],
    responseFormat: "json",
  }),
});

This is a self-contained definition. No global registry, no side effects at definition time. The step declares what goes in, what comes out, and how to call the model.

Run and capture a baseline

const adapters = { llm: createOpenAI({ client: new OpenAI() }) };

const text = `Acme Corp reported $4.2M in Q3 revenue, up 18% year-over-year.
CEO Jane Park announced 3 new enterprise clients and plans to expand
the engineering team from 45 to 60 people by March 2025.`;

const baseline = unwrap(await run(extractClaims, { text }, { adapters }));

run() executes the step and captures the result as an artifact. unwrap() extracts the value from the Result type (Verist uses errors-as-values, not exceptions). The baseline now holds the output along with the artifacts needed to recompute later.

Baseline: 4 claims
  - Acme Corp reported $4.2M in Q3 revenue
  - Revenue up 18% year-over-year
  - CEO Jane Park announced 3 new enterprise clients
  - Plans to expand engineering team from 45 to 60 by March 2025

Recompute with a new prompt

Now change the prompt. Maybe you want something more concise, so you switch to a summarization prompt:

const vagueStep = defineExtractionStep({
  name: "extract-claims",
  input: z.object({ text: z.string() }),
  output: ClaimsSchema,
  request: (input) => ({
    model: "gpt-4o-mini",
    temperature: 0,
    messages: [
      {
        role: "system",
        content: `Summarize the key points from the text. Be concise.
Return raw JSON only, no markdown: { "claims": ["point1", ...] }`,
      },
      { role: "user", content: input.text },
    ],
    responseFormat: "json",
  }),
});

// Replay the new step logic against the baseline's input
const result = unwrap(await recompute(baseline, vagueStep, { adapters }));

console.log(formatDiff(result.outputDiff));

recompute() runs the new step definition against the same input from the baseline, without re-running your application code. It returns a diff comparing the old output to the new one.

  claims[0]: "Acme Corp reported $4.2M in Q3 revenue"
          -> "Acme Corp had strong Q3 revenue growth"
- claims[3]: "Plans to expand engineering team from 45 to 60 by March 2025"

That’s the regression. The vague prompt lost specificity in claims[0] and dropped claims[3] entirely. You caught it before shipping. No customer tickets. No silent regressions.

Schema violations: catching what logs miss

Back to the opening scenario. You changed a prompt, tested on five examples, and shipped. Three percent of cases broke. But what kind of breakage?

Sometimes the model doesn’t just change a value. It returns something structurally wrong: a string where you expected a number, a missing required field, an array that’s suddenly empty. Logs would show the raw response. Verist validates the output against your Zod schema on every recompute and surfaces violations explicitly.

const result = unwrap(await recompute(baseline, updatedStep, { adapters }));

if (result.schemaViolations.length > 0) {
  for (const v of result.schemaViolations) {
    console.log(`${v.path.join(".")}: ${v.kind} (${v.message})`);
  }
}
// claims.0: type (Expected string, received number)

Each violation has a path, a kind ("missing", "type", "refinement", or "other"), and a message. In CI, schema violations always fail the build, even with --no-fail-on-diff. Value changes are debatable. Structural breakage is not.

Try it without API keys

You don’t need OpenAI credentials to see the workflow in action. verist init scaffolds a deterministic step using regex extraction:

npx verist init
npx verist capture --step parse-contact --input "verist/inputs/*.json"
npx verist test --step parse-contact

This creates a step, captures baselines from sample inputs, and runs a regression test. No LLM calls, no API keys. Once you’re ready to see LLM diffs, add your key and run the full example:

OPENAI_API_KEY=sk-... npx tsx examples/prompt-diff/quickstart.ts

Human corrections that survive

In production, humans correct AI mistakes. A reviewer fixes a misclassified claim. A support agent overrides an extracted value. These corrections are expensive to make and easy to lose. Recompute with a new prompt and most systems wipe them out.

Verist separates the two concerns with a three-layer state model:

computed  +  overlay  =  effective
(AI)        (human)      (what the app sees)

Computed: AI-derived values, rewritten on every recompute
Overlay: Human corrections, never touched by automation
Effective: Shallow merge where overlay wins:
{ ...computed, ...overlay }

import { effectiveState } from "@verist/storage";

// AI extracted: { amount: "$4.2M", currency: "USD" }
// Human corrected currency to EUR

const state = {
  computed: { amount: "$4.2M", currency: "USD" },
  overlay: { currency: "EUR" }, // human correction
};

const effective = effectiveState(state);
// -> { amount: "$4.2M", currency: "EUR" }

When you recompute with a new prompt, the computed layer updates. The overlay stays. If the new prompt extracts { amount: "$4.2M", currency: "GBP" }, the effective state is still { amount: "$4.2M", currency: "EUR" } because the human correction takes precedence.

This isn’t a merge strategy you have to build. It’s a primitive in the storage layer.

CI: block regressions before merge

Once you have baselines captured, add a regression gate to your CI pipeline:

# .github/workflows/verist.yml
name: Verist regression check
on: [push, pull_request]

jobs:
  verist-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "22"
      - run: npm install
      - run: npx verist test --step extract-claims

verist test recomputes every baseline for the step and exits with code 1 if anything changed. Schema violations always fail. Value changes fail by default but can be relaxed with --no-fail-on-diff for steps where some drift is acceptable.

For PR comments with a summary table:

- run: npx verist test --step extract-claims --format markdown > verist-report.md
  continue-on-error: true
- uses: marocchino/sticky-pull-request-comment@v2
  with:
    path: verist-report.md

The JSON format (--format json) gives you machine-readable output with counts for passed, changed, schemaViolations, and failed, so you can build custom thresholds or notifications.

What this is not

Not an agent framework. No autonomous loops, no memory, no tool calling. Verist is the layer underneath that makes decisions reviewable.
Not observability. Logs tell you what happened. Verist tells you what would change before you ship.
Not a hosted platform. It’s a library. Your code, your infrastructure, your database.
Not evals. Eval frameworks score outputs against a ground truth you label. Verist doesn’t require ground truth. It diffs old output against new output so you can review the delta.

When this fits

Prompt iteration — You’re tuning prompts and need to know what breaks across your full input set, not just three cherry-picked examples.
Model upgrades — You’re switching from GPT-4 to Claude or upgrading to a new version and want to quantify the impact before deploying.
Safe recompute — You need to reprocess historical data with new logic without overwriting the human corrections your team has made.
Decision audit — Regulated or high-stakes domains where you need to reproduce and explain any past AI decision.

If your AI outputs are consumed by downstream code and a silent change would cause a bug, a bad customer experience, or a compliance issue, you need change control.

If you’ve ever hesitated to change a prompt because you couldn’t predict the blast radius, that’s what Verist solves.

Try it

npm install verist @verist/cli zod
npx verist init

https://verist.dev

Designing Fair Token Bucket Policies for Real-Time Apps

Konstantin Tarkus — Tue, 04 Nov 2025 23:47:31 GMT

Sizing burst capacity and refill rates for chat, gaming, and streaming without starving legitimate users or enabling abuse.

The Problem With One-Size-Fits-All Rate Limiting

You launch your multiplayer game. The collision detection runs tight — good. Then a speedrunner discovers an exploit: spam move commands fast enough and the server’s position updates lag behind optimistic rendering. Your rate limiter catches them at 30 messages, which feels fair. But then legitimate players in heated moments hit the same limit after 20 rapid inputs during a firefight.

You dial the limit down to 15 to catch abuse earlier. Now everyone complains about input lag.

Sound familiar?

The problem isn’t that rate limiting is wrong. It’s essential for protecting backend resources and ensuring fair access. The problem is that most policies treat all traffic the same, missing the fact that different applications have radically different traffic signatures. A chat app needs to tolerate sudden bursts when users paste code blocks. A game needs predictable, high-frequency streams. A video stream needs one massive initial burst, then steady flow.

The token bucket algorithm is elegant enough to handle all three — but only if you size it correctly for your workload. This post shows you how to reason about capacity and refill rate, not as magic numbers, but as levers that directly model your application’s behavior. You’ll learn to distinguish between legitimate user behavior and abuse, then size your limits accordingly.

How Token Buckets Work: A Quick Mental Model

Token buckets are simple: tokens refill at a constant rate (r), you consume tokens per message, and requests queue when empty. The key parameters are:

Refill rate (r): tokens per second — your sustained message rate
Burst capacity (B_max): maximum tokens — how many messages you can send instantly

The magic: bursts are allowed without punishing inactive users. This is why token buckets beat leaky buckets for real-time apps.

Capacity vs. Refill Rate: The Fundamental Trade-Off

These two parameters work together. Sized independently, they create terrible user experiences or dangerous security holes.

Capacity: Burst Tolerance

What it controls: How many tokens a user can consume in a single burst.

If too low: Users feel input lag. A gamer can’t respond quickly. A chatter can’t paste a code block without hitting the limit.

If too high: Attackers get a long window to flood before detection kicks in.

Example comparison (chat app):

Capacity 10, Rate 2/sec → allows 10 instant messages, then wait 5s. Feels slow.
Capacity 100, Rate 2/sec → allows 100 instant messages, then wait 50s. Easy attack vector.
Capacity 30, Rate 10/sec → allows 30 instant messages, then wait 2s. Good balance.

Refill Rate: Sustained Throughput

What it controls: The long-term average message rate.

If too low: Legitimate users feel starved. App feels sluggish.

If too high: Abusers run rampant and can saturate the system.

Example comparison (gaming):

Rate 10/sec, Capacity 20 → 10 messages/sec sustained, burst of 20. Too slow for 60 Hz game.
Rate 100/sec, Capacity 20 → 100 messages/sec sustained, but only 20-token burst. Drops packets on network jitter.
Rate 60/sec, Capacity 120 → 60 messages/sec sustained, 2-second burst window. Matches 60 Hz tick rate perfectly.

The Pairing Principle

Set capacity and refill rate together, not independently. A useful formula:

recovery_time = capacity / refill_rate

If you want users to recover from a full burst in 3 seconds, size it as:

refill_rate = 20 tokens/sec implies capacity = 60 tokens
refill_rate = 10 tokens/sec implies capacity = 30 tokens

Both allow 3-second recovery, but the first sustains higher throughput while the second is more restrictive. Your choice depends on your app’s typical load and abuse patterns.

| Use Case        | Capacity | Rate/sec | Sustained     | Recovery | Outcome                                      |
| --------------- | -------- | -------- | ------------- | -------- | -------------------------------------------- |
| Too Strict      | 10       | 2        | 2 msg/sec     | 5s       | Users churn from input lag                   |
| Chat (Balanced) | 100-200  | 1-2      | 1-2 msg/sec   | 50-200s  | Handles history load without false positives |
| Gaming (30 Hz)  | 10-15    | 35-40    | 35-40 msg/sec | <1s      | Matches game tick rate plus jitter margin    |
| Too Loose       | 500      | 100      | 100 msg/sec   | 5s       | Abusers flood the system                     |

Domain-Specific Policies

Real-time apps have distinct traffic signatures. Understanding yours is the key to sizing limits that work.

Chat Applications

Traffic signature:

Users type at 0.6–1.1 words per second (roughly one message every 10–20 seconds under normal typing)
Bursts occur: pasting code snippets, rapid-fire group conversation, media uploads
False positives hurt engagement; users churn if rate limiting feels excessive
Initial state: loading message history, fetching member lists, synchronizing presence

Recommended policy:

Capacity: 100-200 messages
Rate: 1-2 messages/sec
Recovery time: 50-200 seconds

Rationale:

Refill rate (1–2/sec) covers normal conversation: typical typing pace plus presence updates and typing indicators
Capacity (100–200) handles the most common legitimate burst: loading message history on channel entry or pasting code snippets
If a user loads 100 recent messages on entering a channel, they need burst capacity of at least 100
Recovery time of 50–200 seconds is acceptable — users expect a pause after heavy activity like bulk uploads or rapid pasting
Real test: User sends 50 fast messages (code paste or rapid conversation), 50–150 tokens remain, wait 25–150s, back to full bucket

What breaks this:

Capacity 10: Users copying code blocks get blocked constantly
Rate 100/sec: Coordinated spam floods rooms before moderation
Rate 0.5/sec: Users feel starved in group conversations
No monitoring: You won’t know when the policy is too strict

Observability: Alert if more than 5% of users hit rate limits daily. This signals either a too-strict policy or a spam spike.

Multiplayer Gaming

Traffic signature:

Players send input commands at the game tick rate (e.g., 60 Hz = 60 messages/sec per player)
One limit too low makes the game unplayable; one too high enables desynchronization attacks
Different message types have different costs: position updates vs. chat messages
Network jitter causes packet batching; the bucket must absorb temporary spikes

Recommended policy (for a 30 Hz server):

Per-player input (position, rotation):
  Capacity: 10-15 tokens (quick action "combos")
  Rate: 35-40 tokens/sec (30 Hz tick rate + 20% safety margin)

Per-player chat (separate bucket):
  Capacity: 20 messages
  Rate: 5 messages/sec

Rationale:

Refill rate must match or exceed the server’s tick rate, plus a 20% safety margin for network jitter
For a 30 Hz server, 35–40 tokens/sec ensures players can send and receive at the game’s natural frequency
Capacity (10–15 tokens) is small because gaming traffic is a continuous stream, not bursty — large bursts often signal abuse (spam scripts)
Separate buckets prevent chat spam from blocking critical movement commands
Scenario: Player executes a quick combo (5 actions). Bucket has tokens. Player spams chat. Chat bucket empties, but movement input still flows through

What breaks this:

Single bucket for all messages: Attacker spams chat, blocks position updates, player appears frozen
Capacity 20: Temporary packet loss makes the game unplayable
Capacity 500+: Malicious clients send fake position updates, world desynchronizes
Rate 20/sec: Input lag on 60 Hz server feels terrible

Streaming (Live Video)

Traffic signature:

Two-phase model: massive initial buffer-fill (0–10 seconds), then steady-state flow at video bitrate
Phase 1 (buffer-fill): High-speed transfer to fill client buffer (e.g., 10 seconds of content upfront)
Phase 2 (steady-state): Downloads at bitrate matched to playback speed (with minor pauses to prevent buffer overflow)
Control plane: Viewers send heartbeats, seek requests, quality changes (low volume); Streamers send metadata updates

The Critical Buffer-Fill Calculation:

Streaming policies must explicitly handle the initial massive burst. This is the most commonly missed piece.

# Example: 1080p stream at 6 Mbps, target 10-second buffer

Video bitrate:       750 KB/sec (6 Mbps = 750 kilobytes/second)
Buffer duration:     10 seconds
Initial burst size:  750 KB/sec × 10 sec = 7,500 KB = 7.5 MB

This is the minimum burst capacity your policy must allow.

Recommended policy (per-viewer):

Initial buffer-fill (first connection or seek):
  Capacity: 7.5 MB (for 10s of 6 Mbps content)
  Rate: 750 KB/sec (6 Mbps)
  Recovery time: ~10 seconds

Steady-state playback (after buffer full):
  Capacity: Bitrate × 2-3 seconds (e.g., 1.5-2.25 MB for 6 Mbps stream)
  Rate: Video bitrate (e.g., 750 KB/sec)

Control plane (heartbeats, seeks, quality changes):
  Capacity: 10 actions
  Rate: 2 actions/sec (separate bucket from data transfer)

Rationale:

Burst capacity must be sized to the actual initial buffer-fill requirement, not guessed
For a 6 Mbps stream with 10-second target buffer, capacity must be at least 7.5 MB
Steady-state capacity slightly larger than refill rate prevents underflow during minor network jitter
Separate control plane prevents heartbeat timeouts or seek delays from being starved by data transfer
Recovery time for initial burst is acceptable (users expect a few seconds of buffering on start)

Calculating for Your Bitrate:

First, convert Mbps to KB/sec (divide by 8): 6 Mbps → 750 KB/sec. Then multiply by buffer duration.

| Bitrate         | KB/sec | 5s Buffer | 10s Buffer | 15s Buffer |
| --------------- | ------ | --------- | ---------- | ---------- |
| 2.5 Mbps (480p) | 312.5  | 1.56 MB   | 3.125 MB   | 4.7 MB     |
| 5 Mbps (720p)   | 625    | 3.13 MB   | 6.25 MB    | 9.4 MB     |
| 8 Mbps (1080p)  | 1,000  | 5 MB      | 10 MB      | 15 MB      |

What breaks this:

Using a generic 100-token capacity: Can’t handle even 5 seconds of buffer-fill at real-world bitrates
Mixing data and control in one bucket: Heartbeats timeout while buffering, users see “connection lost”
Heartbeat rate 0.2/sec: Users seeking every few seconds hit the control limit
Not accounting for buffer-fill: Users experience long startup delays on first play or after seeking

Layering Limits: Per-User, Per-Route, Cost-Based

A single global limit is a blunt instrument. Production systems need multiple layers of protection.

The Pyramid of Protection

Global limit (all users, all messages)
    ↓
Per-IP / Per-Device limit (catch botnets)
    ↓
Per-User Per-Type limit (fairness between users)
    ↓
Per-Route Cost-Based limit (protect expensive operations)

Example: Chat room with 10,000 users:

Global: 100,000 messages/sec
Per-IP: 100 messages/sec (catch credential stuffing)
Per-User: 20/sec per message type
Per-Route: TEXT=1 token, ADMIN_COMMAND=10 tokens

Per-User Per-Type (Most Common)

One bucket per (user, message type) pair. Chat messages have a separate quota from presence updates.

// Per-user per-type (in-memory adapter shown; swap for Redis/Durable Objects in production)
import { rateLimit, keyPerUserPerType } from "@ws-kit/middleware";
import { memoryRateLimiter } from "@ws-kit/adapters/memory";

const limiter = rateLimit({
  limiter: memoryRateLimiter({
    capacity: 30,
    tokensPerSecond: 10,
  }),
  key: keyPerUserPerType,
});

router.use(limiter);

router.on(ChatSendMessage, (ctx) => {
  ctx.send(ChatSentMessage, { message: ctx.payload.text });
});

router.on(PresenceUpdateMessage, (ctx) => {
  ctx.publish("presence", PresenceChangedMessage, {
    userId: ctx.ws.data?.userId,
  });
});

Why it works:

Prevents one chatty user from monopolizing bandwidth and starving others
Different message types can have different tolerances based on their impact
Fair: each user gets an independent quota per message type
Simple to reason about and tune based on real-world usage

Cost-Based Limiting (Advanced)

Different operations consume different amounts of resources. GitHub and Shopify use this for GraphQL APIs — it’s equally powerful for WebSockets.

// Single unified rate limiter with custom cost per operation
import { rateLimit } from "@ws-kit/middleware";
import { memoryRateLimiter } from "@ws-kit/adapters/memory";

const limiter = rateLimit({
  limiter: memoryRateLimiter({
    capacity: 200,
    tokensPerSecond: 50,
  }),
  key: (ctx) => {
    const user = ctx.ws.data?.userId ?? "anon";
    return `user:${user}`;
  },
  cost: (ctx) => {
    // All costs must be positive integers
    if (ctx.type === "ChatSend") return 1;
    if (ctx.type === "FileUpload") return 20;
    if (ctx.type === "AdminBan") return 10;
    if (ctx.type === "HistorySearch") return 15;
    return 1; // default
  },
});

router.use(limiter);

Why it works:

Expensive operations (database queries, external APIs) cost more tokens
Cheap operations (presence updates, heartbeats) cost less, allowing higher frequency
Single shared bucket prevents any one operation type from monopolizing quota
Users have flexibility: spend budget on many cheap operations or a few expensive ones
Scales better than managing dozens of independent buckets

Trade-off: Costs must align with actual resource consumption. Misjudged costs will either starve legitimate users or enable abuse.

Choosing a Rate Limiter Adapter

Examples above use memoryRateLimiter. For production, choose your adapter based on deployment:

| Adapter         | Best For                | Latency |
| --------------- | ----------------------- | ------- |
| In-Memory       | Single server, dev/test | <1ms    |
| Redis           | Distributed fleets      | 2-5ms   |
| Durable Objects | Edge/global             | 10-50ms |

Swap the adapter and the semantics remain identical:

import { redisRateLimiter } from "@ws-kit/adapters/redis";

const limiter = rateLimit({
  limiter: redisRateLimiter(redis, { capacity: 30, tokensPerSecond: 10 }),
  key: keyPerUserPerType,
});

Note on imports: Each adapter is available via a subpath export. Use @ws-kit/adapters/memory, @ws-kit/adapters/redis, or @ws-kit/adapters/cloudflare-do to import only the adapter you need. Importing from @ws-kit/adapters directly requires explicit adapter selection via the platform-specific factories.

How to Calculate Costs

Assigning costs requires understanding what each operation actually costs your backend:

// Example cost calculation based on database operations
// All costs must be positive integers (no decimals, no zero/negative values)
const operationCosts = {
  // Simple, in-memory operations (cost = 1)
  PresenceUpdate: 1, // Just update local state
  TypingIndicator: 1, // Ephemeral, instant delivery

  // Database reads (cost = 2-5 depending on complexity)
  MessageGet: 3, // Single document read
  UserProfile: 2, // Cache-friendly read

  // Database writes (cost = 5-10 including indexing)
  MessageSend: 5, // Write + index update
  MessageEdit: 5, // Update + audit log

  // Expensive operations (cost = 20-50 for CPU-intensive work)
  MessageSearch: 20, // Full-text search across millions
  HistoryExport: 50, // Generates file, might send email
  AnalyticsQuery: 30, // Aggregates data across time range
};

Start with these simple ratios:

In-memory operations: 1 token
Database reads: 2–5 tokens (depends on complexity)
Database writes: 5–10 tokens (include indexing, replication)
External API calls: 10–20 tokens (includes latency uncertainty)
Aggregations/searches: 20–50 tokens (CPU-intensive)

Then validate by measuring actual P95 latencies. If message.search takes 100ms and message.send takes 10ms, but both cost 5 tokens, you're underweighting search. Increase its cost to 50 tokens.

Common Mistakes and Red Flags

The most impactful rate-limiting failures in production boil down to three design errors. These are the battle scars from real systems.

1. Capacity Way Higher Than Refill Rate (Most Common)

The Failure Mode: This is why most rate-limiting deployments fail in their first week.

❌ Bad: Capacity 1000, Rate 10/sec = 100-second burst window.

An attacker (or botnet) exploits the massive window: they send 1000 requests in 10 seconds while your monitoring system is still sleeping. The damage is done before typical monitoring thresholds are breached and alerts fire — by then, the database is already melting.

Real-world impact: In early 2023, a major gaming platform experienced a DDoS that succeeded not because attacks were fast, but because their burst window was so large that coordinated spam flooded the system before rate-limit signals propagated to edge nodes.

✅ Fix: Use the formula capacity ≈ refill_rate * desired_recovery_time. A 10/sec refill rate with a 3-second recovery window means capacity ≈ 30. An attacker can't get meaningful damage done in 3 seconds.

2. Confusing Per-User and Global Limits (Second Most Common)

The Failure Mode: Limits work independently instead of layered, creating security gaps or fairness problems.

❌ Bad approach 1: Per-user limit only → one power user or coordinated botnet saturates the server’s total capacity.

❌ Bad approach 2: Global limit only → one misbehaving user or network spike blocks all other users. Innocent mobile users in high-latency regions get starved.

Real-world impact: A real-time collaboration platform launched with per-user limits but no global cap. During a product launch event, legitimate traffic from 50,000 concurrent users each hitting their personal limit harmlessly… except in aggregate they exceeded the database’s actual throughput. The platform went down not from abuse, but from scale.

✅ Fix: Implement the pyramid structure. Global limits upstream catch coordinated attacks by restricting aggregate capacity across all users and IPs — even if every single user stays within their quota, the sum cannot overwhelm the database. Per-user limits downstream ensure fairness. Both must exist.

3. Not Accounting for Network Jitter (Subtle but Frequent)

The Failure Mode: Policies work perfectly in the lab but fail mysteriously in production due to real-world network behavior.

❌ Bad: Capacity 5, Rate 10/sec. Looks reasonable on paper: 0.5-second recovery.

On a flaky mobile network, packet loss causes TCP to retransmit and batch messages. The client bursts 5 messages in rapid succession. Bucket emptied. User can’t send anything for 0.5 seconds. It feels like input lag.

At scale, 10% of users experience this on Tuesday afternoons when cellular networks are congested. Support tickets flood in. You revert the rollout.

✅ Fix: Add 1–2 seconds of headroom to your capacity parameter — enough to absorb traffic bursts lasting that duration. For a 10/sec rate, use capacity 30–50 (representing 3–5 seconds of refill), not 10. This absorbs temporary network spikes without being a security hole (recovery is still fast: 3–5 seconds).

Other Considerations

If building cost-based limits (advanced), ensure operation costs match actual resource consumption — misjudged weights starve expensive queries or enable cheap-operation spam. Cost assignment is iterative: start with educated guesses (database reads cost more than presence updates), then continuously validate against actual latencies and user behavior, adjusting weights every few weeks. For multi-phase services, tie buckets to stable identifiers (session or device ID, not just user ID) to avoid losing tokens on login. Finally, always layer limits — global upstream, per-operation downstream — so bugs in one handler don’t cascade.

Testing and Tuning Your Policy

Choosing initial values is only half the battle. The other half is validating them against real-world usage patterns.

Load Testing Strategy

Before deploying to production, validate your policy with synthetic load:

Baseline test: Run legitimate usage at 10x typical load. Verify that normal users don’t hit limits.
Burst test: Simulate network jitter by batching messages. Ensure capacity absorbs temporary spikes without false positives.
Abuse test: Run coordinated spam from multiple IPs/users. Verify that global and per-IP limits catch coordinated attacks before they saturate the system.
Edge case test: Run mixed workloads (some users light, some heavy). Verify that fair distribution works as expected.

Monitoring During Rollout

When deploying to production, ramp up gradually:

Week 1: Deploy to 5% of users. Monitor metrics hourly. Look for unexpected spikes in rejected requests.
Week 2: Expand to 25% of users. Watch for patterns (time of day? geographic? user type?).
Week 3–4: Full rollout with continued monitoring.

Track these metrics:

Rate limit hit rate (by message type, user tier): Should be <1% for legitimate traffic
Histogram of tokens remaining at rejection: If users always have 0 tokens, you’re too strict. If they have plenty, you’re too loose.
Time since last refill: How long do users wait after hitting a limit? Should match your recovery_time.
P95 latency of rate limit checks: Keep <1ms. Slow checks block your event loop.

Tuning Based on Real Data

After 2–4 weeks of data, adjust:

If <0.1% hit the limit: Your policy is too loose. Users may complain if you see coordinated spam.
If 0.1%-1% hit the limit: Good zone. Some legitimate power users hit it, but most don’t.
If 1%-5% hit the limit: Getting strict. Watch user support tickets for complaints about input lag or sluggish feel.
If >5% hit the limit: Too strict. You’re harming legitimate users.

Also consider:

Seasonal patterns: Games may be stricter during tournaments. Chat apps during product launches.
User cohorts: Free users might have stricter limits than paid. Mobile users might have more generous limits due to network variance.
Abuse trends: If a particular message type is being attacked, tighten its limit without affecting others.

Real-World Case Study: RoomChat

Setup: Collaborative room editor with real-time cursor positions and code editing. 10,000 concurrent users, averaging 2–3 Mbps egress during peak hours. Launch week: everything seemed fine. Week two: support tickets spiked.

Initial policy (launched naively):

Per-user per-type: capacity 200, rate 100/sec

Why this choice: Backend could handle ~100k msg/sec aggregate. With 10k users, averaging 10 msg/sec each felt reasonable. The team picked 100/sec as a “burst allowance” without much thought — it came from dividing remaining capacity by active users at a single snapshot.

Week 1–2: The Discovery Phase

Support tickets: “Rate limit? I just pasted a code snippet” and “My cursor keeps disappearing.”

First instinct: check if it’s users on weird network conditions or if everyone’s being blocked equally.

Oct 15 — rate_limit_hit_total: 2.3% of active users hitting limits
Oct 16 — pulled message breakdown: TEXT_MESSAGE 12% of hits, CODE_PASTE 31%, CURSOR_POSITION 89%
Oct 17 — chart of hits by time of day: spiky, not uniform (but no clear pattern yet)

Initial hypothesis: “Bucket’s too small, users are naturally bursty.” Increased capacity to 500 and shipped it midweek, fingers crossed.

Week 2–3: The False Start

Capacity 500 helped… sort of. Some metrics improved dramatically, others barely moved:

CURSOR_POSITION hits: 89% → 47% (huge win, network jitter absorbed)
TEXT_MESSAGE hits: 12% → 9% (minor improvement)
CODE_PASTE hits: 31% → 19% (only 12-point drop, still high)

But in the support channel: “I pasted a 50-line snippet and got rate limited.” Still happening at 19% — not acceptable.

Plus, during testing, one engineer noticed: if you rapidly paste 5 code blocks, the server logs don’t show 5 separate CODE_PASTE messages arriving. They show 3 or 4, sometimes out of order. TCP batching on 4G was real.

Root cause discovery (messy, took longer than expected):

One engineer pulled wire timings from production — looking at actual message arrival patterns. First observation: code pastes weren’t evenly spaced. A user would send one paste, and the server would receive it as 2–3 fragmented TCP packets within 50ms. The token bucket saw each fragment as a separate message.

Hypothesis: “Network batching is compressing things.” But how much?

Checked message sizes: text messages averaged 80 bytes, code pastes 2–4 KB. On a congested 4G network, the TCP stack groups multiple frames together. A single logical “paste code block” operation becomes 3–4 physical packets arriving in rapid succession.

More investigation: ran a load test with intentional packet loss. At 5% loss, CODE_PASTE hit rate jumped from 19% to 34%. At 15% loss (simulating congested networks), it hit 41%.

Additionally discovered during an unrelated audit: 10k users × 100/sec = 1M msg/sec theoretical capacity. Reality check against database: the backend maxed out around 150k/sec sustained load. We were giving users a budget that the infrastructure couldn't actually handle.

Week 3–4: Iterating (with setbacks)

Strategy 1: Separate buckets by message type, uniform token cost.

TEXT_MESSAGE: capacity 50, rate 20/sec
CODE_PASTE: capacity 80, rate 8/sec
CURSOR_POSITION: capacity 300, rate 100/sec

Deployed. After 2 days, CODE_PASTE hits were down to 15% — progress but still unacceptable. Cursor felt smooth. Text felt fine. But code paste users were still complaining.

Late-week realization (and this was annoying): the issue wasn’t rate — it was capacity. Users could hit their CODE_PASTE bucket almost immediately during packet batching. Once empty, they had to wait 10 seconds for recovery. That felt broken.

Strategy 2: Different approach. Maybe code pastes cost more than text messages because they’re actually more expensive on the backend.

Measured actual server resource consumption:

TEXT_MESSAGE: ~0.5ms processing
CODE_PASTE: 2.5–4ms processing (syntax highlighting, diff calculation, conflict detection)
CURSOR_POSITION: ~0.1ms processing

Cost weighting: code paste takes 5–8x longer. So cost them more tokens.

Final tuned policy (end of week 4):

Global: 80,000 msg/sec (150k/sec max minus headroom, with some buffer for spikes)

Per-user per-type:
  TEXT_MESSAGE: capacity 50, rate 20/sec (1 token each)
  CODE_PASTE: capacity 100, rate 3/sec (5 tokens each; accounts for actual backend overhead)
  CURSOR_POSITION: capacity 500, rate 100/sec (0.5 tokens; cheap)

Rollout (week 5, then week 6):

Friday deploy: Canary to 5% of traffic. Monitored hourly over the weekend — hit rate hovered around 1.6–1.9%. Not perfect, but better than 2.3%.

Monday morning: Expanded to 30% (skipped the “25%” step; ops team wanted to move faster). Hit rate climbed to 2.1% during peak hours. Huh. Global limit + network spike during Monday morning activity. Pulled the release back to 5% after 4 hours.

Mid-week investigation: realized the global 80k limit was too tight for genuine peaks. Bumped to 95k. Re-deployed to 30%. Steadier at 1.3–1.7%.

Friday: Full rollout to 100%. First week: 1.4%, 1.8%, 1.1%, 1.9%, 1.5% day-to-day. Not the clean 0.8–1.2% range. More like 1–2%, with occasional spikes to 2.1% during product launch days or when heavy European users come online.

Current state (2 months later): still hovering around 1.3% average, with weeks ranging 0.9%-2.2% depending on user activity patterns.

What we learned from actual data:

Initial assumptions were confidently wrong. Dividing backend capacity by user count sounds logical; it’s still wrong. Real traffic doesn’t distribute evenly. One 10,000-user instance isn’t the same as ten 1,000-user instances.
Network batching isn’t a theoretical problem. It shows up in production the moment users are on congested networks. Staging with 500 synthetic users showed 0.2% hit rate. Real-world 10k users on 4G networks: 2.3%. The difference is TCP’s Nagle algorithm and network variance, not your code.
Metrics lie until you understand them. “CODE_PASTE 31% hit rate” sounds bad. But investigation revealed: CODE_PASTE messages were 25x larger than TEXT, and they triggered 5–8x more backend work. A hit rate of 31% on expensive operations might be more acceptable than 12% on cheap ones. You need to measure resource cost, not just message count.
Rollouts reveal what testing missed. The Friday canary revealed the baseline. Monday’s 2.1% spike (hitting the global limit) told us the 80k value was optimistic. Without real traffic, you can’t tune effectively.
Tuning isn’t a one-time event. Two months in, we’re still between 0.9%-2.2% depending on the week. That’s not failure; that’s normal. The policy absorbs user behavior variation and network conditions without catastrophic failures. We adjust the global limit once every 2–3 weeks if we see consistent drift, but the per-user per-type strategy handles most variance automatically.

Current monitoring setup:

Alert if >3% of users hit rate limits in an hour (sudden policy drift or attack)
Track global limit rejection rate separately; if >0.5% of requests hit it, check for DDoS or planned load spike
Weekly histogram of per-type hit rates; CODE_PASTE at 1–2% is expected, TEXT at >1% means investigation
Correlate with customer support tickets; if code-heavy teams complain about “lags,” the policy might be too strict

Lessons learned (revised after production experience):

Assumptions require validation. The textbook formula (capacity × refill_rate = recovery_time) is useful for thinking but meaningless without real traffic data. Always A/B test in canary before rolling out.
Cost-based limiting works, but requires measurement. You can’t eyeball token costs. Measure actual backend latency per operation. CODE_PASTE costing 5 tokens wasn’t a guess; it came from production profiling.
Network jitter is your biggest wildcard. All your lab testing happens on stable networks. Production users on 4G, airplane WiFi, and congested corporate networks behave differently. Add 20–30% headroom to burst capacity; it’s not a waste, it’s insurance.
Global limits are the safety net you hope never activates. During normal operation, they shouldn’t be hit. When they are (coordinated spam, load spike), that’s the only thing standing between your backend and meltdown. Don’t skimp on them.
Tuning is iterative forever. There’s no “final policy.” Seasonal patterns (back-to-school, holidays, product launches) mean you’ll adjust limits 4–6 times per year. Build monitoring that makes adjustments quick and low-risk.

Implementation Checklist

Analyze your application’s traffic signature (chat? gaming? streaming?)
Define capacity and rate for each message type or operation
Document reasoning for chosen numbers (recovery time, user behavior, abuse scenarios)
Implement observability: log when limits are hit, by whom, how often
Test with synthetic load: simulate network jitter, bursts, and coordinated spam
Monitor in production for 2 weeks before declaring success
Review every 3 months: are users complaining? Is abuse rising? Adjust as needed.
Set up alerts for anomalies (sudden spike in limit hits, unusual patterns)
Use separate buckets for different message types (or cost-weight a single bucket)

Key metrics to track:

rate_limit_exceeded_total (counter)
  Break down by message type and user cohort

rate_limit_bucket_tokens (gauge)
  Distribution of remaining tokens at time of rejection

rate_limit_check_latency (histogram)
  Cost of checking limits (target: <1ms)

Putting It Into Practice

Token bucket rate limiting is deceptively simple: add tokens, consume tokens, reject when empty. But designing fair policies that protect your backend without starving users requires understanding the trade-offs between capacity, refill rate, buckets, and costs.

The Three-Phase Deployment Strategy

Phase 1: Design (1–2 weeks)

Analyze your app’s traffic signature using real production logs
Calculate baseline rates from P95 user behavior
Choose strategy: single bucket, per-user per-type, or cost-based
Build a simple rate limit simulator to test policies without deploying

Phase 2: Testing (1 week)

Run load tests at 5x and 10x typical peak load
Simulate network jitter and packet batching
Run abuse scenarios: coordinated spam, individual floods, mixed workloads
Document what settings work and what breaks

Phase 3: Gradual Rollout (4 weeks)

Week 1: Deploy to 5% of users (or one region)
Week 2: Expand to 25% as you build confidence
Week 3–4: Roll out to 100% with continued monitoring
Keep tuning decisions lightweight — you can adjust rates without redeploying

Common Implementation Patterns

(Examples use memoryRateLimiter — swap for redisRateLimiter or durableObjectRateLimiter per deployment.)

Pattern 1: Per-User Limits Only

Simple, good for early-stage apps. Protects against individual users monopolizing resources but doesn’t prevent coordinated attacks.

// Per-user rate limit
import { rateLimit } from "@ws-kit/middleware";
import { memoryRateLimiter } from "@ws-kit/adapters/memory";

const limiter = rateLimit({
  limiter: memoryRateLimiter({
    capacity: 100,
    tokensPerSecond: 20,
  }),
  key: (ctx) => {
    const user = ctx.ws.data?.userId ?? "anon";
    return `user:${user}`;
  },
});

router.use(limiter);

Pattern 2: Tiered by Message Type

Better for apps with mixed message costs. Text chat gets higher limits than expensive operations.

// Define different limits per operation type
import { rateLimit } from "@ws-kit/middleware";
import { memoryRateLimiter } from "@ws-kit/adapters/memory";

const chatLimiter = rateLimit({
  limiter: memoryRateLimiter({ capacity: 50, tokensPerSecond: 20 }),
  key: (ctx) => `user:${ctx.ws.data?.userId ?? "anon"}`,
});

const uploadLimiter = rateLimit({
  limiter: memoryRateLimiter({ capacity: 5, tokensPerSecond: 1 }),
  key: (ctx) => `user:${ctx.ws.data?.userId ?? "anon"}`,
});

const searchLimiter = rateLimit({
  limiter: memoryRateLimiter({ capacity: 10, tokensPerSecond: 2 }),
  key: (ctx) => `user:${ctx.ws.data?.userId ?? "anon"}`,
});

// Register each limiter with its specific message type
router.use(ChatSendMessage, chatLimiter);
router.use(FileUploadMessage, uploadLimiter);
router.use(SearchHistoryMessage, searchLimiter);

Pattern 3: Cost-Based (Most Sophisticated)

Single bucket per user, costs scale with operation impact. Best for mature apps where you’ve measured actual costs.

Selecting Your Pattern:

Early stage: Start with Pattern 1 (per-user only)
Multiple message types: Graduate to Pattern 2 (per-type limits)
Mature, complex API: Pattern 3 (cost-based) provides the most control

Quick Start

Analyze your app’s traffic signature (chat? gaming? streaming?)
Pick initial capacity and rate from the domain-specific recommendations above
Choose your strategy: simple per-user, per-type, or cost-based
Deploy to a small cohort and monitor for 2 weeks
Tune based on real-world feedback: Are users complaining? Is abuse rising?

The case study shows that one generic policy rarely survives contact with production. But by reasoning about capacity and refill rate as levers that model your workload, you move from reactive firefighting to proactive, confident tuning.

Fair rate limiting isn’t about saying “no” more often. It’s about saying “yes” predictably, protecting the system when necessary, and building a platform where both legitimate users and your backend infrastructure can thrive. When done right, users won’t notice the limit is there — they’ll just experience a fast, fair, and stable service.

Beyond the Lock: Why Fencing Tokens Are Essential

Konstantin Tarkus — Mon, 20 Oct 2025 03:35:49 GMT

A lock isn’t enough. Discover how fencing tokens prevent data corruption from stale locks and “zombie” processes.

Your payment processor just charged a customer twice. Your inventory system thinks you have 47 widgets when there are only 23. Both disasters happened despite using distributed locks. The culprit? Your process held a lock that had already expired, became a “zombie,” and corrupted data while believing it was still protected.

This isn’t a rare edge case. It’s a fundamental property of distributed systems that most developers don’t discover until production.

The Illusion of Safety

Distributed locks feel safe. You acquire a lock from Redis or your database, perform your critical work, and release it. The pattern is simple:

import { createLock } from "syncguard/redis";
import Redis from "ioredis";

const redis = new Redis();
const lock = createLock(redis);

// UNSAFE: No protection against stale locks
async function processPayment(orderId: string) {
  await lock(
    async () => {
      const order = await db.getOrder(orderId);
      await paymentGateway.charge(order.amount);
      await db.updateOrder(orderId, { status: "paid" });
    },
    { key: `payment:${orderId}`, ttlMs: 30000 },
  );
}

This code looks correct. You’re using a proper distributed lock with a 30-second timeout. But there’s a critical flaw that becomes visible only under production conditions.

The problem is subtle: you’re treating a distributed lock as a binary state (locked/unlocked), just like an in-process mutex. But a distributed lock isn’t a mutex. It’s a lease — a time-bound grant of exclusive access that can expire while you’re still using it.

When Locks Lie: The Zombie Process Problem

Here’s what actually happens in production:

Timeline:

T=0s: Process A acquires the lock with a 30-second TTL and starts processing a payment.

T=5s: Process A enters a stop-the-world garbage collection pause. In the JVM, these pauses can last for minutes in pathological cases. But even a 35-second pause is enough to break everything.

T=30s: The lock expires in Redis. From the lock service’s perspective, Process A has died.

T=31s: Process B successfully acquires the same lock and begins processing the payment.

T=33s: Process B charges the customer’s card and updates the database.

T=40s: Process A wakes up from its GC pause, completely unaware that 35 seconds have passed. From its perspective, only microseconds elapsed. It believes it still holds the lock.

T=41s: Process A charges the customer’s card again and overwrites Process B’s database update.

Result: The customer is charged twice. Data corruption. A production incident.

This isn’t a bug in your lock implementation. This isn’t a bug in Redis. This is how distributed systems work. A process can be paused at any moment by:

Garbage collection (stop-the-world pauses lasting seconds or even minutes)
OS process preemption (your process gets swapped out)
Virtual memory page faults (requires slow disk I/O)
Network delays (requests hang for seconds or minutes)

The fundamental issue: only two parties are involved — the client and the lock manager. The client thinks it holds the lock. The lock manager knows the lease expired. But there’s no one to stop the client from proceeding with stale authorization.

The Three-Party Protocol: Enter Fencing Tokens

The solution requires a mental model shift. We need a third party to validate whether operations should be accepted: the resource itself.

A fencing token is a monotonically increasing number issued by the lock service with every successful lock acquisition. Each time any process acquires the lock for a given resource, the token increases. Process A gets token 33. When the lock expires and Process B acquires it, Process B gets token 34.

The protocol works like this:

Client acquires lock and receives a token: { ok: true, lockId: "...", fence: "000000000000033" }
Client includes the token in every write: All operations to the protected resource must carry the fence token
Resource checks the token: Before executing any write, the resource compares the incoming token against the last token it saw
Resource rejects stale tokens: If incoming_token <= last_seen_token, reject the write
Resource accepts and updates: If incoming_token > last_seen_token, accept the write and store the new token

Now let’s replay the zombie process scenario with fencing tokens:

Timeline:

T=0s: Process A acquires the lock and receives token 33, then enters a GC pause.

T=31s: Lock expires. Process B acquires the lock and receives token 34.

T=33s: Process B charges the payment gateway (unfenced operation) and writes to the database with token 34. The database validates 34 > null, accepts the write, and stores 34 as the last-seen token.

T=40s: Process A wakes up from its GC pause. It charges the payment gateway again (creating a duplicate charge), then attempts to update the database with its stale token 33.

T=41s: The database validates: 33 > 34 is false. Write rejected. The database responds with an error.

Result: Database integrity preserved — the zombie process cannot corrupt order state. However, the payment gateway was charged twice because it doesn’t participate in fencing. This demonstrates why idempotency keys are needed for external APIs (covered in “When Fencing Isn’t Possible”).

The key insight: the resource doesn’t trust the client’s claim of holding the lock. The resource validates the token against reality. Even a process with a stale view of the world cannot corrupt data.

How SyncGuard Implements Fencing Tokens

SyncGuard provides fencing tokens out-of-the-box for all its backends (Redis, PostgreSQL, Firestore). The implementation varies by backend, but the API is consistent.

Backend Implementation

Redis: Uses atomic INCR on a per-key fence counter. The increment and lock acquisition happen in a single Lua script for atomicity:

-- Within the atomic acquire script
local fenceKey = KEYS[3]  -- Per-resource counter key
local fence = string.format("%015d", redis.call('INCR', fenceKey))
-- Store fence in lock data and return it

PostgreSQL: Uses a dedicated fence_counters table with database-enforced atomicity. The counter increment happens within the same transaction as lock acquisition.

Firestore: Uses Firestore transactions with per-key counter documents. The transaction ensures the counter increment and lock creation are atomic.

All backends return fence tokens as 15-digit zero-padded strings (e.g., "000000000000042"):

Monotonically increasing per resource key
Lexicographically comparable (use string comparison: fenceA > fenceB)
Guaranteed unique even across crashes and restarts
No parsing needed — just compare strings directly

Resource-Side Implementation

The resource (your database, file system, or API) must actively participate in the fencing protocol. This requires three steps:

1. Add a fence token column to your data model:

ALTER TABLE orders ADD COLUMN last_fence_token VARCHAR(15);

2. Validate fence tokens on every write:

UPDATE orders
SET
  status = $1,
  last_fence_token = $2,
  updated_at = NOW()
WHERE
  order_id = $3
  AND last_fence_token < $2  -- CRITICAL: only accept strictly greater tokens
RETURNING *;

3. Check the result to detect fenced-out operations:

async function updateOrderWithFencing(
  orderId: string,
  updates: { status: string },
  fence: string,
): Promise {
  const result = await db.query(
    `UPDATE orders
     SET status = $1, last_fence_token = $2, updated_at = NOW()
     WHERE order_id = $3 AND last_fence_token < $2
     RETURNING *`,
    [updates.status, fence, orderId],
  );

  // If no rows updated, our fence token was stale
  return result.rowCount > 0;
}

Putting It All Together

Here’s the safe pattern with SyncGuard:

import { createRedisBackend } from "syncguard/redis";
import Redis from "ioredis";

const redis = new Redis();
const backend = createRedisBackend(redis);

// SAFE: Database validates fence token before accepting writes
async function processPayment(orderId: string) {
  await using lock = await backend.acquire({
    key: `payment:${orderId}`,
    ttlMs: 30000,
  });

  if (!lock.ok) {
    throw new Error("Could not acquire lock");
  }

  // Extract the fence token - a monotonically increasing number
  const { fence } = lock; // e.g., "000000000000042"

  const order = await db.getOrder(orderId);
  await paymentGateway.charge(order.amount);

  // Database validates: only accepts writes with fence > last_seen_fence
  const updated = await db.updateOrderWithFencing(
    orderId,
    { status: "paid" },
    fence,
  );

  if (!updated) {
    // Our fence token was stale - another process with a higher token won
    // This means our lock expired and we're a "zombie process"
    throw new Error("Stale lock - operation rejected by resource");
  }

  // Lock automatically released when exiting 'await using' block
}

If Process A pauses during payment processing and its lock expires, Process B will acquire a new lock with a higher fence token. When Process A wakes up and attempts to update the database with its stale fence token, the database rejects the write. The payment is processed exactly once.

When Fencing Isn’t Possible

Not all systems can participate in the fencing protocol. Third-party REST APIs, legacy systems, or external services may not support custom token validation. In these cases, you have several options:

Idempotency Keys: Many payment gateways and external APIs support idempotency keys. Use a unique request ID (like {orderId}-{fence}) to prevent duplicate processing:

await paymentGateway.charge({
  amount: order.amount,
  idempotencyKey: `order-${orderId}-${fence}`,
});

Optimistic Concurrency Control: Use version numbers or ETags if the external system supports them. Before updating, check that the version hasn’t changed since you read it.

Move to a Fencing-Capable Resource: Use your own database as a proxy. Instead of writing directly to the external API, write to your database with fence token validation, then have a separate process (idempotent worker) sync to the external system.

Compensating Transactions: Design operations to be reversible. If you detect a duplicate operation after the fact, have a process to undo it.

The key principle: if you can’t validate fence tokens at the resource, you must use another mechanism to ensure idempotency.

When You Need Fencing Tokens

Not every lock requires fencing tokens. The decision depends on what the lock is protecting.

You NEED fencing tokens when:

The lock is for correctness, not just efficiency
Failures would cause data corruption, not just duplicate work
Examples: financial transactions, inventory updates, order state machines, account balance modifications

You might NOT need fencing tokens when:

The lock is for efficiency only (e.g., preventing duplicate cache computations)
Your system can tolerate occasional duplicates
Idempotency alone provides sufficient protection
Operations are commutative (order doesn’t matter)

Architectural alternatives to consider:

Idempotency keys: For external APIs that support them
Optimistic concurrency control: Use version numbers or timestamps
Event sourcing: Immutable append-only logs eliminate update conflicts
CRDTs: For operations that are naturally commutative

The rule of thumb: if a duplicate or out-of-order operation would corrupt your data, you need either fencing tokens or an equivalent mechanism.

The Bigger Picture: Locks vs Leases

The fundamental lesson is a shift in mental model. A distributed lock is not a mutex. It’s a lease — a time-bound, probabilistic grant of permission.

Leases can expire while you’re using them. This is not a failure mode. This is normal operation in distributed systems. Process pauses, network delays, and clock skew are not bugs to be fixed — they are fundamental properties of the environment.

Fencing tokens upgrade this probabilistic safety to deterministic correctness. Instead of hoping your process doesn’t pause, you build a system where even a paused process cannot cause harm. The resource becomes the final arbiter of operation validity.

This is the essence of defensive programming in distributed systems: assume your view of the world is stale. Don’t trust the client’s claim of holding a lock. Validate at the resource level with monotonically increasing tokens.

Conclusion

If you’re using distributed locks for data correctness, and you’re not using fencing tokens (or an equivalent mechanism), you have a latent data corruption bug. It’s not a matter of “if” but “when.”

The zombie process problem is real. GC pauses, network delays, and process preemption happen in production. Your distributed lock will expire while your process is paused. Without fencing tokens, that process will wake up and corrupt your data.

Fencing tokens solve this problem by making the resource an active participant in the safety protocol. The resource doesn’t trust the client’s claim of authorization — it validates every operation against a monotonically increasing token.

The cost is modest: an extra column in your database, an extra check in your write queries. The benefit is enormous: deterministic correctness instead of probabilistic hope.

Build systems that are safe by design. Use fencing tokens.

SyncGuard is a TypeScript library that provides distributed locking with built-in fencing token support for Redis, PostgreSQL, and Firestore. Learn more at kriasoft.com/syncguard/ or check out the source code on GitHub.

Beyond the Lock: Why Fencing Tokens Are Essential was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

GitHub Security Notifications for Discord

Konstantin Tarkus — Sun, 28 Sep 2025 10:07:01 GMT

A practical guide for setting up automated security notifications from GitHub repositories to Discord channels.

Why Security Notifications Matter

Security events in your repositories need immediate attention. This guide helps you configure automated notifications so your team stays informed about:

Vulnerability discoveries and fixes
Security feature bypasses or disabled protections
Unauthorized access changes
Secret leaks and scanning results

Important Security Limitations

⚠️ Discord Webhook Security Notice:

Discord webhooks have no built-in authentication mechanism
Anyone with the webhook URL can send messages to your channel
There is no way to verify that messages come from GitHub
Treat webhook URLs as secrets and never expose them publicly

Discord Setup

1. Create a Private Security Channel

Right-click your Discord server → Create Channel
Name: #security (or #security-alerts)
Type: Text Channel
Private Channel: ✅ Enable
Permissions: Only security team members

2. Generate Webhook URL

Channel Settings → Integrations → Webhooks → New Webhook
Name: GitHub Security
Avatar: Upload GitHub logo (optional)
Copy Webhook URL → Save for GitHub configuration

GitHub Configuration

1. Repository Webhooks

Navigate to: Settings → Webhooks → Add webhook

Payload URL: [Your Discord webhook URL]/github
Content type: application/json
Secret: [Optional - validates requests FROM GitHub, but Discord cannot verify signatures]
SSL verification: Enable SSL verification

⚠️ Important: GitHub secrets only validate that webhooks come FROM GitHub to prevent spoofing. Discord webhooks have NO signature validation capability — Discord accepts any properly formatted request to the webhook URL. For signature validation, use a proxy service between GitHub and Discord.

2. Critical Security Events

Enable these events for immediate notification:

☑️ Code scanning alerts - Code scanning alert created, fixed in branch, or closed
☑️ Secret scanning alerts - Secrets scanning alert created, resolved, reopened, validated, or publicly leaked
☑️ Secret scanning alert locations - Secrets scanning alert location created
☑️ Dependabot alerts - Dependabot alert auto_dismissed, auto_reopened, created, dismissed, reopened, fixed, or reintroduced
☑️ Repository vulnerability alerts - (Note: Being deprecated, use Dependabot alerts)
☑️ Security and analyses - When security features are enabled/disabled
☑️ Repository advisories - Security advisories published for the repo

3. Access Control Events

Enable for access monitoring:

☑️ Branch protection configurations - All protections enabled/disabled
☑️ Branch protection rules - Individual rules created/edited/deleted
☑️ Collaborator add, remove, or changed - Team member access changes
☑️ Deploy keys - Deployment key additions/removals
☑️ Visibility changes - Repository made public/private

4. Security Bypass Events

Enable for policy compliance:

☑️ Dismissal requests for code scanning alerts - Alert dismissal tracking
☑️ Dismissal requests for secret scanning alerts - Secret dismissal tracking
☑️ Bypass requests for push rulesets - Push rule bypass requests
☑️ Bypass requests for secret scanning push protections - Secret push bypass

Best Practices

Channel Management

Keep the security channel private and restricted
Add only security team members and repository maintainers
Use thread discussions for detailed investigation
Pin important security policies and contacts

Notification Hygiene

Start with critical events only, expand gradually
Review and tune notifications weekly for first month
Document response procedures for each alert type
Set up on-call rotation for critical alerts

Response Workflows

Acknowledge alerts within 15 minutes during business hours (organizational policy)
Assign owner for each security issue
Use GitHub issue templates for security incident tracking
Post resolution summaries back to the channel

Note: Response times are organizational recommendations based on industry standards for critical security incidents, not technical requirements from GitHub or Discord.

Testing Your Setup

Configure the webhook in your repository
Make a simple change (e.g., push a commit or create an issue)
Check Discord channel for the notification
Note: The GitHub “Test” button returns a 400 error because GitHub’s test payload format doesn’t match Discord’s expected message schema. This is normal — the webhook is working correctly.
For security events, try creating a test file with a fake API key to trigger secret scanning

Event Priority Levels

🚨 Critical (Immediate Response Required)

Secret scanning alerts (publicly leaked)
Code scanning alerts (high/critical severity)
Repository vulnerability alerts (high/critical CVEs)
Security features disabled

⚠️ High (Response Within Hours)

Dependabot alerts (high/critical)
Branch protection disabled
Unauthorized collaborator changes
Deploy key modifications

ℹ️ Medium (Monitor and Review)

Security bypass requests
Alert dismissal requests
Branch protection rule changes
Completed security scans

Advanced Configuration (Optional)

Multiple Repositories

For teams managing multiple repositories:

Use organization-level webhooks for centralized management
Create separate channels for different severity levels (#security-critical, #security-info)
Start simple: same webhook for all repos, then customize as needed

Simple Enhancements

GitHub Actions: Filter events before sending to Discord (example: only notify for public repos)
Discord Bots: Use bots for two-way communication (acknowledge alerts from Discord)
Monitoring: Set up a simple daily/weekly summary of security events

Troubleshooting

Webhook not triggering:

Verify webhook URL format includes /github suffix (required for GitHub integration)
Check GitHub webhook delivery logs in Settings → Webhooks → Recent Deliveries
Ensure Discord channel permissions allow webhook posts
Note: GitHub’s “test” payload will show a 400 error — this is normal

Missing notifications:

Review selected events in GitHub webhook configuration
Test with a simple push event first
Check Discord server notification settings

Too many notifications:

Start with critical events only
Use Discord thread mode for detailed discussions
Consider time-based filtering for non-critical events
Discord rate limit: ~5 requests per 2 seconds per webhook

Security Best Practices

Webhook URL Protection

Never expose webhook URLs in client-side code or public repositories
Store webhook URLs in environment variables or secure secret management systems
Rotate webhook URLs quarterly or immediately if exposed
Use .gitignore to exclude any files containing webhook URLs

Webhook Rotation Procedure

Create new webhook in Discord (keep old one active)
Update GitHub webhook configuration with new URL
Test new webhook with a commit or issue
Once confirmed working, delete old Discord webhook
Document rotation in security log

Additional Security Measures

Consider using a proxy service between GitHub and Discord for additional validation
Implement monitoring for unusual webhook activity patterns
Use Discord bots with proper authentication for sensitive operations
Restrict channel access to security team members only

Security Reminder: Discord webhooks cannot authenticate senders. Any service or person with the webhook URL can post messages. This is a fundamental limitation of Discord’s webhook system.

🚨 Critical: If a webhook URL is ever exposed or leaked, rotate it immediately. Exposed webhook URLs are compromised security credentials.

Building a Localhost OAuth Callback Server in Node.js

Konstantin Tarkus — Tue, 19 Aug 2025 17:15:14 GMT

npm: oauth-callback

When building CLI tools or desktop applications that integrate with OAuth providers, you face a unique challenge: how do you capture the authorization code when there’s no public-facing server to receive the callback? The answer lies in a clever technique that’s been right under our noses — spinning up a temporary localhost server to catch the OAuth redirect.

This tutorial walks through building a production-ready OAuth callback server that works across Node.js, Deno, and Bun. We’ll cover everything from the basic HTTP server setup to handling edge cases that trip up most implementations.

Understanding the OAuth Callback Flow

Before diving into code, let’s clarify what we’re building. In a typical OAuth 2.0 authorization code flow, your application redirects users to an authorization server (like Notion or Linear), where they grant permissions. The authorization server then redirects back to your application with an authorization code.

For web applications, this redirect goes to a public URL. But for CLI tools and desktop apps, we use a localhost UR — typically http://localhost:3000/callback. The OAuth provider redirects to this local address, and our temporary server captures the authorization code from the query parameters.

This approach is explicitly blessed by OAuth 2.0 for Native Apps (RFC 8252) and is used by major tools like the GitHub CLI and Google’s OAuth libraries.

Setting Up the Basic HTTP Server

The first step is creating an HTTP server that can listen on localhost. Modern JavaScript runtimes provide different APIs for this, but we can abstract them behind a common interface using Web Standards Request and Response objects.

interface CallbackServer {
  start(options: ServerOptions): Promise;
  waitForCallback(path: string, timeout: number): Promise;
  stop(): Promise;
}

function createCallbackServer(): CallbackServer {
  // Runtime detection
  if (typeof Bun !== "undefined") return new BunCallbackServer();
  if (typeof Deno !== "undefined") return new DenoCallbackServer();
  return new NodeCallbackServer();
}

Each runtime implementation follows the same pattern: create a server, listen for requests, and resolve a promise when the callback arrives. Here’s the Node.js version that bridges between Node’s http module and Web Standards:

class NodeCallbackServer implements CallbackServer {
  private server?: http.Server;
  private callbackPromise?: {
    resolve: (result: CallbackResult) => void;
    reject: (error: Error) => void;
  };

  async start(options: ServerOptions): Promise {
    const { createServer } = await import("node:http");

    return new Promise((resolve, reject) => {
      this.server = createServer(async (req, res) => {
        const request = this.nodeToWebRequest(req, options.port);
        const response = await this.handleRequest(request);

        res.writeHead(
          response.status,
          Object.fromEntries(response.headers.entries()),
        );
        res.end(await response.text());
      });

      this.server.listen(options.port, options.hostname, resolve);
      this.server.on("error", reject);
    });
  }

  private nodeToWebRequest(req: http.IncomingMessage, port: number): Request {
    const url = new URL(req.url!, `http://localhost:${port}`);
    const headers = new Headers();

    for (const [key, value] of Object.entries(req.headers)) {
      if (typeof value === "string") {
        headers.set(key, value);
      }
    }

    return new Request(url.toString(), {
      method: req.method,
      headers,
    });
  }
}

The beauty of this approach is that once we convert to Web Standards, the actual request handling logic is identical across all runtimes.

Capturing the OAuth Callback

The heart of our server is the callback handler. When the OAuth provider redirects back, we need to extract the authorization code (or error) from the query parameters:

private async handleRequest(request: Request): Promise {
  const url = new URL(request.url);

  if (url.pathname === this.callbackPath) {
    const params: CallbackResult = {};

    // Extract all query parameters
    for (const [key, value] of url.searchParams) {
      params[key] = value;
    }

    // Resolve the waiting promise
    if (this.callbackPromise) {
      this.callbackPromise.resolve(params);
    }

    // Return success page to the browser
    return new Response(this.generateSuccessHTML(), {
      status: 200,
      headers: { "Content-Type": "text/html" }
    });
  }

  return new Response("Not Found", { status: 404 });
}

Notice how we capture all query parameters, not just the authorization code. OAuth providers send additional information like state for CSRF protection, and error responses include error and error_description fields. Our implementation preserves everything for maximum flexibility.

Handling Timeouts and Cancellation

Real-world OAuth flows can fail in numerous ways. Users might close the browser, deny permissions, or simply walk away. Our server needs robust timeout and cancellation handling:

async waitForCallback(path: string, timeout: number): Promise {
  this.callbackPath = path;

  return new Promise((resolve, reject) => {
    let isResolved = false;

    // Set up timeout
    const timer = setTimeout(() => {
      if (!isResolved) {
        isResolved = true;
        reject(new Error(`OAuth callback timeout after ${timeout}ms`));
      }
    }, timeout);

    // Wrap resolve/reject to handle cleanup
    const wrappedResolve = (result: CallbackResult) => {
      if (!isResolved) {
        isResolved = true;
        clearTimeout(timer);
        resolve(result);
      }
    };

    this.callbackPromise = {
      resolve: wrappedResolve,
      reject: (error) => {
        if (!isResolved) {
          isResolved = true;
          clearTimeout(timer);
          reject(error);
        }
      }
    };
  });
}

Supporting AbortSignal enables programmatic cancellation, essential for GUI applications where users might close a window mid-flow:

if (signal) {
  if (signal.aborted) {
    throw new Error("Operation aborted");
  }

  const abortHandler = () => {
    this.stop();
    if (this.callbackPromise) {
      this.callbackPromise.reject(new Error("Operation aborted"));
    }
  };

  signal.addEventListener("abort", abortHandler);
}

Providing User Feedback

When users complete the OAuth flow, they see a browser page indicating success or failure. Instead of a blank page or cryptic message, provide clear feedback with custom HTML:

function generateCallbackHTML(
  params: CallbackResult,
  templates: Templates,
): string {
  if (params.error) {
    // OAuth error - show error page
    return templates.errorHtml
      .replace(/{{error}}/g, params.error)
      .replace(/{{error_description}}/g, params.error_description || "");
  }

  // Success - show confirmation
  return (
    templates.successHtml ||
    `
    
      
        ✅ Authorization successful!

        You can now close this window and return to your terminal.

      
    
  `
  );
}

For production applications, consider adding CSS animations, auto-close functionality, or deep links back to your desktop application.

Security Considerations

While localhost servers are inherently more secure than public endpoints, several security measures are crucial:

1. Bind to localhost only: Never bind to 0.0.0.0 or public interfaces. This prevents network-based attacks:

this.server.listen(port, "localhost"); // NOT "0.0.0.0"

2. Validate the state parameter: OAuth’s state parameter prevents CSRF attacks. Generate it before starting the flow and validate it in the callback:

const state = crypto.randomBytes(32).toString("base64url");
const authUrl = `${provider}/authorize?state=${state}&...`;

// In callback handler
if (params.state !== expectedState) {
  throw new Error("State mismatch - possible CSRF attack");
}

3. Close the server immediately: Once you receive the callback, shut down the server to minimize the attack surface:

const result = await server.waitForCallback("/callback", 30000);
await server.stop(); // Always cleanup

4. Use unpredictable ports when possible: If your OAuth provider supports dynamic redirect URIs, use random high ports to prevent port-squatting attacks.

Putting It All Together

Here’s a complete example that ties everything together:

import { createCallbackServer } from "./server";
import { spawn } from "child_process";

export async function getAuthCode(authUrl: string): Promise {
  const server = createCallbackServer();

  try {
    // Start the server
    await server.start({
      port: 3000,
      hostname: "localhost",
      successHtml: "Success! You can close this window.
",
      errorHtml: "Error: {{error_description}}
",
    });

    // Open the browser
    const opener =
      process.platform === "darwin"
        ? "open"
        : process.platform === "win32"
          ? "start"
          : "xdg-open";
    spawn(opener, [authUrl], { detached: true });

    // Wait for callback
    const result = await server.waitForCallback("/callback", 30000);

    if (result.error) {
      throw new Error(`OAuth error: ${result.error_description}`);
    }

    return result.code!;
  } finally {
    // Always cleanup
    await server.stop();
  }
}

// Usage
const code = await getAuthCode(
  "https://github.com/login/oauth/authorize?" +
    "client_id=xxx&redirect_uri=http://localhost:3000/callback",
);

Best Practices and Next Steps

Building a robust OAuth callback server requires attention to detail, but the patterns are consistent across implementations. Key takeaways:

Use Web Standards APIs for cross-runtime compatibility
Handle all error cases including timeouts and user cancellation
Provide clear user feedback with custom success/error pages
Implement security measures like state validation and localhost binding
Clean up resources by always stopping the server after use

This localhost callback approach has become the de facto standard for OAuth in CLI tools. Libraries like oauth-callback provide production-ready implementations with additional features like automatic browser detection, token persistence, and PKCE support.

Modern OAuth is moving toward even better solutions like Device Code Flow for headless environments and Dynamic Client Registration for eliminating pre-shared secrets. But for now, the localhost callback server remains the most widely supported and user-friendly approach for bringing OAuth to command-line tools.

Ready to implement OAuth in your CLI tool? Check out the complete oauth-callback library for a battle-tested implementation that handles all the edge cases discussed here.

This tutorial is part of a series on modern authentication patterns. Follow @koistya for more insights on building secure, user-friendly developer tools.

Building a Localhost OAuth Callback Server in Node.js was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

Why I Built MCP Client Generator (And Why You Should Care)

Konstantin Tarkus — Wed, 13 Aug 2025 14:11:40 GMT

⚠️ Upfront disclaimer: This is an early prototype exploring new approaches to API integration. It’s not production-ready, but I’m sharing it to gather feedback from developers facing similar challenges.

Picture this: You’re building with AI coding assistants like Claude or GitHub Copilot, and they’re using MCP tools to interact with your services. But here’s the catch — inefficient tool usage can dramatically increase your token costs. A single poorly optimized loop calling APIs could burn through tokens faster than you expect.

I learned this the hard way. My AI assistant was making individual API calls in a loop, consuming tokens at an alarming rate. That’s when I realized: we need a better way to interact with MCP servers — one that’s both cost-effective and developer-friendly.

The Hidden Cost of AI Tool Usage

You know the drill — you need to connect to GitHub, Notion, Linear, and multiple other services. Each one has its own SDK quirks, authentication dance, and type definitions that may or may not be up-to-date. But there’s a bigger problem:

When AI agents use MCP tools inefficiently, your costs can escalate quickly.

Instead of letting AI assistants make hundreds of individual tool calls, what if we could generate efficient, batched automation scripts? What if we could have type-safe clients that encourage best practices? What if authentication could just… work?

That’s when I discovered the potential of the Model Context Protocol (MCP) and decided to build something to solve this problem.

The Integration Challenge Many Developers Face

Whether you’re a solo developer automating your workflow or part of a startup building integrations, you’ve likely encountered this scenario:

Create GitHub issues from Notion pages
Sync tasks with Linear for bug tracking
Move data between multiple platforms
Do it all with proper TypeScript support

The traditional approach? Install multiple SDKs, manage different authentication flows, deal with various error handling patterns, and hope everything stays in sync. Oh, and don’t forget to create OAuth applications for each service — complete with client IDs and secrets that you’ll need to manage securely.

The boilerplate accumulates quickly.

Enter MCP Client Generator: A Prototype

Here’s what I’m experimenting with:

import { notion, github, linear } from "./lib/mcp-client";

// One command will generate all of this with full type safety
const page = await notion.createPage({
  title: "Bug Report: Login Issues",
  content: "Users are reporting authentication failures...",
});
const issue = await github.createIssue({
  title: page.title,
  body: page.content,
});
await linear.createIssue({
  title: `Bug: ${page.title}`,
  description: `GitHub Issue: ${issue.html_url}`,
  teamId: "engineering",
});

That’s the goal. Three services, fully typed, with a single configuration file:

{
  "mcpServers": {
    "notion": {
      "type": "http",
      "url": "https://mcp.notion.com/mcp"
    },
    "github": {
      "type": "http",
      "url": "https://api.githubcopilot.com/mcp/"
    },
    "linear": {
      "type": "sse",
      "url": "https://mcp.linear.app/sse"
    }
  }
}

The Authentication Reality

Here’s what I’m exploring: simplified OAuth setup where possible.

Traditional approach:

Go to GitHub’s developer settings
Create a new OAuth app
Configure redirect URIs
Copy client ID and secret
Repeat for each service
Manage secrets securely
Handle token refresh logic
Deal with different OAuth flows per service

The idealized MCP approach (when fully implemented):

Run npx mcp-client-gen
Configure your MCP server endpoints
Authenticate through browser redirect
Tokens stored locally for reuse

Reality check: This relies on RFC 7591 Dynamic Client Registration, which:

Isn’t universally supported — Many services don’t allow dynamic registration
Has security implications — Some orgs prohibit dynamic client creation
Still requires browser auth — You’ll be redirected to approve access
Needs secure token storage — We handle refresh, but you manage security

For services without dynamic registration, you’ll still need to create OAuth apps manually and provide credentials in your config.

The Potential for Cost Optimization

The Efficiency Problem: When using AI assistants with APIs, inefficient patterns can increase costs. For example, making 100 individual API calls instead of batched requests means more tokens for tool invocations, responses, and error handling.

The Proposed Solution: Generate automation scripts using MCP Client Generator that could:

Batch operations efficiently
Use proper pagination
Cache responses when appropriate
Provide type safety to prevent errors that trigger retries

While I don’t have exact metrics yet (the tool is still in development), the potential for cost reduction is significant based on the difference between individual vs batched API operations.

Potential Use Cases

Cross-Platform Data Sync: Scripts that keep GitHub issues and Notion project pages in sync, automatically updating status changes across platforms.

Automated Reporting: Weekly reports pulling data from GitHub (commits, PRs), Jira (completed tickets), and Slack (team activity), then creating summaries in Notion.

Incident Response: When monitoring systems detect issues, automatically create Slack threads, open GitHub issues, and update status pages with proper context.

Content Pipeline: Write technical content in one platform and cross-post to Dev.to, LinkedIn, and company blogs with platform-specific formatting.

Current State & Roadmap

Actually Implemented (You can test these today)

✅ CLI interface and configuration parsing
✅ Interactive prompts with smart defaults
✅ Basic scaffolding for client generation
✅ Multiple config format support (.mcp.json, .cursor/, .vscode/)

Not Yet Working (Under Development)

❌ MCP server introspection (cannot connect to servers yet)
❌ OAuth authentication (no auth flow implemented)
❌ Type generation from live servers (uses mocks currently)
❌ Error handling and retry logic
❌ Streaming support

Reality check: The core generation pipeline exists, but it doesn’t actually connect to MCP servers yet. Think of it as a foundation waiting for the protocol implementation.

Prerequisites (Current)

MCP servers must support HTTP transport
Servers need proper schema exposure
Node.js 18+ or Bun runtime

Performance Considerations

Initial connection overhead (estimated ~200ms)
Type generation at build time (not runtime)
Tree-shakable output for optimal bundle size

Addressing Common Questions

“Why not just use existing SDKs?”

This is a valid point. Traditional SDKs are mature and well-tested. However, they present challenges when you need:

Unified patterns across multiple services
Consistent authentication handling
Type safety that stays in sync with API changes
Integration with AI coding assistants

MCP Client Generator aims to complement, not replace, existing SDKs. It’s specifically designed for scenarios requiring unified access to multiple MCP-enabled services.

When to stick with traditional SDKs

Production systems requiring battle-tested reliability
Complex operations that need SDK-specific optimizations
Single service integration where you don’t need cross-platform consistency
Teams with existing SDK expertise and established patterns

When this tool might help (once complete)

Prototyping multi-service integrations quickly
AI-assisted development where consistent patterns reduce token usage
Small teams managing many integrations without dedicated expertise per SDK
Exploratory projects testing MCP capabilities

“What’s the learning curve?”

Fair question. While MCP itself is new, the generated clients use familiar JavaScript/TypeScript patterns. If you can use an SDK, you can use the generated clients. The main learning curve is understanding MCP configuration, which we’re working to simplify with better documentation and examples.

“What about existing integration platforms?”

Tools like Zapier, n8n, and Make excel at no-code workflows. MCP Client Generator targets a different need:

Type safety in your code
Custom business logic
Direct API access without middleware
Integration with your existing codebase
Cost-effective high-volume operations

“Is dynamic client registration secure?”

RFC 7591 Dynamic Client Registration is an OAuth 2.1 standard with both benefits and risks:

Security benefits:

Clients register with limited scopes
Short-lived, refreshable tokens
No hardcoded secrets in code
Unique registration per application

Security considerations:

Token storage: Generated clients store tokens locally — secure your development environment
Scope creep: Dynamic registration might request broader permissions than needed
Audit trails: Harder to track dynamically created clients in your OAuth provider
Organizational policies: Many enterprises prohibit dynamic registration

Best practice: Use dynamic registration for development/prototyping. For production, consider traditional OAuth apps with proper secret management (environment variables, secret stores, etc.).

Always review security implications for your specific use case and comply with your organization’s security policies.

“What if MCP doesn’t achieve widespread adoption?”

A legitimate concern. MCP is emerging technology currently supported by:

Claude Desktop — Anthropic’s desktop app with MCP support
Cline (formerly Claude Dev) — VS Code extension using MCP
Continue.dev — Open-source AI code assistant
Official MCP servers — GitHub repo includes filesystem, GitHub, GitLab, Slack, and Google Drive implementations

The ecosystem is small but growing. Even if adoption remains limited, the code generation patterns have value beyond MCP. The project could adapt to other protocols if needed.

“How do you handle API changes and maintenance?”

The maintenance challenge is real. Here’s the proposed approach:

Regeneration workflow: Run mcp-client-gen again to update types when APIs change
Version pinning: Lock to specific MCP server versions in your config
Git diff review: Generated code changes are reviewable like any dependency update
Fallback strategy: Keep previous generated versions if servers break compatibility

Reality check: This adds a build step and requires you to manage regeneration. It’s a tradeoff between automation and control.

Why I’m Sharing This Prototype Now

AI-assisted development is changing how we build integrations, but the tooling hasn’t caught up. I’m sharing this early prototype to:

Validate the problem: Do others face similar multi-service integration challenges?
Gather feedback: What would make this actually useful?
Find collaborators: Who wants to help build this?

This isn’t a launch — it’s an invitation to experiment together.

What’s Next

Immediate priorities:

Complete MCP protocol implementation
Robust server introspection
Comprehensive error handling
Streaming support for real-time APIs
Plugin system for custom transformations

But I’m most interested in community input on priorities.

How You Can Help

For Developers

Test the proof of concept: Try npx mcp-client-gen and share feedback
Contribute to development:
MCP server introspection
Authentication flows
Error handling patterns
Test coverage

For Potential Users

Share your use cases: What MCP servers do you need?
Provide feedback: What would make this useful for your workflow?
Help with documentation: Explain concepts to newcomers

For MCP Server Implementers

Feedback on approach: What patterns work best?
Schema standards: How should servers expose capabilities?

Get Started

# Try the interactive mode
npx mcp-client-gen

# Or quick mode with defaults
npx mcp-client-gen -y

NPM: npmjs.com/package/mcp-client-gen
GitHub: github.com/kriasoft/mcp-client-gen
Issues: Report bugs or request features
Discussions: Join the conversation

The Vision

I believe we’re at an inflection point in API integration, especially with AI-assisted development. While traditional SDKs remain valuable, there’s room for new approaches that better serve modern development workflows.

The future I’m building toward:

Configuration-driven service connections
Type safety as a default
Abstracted authentication complexity
Efficient clients for both AI and human developers

This is an early-stage project with ambitious goals. If you’re interested in shaping how developers interact with MCP services, I’d love your input and collaboration.

The MCP Client Generator is MIT licensed and available on GitHub. If this project helps you build amazing integrations, consider sponsoring the development to support continued progress.

Zero-Wait PR Previews: The Pre-Configured Slots Pattern

Konstantin Tarkus — Fri, 25 Jul 2025 14:18:17 GMT

Ever waited for PR preview environments to spin up? Yeah, me too. Here’s a pattern that changed the game for our team: pre-configured deployment slots with deterministic routing.

The Problem

Traditional PR preview workflows go something like this:

1. Open PR
2. CI/CD provisions a new environment
3. Wait… ⏳
4. Deploy code
5. Wait some more… ⏳
6. Finally get your preview URL

The provisioning step is the killer. Whether you’re using Kubernetes namespaces, cloud functions, or edge workers, creating resources takes time.

The Solution: Pre-Configured Slots

What if we flipped the script? Instead of creating environments on-demand, we pre-configure a fixed set of deployment slots:

tokyo    🔗 https://tokyo.example.com
paris    🔗 https://paris.example.com
london   🔗 https://london.example.com
berlin   🔗 https://berlin.example.com
sydney   🔗 https://sydney.example.com
madrid   🔗 https://madrid.example.com
moscow   🔗 https://moscow.example.com
cairo    🔗 https://cairo.example.com
dubai    🔗 https://dubai.example.com
rome     🔗 https://rome.example.com

Then use a deterministic hash to map PR numbers to slots:

- uses: kriasoft/pr-codename@v1
  id: pr
  
- run: wrangler deploy --env ${{ steps.pr.outputs.codename }}

PR #1234 always maps to tokyo. PR #1235 always maps to paris. No provisioning, no waiting.

How It Works

The magic happens in three parts:

1. Pre-Configure Your Slots

First, set up your deployment slots. Here’s a Cloudflare Workers example:

# wrangler.toml
[env.tokyo]
name = "preview-tokyo"
route = "tokyo.example.com/*"

[env.paris]
name = "preview-paris"
route = "paris.example.com/*"

[env.london]
name = "preview-london"
route = "london.example.com/*"

# ... repeat for all slots

2. Deterministic Mapping

The PR Codename Action uses a simple hash function to consistently map PR numbers to slot names:

const words = ["tokyo", "paris", "london", "berlin", /* ... */];
const index = prNumber % words.length;
return words[index];

The above is just an example, in reality it uses FNV-1a hashing algorithm.

3. Deploy to the Slot

Your GitHub Action workflow becomes dead simple:

name: Deploy PR Preview

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - uses: kriasoft/pr-codename@v1
        id: pr
        
      - name: Deploy to slot
        run: |
          npm ci
          npm run build
          wrangler deploy --env ${{ steps.pr.outputs.codename }}
          
      - name: Comment PR
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '🚀 Preview deployed to https://${{ steps.pr.outputs.codename }}.example.com'
            })

The Benefits

This pattern isn’t just a neat trick; it fundamentally changes the rhythm of your development cycle.

🚀 Zero-Wait Deploys
The biggest win. By eliminating the on-demand provisioning step, deployments start immediately. What used to be a 2–3 minute coffee break is now a 30-second task. Your developers stay in the flow, and your pipeline gets a whole lot faster.

🔗 URLs You Can Actually Share
Forget long, ugly, auto-generated URLs. With this pattern, PR #1234 always maps to https://tokyo.example.com. This URL is:

Memorable: You can actually remember it.
Shareable: Perfect for dropping in a Slack channel, a Linear ticket, or even saying out loud during a Zoom call. No more “Hey, can you find that preview link for me?”
Bookmarkable: QA testers and product managers can bookmark slots for features they’re tracking.

💰 No More Cloud Bill Surprises
Dynamic environments are notorious for leaving behind orphaned resources that quietly drain your budget. With a fixed number of slots, your infrastructure costs become predictable. You know exactly what’s running, and you never have to hunt down forgotten preview apps again.

🧹 Cleanup? What Cleanup?
When a PR is merged or closed, there’s no complex teardown script to run. The slot simply becomes available for the next PR. You can even have a workflow that automatically deploys the main branch to the slot to keep it fresh. It’s a self-cleaning system.

Real-World Considerations

How Many Slots?

We’ve found 10–15 slots work well for most teams. The math:

10 slots + 50 open PRs = each slot serves ~5 PRs
Only the latest deployment to each slot is accessible
Most teams only actively review a handful of PRs at once

Collision Handling

Yes, PRs can map to the same slot. PR #1 and PR #11 both map to the same environment with 10 slots. This means newer deployments overwrite older ones — so if you’re reviewing PR #1 and someone pushes PR #11, your preview disappears.

In practice, this works for many teams because:

Developers typically work on recent PRs
Old PR previews naturally expire
You can always trigger a redeploy to refresh

When slots don’t work well: Large teams, high PR velocity, or when multiple people need to review the same PR simultaneously.

Database & Stateful Services

The biggest challenge with any preview environment is handling databases and stateful services. With slots, you have a few options:

Shared database: Fast and cheap, but schema migrations from one PR can break others
Database per slot: Better isolation, but requires seeding data for each slot
Database branching services: Tools like Neon offer instant database branches (premium option)

For simple stateless apps, this isn’t an issue. For complex apps with databases, it’s the main implementation challenge.

Security Notes

Use environment-specific secrets for each slot
Consider adding basic auth to preview domains
Implement automatic cleanup for stale deployments\

Beyond Basic Previews

This pattern unlocks some cool possibilities:

Persistent Test Environments: QA can bookmark specific slots for testing.
A/B Testing: Map feature flags to slots for instant switching.
Geographic Testing: Actually deploy slots to different regions.

Try It Yourself

Getting started is pretty straightforward:

Install the action:

- uses: kriasoft/pr-codename@v1
  id: pr

2. Use the codename in your deploy:

deploy --env ${{ steps.pr.outputs.codename }}

3. Enjoy instant PR previews 🚀

The full source is on GitHub if you want to customize the word list or hashing algorithm.

Slots vs On-Demand: Quick Comparison

Before you dive in, it’s worth understanding how the pre-configured slots pattern stacks up against the traditional on-demand ephemeral environments. While this post focuses on slots, knowing the trade-offs helps you make the right choice for your team.

Pre-Configured Deployment Slots vs On-Demand Ephemeral Deployments

Best for Slots: Small teams, simple apps, tight budgets

Best for On-Demand: Growing teams, complex apps, quality-focused

Nothing prevents you from mixing both approaches — use slots for rapid prototyping and on-demand for critical features.

Wrapping Up

Sometimes the best optimization is avoiding work altogether. By pre-configuring deployment slots and using deterministic routing, we eliminated the biggest bottleneck in our PR workflow.

Give it a shot and let me know how it works for your team. Happy deploying!

What patterns have you used for PR previews? Drop a comment below 👇 always curious to hear different approaches!

Zero-Wait PR Previews: The Pre-Configured Slots Pattern was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

Building Type-Safe WebSocket Applications with Bun and Zod

Konstantin Tarkus — Sat, 03 May 2025 23:42:28 GMT

Introduction

In the ever-evolving world of web development, real-time interactions have become less of a luxury and more of an expectation. Whether you’re building a chat application, a collaborative document editor, or a multiplayer game, the need for bidirectional communication is clear, and WebSockets are often the technology of choice.

But let’s be honest: working with WebSockets can sometimes feel like trying to organize a party where guests randomly shout things at each other across the room. Messages fly back and forth, payload structures are inconsistent, and before you know it, your elegant application architecture looks more like a tangled ball of holiday lights that you’ve promised yourself you’ll sort out “next year.”

Enter Bun and Zod

Bun has been making waves as a fast JavaScript runtime with built-in WebSocket support that’s both performant and easy to work with. Its native WebSocket implementation (based on uWebSockets) outperforms many alternatives, making it an excellent foundation for real-time applications.

Meanwhile, Zod has revolutionized runtime type validation in the JavaScript ecosystem. It provides a way to define schemas that guarantee the shape and type of your data, catching errors before they wreak havoc in your application.

The Challenge of WebSocket Communication

When building applications with WebSockets, several challenges typically arise:

Type safety across the wire: Unlike HTTP requests with well-defined endpoints and schemas, WebSocket messages can be a wild west of untyped JSON.
Message routing complexity: As your application grows, so does the variety of messages you need to handle. Without a structured system, this often results in sprawling switch statements or complex conditionals.
Error handling: When a message doesn’t match your expectations, how do you gracefully handle it and provide meaningful feedback?
Connection lifecycle management: Who’s connected? What rooms are they in? How do you manage authentication state across a persistent connection?

Introducing WS-Kit

To address these challenges, we’ve created WS-Kit — a type-safe WebSocket router for Bun and other platforms. It combines pluggable validators (Zod, Valibot, custom) with Bun’s WebSocket implementation to create a structured, maintainable approach to real-time messaging.

At its core, WS-Kit gives you:

A way to define message types with Zod schemas
A router that automatically validates incoming messages against these schemas
Handlers that receive only properly typed message payloads
Built-in support for broadcasting and room-based communication
Clean error handling patterns

Instead of wrestling with raw WebSocket messages, you can think in terms of typed routes, similar to how you’d structure a REST API. This approach brings clarity and maintainability to what would otherwise be chaotic message passing.

In this tutorial, we’ll build a real-time application from the ground up using Bun and WS-Kit. We’ll start with the basics of WebSocket communication in Bun, then gradually introduce type safety with Zod, and finally implement more advanced patterns like authentication and room-based messaging.

By the end, you’ll have a solid foundation for building robust, type-safe real-time applications that can scale with your needs. No more digging through message payloads with console.log at 2 AM, wondering why your users are seeing gibberish on their screens instead of the latest game state.

So grab your favorite beverage, fire up your code editor, and let’s bring some order to the WebSocket chaos. Your future self — the one who has to maintain this code six months from now — will thank you.

Part 1: WebSockets Fundamentals in Bun

What Are WebSockets and Why Use Them?

Remember the days of polling a server every few seconds to check for updates? Like repeatedly asking “Are we there yet?” on a road trip, except the server is the increasingly annoyed parent. That’s the world WebSockets were designed to rescue us from.

Unlike traditional HTTP connections that follow a request-response pattern, WebSockets establish a persistent, two-way communication channel between clients and servers. Once established, both sides can send messages to each other at any time without the overhead of creating new connections. This makes WebSockets perfect for:

Real-time chat applications
Live dashboards and data visualizations
Multiplayer games
Collaborative editing tools
Notification systems
Stock tickers and sports scores

In essence, anywhere you need low-latency, bidirectional communication, WebSockets are your friend.

Bun’s Native WebSocket Implementation

Bun comes with a blazing-fast, native WebSocket implementation built right in. No need to reach for additional packages like ws or socket.io (though they're excellent tools in their own right). Bun's implementation is:

Fast: Built on top of Bun’s optimized JavaScript runtime
Memory-efficient: Uses less memory than Node.js alternatives
Standards-compliant: Follows the WebSocket protocol (RFC 6455)
Feature-rich: Includes built-in support for the PubSub pattern

This native implementation means you can start building real-time applications immediately without any external dependencies for the WebSocket functionality itself.

Setting Up a Basic WebSocket Server in Bun

Let’s create a simple WebSocket echo server to demonstrate how easy it is to get started with Bun. Create a new file called server.ts:

import { serve } from "bun";

serve({
  port: 3000,

  fetch(req, server) {
    // Extract URL from the request
    const url = new URL(req.url);

    // Handle WebSocket upgrade requests
    if (url.pathname === "/ws") {
      // Upgrade HTTP request to WebSocket connection
      const success = server.upgrade(req);

      // Return a fallback response if upgrade fails
      if (!success) {
        return new Response("WebSocket upgrade failed", { status: 400 });
      }

      // The connection is handled by the websocket handlers
      return undefined;
    }

    // Handle regular HTTP requests
    return new Response(
      "Hello from Bun! Try connecting to /ws with a WebSocket client.",
    );
  },

  // Define what happens when a WebSocket connects
  websocket: {
    // Called when a WebSocket connection is established
    open(ws) {
      console.log("WebSocket connection opened");
      ws.send(
        "Welcome to the echo server! Send me a message and I'll send it right back.",
      );
    },

    // Called when a message is received
    message(ws, message) {
      console.log(`Received: ${message}`);
      // Echo the message back
      ws.send(`You said: ${message}`);
    },

    // Called when the connection closes
    close(ws, code, reason) {
      console.log(`WebSocket closed with code ${code} and reason: ${reason}`);
    },

    // Called when there's an error
    error(ws, error) {
      console.error(`WebSocket error: ${error}`);
    },
  },
});

console.log("WebSocket echo server listening on ws://localhost:3000/ws");

To run this example:

bun run server.ts

Connecting from a Browser Client

Now let’s create a simple HTML client to connect to our WebSocket server:



  
    
    WebSocket Test
    
  
  
    Bun WebSocket Echo Test

    Disconnected

Open this HTML file in a browser, and you should be able to send messages to your Bun WebSocket server and see the echoed responses.

Understanding the WebSocket Lifecycle

WebSockets follow a specific lifecycle:

Connection — The client initiates a handshake by sending an HTTP request with an Upgrade: websocket header. If the server accepts, it responds with a 101 Switching Protocols status.
Open — After a successful handshake, the WebSocket connection is established and the open event fires.
Message Exchange — Both client and server can send messages at any time.
Closing — Either side can initiate closing the connection with a close code and reason.
Closed — The connection is terminated. No more messages can be sent.

The Challenge of Raw WebSocket Messages

While our echo server is simple, real applications quickly become more complex. As soon as you start building a non-trivial application, you’ll encounter challenges:

Message Format: Should you use JSON? Binary? Some custom format?
Message Types: How do you distinguish between different kinds of messages?
Routing Logic: How do you direct messages to the appropriate handlers?
Error Handling: What happens when a message isn’t formatted correctly?

Let’s upgrade our example to handle JSON messages with a type field:

import { serve } from "bun";

type ChatMessage = {
  type: string;
  content?: any;
};

serve({
  port: 3000,

  fetch(req, server) {
    const url = new URL(req.url);
    if (url.pathname === "/ws") {
      const success = server.upgrade(req);
      return success
        ? undefined
        : new Response("WebSocket upgrade failed", { status: 400 });
    }
    return new Response("Hello from Bun!");
  },

  websocket: {
    open(ws) {
      console.log("Connection opened");
    },

    message(ws, data) {
      try {
        // Parse the incoming message
        const message = JSON.parse(data as string) as ChatMessage;

        // Handle different message types
        switch (message.type) {
          case "CHAT":
            console.log(`Chat message: ${message.content.text}`);
            // Echo back with a timestamp
            ws.send(
              JSON.stringify({
                type: "CHAT_ECHO",
                content: {
                  original: message.content.text,
                  timestamp: new Date().toISOString(),
                },
              }),
            );
            break;

          case "PING":
            ws.send(
              JSON.stringify({
                type: "PONG",
                content: { timestamp: new Date().toISOString() },
              }),
            );
            break;

          default:
            ws.send(
              JSON.stringify({
                type: "ERROR",
                content: { message: `Unknown message type: ${message.type}` },
              }),
            );
            break;
        }
      } catch (error) {
        console.error("Error processing message:", error);
        ws.send(
          JSON.stringify({
            type: "ERROR",
            content: { message: "Could not parse message" },
          }),
        );
      }
    },

    close(ws, code, reason) {
      console.log(`Connection closed: ${code} ${reason}`);
    },
  },
});

console.log("Improved WebSocket server running on ws://localhost:3000/ws");

The Problem with This Approach

Even in this simple example, we’re already seeing issues:

Type Safety: The as ChatMessage cast doesn't guarantee the message actually has the right structure.
Error Prone: It’s easy to typo a message type or forget a field.
Scaling Issues: As you add more message types, the switch statement becomes unwieldy.
Maintenance Burden: There’s no centralized definition of message structures.

This is where WS-Kit comes in, providing a structured approach to handling WebSocket messages with pluggable validators. It turns our messy switch statement into clear, type-safe routes with validation baked in.

In the next section, we’ll explore how to solve these problems using WS-Kit with Zod schemas for type-safety.

Part 2: The Type-Safety Challenge

The Wild West of WebSocket Messages

If you’ve been following along, you now have a basic WebSocket server running in Bun. Messages are flying back and forth, connections are being established and closed — everything seems great! But then you try to build something real, and suddenly it feels like you’re trying to herd cats in the dark. With a blindfold on. While riding a unicycle.

The challenge with WebSocket communication is that, unlike REST APIs with their well-defined endpoints and request/response structures, WebSockets are essentially a continuous stream of messages. There’s no built-in mechanism to ensure that:

Messages have the right structure
Required fields are present
Values have the correct types
Handlers receive only messages they’re designed to process

This is where many WebSocket applications start to crumble under their own complexity. Let’s explore the key challenges in detail.

The “What Did I Just Receive?” Problem

Take a look at this common WebSocket message handler pattern:

ws.addEventListener("message", (event) => {
  const data = JSON.parse(event.data);

  if (data.type === "chat_message") {
    // Is data.content defined? Is it a string? Who knows!
    chatSystem.processMessage(data.content);
  } else if (data.type === "user_joined") {
    // Does data.userId exist? Is it a number or string?
    notifyUserJoined(data.userId);
  } else if (data.type === "typing_indicator") {
    // Is data.isTyping a boolean or something else?
    updateTypingStatus(data.userId, data.isTyping);
  }
  // And so on...
});

This approach has several issues:

No guarantee of structure: Just because data.type is 'chat_message' doesn't mean data.content exists.
Type coercion traps: JavaScript’s loose typing means data.isTyping could be the string "false" instead of the boolean false.
Typo landmines: Mistype 'chat_message' as 'chat_mesage' and your handler won't trigger.
Implicit dependencies: It’s not clear what fields each message type requires.

The Evolution of Error Messages

As your application grows, so does the sophistication (or desperation) of your error handling:

Stage 1: Blissful Ignorance

ws.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  processChatMessage(data.room, data.text); // What could go wrong?
});

Stage 2: The console.log Debugging Phase

ws.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  console.log("Received:", data); // Let me see what I'm dealing with
  if (data.room && data.text) {
    processChatMessage(data.room, data.text);
  }
});

Stage 3: Trust Issues

ws.addEventListener('message', (event) => {
  try {
    const data = JSON.parse(event.data);
    if (!data || typeof data !== 'object') {
      throw new Error('Invalid message format');
    }
    
    if (!data.type || typeof data.type !== 'string') {
      throw new Error('Missing or invalid type field');
    }
    
    if (data.type === 'chat_message') {
      if (!data.room || typeof data.room !== 'string') {
        throw new Error('Missing or invalid room field');
      }
      
      if (!data.text || typeof data.text !== 'string') {
        throw new Error('Missing or invalid text field');
      }
      
      processChatMessage(data.room, data.text);
    }
    // And so on for EVERY message type...
  } catch (error) {
    console.error("Error processing message:", error);
    ws.send(JSON.stringify({
      type: 'error',
      message: error.message
    }));
  }
});

By Stage 3, a third of your codebase is dedicated to validation, and you’re seriously considering a career change to something less frustrating… like herding actual cats.

The TypeScript Mirage

“But wait,” you might say, “I’m using TypeScript! I’ve defined interfaces for all my message types!”

interface BaseChatMessage {
  type: string;
}

interface ChatMessage extends BaseChatMessage {
  type: 'chat_message';
  room: string;
  text: string;
}

interface UserJoinedMessage extends BaseChatMessage {
  type: 'user_joined';
  userId: string;
  username: string;
}

// More message types...

type AllMessageTypes = ChatMessage | UserJoinedMessage | /* ... */;

ws.addEventListener('message', (event) => {
  const data = JSON.parse(event.data) as AllMessageTypes; // The infamous "trust me" cast

  switch (data.type) {
    case 'chat_message':
      // TypeScript now thinks data is ChatMessage
      processChatMessage(data.room, data.text);
      break;
    case 'user_joined':
      // TypeScript now thinks data is UserJoinedMessage
      notifyUserJoined(data.userId, data.username);
      break;
  }
});

This looks better! TypeScript gives you nice autocomplete and seems to understand your message structure. But there’s an illusion at play here: that as AllMessageTypes cast is basically you telling TypeScript, "Trust me, this JSON is properly formatted." But at runtime, all those lovely types disappear, and you're back to the Wild West.

What if someone sends this?

{
  "type": "chat_message",
  "rum": "general",  // Typo: "rum" instead of "room"
  "text": "Hello world!"
}

TypeScript won’t save you. Your code will try to process data.room, which is undefined, potentially causing errors downstream.

The Runtime Validation Gap

The core issue is the gap between compile-time types (what TypeScript checks) and runtime values (what actually arrives over the wire). This is where validation libraries like Zod come in.

Zod lets you define schemas that serve as both TypeScript types AND runtime validators:

import { z } from "zod";

// Define message schemas
const ChatMessageSchema = z.object({
  type: z.literal("chat_message"),
  room: z.string(),
  text: z.string(),
});

const UserJoinedSchema = z.object({
  type: z.literal("user_joined"),
  userId: z.string(),
  username: z.string(),
});

// Infer TypeScript types from schemas
type ChatMessage = z.infer;
type UserJoinedMessage = z.infer;

// Use in handler
ws.addEventListener("message", (event) => {
  const data = JSON.parse(event.data);

  try {
    if (data.type === "chat_message") {
      const validatedData = ChatMessageSchema.parse(data);
      processChatMessage(validatedData.room, validatedData.text);
    } else if (data.type === "user_joined") {
      const validatedData = UserJoinedSchema.parse(data);
      notifyUserJoined(validatedData.userId, validatedData.username);
    }
  } catch (error) {
    console.error("Validation error:", error);
    // Send error back to client
  }
});

This is much more robust! Now if someone sends a malformed message, Zod will catch it and provide detailed error information.

The Routing Challenge

But we still have another problem: as your application grows, this giant message handler becomes unmaintainable. You need a way to:

Define message types and their validation schemas in one place
Route incoming messages to the appropriate handlers
Handle error cases consistently
Provide type safety throughout the process

This is where WS-Kit comes in. It combines message type definition, validation, and routing into a clean, type-safe API.

Enter WS-Kit

WS-Kit is designed to solve these challenges by providing:

A way to define message types with Zod schemas
Automatic validation of incoming messages
Routing to type-specific handlers
Clean error handling patterns

All code examples in this guide use createRouter imported from @ws-kit/zod, which automatically configures the router with Zod validation. This is the recommended way to set up WS-Kit.

Instead of a giant switch statement or if/else chain, you can write code like this:

import { z, message, createRouter } from "@ws-kit/zod";
import { serve } from "@ws-kit/bun";

// Define message types with schemas
const ChatMessage = message("CHAT_MESSAGE", {
  room: z.string(),
  text: z.string(),
});

const JoinRoom = message("JOIN_ROOM", {
  room: z.string(),
});

// Create router
const router = createRouter();

// Define handlers for each message type
router.on(ChatMessage, (ctx) => {
  // ctx.payload is fully typed and validated!
  const { room, text } = ctx.payload;

  // Do something with the message
  console.log(`Message in ${room}: ${text}`);

  // Send response
  ctx.send(ChatMessage, { room, text: "Echo: " + text });
});

router.on(JoinRoom, (ctx) => {
  const { room } = ctx.payload;
  ctx.subscribe(room); // Subscribe to room using Bun's built-in PubSub
  console.log(`Client joined room: ${room}`);
});

// Start server with router
serve(router, {
  port: 3000,
});

With this approach:

Message schemas are defined clearly in one place
Incoming messages are automatically validated
Handlers only receive messages they’re supposed to handle
TypeScript provides full type safety at every step
Invalid messages generate helpful error responses

The Benefits of Type-Safe WebSockets

Using a typed approach with validation provides several key benefits:

Robust error handling: Catch malformed messages early with detailed error information
Self-documenting code: Your message schemas serve as documentation for your protocol
IDE support: Get autocomplete and type checking as you work with messages
Safer refactoring: Change message structures with confidence, as TypeScript will find usages
Clearer mental model: Discrete message types make the system easier to understand

From Chaos to Order

With a type-safe approach using WS-Kit and Zod, we’ve moved from the Wild West of WebSocket messages to a structured, maintainable system. No more casting and hoping for the best. No more giant switch statements. No more manual validation code.

In the next section, we’ll dive deeper into WS-Kit and explore how it can be used to build a complete real-time chat application with authentication, rooms, and more.

Part 3: Introducing WS-Kit

The Missing Piece in WebSocket Development

In the previous sections, we explored WebSockets in Bun and the challenges of maintaining type safety in a real-time messaging environment. Now it’s time to introduce the solution to our WebSocket woes: WS-Kit.

Think of WS-Kit as that friend who always keeps their kitchen organized — the one who has separate containers for different types of pasta and labels everything. Maybe a bit obsessive, but you’re secretly grateful when you need to find the rigatoni at 2 AM. That’s what WS-Kit does for your WebSocket messages: it keeps everything organized, labeled, and exactly where it should be.

What is WS-Kit?

WS-Kit is a lightweight, type-safe WebSocket router for Bun and other platforms. It provides a structured way to handle WebSocket connections and route messages to different handlers based on message types, all with full TypeScript support and pluggable validator integration (Zod, Valibot, custom).

Instead of building your own message routing system from scratch (and let’s be honest, the first version would probably be a giant switch statement), WS-Kit gives you a battle-tested solution that’s ready to use.

Core Philosophy

The core philosophy behind WS-Kit is simple:

Pluggable, not prescriptive: Work with any validator (Zod, Valibot, custom) and any platform (Bun, Cloudflare, custom adapters)
Type safety everywhere: From message definition to handler execution
Runtime validation: Catch errors before they cause problems
Clean separation: Organize handlers by message type
Minimal overhead: Keep things fast and lightweight

Key Features in Detail

Let’s dig into the key features that make WS-Kit stand out:

Type-Safe Messaging with Zod Schemas

At the heart of WS-Kit is the message function. This function allows you to define message types with their associated payloads using Zod schemas:

import { z, message } from "@ws-kit/zod";

// Define a message type for joining a chat room
export const JoinRoom = message("JOIN_ROOM", {
  roomId: z.string(),
});

// Define a message for sending a chat message
export const SendMessage = message("SEND_MESSAGE", {
  roomId: z.string(),
  message: z.string().min(1).max(500), // Add constraints
  attachments: z
    .array(
      z.object({
        type: z.enum(["image", "file"]),
        url: z.string().url(),
      }),
    )
    .optional(),
});

The magic here is twofold:

TypeScript Types: The message function automatically generates TypeScript types that you can use throughout your codebase
Runtime Validation: When a message arrives, it’s automatically validated against the schema before your handler is called

This means you can confidently access ctx.payload.roomId in your handler, knowing it's a string that passed validation. No more defensive coding with if (typeof data.roomId === 'string') checks everywhere!

Intuitive Routing System

With WS-Kit, you define handlers for specific message types:

import { z, createRouter } from "@ws-kit/zod";
import { JoinRoom, SendMessage } from "./schemas";

const router = createRouter();

// Handle JOIN_ROOM messages
router.on(JoinRoom, (ctx) => {
  const { roomId } = ctx.payload; // Fully typed and validated!
  console.log(`Client wants to join room: ${roomId}`);

  // Join the room using Bun's built-in PubSub
  ctx.subscribe(roomId);

  // Send confirmation
  ctx.send(JoinRoom, { roomId }); // Type-checked!
});

// Handle SEND_MESSAGE messages
router.on(SendMessage, (ctx) => {
  const { roomId, message, attachments } = ctx.payload;
  console.log(`New message in ${roomId}: ${message}`);

  // No need to check if attachments exists - type system handles it
  const hasAttachments = attachments && attachments.length > 0;

  // Broadcast to room (using Bun's built-in PubSub)
  // More on this in the broadcast section
});

Each handler receives a context object with:

ws: The WebSocket connection
payload: The validated message payload (fully typed!)
meta: Additional metadata about the message
send(): A helper method for sending responses

If a message arrives with an unknown type or fails validation, it’s automatically rejected with an appropriate error message — no need to write that boilerplate yourself.

Leveraging Bun’s Native WebSocket Performance

WS-Kit is designed to be a thin layer on top of Bun’s already-fast WebSocket implementation. It doesn’t reinvent the wheel — it just adds guardrails to keep you on the road.

The library adds minimal overhead to message processing, focusing on routing and validation while letting Bun handle the heavy lifting of WebSocket connections, frame parsing, and PubSub functionality.

Flexible Integration

One of the strengths of WS-Kit is how easily it integrates with different server setups:

import { createRouter, z } from "@ws-kit/zod";
import { serve } from "@ws-kit/bun";

// WebSocket router
const router = createRouter();

// Define your message handlers here
router.on(YourMessage, (ctx) => {
  // Handle message
});

// High-level serve() with auto-configuration
serve(router, {
  port: 3000,
});

// Or for advanced setups, use Hono or any HTTP framework:
import { Hono } from "hono";
import { createBunHandler } from "@ws-kit/bun";

const app = new Hono();
app.get("/", (c) => c.text("Welcome to Hono!"));

const wsHandler = createBunHandler(router);

Bun.serve({
  port: 3000,
  fetch(req, server) {
    if (new URL(req.url).pathname === "/ws") {
      return wsHandler(req, server);
    }
    return app.fetch(req);
  },
  websocket: router.websocket,
});

The library is framework-agnostic — it works standalone, with Hono, Elysia, or any other HTTP framework you prefer.

Connection Lifecycle Management

WS-Kit provides handlers for the entire WebSocket lifecycle:

// Handle new connections
router.onOpen((ctx) => {
  console.log(`New client connected: ${ctx.ws.data.clientId}`);

  // Send welcome message
  ctx.send(Welcome, { message: "Welcome to the server!" });
});

// Handle message types (as seen earlier)
router.on(JoinRoom, (ctx) => {
  /* ... */
});

// Handle disconnections
router.onClose((ctx) => {
  console.log(`Client disconnected: ${ctx.ws.data.clientId}`);
  console.log(`Close code: ${ctx.code}`);
  console.log(`Close reason: ${ctx.reason}`);

  // Clean up any resources
  if (ctx.ws.data.roomId) {
    leaveRoom(ctx.ws.data.roomId, ctx.ws.data.clientId);
  }
});

Each handler has access to the WebSocket connection’s metadata through ctx.ws.data, allowing you to store and retrieve session information.

Authentication and Security

Security is a critical concern in WebSocket applications. WS-Kit provides a clean way to handle authentication during the WebSocket upgrade process:

import { z, createRouter } from "@ws-kit/zod";
import { serve } from "@ws-kit/bun";
import { verifyToken } from "./auth"; // Your authentication logic

type AppData = {
  userId?: string;
  userRole?: string;
};

// Create router with type for connection metadata
const router = createRouter();

// Your message handlers
router.on(SomeMessage, (ctx) => {
  // ctx.ws.data.userId is available here
});

// Start server with authentication
serve(router, {
  port: 3000,
  async authenticate(req) {
    // Extract and verify authentication token
    const authHeader = req.headers.get("Authorization");
    const token = authHeader?.split("Bearer ")[1];

    // Optional: Reject connection if no token
    if (!token) {
      return undefined; // Rejects the connection
    }

    // Verify token and get user info
    const user = await verifyToken(token);

    // Return user data to be attached to ws.data
    return {
      userId: user?.id,
      userRole: user?.role,
    };
  },
});

By authenticating during the upgrade process, you ensure that only authorized users can establish WebSocket connections. The user data is then available in all your handlers via ctx.ws.data.

Broadcasting and Room Management

WebSocket applications often need to broadcast messages to multiple clients. WS-Kit complements Bun’s built-in PubSub functionality with schema validation:

import { z, createRouter } from "@ws-kit/zod";
import { ChatMessage, UserJoined } from "./schemas";

const router = createRouter();

router.on(ChatMessage, (ctx) => {
  const { roomId, message } = ctx.payload;
  const userId = ctx.ws.data.userId;

  // Broadcast the message to everyone in the room
  ctx.publish(roomId, ChatMessage, {
    roomId,
    userId,
    message,
    timestamp: Date.now(),
  });
});

router.on(JoinRoom, (ctx) => {
  const { roomId } = ctx.payload;
  const userId = ctx.ws.data.userId;

  // Subscribe to the room
  ctx.subscribe(roomId);
  ctx.ws.data.roomId = roomId;

  // Notify others
  ctx.publish(roomId, UserJoined, {
    roomId,
    userId,
    timestamp: Date.now(),
  });
});

The ctx.publish() helper ensures that broadcast messages are validated against their schemas before being sent, providing the same type safety for broadcasts that you get with direct messaging.

Error Handling

Robust error handling is crucial for WebSocket applications. WS-Kit includes a standardized error system with error codes aligned with gRPC:

import { z, createRouter } from "@ws-kit/zod";

const router = createRouter();

router.on(JoinRoom, (ctx) => {
  const { roomId } = ctx.payload;

  // Check if room exists
  const roomExists = checkRoomExists(roomId);

  if (!roomExists) {
    // Send typed error response
    ctx.error("NOT_FOUND", `Room ${roomId} does not exist`, {
      roomId, // Additional debug info
    });
    return;
  }

  // Continue with normal flow...
});

The library includes predefined error codes (UNAUTHENTICATED, PERMISSION_DENIED, INVALID_ARGUMENT, NOT_FOUND, RESOURCE_EXHAUSTED, etc.) for common scenarios, ensuring consistent error reporting.

Modular Route Organization

As your application grows, you can organize routes into separate modules:

// chat.ts
import { z, createRouter } from "@ws-kit/zod";
import { ChatMessage, JoinRoom } from "./schemas";

// Create a router instance
export const chatRouter = createRouter();

// Add message handlers
chatRouter.on(ChatMessage, (ctx) => {
  /* ... */
});
chatRouter.on(JoinRoom, (ctx) => {
  /* ... */
});

// main.ts
import { z, createRouter } from "@ws-kit/zod";
import { serve } from "@ws-kit/bun";
import { chatRouter } from "./chat";
import { userRouter } from "./user";

const router = createRouter();

// Add modular routers
router.merge(chatRouter);
router.merge(userRouter);

// Start server
serve(router, { port: 3000 });

This keeps your codebase organized and makes it easier to collaborate with team members.

Why Choose WS-Kit?

With so many WebSocket solutions out there, why choose WS-Kit?

Platform-agnostic: Pluggable adapters for Bun, Cloudflare, and custom platforms
Validator-agnostic: Works with Zod, Valibot, or your own validation library
TypeScript-first: Designed with type safety as a core principle
Runtime validation: Catch errors before they cause problems
Lightweight: Minimal overhead, just the features you need
Progressive: Start simple and scale as needed

It’s the Goldilocks of WebSocket libraries: not too heavy, not too bare-bones, but just right. Plus, you’re not locked into a single validator or platform.

Getting Started with WS-Kit

Ready to bring some order to your WebSocket chaos? Let’s get started:

bun add @ws-kit/zod @ws-kit/bun zod
bun add @types/bun -D  # For TypeScript support

In the next section, we’ll put everything together to build a complete real-time chat application using WS-Kit, demonstrating how the library makes complex WebSocket applications more manageable.

Say goodbye to giant switch statements and untyped message payloads. With WS-Kit, your WebSocket code can be as clean and organized as that friend’s pasta collection — just hopefully without the late-night carbohydrate cravings.

Part 4: Building a Real-Time Chat Application

Now that we understand the fundamentals of WebSockets in Bun and have been introduced to WS-Kit, let’s put everything together to build something practical: a real-time chat application.

After all, what better way to test our new WebSocket routing superpowers than by creating yet another chat app? Because clearly, what the world needs is one more place for people to share cat memes and debate whether pineapple belongs on pizza (it does, fight me).

Project Setup

First things first, let’s set up our project. Create a new folder for our chat application and initialize it:

mkdir bun-chat-app
cd bun-chat-app
bun init -y

Next, install the dependencies we’ll need:

bun add @ws-kit/zod @ws-kit/bun zod
bun add @types/bun -D

Step 1: Define Our Message Schemas

The heart of our type-safe approach is defining clear message schemas. Let’s create a file called schemas.ts to define all the message types our chat application will support:

import { z, message } from "@ws-kit/zod";

// User authentication
export const Authenticate = message("AUTHENTICATE", {
  token: z.string(),
});

export const AuthSuccess = message("AUTH_SUCCESS", {
  userId: z.string(),
  username: z.string(),
});

// Room management
export const JoinRoom = message("JOIN_ROOM", {
  roomId: z.string(),
});

export const LeaveRoom = message("LEAVE_ROOM", {
  roomId: z.string(),
});

export const UserJoined = message("USER_JOINED", {
  roomId: z.string(),
  userId: z.string(),
  username: z.string(),
});

export const UserLeft = message("USER_LEFT", {
  roomId: z.string(),
  userId: z.string(),
  username: z.string(),
});

export const RoomList = message("ROOM_LIST", {
  rooms: z.array(
    z.object({
      id: z.string(),
      name: z.string(),
      userCount: z.number(),
    }),
  ),
});

// Messaging
export const SendMessage = message("SEND_MESSAGE", {
  roomId: z.string(),
  text: z.string().min(1).max(1000),
  // Optional attachment
  attachment: z
    .object({
      type: z.enum(["image", "file"]),
      url: z.string().url(),
      name: z.string().optional(),
    })
    .optional(),
});

export const ChatMessage = message("CHAT_MESSAGE", {
  messageId: z.string(),
  roomId: z.string(),
  userId: z.string(),
  username: z.string(),
  text: z.string(),
  timestamp: z.number(),
  attachment: z
    .object({
      type: z.enum(["image", "file"]),
      url: z.string().url(),
      name: z.string().optional(),
    })
    .optional(),
});

// Typing indicators
export const TypingStart = message("TYPING_START", {
  roomId: z.string(),
});

export const TypingStop = message("TYPING_STOP", {
  roomId: z.string(),
});

export const UserTyping = message("USER_TYPING", {
  roomId: z.string(),
  userId: z.string(),
  username: z.string(),
});

// Connection metadata type
export type Meta = {
  userId?: string;
  username?: string;
  currentRoomId?: string;
  isAuthenticated?: boolean;
};

Notice how we’ve organized our messages into logical groups: authentication, room management, messaging, and typing indicators. We’re also using Zod’s validation capabilities to ensure messages have the correct shape and content (like enforcing minimum and maximum message length).

Step 2: Setting Up Our Mock User Database

For simplicity, we’ll use an in-memory store for users and rooms instead of a real database:

import { randomUUID } from "crypto";

// User record
export type User = {
  id: string;
  username: string;
  token: string;
};

// Room record
export type Room = {
  id: string;
  name: string;
  users: Set; // User IDs
};

// In-memory storage
const users = new Map();
const tokens = new Map(); // token -> userId
const rooms = new Map();

// Seed with some default rooms
rooms.set("general", {
  id: "general",
  name: "General Chat",
  users: new Set(),
});

rooms.set("random", {
  id: "random",
  name: "Random Stuff",
  users: new Set(),
});

// User authentication methods
export function authenticateUser(token: string): User | null {
  const userId = tokens.get(token);
  if (!userId) return null;

  return users.get(userId) || null;
}

export function createUser(username: string): User {
  const id = randomUUID();
  const token = randomUUID();

  const user: User = { id, username, token };
  users.set(id, user);
  tokens.set(token, id);

  return user;
}

// Room methods
export function getRooms(): Room[] {
  return Array.from(rooms.values());
}

export function getRoom(roomId: string): Room | undefined {
  return rooms.get(roomId);
}

export function joinRoom(roomId: string, userId: string): boolean {
  const room = rooms.get(roomId);
  if (!room) return false;

  room.users.add(userId);
  return true;
}

export function leaveRoom(roomId: string, userId: string): boolean {
  const room = rooms.get(roomId);
  if (!room) return false;

  return room.users.delete(userId);
}

export function getUser(userId: string): User | undefined {
  return users.get(userId);
}

This simple store handles user authentication, room management, and keeping track of who’s in which room.

Step 3: Implementing Our WebSocket Handlers

Now let’s implement handlers for each of our message types. Let’s create a file called chat-router.ts:

import { z, createRouter } from "@ws-kit/zod";
import { randomUUID } from "crypto";
import * as schema from "./schemas";
import {
  authenticateUser,
  createUser,
  getRoom,
  getRooms,
  joinRoom,
  leaveRoom,
  getUser,
} from "./data-store";

// Create a router with our meta type
const router = createRouter();

// Handle new connections
router.onOpen((ctx) => {
  // clientId is automatically assigned by ws-kit framework
  console.log(`New client connected: ${ctx.ws.data.clientId}`);

  // Assign a random guest name until authenticated
  ctx.assignData({
    username: `Guest-${Math.floor(Math.random() * 10000)}`,
  });

  // Send room list to the new client
  const rooms = getRooms().map((room) => ({
    id: room.id,
    name: room.name,
    userCount: room.users.size,
  }));

  ctx.send(schema.RoomList, { rooms });
});

// Handle authentication
router.on(schema.Authenticate, (ctx) => {
  const { token } = ctx.payload;

  // Check if token exists in our store
  const user = authenticateUser(token);

  if (user) {
    // Authentication successful
    ctx.ws.data.isAuthenticated = true;
    ctx.ws.data.userId = user.id;
    ctx.ws.data.username = user.username;

    ctx.send(schema.AuthSuccess, {
      userId: user.id,
      username: user.username,
    });

    console.log(`User authenticated: ${user.username} (${user.id})`);
  } else {
    // Create a new user if token doesn't exist
    // In a real app, you'd probably reject invalid tokens
    const newUser = createUser(
      ctx.ws.data.username || `User-${randomUUID().slice(0, 6)}`,
    );

    ctx.ws.data.isAuthenticated = true;
    ctx.ws.data.userId = newUser.id;
    ctx.ws.data.username = newUser.username;

    ctx.send(schema.AuthSuccess, {
      userId: newUser.id,
      username: newUser.username,
    });

    console.log(`New user created: ${newUser.username} (${newUser.id})`);
  }
});

// Handle joining a room
router.on(schema.JoinRoom, (ctx) => {
  const { roomId } = ctx.payload;
  const userId = ctx.ws.data.userId;
  const username = ctx.ws.data.username;

  // Check if user is authenticated
  if (!userId || !username) {
    ctx.error("UNAUTHENTICATED", "You must be authenticated to join a room");
    return;
  }

  // Check if room exists
  const room = getRoom(roomId);
  if (!room) {
    ctx.error("NOT_FOUND", `Room ${roomId} does not exist`);
    return;
  }

  // If user is already in a room, leave it first
  if (ctx.ws.data.currentRoomId) {
    leaveRoom(ctx.ws.data.currentRoomId, userId);

    // Let others know user left the previous room
    ctx.publish(ctx.ws.data.currentRoomId, schema.UserLeft, {
      roomId: ctx.ws.data.currentRoomId,
      userId,
      username,
    });

    // Unsubscribe from previous room
    ctx.unsubscribe(ctx.ws.data.currentRoomId);
  }

  // Join the new room
  joinRoom(roomId, userId);
  ctx.ws.data.currentRoomId = roomId;

  // Subscribe to the room's messages
  ctx.subscribe(roomId);

  // Confirm to the user they've joined
  ctx.send(schema.UserJoined, {
    roomId,
    userId,
    username,
  });

  // Let others know a new user joined
  ctx.publish(roomId, schema.UserJoined, {
    roomId,
    userId,
    username,
  });

  console.log(`User ${username} (${userId}) joined room: ${roomId}`);
});

// Handle leaving a room
router.on(schema.LeaveRoom, (ctx) => {
  const { roomId } = ctx.payload;
  const userId = ctx.ws.data.userId;
  const username = ctx.ws.data.username;

  if (!userId || !username) {
    ctx.error("UNAUTHENTICATED", "You must be authenticated to leave a room");
    return;
  }

  // Check if user is in the room
  if (ctx.ws.data.currentRoomId !== roomId) {
    ctx.error("INVALID_ARGUMENT", "You are not in this room");
    return;
  }

  // Leave the room
  leaveRoom(roomId, userId);
  ctx.ws.data.currentRoomId = undefined;

  // Unsubscribe from room
  ctx.unsubscribe(roomId);

  // Let others know user left
  ctx.publish(roomId, schema.UserLeft, {
    roomId,
    userId,
    username,
  });

  console.log(`User ${username} (${userId}) left room: ${roomId}`);
});

// Handle sending messages
router.on(schema.SendMessage, (ctx) => {
  const { roomId, text, attachment } = ctx.payload;
  const userId = ctx.ws.data.userId;
  const username = ctx.ws.data.username;

  if (!userId || !username) {
    ctx.error("UNAUTHENTICATED", "You must be authenticated to send messages");
    return;
  }

  // Check if room exists
  if (!getRoom(roomId)) {
    ctx.error("NOT_FOUND", `Room ${roomId} does not exist`);
    return;
  }

  // Check if user is in the room they're trying to message
  if (ctx.ws.data.currentRoomId !== roomId) {
    ctx.error(
      "PERMISSION_DENIED",
      "You must join the room before sending messages",
    );
    return;
  }

  // Create a message object with ID and timestamp
  const messageId = randomUUID();
  const timestamp = Date.now();

  const chatMessage = {
    messageId,
    roomId,
    userId,
    username,
    text,
    timestamp,
    attachment,
  };

  // Broadcast the message to everyone in the room, including sender
  ctx.publish(roomId, schema.ChatMessage, chatMessage);

  console.log(
    `Message sent to room ${roomId} by ${username}: ${text.substring(0, 20)}${text.length > 20 ? "..." : ""}`,
  );
});

// Handle typing indicators
router.on(schema.TypingStart, (ctx) => {
  const { roomId } = ctx.payload;
  const userId = ctx.ws.data.userId;
  const username = ctx.ws.data.username;

  if (!userId || !username || ctx.ws.data.currentRoomId !== roomId) return;

  // Broadcast typing indicator to everyone else in the room
  ctx.publish(roomId, schema.UserTyping, {
    roomId,
    userId,
    username,
  });
});

// Handle connection closure
router.onClose((ctx) => {
  const userId = ctx.ws.data.userId;
  const username = ctx.ws.data.username;
  const roomId = ctx.ws.data.currentRoomId;

  console.log(
    `Client disconnected: ${userId || ctx.ws.data.clientId}, code: ${ctx.code}`,
  );

  // If user was in a room, notify others and clean up
  if (userId && username && roomId) {
    leaveRoom(roomId, userId);

    // Let others know user left
    ctx.publish(roomId, schema.UserLeft, {
      roomId,
      userId,
      username,
    });
  }
});

export default router;

That’s quite a bit of code, but it’s well-organized and each message type has its own dedicated handler. The beauty of this approach is that each handler receives a fully typed and validated payload, making it easy to work with the data without worrying about runtime errors.

Step 4: Creating the Main Server

Now let’s create the main server file that will bring everything together:

import { z, createRouter } from "@ws-kit/zod";
import { serve } from "@ws-kit/bun";
import chatRouter from "./chat-router";

// Create the main WebSocket router
const router = createRouter();

// Add our chat routes
router.merge(chatRouter);

// Start the server with WS-Kit
serve(router, {
  port: 3000,
});

console.log("Chat server running on http://localhost:3000");
console.log("WebSocket endpoint: ws://localhost:3000/ws");

Step 5: Creating a Simple Frontend with @ws-kit/client

Let’s create a basic chat UI and use the @ws-kit/client SDK for WebSocket communication. This dramatically simplifies the client-side code compared to manual WebSocket handling.

First, install the client SDK:

npm install @ws-kit/client

Note: The @ws-kit/client package provides the complete client SDK with full TypeScript support. Import from @ws-kit/client/zod for Zod-based validation or @ws-kit/client/valibot for Valibot-based validation, matching your server-side validator choice.

Create a public folder for our static files:

mkdir -p public

Create the HTML file:




  
    
    
    Bun Chat App
    
  
  
    
      
        
          Not logged in
          
        

        Disconnected

        Rooms

        

      


      
        Select a room


        


        


        
                      type="text"
            id="message-input"
            placeholder="Type a message..."
            disabled
          />

Add styling (same as before, with one addition for connection status):

* {
  margin: 0;
  padding: 0;
  box-sizing: border-box;
  font-family:
    system-ui,
    -apple-system,
    BlinkMacSystemFont,
    "Segoe UI",
    Roboto,
    Oxygen,
    Ubuntu,
    Cantarell,
    "Open Sans",
    "Helvetica Neue",
    sans-serif;
}

body {
  background-color: #f5f5f5;
}

.app-container {
  display: flex;
  height: 100vh;
  max-width: 1200px;
  margin: 0 auto;
  background-color: white;
  box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
}

.sidebar {
  width: 250px;
  background-color: #f0f0f0;
  padding: 20px;
  border-right: 1px solid #ddd;
}

.user-info {
  display: flex;
  justify-content: space-between;
  align-items: center;
  margin-bottom: 20px;
  padding-bottom: 10px;
  border-bottom: 1px solid #ddd;
}

.connection-status {
  font-size: 0.85em;
  padding: 8px;
  margin-bottom: 15px;
  border-radius: 4px;
  text-align: center;
  background-color: #ffe6e6;
  color: #d32f2f;
}

.connection-status.open {
  background-color: #e6ffe6;
  color: #388e3c;
}

.connection-status.connecting {
  background-color: #fff3e0;
  color: #f57c00;
}

.room-list {
  list-style: none;
}

.room-item {
  padding: 8px 10px;
  margin-bottom: 5px;
  border-radius: 4px;
  cursor: pointer;
}

.room-item:hover {
  background-color: #e0e0e0;
}

.room-item.active {
  background-color: #2c3e50;
  color: white;
}

.chat-container {
  flex: 1;
  display: flex;
  flex-direction: column;
}

.room-header {
  padding: 15px 20px;
  background-color: #2c3e50;
  color: white;
  font-weight: bold;
}

.messages {
  flex: 1;
  overflow-y: auto;
  padding: 20px;
}

.message {
  margin-bottom: 15px;
}

.message .header {
  display: flex;
  margin-bottom: 5px;
}

.message .username {
  font-weight: bold;
  margin-right: 10px;
}

.message .time {
  color: #999;
  font-size: 0.8em;
}

.message .text {
  background-color: #f1f1f1;
  padding: 10px;
  border-radius: 10px;
  max-width: 80%;
  word-break: break-word;
}

.message.own {
  text-align: right;
}

.message.own .text {
  background-color: #3498db;
  color: white;
  margin-left: auto;
}

.message.system {
  text-align: center;
  font-style: italic;
  color: #666;
  margin: 10px 0;
}

.typing-indicator {
  padding: 5px 20px;
  color: #666;
  font-style: italic;
  min-height: 30px;
}

.message-form {
  display: flex;
  padding: 10px 20px;
  background-color: #f9f9f9;
  border-top: 1px solid #ddd;
}

.message-form input {
  flex: 1;
  padding: 10px;
  border: 1px solid #ddd;
  border-radius: 4px;
  margin-right: 10px;
}

.message-form button {
  padding: 10px 15px;
  background-color: #3498db;
  color: white;
  border: none;
  border-radius: 4px;
  cursor: pointer;
}

.message-form button:disabled {
  background-color: #ccc;
  cursor: not-allowed;
}

Now, here’s the client code using @ws-kit/client (much simpler than manual WebSocket handling):

// app.js
import { wsClient } from "@ws-kit/client/zod";
import {
  Authenticate,
  AuthSuccess,
  JoinRoom,
  UserJoined,
  UserLeft,
  ChatMessage,
  SendMessage,
  RoomList,
  TypingStart,
  UserTyping,
} from "./shared/schemas.js";

// DOM elements
const usernameElement = document.getElementById("username");
const loginButton = document.getElementById("login-btn");
const roomList = document.getElementById("room-list");
const roomHeader = document.getElementById("room-header");
const messagesContainer = document.getElementById("messages");
const typingIndicator = document.getElementById("typing-indicator");
const messageForm = document.getElementById("message-form");
const messageInput = document.getElementById("message-input");
const sendButton = document.getElementById("send-btn");
const connectionStatus = document.getElementById("connection-status");

// App state
let currentUser = {
  userId: null,
  username: null,
};
let currentRoomId = null;
let rooms = [];

// Create the WebSocket client with auto-reconnection
const client = wsClient({
  url: `ws://${window.location.host}/ws`,
  autoConnect: true,
  reconnect: {
    enabled: true,
    maxAttempts: 5,
    initialDelayMs: 300,
    maxDelayMs: 10_000,
    jitter: "full",
  },
  auth: {
    getToken: () => localStorage.getItem("chatToken"),
    attach: "query",
  },
});

// Monitor connection state
client.onState((state) => {
  connectionStatus.textContent = state.charAt(0).toUpperCase() + state.slice(1);
  connectionStatus.className = `connection-status ${state}`;

  // Enable/disable input based on connection
  const canSend = state === "open" && currentUser.userId;
  messageInput.disabled = !canSend;
  sendButton.disabled = !canSend;
});

// Handle room list
client.on(RoomList, (msg) => {
  rooms = msg.payload.rooms;
  renderRoomList();
});

// Handle authentication success
client.on(AuthSuccess, (msg) => {
  const { userId, username } = msg.payload;
  currentUser.userId = userId;
  currentUser.username = username;

  // Store token for next session
  localStorage.setItem("chatToken", Math.random().toString(36).substring(7));

  usernameElement.textContent = username;
  loginButton.textContent = "Logout";

  // Enable message input
  messageInput.disabled = false;
  sendButton.disabled = false;
});

// Handle user joined
client.on(UserJoined, (msg) => {
  if (msg.payload.roomId === currentRoomId) {
    const isCurrentUser = msg.payload.userId === currentUser.userId;
    const text = isCurrentUser
      ? "You joined the room"
      : `${msg.payload.username} joined the room`;
    addSystemMessage(text);
  }
});

// Handle user left
client.on(UserLeft, (msg) => {
  if (msg.payload.roomId === currentRoomId) {
    const isCurrentUser = msg.payload.userId === currentUser.userId;
    const text = isCurrentUser
      ? "You left the room"
      : `${msg.payload.username} left the room`;
    addSystemMessage(text);
  }
});

// Handle chat message
client.on(ChatMessage, (msg) => {
  if (msg.payload.roomId === currentRoomId) {
    const { userId, username, text, timestamp } = msg.payload;
    const isOwnMessage = userId === currentUser.userId;

    const messageEl = document.createElement("div");
    messageEl.className = `message ${isOwnMessage ? "own" : ""}`;
    messageEl.innerHTML = `
      
        ${isOwnMessage ? "You" : username}
        ${new Date(timestamp).toLocaleTimeString()}
      

      ${escapeHtml(text)}

    `;

    messagesContainer.appendChild(messageEl);
    scrollToBottom();
  }
});

// Handle typing indicator
client.on(UserTyping, (msg) => {
  if (
    msg.payload.roomId === currentRoomId &&
    msg.payload.userId !== currentUser.userId
  ) {
    typingIndicator.textContent = `${msg.payload.username} is typing...`;

    setTimeout(() => {
      typingIndicator.textContent = "";
    }, 3000);
  }
});

// Error handling
client.onError((error, context) => {
  console.error("WebSocket error:", error.message, context);

  if (context.type === "validation") {
    addSystemMessage("Received invalid message from server", true);
  } else if (context.type === "parse") {
    addSystemMessage("Failed to parse message", true);
  }
});

// Render room list
function renderRoomList() {
  roomList.innerHTML = "";
  rooms.forEach((room) => {
    const li = document.createElement("li");
    li.className = `room-item ${room.id === currentRoomId ? "active" : ""}`;
    li.textContent = `${room.name} (${room.userCount})`;
    li.addEventListener("click", () => joinRoom(room.id));
    roomList.appendChild(li);
  });
}

// Join a room
function joinRoom(roomId) {
  if (currentRoomId === roomId) return;

  messagesContainer.innerHTML = "";
  typingIndicator.textContent = "";
  currentRoomId = roomId;

  const room = rooms.find((r) => r.id === roomId);
  if (room) {
    roomHeader.textContent = room.name;
  }

  client.send(JoinRoom, { roomId });
  renderRoomList();
}

// Send a chat message
function sendChatMessage(text) {
  if (!text.trim() || !currentRoomId) return;

  const sent = client.send(SendMessage, {
    roomId: currentRoomId,
    text: text.trim(),
  });

  if (sent) {
    messageInput.value = "";
  } else {
    addSystemMessage("Failed to send message", true);
  }
}

// Add system message
function addSystemMessage(text, isError = false) {
  const messageEl = document.createElement("div");
  messageEl.className = `message system ${isError ? "error" : ""}`;
  messageEl.textContent = text;
  messagesContainer.appendChild(messageEl);
  scrollToBottom();
}

// Utilities
function scrollToBottom() {
  messagesContainer.scrollTop = messagesContainer.scrollHeight;
}

function escapeHtml(text) {
  const div = document.createElement("div");
  div.textContent = text;
  return div.innerHTML;
}

// Event listeners
loginButton.addEventListener("click", () => {
  if (currentUser.userId) {
    // Logout
    localStorage.removeItem("chatToken");
    currentUser = { userId: null, username: null };
    usernameElement.textContent = "Not logged in";
    loginButton.textContent = "Login";
    client.close();
  } else {
    // Login
    client.send(Authenticate, { token: "demo-token" });
  }
});

messageForm.addEventListener("submit", (e) => {
  e.preventDefault();
  sendChatMessage(messageInput.value);
});

messageInput.addEventListener("input", () => {
  if (currentRoomId && client.isConnected) {
    client.send(TypingStart, { roomId: currentRoomId });
  }
});

// Optional: Log connection state changes
client.onState((state) => {
  console.log("Connection state:", state);
});

Step 6: Running the Application

With everything in place, let’s run our chat application:

bun run server.ts

Now open your browser to http://localhost:3000, and you should see the chat interface. You can:

Click the login button to get a random username
Join one of the two default rooms (General Chat or Random Stuff)
Send messages and see them appear in real-time
See typing indicators when other users are typing

You can even open multiple browser tabs to simulate different users!

Extending the Application

This chat application is just a starting point. With the robust foundation provided by WS-Kit, you can easily extend it with additional features:

Direct messaging: Add a new message schema for private messaging between users
User profiles: Store and display more information about users
Message history: Add persistence to store chat history
Room creation: Allow users to create their own chat rooms
Rich media: Improve the attachment support for images, videos, etc.
Moderation tools: Add features for admins to moderate chats

Conclusion

We’ve built a complete real-time chat application using Bun, WebSockets, and WS-Kit. The application features:

Type-safe messaging with Zod schemas
Room-based chat with join/leave notifications
User authentication
Real-time message delivery
Typing indicators
Error handling

Despite the simplicity of our example, it showcases the power of a type-safe approach to WebSocket messaging. By defining our message schemas upfront and using WS-Kit to handle validation and routing, we’ve created a codebase that’s easy to understand, extend, and maintain.

No more giant switch statements. No more type coercion surprises. No more undefined property errors. Just clean, type-safe WebSocket messaging that scales with your application’s needs.

So the next time someone asks you to build “just a simple chat app” (which, let’s be honest, is never simple), you’ll have the tools you need to build it properly from the start. Your future self — the one who has to maintain this code six months from now while sipping coffee at 2 AM — will thank you.

Part 5: Advanced Patterns

Beyond the Basics: Leveling Up Your WebSocket Game

Now that we’ve built a functional chat application, let’s dive into some advanced patterns that can take your WebSocket applications from “it works” to “wow, that’s impressive!” After all, anyone can build a chat app — it’s like the “Hello World” of WebSockets — but production-grade applications require more sophisticated techniques.

Think of these patterns as the difference between knowing how to play “Hot Cross Buns” on the recorder and performing a jazz improvisation. Same instrument, vastly different results. Let’s jazz things up!

Note on Client vs Server Patterns

The @ws-kit/client SDK handles many advanced patterns automatically on the client side. This section covers:

Client-side (handled by @ws-kit/client SDK):
✅ Connection pooling and state management (client.state)
✅ Automatic reconnection with exponential backoff
✅ Request/response correlation (RPC via client.request())
✅ Heartbeat monitoring
✅ Message queueing while disconnected
✅ Centralized error handling (client.onError())

Server-side patterns (covered in this section):
- Multi-client connection tracking and registration
- Rate limiting per user or connection
- Broadcasting to subsets of clients
- Advanced pub/sub with selective message delivery
- Protocol negotiation and feature detection

For most applications, the SDK’s built-in features are sufficient. The patterns here address production scenarios requiring custom server-side orchestration.

Connection Pools and Client Tracking

In production applications, you’ll often need to keep track of connected clients beyond what’s directly available in the WebSocket object. This is especially important for features like:

Displaying online/offline status
User activity monitoring
Rate limiting
Resource cleanup

Here’s a robust connection pool implementation using WS-Kit:

import { z, createRouter } from "@ws-kit/zod";
import type { ServerWebSocket } from "bun";

type ClientInfo = {
  userId: string;
  username: string;
  connectedAt: number;
  lastActivity: number;
  rooms: Set;
};

class ConnectionPool {
  private clients = new Map>();
  private userConnections = new Map>();

  /**
   * Register a new connection
   */
  add(
    clientId: string,
    ws: ServerWebSocket,
    userId?: string,
  ): void {
    // Store connection by client ID
    this.clients.set(clientId, ws as ServerWebSocket);

    // Track by user ID if available
    if (userId) {
      if (!this.userConnections.has(userId)) {
        this.userConnections.set(userId, new Set());
      }
      this.userConnections.get(userId)!.add(clientId);
    }
  }

  /**
   * Update user ID for an existing connection
   */
  associateWithUser(clientId: string, userId: string): void {
    const ws = this.clients.get(clientId);
    if (!ws) return;

    ws.data.userId = userId;

    if (!this.userConnections.has(userId)) {
      this.userConnections.set(userId, new Set());
    }
    this.userConnections.get(userId)!.add(clientId);
  }

  /**
   * Remove a connection
   */
  remove(clientId: string): void {
    const ws = this.clients.get(clientId);
    if (!ws) return;

    const userId = ws.data.userId;

    // Remove from clients map
    this.clients.delete(clientId);

    // Clean up user association
    if (userId && this.userConnections.has(userId)) {
      const connections = this.userConnections.get(userId)!;
      connections.delete(clientId);

      if (connections.size === 0) {
        this.userConnections.delete(userId);
      }
    }
  }

  /**
   * Update activity timestamp
   */
  updateActivity(clientId: string): void {
    const ws = this.clients.get(clientId);
    if (ws) {
      ws.data.lastActivity = Date.now();
    }
  }

  /**
   * Check if user is online (has any active connections)
   */
  isUserOnline(userId: string): boolean {
    return (
      this.userConnections.has(userId) &&
      this.userConnections.get(userId)!.size > 0
    );
  }

  /**
   * Get all connections for a user
   */
  getUserConnections(userId: string): ServerWebSocket[] {
    if (!this.userConnections.has(userId)) return [];

    return Array.from(this.userConnections.get(userId)!)
      .map((clientId) => this.clients.get(clientId))
      .filter(Boolean) as ServerWebSocket[];
  }

  /**
   * Get all clients
   */
  getAllClients(): ServerWebSocket[] {
    return Array.from(this.clients.values());
  }

  /**
   * Send message to all connections of a specific user
   */
  sendToUser(userId: string, schema: S, payload: P): void {
    const connections = this.getUserConnections(userId);

    for (const ws of connections) {
      // Using the WebSocketRouter's message format
      ws.send(
        JSON.stringify({
          type: (schema as any).type,
          payload,
        }),
      );
    }
  }
}

export default ConnectionPool;

Now let’s integrate it with our router:

import { z, createRouter } from "@ws-kit/zod";
import { randomUUID } from "crypto";
import ConnectionPool from "./connection-pool";
import * as schema from "./schemas";
import chatRouter from "./chat-router";

type AppData = schema.Meta & {
  clientId: string;
  connectedAt: number;
  lastActivity: number;
};

// Create the WebSocket router
const router = createRouter();

// Create connection pool
const pool = new ConnectionPool();

// Add connection tracking
router.onOpen((ctx) => {
  // Generate a unique ID for this connection
  const clientId = randomUUID();

  // Set initial connection metadata
  ctx.ws.data.clientId = clientId;
  ctx.ws.data.connectedAt = Date.now();
  ctx.ws.data.lastActivity = Date.now();

  // Add to connection pool
  pool.add(clientId, ctx.ws);

  console.log(`Client connected: ${clientId}`);
});

// Add activity tracking middleware
router.use((ctx, next) => {
  // Update last activity timestamp
  ctx.ws.data.lastActivity = Date.now();
  pool.updateActivity(ctx.ws.data.clientId);

  // Continue processing
  return next();
});

// When user authenticates, associate their connection with their user ID
router.on(schema.Authenticate, (ctx) => {
  // Authentication logic...

  // Associate connection with user
  if (ctx.ws.data.userId) {
    pool.associateWithUser(ctx.ws.data.clientId, ctx.ws.data.userId);
  }

  // Continue with normal flow...
});

// Handle disconnection
router.onClose((ctx) => {
  console.log(`Client disconnected: ${ctx.ws.data.clientId}`);
  pool.remove(ctx.ws.data.clientId);
});

// Add our chat routes
router.merge(chatRouter);

// Expose pool to other modules
export { pool };

With this connection pool, you can now easily:

Send messages to all of a user’s devices (multi-device support)
Check if users are online
Implement presence detection
Monitor connection statistics

Rate Limiting and Throttling

Nothing ruins a WebSocket service faster than a client that sends messages at the speed of light (or a poorly written client that got stuck in a message loop). Let’s implement a rate limiter middleware:

import { z, createRouter } from "@ws-kit/zod";

type RateLimitOptions = {
  // Maximum messages per window
  maxMessages: number;

  // Time window in milliseconds
  windowMs: number;

  // Optional exception for specific message types
  excludeTypes?: string[];
};

type RateLimitData = {
  counter: number;
  resetAt: number;
};

// Store rate limit data by client ID
const limiters = new Map();

// Clean up stale rate limit data every 5 minutes
setInterval(
  () => {
    const now = Date.now();
    for (const [clientId, data] of limiters.entries()) {
      if (data.resetAt <= now) {
        limiters.delete(clientId);
      }
    }
  },
  5 * 60 * 1000,
);

// Create rate limiter middleware
export function createRateLimiter(options: RateLimitOptions) {
  const { maxMessages, windowMs, excludeTypes = [] } = options;

  return async function rateLimiterMiddleware(ctx, next) {
    // Skip rate limiting for excluded message types
    if (excludeTypes.includes(ctx.type)) {
      await next();
      return;
    }

    const clientId = ctx.ws.data.clientId;

    if (!clientId) {
      // Can't rate limit without client ID
      return next();
    }

    const now = Date.now();
    let limiter = limiters.get(clientId);

    // Initialize or reset if window has passed
    if (!limiter || limiter.resetAt <= now) {
      limiter = {
        counter: 0,
        resetAt: now + windowMs,
      };
      limiters.set(clientId, limiter);
    }

    // Check if rate limit exceeded
    if (limiter.counter >= maxMessages) {
      const secondsRemaining = Math.ceil((limiter.resetAt - now) / 1000);

      // Send error message with proper details
      ctx.error(
        "RESOURCE_EXHAUSTED",
        "Rate limit exceeded",
        { secondsRemaining, maxMessages, windowMs },
        { retryable: true, retryAfterMs: limiter.resetAt - now },
      );

      return; // Stop processing
    }

    // Increment counter and continue
    limiter.counter++;
    await next();
  };
}

Now let’s apply this middleware to our router:

import { z, createRouter } from "@ws-kit/zod";
import { createRateLimiter } from "./rate-limiter";

const router = createRouter();

// Apply rate limiting
router.use(
  createRateLimiter({
    maxMessages: 20, // 20 messages
    windowMs: 10_000, // per 10 seconds
    excludeTypes: [
      // Don't rate limit typing indicators
      "TYPING_START",
      "TYPING_STOP",
    ],
  }),
);

// Rest of your server setup...

Custom PubSub with Selective Message Delivery

While WS-Kit provides built-in pub/sub through ctx.publish() and ctx.subscribe(), sometimes you need advanced filtering based on user properties. This section shows a custom implementation for scenarios requiring fine-grained control:

For most applications, WS-Kit’s native pub/sub is sufficient:

// Simple room-based broadcasting with WS-Kit
router.on(schema.ChatMessage, (ctx) => {
  const { roomId, text } = ctx.payload;

  // Publish to all subscribers in room (with validation)
  ctx.publish(roomId, schema.ChatMessage, {
    roomId,
    userId: ctx.ws.data.userId,
    username: ctx.ws.data.username,
    text,
    timestamp: Date.now(),
  });
});

router.on(schema.JoinRoom, (ctx) => {
  const { roomId } = ctx.payload;
  ctx.subscribe(roomId); // Join room
});

router.on(schema.LeaveRoom, (ctx) => {
  const { roomId } = ctx.payload;
  ctx.unsubscribe(roomId); // Leave room
});

When to Use EnhancedPubSub: WS-Kit’s native ctx.publish() and ctx.subscribe() are sufficient for most applications, providing simple topic-based broadcasting with automatic message validation. Consider implementing a custom PubSub extension only when you need role-based filtering, metadata-based message delivery, or complex subscriber filtering logic that goes beyond basic topic subscriptions. For typical chat applications, room management, and notification systems, stick with the native approach shown above.

**For advanced filtering use cases, here’s a custom PubSub extension:

import { z, createRouter, message } from "@ws-kit/zod";
import type { ServerWebSocket } from "bun";

// Define a topic subscriber with filtering options
type Subscriber = {
  ws: ServerWebSocket;
  filter?: (meta: T) => boolean;
};

class EnhancedPubSub {
  private topics = new Map>>();

  /**
   * Subscribe a client to a topic with optional filter
   */
  subscribe(
    ws: ServerWebSocket,
    topic: string,
    filter?: (meta: T) => boolean,
  ): void {
    if (!this.topics.has(topic)) {
      this.topics.set(topic, new Set());
    }

    this.topics.get(topic)!.add({ ws, filter });
  }

  /**
   * Unsubscribe a client from a topic
   */
  unsubscribe(ws: ServerWebSocket, topic: string): void {
    if (!this.topics.has(topic)) return;

    const subscribers = this.topics.get(topic)!;
    const toRemove = Array.from(subscribers).filter((sub) => sub.ws === ws);

    for (const sub of toRemove) {
      subscribers.delete(sub);
    }

    if (subscribers.size === 0) {
      this.topics.delete(topic);
    }
  }

  /**
   * Unsubscribe a client from all topics
   */
  unsubscribeAll(ws: ServerWebSocket): void {
    for (const [topic, subscribers] of this.topics.entries()) {
      this.unsubscribe(ws, topic);
    }
  }

  /**
   * Publish a message to all subscribers of a topic
   */
  publish(
    sourceSender: ServerWebSocket | null,
    topic: string,
    schema: S,
    payload: P,
    skipSender: boolean = true,
  ): number {
    if (!this.topics.has(topic)) return 0;

    const subscribers = this.topics.get(topic)!;
    let sentCount = 0;

    const message = JSON.stringify({
      type: (schema as any).type,
      payload,
    });

    for (const { ws, filter } of subscribers) {
      // Skip sender if requested
      if (skipSender && ws === sourceSender) continue;

      // Apply filter if one exists
      if (filter && !filter(ws.data)) continue;

      // Send the message
      ws.send(message);
      sentCount++;
    }

    return sentCount;
  }

  /**
   * Get count of subscribers for a topic
   */
  subscriberCount(topic: string): number {
    return this.topics.has(topic) ? this.topics.get(topic)!.size : 0;
  }

  /**
   * Get all topics a client is subscribed to
   */
  getSubscribedTopics(ws: ServerWebSocket): string[] {
    const result: string[] = [];

    for (const [topic, subscribers] of this.topics.entries()) {
      if (Array.from(subscribers).some((sub) => sub.ws === ws)) {
        result.push(topic);
      }
    }

    return result;
  }
}

export default EnhancedPubSub;

Now we can use this advanced PubSub system to implement features like:

import EnhancedPubSub from "./enhanced-pubsub";
import { z, createRouter } from "@ws-kit/zod";
import * as schema from "./schemas";

type AppData = schema.Meta;

const router = createRouter();
const pubsub = new EnhancedPubSub();

// Handle room joining with role-based filters
router.on(schema.JoinRoom, (ctx) => {
  const { roomId } = ctx.payload;
  const userId = ctx.ws.data.userId;
  const username = ctx.ws.data.username;
  const userRole = ctx.ws.data.userRole || "user";

  // Subscribe with filter - only receive messages for your role level and below
  pubsub.subscribe(ctx.ws, roomId, (clientData) => {
    const messageMinRole = clientData.messageMinRole || "user";

    if (messageMinRole === "admin" && userRole !== "admin") {
      return false; // Filter out admin-only messages
    }
    if (
      messageMinRole === "moderator" &&
      userRole !== "admin" &&
      userRole !== "moderator"
    ) {
      return false; // Filter out moderator-only messages
    }

    return true;
  });

  // Let others know user joined
  pubsub.publish(ctx.ws, roomId, schema.UserJoined, {
    roomId,
    userId,
    username,
  });

  console.log(`User ${username} (${userId}) joined room: ${roomId}`);
});

// Send message only to admins and moderators
router.on(schema.ModAction, (ctx) => {
  const { roomId, action } = ctx.payload;

  // Only allow moderators and admins to send mod actions
  const userRole = ctx.ws.data.userRole;
  if (userRole !== "moderator" && userRole !== "admin") {
    ctx.error(
      "PERMISSION_DENIED",
      "You don't have permission to perform moderator actions",
    );
    return;
  }

  // Set minimum role to receive this message
  ctx.ws.data.messageMinRole = "moderator";

  // Publish to room (only mods/admins will receive it due to filter)
  pubsub.publish(ctx.ws, roomId, schema.ModAction, {
    roomId,
    userId: ctx.ws.data.userId,
    username: ctx.ws.data.username,
    action,
  });

  // Reset the message minimum role
  ctx.ws.data.messageMinRole = "user";
});

// Clean up subscriptions when user leaves
router.onClose((ctx) => {
  pubsub.unsubscribeAll(ctx.ws);
});

export default router;

Request/Response Pattern (RPC)

Real-time applications often need reliable request/response patterns for operations like fetching data, updating settings, or triggering actions. WS-Kit provides built-in RPC support with automatic correlation IDs, timeouts, and type safety.

Server-Side RPC Handler

Define request and response schemas, then handle with router.rpc():

import { z, message, createRouter } from "@ws-kit/zod";

// Define request and response schemas
const FetchProfile = message("FETCH_PROFILE", { userId: z.string() });
const ProfileResponse = message("PROFILE_RESPONSE", {
  id: z.string(),
  name: z.string(),
  email: z.string(),
});

const router = createRouter();

// Handle RPC request
router.rpc(FetchProfile, async (ctx) => {
  const { userId } = ctx.payload;

  try {
    // Fetch user profile from database
    const profile = await fetchUserProfileFromDb(userId);

    if (!profile) {
      // Send error response
      ctx.error("NOT_FOUND", `User ${userId} not found`);
      return;
    }

    // Send typed response (automatically correlates with request)
    ctx.reply(ProfileResponse, profile);
  } catch (error) {
    ctx.error("INTERNAL", "Failed to fetch profile");
  }
});

Key RPC features:

✅ Automatic correlation ID generation
✅ Built-in timeout handling
✅ Full type safety on both request and response
✅ Structured error responses with gRPC-standard error codes

Error Codes: WS-Kit uses gRPC-standard error codes for consistency across your application. Common codes include: NOT_FOUND (resource doesn't exist), PERMISSION_DENIED (insufficient permissions), INVALID_ARGUMENT (malformed request), INTERNAL (server error), RESOURCE_EXHAUSTED (rate limit exceeded), UNAUTHENTICATED (missing or invalid credentials), and UNAVAILABLE (service temporarily down). Use these standard codes in ctx.error() for predictable client-side error handling.

Client-Side RPC Call

On the client, client.request() handles correlation automatically:

// Client code using @ws-kit/client/zod
import {
  wsClient,
  TimeoutError,
  ServerError,
  ConnectionClosedError,
} from "@ws-kit/client/zod";
import { FetchProfile, ProfileResponse } from "./shared/schemas.js";

const client = wsClient({ url: "ws://localhost:3000/ws" });

async function getUserProfile(userId) {
  try {
    // Send request and wait for typed response
    const response = await client.request(
      FetchProfile,
      { userId },
      ProfileResponse,
      { timeoutMs: 5000 }, // 5 second timeout
    );

    console.log("Profile:", response.payload);
    // response.payload is fully typed: { id: string, name: string, email: string }

    return response.payload;
  } catch (error) {
    if (error instanceof TimeoutError) {
      console.error(`Request timed out after ${error.timeoutMs}ms`);
    } else if (error instanceof ServerError) {
      console.error(`Server error: ${error.code}`, error.context);
    } else if (error instanceof ConnectionClosedError) {
      console.error("Connection closed before reply");
    }
    throw error;
  }
}

// Usage
const profile = await getUserProfile("user-123");

Client request features:

✅ Automatic correlationId generation (UUIDv4)
✅ Configurable timeout (default: 30 seconds)
✅ AbortSignal support for cancellation
✅ Typed responses with validation
✅ Automatic reconnection with queued requests

Cancellation with AbortSignal

The client SDK supports standard AbortSignal for cancelling in-flight RPC requests. This is useful when users navigate away from a page, close a modal, or when you want to implement request debouncing. Cancelled requests are cleaned up immediately without waiting for timeouts.

const controller = new AbortController();

const promise = client.request(
  FetchProfile,
  { userId: "user-123" },
  ProfileResponse,
  { signal: controller.signal },
);

// Cancel the request
setTimeout(() => controller.abort(), 2000);

try {
  const response = await promise;
} catch (error) {
  if (error instanceof StateError && error.message.includes("aborted")) {
    console.log("Request was cancelled by user");
  }
}

The @ws-kit/client SDK automatically handles correlation, timeouts, and retries, so you don't need to implement custom request tracking. Just use client.request() as shown above.

Connection Health Monitoring with Heartbeats

WebSocket connections can silently die or become “zombies” where the TCP connection is technically open but no longer passing messages. WS-Kit provides built-in heartbeat support through router configuration:

Note: WS-Kit’s heartbeat system operates on two layers: (1) the framework’s automatic WebSocket ping/pong frames for detecting broken connections, and (2) optional application-level custom heartbeat messages for measuring client latency and application responsiveness. The example below demonstrates both layers working together.

import { z, createRouter, message } from "@ws-kit/zod";
import { serve } from "@ws-kit/bun";
import type { Meta } from "./schemas";

// Define custom heartbeat messages for application-level monitoring
export const HeartbeatPing = message("HEARTBEAT_PING", {
  timestamp: z.number(),
});

export const HeartbeatPong = message("HEARTBEAT_PONG", {
  timestamp: z.number(),
  latency: z.number().optional(),
});

// Setup router with built-in heartbeat
const router = createRouter({
  heartbeat: {
    intervalMs: 30_000, // Send heartbeat every 30 seconds
    timeoutMs: 5_000, // Expect response within 5 seconds
    onStaleConnection: (clientId, ws) => {
      console.log(`Stale connection detected: ${clientId}`);
      // Connection is automatically closed by framework
      // Use this callback for cleanup if needed
    },
  },
});

// Optional: handle custom heartbeat messages for latency measurement
router.on(HeartbeatPing, (ctx) => {
  const { timestamp } = ctx.payload;
  const latency = Date.now() - timestamp;

  ctx.send(HeartbeatPong, {
    timestamp,
    latency,
  });
});

// Setup server with heartbeat enabled
serve(router, {
  port: 3000,
});

The @ws-kit/client SDK handles heartbeat monitoring automatically when configured:

import { wsClient } from "@ws-kit/client/zod";

const client = wsClient({
  url: "ws://localhost:3000/ws",
  heartbeat: {
    // Optional: SDK can detect stale connections
    // Heartbeat is handled transparently via WebSocket ping/pong
  },
});

// Monitor connection health via state changes
client.onState((state) => {
  if (state === "closed") {
    console.warn("Connection closed, client will auto-reconnect");
  } else if (state === "open") {
    console.log("Connection healthy and open");
  }
});

// Optional: Measure latency with custom heartbeat messages
const HeartbeatPing = message("HEARTBEAT_PING", { timestamp: z.number() });
const HeartbeatPong = message("HEARTBEAT_PONG", { timestamp: z.number() });

client.on(HeartbeatPong, (msg) => {
  const latency = Date.now() - msg.payload.timestamp;
  console.log(`Latency: ${latency}ms`);
});

// Measure latency periodically
setInterval(() => {
  if (client.isConnected) {
    client.send(HeartbeatPing, { timestamp: Date.now() });
  }
}, 30_000);

Key advantages of WS-Kit’s built-in heartbeat:

Automatic detection of stale connections
No manual connection tracking needed
Configurable intervals and timeouts
Framework handles connection cleanup
Can be disabled by omitting heartbeat config

Connection Upgrades and Protocol Negotiation

In sophisticated applications, you might need to negotiate protocol features or upgrade connections to support different functionality:

import { z, createRouter, message } from "@ws-kit/zod";
import type { Meta } from "./schemas";

// Define feature flags
export enum Feature {
  COMPRESSION = "compression",
  ENCRYPTION = "encryption",
  BATCHING = "batching",
  BINARY_MESSAGES = "binary_messages",
}

// Negotiation message schemas
export const ClientCapabilities = message("CLIENT_CAPABILITIES", {
  protocolVersion: z.string(),
  features: z.array(z.nativeEnum(Feature)),
  compressionFormats: z.array(z.string()).optional(),
});

export const ServerCapabilities = message("SERVER_CAPABILITIES", {
  protocolVersion: z.string(),
  supportedFeatures: z.array(z.nativeEnum(Feature)),
  enabledFeatures: z.array(z.nativeEnum(Feature)),
  compressionFormat: z.string().optional(),
});

// Setup protocol negotiation
export function setupProtocolNegotiation(router) {
  // Server supported features
  const supportedFeatures = [Feature.COMPRESSION, Feature.BATCHING];

  // Handle client capabilities message
  router.on(ClientCapabilities, (ctx) => {
    const { protocolVersion, features, compressionFormats } = ctx.payload;

    // Check protocol version compatibility
    if (!isCompatibleVersion(protocolVersion)) {
      ctx.error(
        "INVALID_ARGUMENT",
        `Unsupported protocol version: ${protocolVersion}. Server requires 1.x`,
      );

      // Terminate connection - incompatible protocol
      setTimeout(
        () => ctx.ws.close(1002, "Incompatible protocol version"),
        100,
      );
      return;
    }

    // Determine which features to enable
    const enabledFeatures = supportedFeatures.filter((feature) =>
      features.includes(feature),
    );

    // Store enabled features in connection metadata
    ctx.ws.data.enabledFeatures = enabledFeatures;

    // Determine compression format if requested
    let compressionFormat: string | undefined;

    if (
      enabledFeatures.includes(Feature.COMPRESSION) &&
      compressionFormats &&
      compressionFormats.length > 0
    ) {
      // Choose first supported compression format
      if (compressionFormats.includes("gzip")) {
        compressionFormat = "gzip";
      } else if (compressionFormats.includes("deflate")) {
        compressionFormat = "deflate";
      }

      ctx.ws.data.compressionFormat = compressionFormat;
    }

    // Send server capabilities
    ctx.send(ServerCapabilities, {
      protocolVersion: "1.0",
      supportedFeatures,
      enabledFeatures,
      compressionFormat,
    });

    console.log(
      `Negotiated protocol with ${ctx.ws.data.clientId}: ${enabledFeatures.join(", ")}`,
    );
  });
}

// Check if client version is compatible with server
function isCompatibleVersion(clientVersion: string): boolean {
  // Simple version check - in real app you'd use semver
  return clientVersion.startsWith("1.");
}

Conclusion: The Power of Advanced Patterns

By implementing these advanced patterns, you’ve taken your WebSocket application from a simple message-passing system to a robust, production-ready communication platform. We’ve covered:

Connection management with tracking, pooling, and user association
Rate limiting to protect against accidental or malicious overload
Enhanced PubSub with selective message delivery based on user properties
Request/response patterns for reliable communication with acknowledgments
Connection health monitoring with heartbeats to detect zombie connections
Protocol negotiation for feature detection and progressive enhancement

Each of these patterns addresses real-world challenges you’ll face when deploying WebSocket applications at scale. The beauty of using WS-Kit is that its clean, type-safe foundation makes it easy to layer these advanced patterns on top without creating a tangled mess of code.

Remember, in the world of WebSockets, the difference between a toy project and a production system isn’t just in the basic functionality — it’s in how gracefully your application handles edge cases, failures, and scale. With these patterns in your toolkit, you’re well-equipped to build WebSocket applications that don’t just work in the happy path, but thrive in the chaotic reality of the real world.

And the next time someone casually suggests “Let’s just add real-time messaging to our app, how hard could it be?”, you can smile knowingly — and then build it right the first time.

Wrapping It Up

We’ve taken quite a journey together, exploring how to build robust, type-safe WebSocket applications with Bun and WS-Kit. From the basics of WebSocket communication to advanced patterns like connection management, authentication, and error handling, we’ve covered the essentials of crafting real-time applications that are both maintainable and scalable.

Why WS-Kit Stands Out

WS-Kit represents a modern approach to WebSocket development:

Platform-agnostic: Works with Bun, Cloudflare Durable Objects, and custom adapters
Validator-agnostic: Choose between Zod, Valibot, or your own validation library
Production-ready: Built on lessons learned from years of real-time systems experience
Actively developed: Continuously improved based on community feedback

The library is designed to be the foundation for production applications while remaining simple enough for quick prototypes.

Getting Support

If you encounter any issues, have questions, or want to contribute to the project, check out the WS-Kit repository on GitHub. You can also connect with the community and maintainers on Discord to share your experiences and get help troubleshooting any problems you might face.

Final Thoughts

Building real-time applications doesn’t have to be complex or error-prone. With the right tools and patterns, you can focus on creating amazing user experiences without getting bogged down in the details of WebSocket message routing or type validation.

Whether you’re building a simple chat application or a sophisticated collaborative platform, WS-Kit provides the foundation you need to create reliable, type-safe real-time experiences with confidence.

Now go forth and build something amazing! And remember, in the fast-moving world of WebSockets, type safety isn’t just a luxury — it’s your best friend.

Happy coding!

Building Type-Safe WebSocket Applications with Bun and Zod was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

Enabling efficient front-end development

Konstantin Tarkus — Mon, 17 Jul 2023 16:47:53 GMT

The role of web infrastructure engineering teams

Photo by Annie Spratt on Unsplash

Introduction

In today’s digital landscape, the importance of robust web infrastructure cannot be overstated. Large companies like Facebook have realized this and established dedicated web infrastructure engineering teams to support their front-end developers. However, for small startups, the cost of maintaining such a team might be prohibitive. In this blog post, we will explore the benefits of having a web infrastructure engineering team and discuss an alternative approach for startups that allows them to focus on application code while ensuring a reliable and up-to-date web infrastructure.

The value of a dedicated web infrastructure engineering team

Large companies invest in web infrastructure engineering teams for several reasons. These teams act as a bridge between development and operations, specializing in designing, building, and maintaining the underlying infrastructure required for the front-end applications. Here are some key benefits they provide:

Expertise and specialization: Web infrastructure engineers have in-depth knowledge of various tools, frameworks, and technologies that power modern web applications. They understand the complexities of scaling, performance optimization, and security. Their expertise ensures the infrastructure is designed and implemented in a way that maximizes efficiency and minimizes downtime.
Continuous integration and deployment (CI/CD): Web infrastructure teams establish robust CI/CD workflows that enable seamless deployment of code changes. They automate processes, such as building, testing, and deploying applications, ensuring that developers can focus on writing code rather than worrying about deployment pipelines.
Performance monitoring and optimization: These teams proactively monitor the performance of web applications, identifying bottlenecks and optimizing the infrastructure to deliver the best user experience. They leverage tools for performance monitoring, load testing, and analytics to ensure optimal application performance.
Security and compliance: Web infrastructure engineers are responsible for implementing robust security measures, protecting user data, and ensuring compliance with industry regulations. They stay updated with the latest security practices and help in mitigating potential risks.

Contracting with a part-time infrastructure engineer

While small startups may not have the resources to maintain a dedicated web infrastructure engineering team, they can still benefit from infrastructure expertise by contracting with a part-time infrastructure engineer. This approach allows the core development team to focus solely on application code while ensuring a reliable and up-to-date web infrastructure. Here’s why this arrangement can be advantageous:

Cost-effective solution: Hiring a full-time web infrastructure engineer can be expensive for startups, especially in the early stages. Contracting with a part-time engineer allows businesses to access the necessary expertise without bearing the cost of a full-time employee.
Focus on core competencies: By offloading infrastructure responsibilities to a part-time engineer, the core development team can focus on building and enhancing the application. This separation of concerns improves productivity and allows each team member to contribute to their area of expertise.
Smooth CI/CD workflows: A part-time infrastructure engineer can establish and maintain robust CI/CD workflows that automate build, test, and deployment processes. This ensures that the application deployment pipeline remains efficient and reliable, freeing up developers’ time.
Infrastructure maintenance and updates: Web technologies evolve rapidly, and keeping up with the latest frameworks, libraries, and security updates can be challenging. A part-time infrastructure engineer can ensure that the core web infrastructure remains up-to-date, secure, and performs optimally, relieving the development team from such concerns.

How does it work?

Finding a skilled web infrastructure engineer for your startup can be easier than you think. Many of these professionals actively contribute to open-source projects on GitHub. Once your team has settled on a tech stack, you can filter GitHub projects using relevant keywords like [React] or [Boilerplate]. Explore the most popular projects and take note of the maintainers. Reach out to those who have relevant experience and discuss the possibility of contracting their services.

By the way, if you’re embarking on a startup idea with React, Node.js, Google Cloud stack, I recommend starting with this resource: gumroad.com/l/react-starter-kit (a limited-time offer by me).

How much does it cost?

The cost of contracting a part-time web infrastructure engineer typically ranges from $300 to $1500 per month, depending on the complexity of the project and allocated working hours. As your company grows, you can always transition to hiring a full-time DevOps or web infrastructure engineer, or even establish a dedicated team to handle your expanding needs.

Conclusion

While large companies like Facebook have the resources to maintain dedicated web infrastructure engineering teams, startups often face budgetary constraints. By contracting with a part-time infrastructure engineer, small businesses can leverage specialized expertise without the overhead of a full-time team. This arrangement empowers the core development team to focus on application code, knowing that the web infrastructure remains up-to-date, secure, and operates smoothly. By adopting this approach, startups can enhance efficiency, reduce distractions, and pave the way for scalable growth.

Enabling efficient front-end development was originally published in The Startup on Medium, where people are continuing the conversation by highlighting and responding to this story.

Using EJS with Vite

Konstantin Tarkus — Thu, 15 Jun 2023 11:05:42 GMT

Why use EJS with Vite?

Let’s consider an example scenario: you are building a web app that will run at CDN edge locations using Cloudflare Workers. In this scenario, you may have the following requirements:

You need to configure the reverse proxy for certain third-party websites, such as Framer, Intercom Helpdesk, etc.
You should be able to inject custom HTML/JS snippets into the pages of these websites.
The code snippets should function correctly in different environments, such as production and test/QA.
To optimize the application bundle, it is necessary to pre-compile these templates instead of including a template library.

In such cases, using EJS in combination with Vite can be a beneficial choice.

What does it look like?

The HTML snippet is conveniently placed into a separate file with HTML/JS syntax highlighting and code completion (views/analytics.ejs):

While the Cloudflare Worker script injects it into an (HTML) landing page loaded from Framer:

import { Hono } from "hono";
import analytics from "../views/analytics.ejs";

export const app = new Hono();

// Serve landing pages, inject Google Analytics
app.use("*", async ({ req, env }, next) => {
  const url = new URL(req.url);

  // Skip non-landing pages
  if (!["/", "/about", "/home"].includes(url.pathname)) {
    return next();
  }

  const res = await fetch("https://example.framer.app/", req.raw);

  return new HTMLRewriter()
    .on("body", {
      element(el) {
        el.onEndTag((tag) => {
          try {
            tag.before(analytics(env), { html: true });
          } catch (err) {
            console.error(err);
          }
        });
      },
    })
    .transform(res.clone());
});

How do I pre-compile EJS templates with Vite?

Install ejs and @types/ejs NPM modules as development dependencies (yarn add ejs @types/ejs -D).

Add the following plugin to your vite.config.ts file:

import { compile } from "ejs";
import { readFile } from "node:fs/promises";
import { relative, resolve } from "node:path";
import { defineConfig } from "vite";

export default defineConfig({
  ...
  plugins: [
    {
      name: "ejs",
      async transform(_, id) {
        if (id.endsWith(".ejs")) {
          const src = await readFile(id, "utf-8");
          const code = compile(src, {
            client: true,
            strict: true,
            localsName: "env",
            views: [resolve(__dirname, "views")],
            filename: relative(__dirname, id),
          }).toString();
          return `export default ${code}`;
        }
      },
    },
  ],
});

How to make .ejs imports work with TypeScript?

Add **/*.ejs to the list of included files in your tsconfig.json file.
Add the following type declaration to you global.d.ts file:

declare module "*.ejs" {
  /**
   * Generates HTML markup from an EJS template.
   *
   * @param locals an object of data to be passed into the template.
   * @param escape callback used to escape variables
   * @param include callback used to include files at runtime with `include()`
   * @param rethrow callback used to handle and rethrow errors
   *
   * @return Return type depends on `Options.async`.
   */
  const fn: (
    locals?: Data,
    escape?: EscapeCallback,
    include?: IncludeCallback,
    rethrow?: RethrowCallback,
  ) => string;
  export default fn;
}

The kriasoft/relay-starter-kit is a comprehensive full-stack web application project template that comes pre-configured with all the mentioned features (located in the /edge folder).

If you require any assistance with web infrastructure and DevOps, feel free to contact me on Codementor or Discord. I’m here to help! Happy coding!

Stories by Konstantin Tarkus on Medium

From Prototype to Production: Change Control for AI Decisions

Change control for AI decisions

Try it: catch a prompt regression in 60 seconds

Define a step

Run and capture a baseline

Recompute with a new prompt

Schema violations: catching what logs miss

Try it without API keys

Human corrections that survive

CI: block regressions before merge

What this is not

When this fits

Try it

Designing Fair Token Bucket Policies for Real-Time Apps

Sizing burst capacity and refill rates for chat, gaming, and streaming without starving legitimate users or enabling abuse.

The Problem With One-Size-Fits-All Rate Limiting

How Token Buckets Work: A Quick Mental Model

Capacity vs. Refill Rate: The Fundamental Trade-Off

Capacity: Burst Tolerance

Refill Rate: Sustained Throughput

The Pairing Principle

Domain-Specific Policies

Chat Applications

Multiplayer Gaming

Streaming (Live Video)

Layering Limits: Per-User, Per-Route, Cost-Based

The Pyramid of Protection

Per-User Per-Type (Most Common)

Cost-Based Limiting (Advanced)

Choosing a Rate Limiter Adapter

How to Calculate Costs

Common Mistakes and Red Flags

1. Capacity Way Higher Than Refill Rate (Most Common)

2. Confusing Per-User and Global Limits (Second Most Common)

3. Not Accounting for Network Jitter (Subtle but Frequent)

Other Considerations

Testing and Tuning Your Policy

Load Testing Strategy

Monitoring During Rollout

Tuning Based on Real Data

Real-World Case Study: RoomChat

Implementation Checklist

Putting It Into Practice

The Three-Phase Deployment Strategy

Common Implementation Patterns

Quick Start

Further Reading

Beyond the Lock: Why Fencing Tokens Are Essential

The Illusion of Safety

When Locks Lie: The Zombie Process Problem

The Three-Party Protocol: Enter Fencing Tokens

How SyncGuard Implements Fencing Tokens

Backend Implementation

Resource-Side Implementation

Putting It All Together

When Fencing Isn’t Possible

When You Need Fencing Tokens

The Bigger Picture: Locks vs Leases

Conclusion

GitHub Security Notifications for Discord

Why Security Notifications Matter

Important Security Limitations

Discord Setup

1. Create a Private Security Channel

2. Generate Webhook URL

GitHub Configuration

1. Repository Webhooks

2. Critical Security Events

3. Access Control Events

4. Security Bypass Events

Best Practices

Channel Management

Notification Hygiene

Response Workflows

Testing Your Setup

Event Priority Levels

🚨 Critical (Immediate Response Required)

⚠️ High (Response Within Hours)

ℹ️ Medium (Monitor and Review)