Stories by Samir Patil on Medium

Apple Is Quietly Commoditizing AI Infrastructure

Samir Patil — Wed, 10 Jun 2026 10:24:19 GMT

Most people watching WWDC focused on Apple Intelligence.

Some focused on the new M5 chips.

Others noticed MLX Swift and Apple’s new distributed machine learning capabilities.

I think they’re all looking at the wrong thing.

The most important AI announcement at WWDC wasn’t a model or an assistant.

It was infrastructure.

For the last few years, AI has been controlled by three scarce resources:

Access to frontier models
Access to GPU infrastructure
Access to the capital required to operate both

The first barrier is already falling.

Open-source models like DeepSeek, Qwen, and Llama have shown that powerful AI is no longer exclusive to a handful of companies. Every few months, a new model emerges that closes the gap between open and closed systems.

But while intelligence itself is becoming increasingly accessible, the infrastructure required to run it remains expensive and centralized.

If a startup wanted to deploy a powerful model, the default answer was almost always the same:

Rent GPUs.
Use a cloud provider.
Pay an API vendor.

That is why Apple’s announcements around MLX matter far more than they initially appear.

Taken individually, features like MLX Swift, distributed inference, distributed training, RDMA over Thunderbolt, and MLX Distributed look like incremental engineering improvements.

Taken together, they look like a complete AI stack.

The narrative around local AI has traditionally been about shrinking models so they can fit on a laptop.

Apple is pursuing that path too.

But Apple’s distributed MLX announcements reveal a second path: instead of being constrained by the memory of a single machine, developers can distribute workloads across multiple Apple Silicon systems and run models that would otherwise be impossible on a single device.

Imagine a 30-person startup where every employee uses a Mac. Today, those machines are clients. Tomorrow, they could collectively become part of the company’s AI infrastructure.

Not necessarily replacing hyperscale cloud deployments, but potentially providing enough aggregate compute and memory to host powerful open-source models, internal copilots, knowledge systems, and AI workflows that previously required dedicated GPU resources.

Apple isn’t replacing the cloud.

And it isn’t replacing small local models.

What it is doing is expanding the range of what can be run on hardware that organizations already own.

The most interesting part is what happens when the next DeepSeek-level breakthrough arrives.

Instead of waiting for a cloud provider or API vendor to make it available, companies may be able to deploy it themselves on infrastructure they already have.

Open-source models commoditized intelligence.

Apple’s MLX stack may be the first serious attempt to commoditize the infrastructure required to run it.

The Hidden HTTPS Bug Behind Load Balancers

Samir Patil — Sun, 24 May 2026 19:56:32 GMT

Why Your App Breaks Behind a Load Balancer

You deploy your app.

HTTPS is enabled.
The SSL certificate works.
Everything looks perfect.

Then suddenly:

“Mixed Content: The page at ‘https://yourdomain.com' was loaded over HTTPS, but requested an insecure resource ‘http://yourdomain.com/api/...' This request has been blocked.”

Or you get a redirect loop. Or your API calls silently fail.

You check your load balancer — HTTPS is configured correctly. You check Nginx — it’s running. You check your backend — it’s healthy.

So what went wrong?

The issue usually isn’t your load balancer.
It isn’t Nginx either.

Your backend simply doesn’t know the original request was HTTPS.

The Architecture Most Apps Use

Browser → HTTPS → Load Balancer
                     ↓ HTTP
                  Nginx / Proxy
                     ↓ HTTP
                   Backend

This setup is called SSL termination.

The load balancer handles HTTPS, while internal traffic stays on HTTP inside a private network.

This is completely normal in production.

Why Mixed Content Happens

Your browser sends:

https://yourapp.com

The load balancer forwards it internally as HTTP.

Now your backend thinks:

"This is an HTTP request"

So when it generates redirects or absolute URLs, it returns:

http://yourapp.com/login

The browser blocks it because the original page was loaded over HTTPS.

That’s the entire bug.

The Wrong Fix

Many engineers solve this by enabling HTTPS everywhere internally too:

Browser → HTTPS → Load Balancer → HTTPS → Nginx

It works.

But now you manage certificates in two places and add unnecessary TLS overhead internally.

You fixed the symptom, not the root cause.

The Real Fix: X-Forwarded-Proto

Load balancers send a header like this:

X-Forwarded-Proto: https

This tells your backend:

“The original client request was HTTPS.”

Nginx

proxy_set_header X-Forwarded-Proto $scheme;

Express / NestJS

app.set('trust proxy', 1)

Django

SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')

FastAPI

Enable trusted proxy headers so redirects and generated URLs use HTTPS correctly.

Final Architecture

Browser (HTTPS)
      ↓
Load Balancer
      ↓
X-Forwarded-Proto: https
      ↓
Nginx
      ↓
Backend

Now your backend correctly generates:

https://yourapp.com

No mixed content.
No redirect loops.
No duplicate TLS setup.

TL;DR

SSL termination at the load balancer is the correct production setup.

Mixed content errors happen because your backend thinks requests are HTTP.

The proper fix is:

pass X-Forwarded-Proto
trust proxy headers in your framework
let the backend know the original request was HTTPS

One header.
One configuration.
Problem solved.

The Problems Users Won’t Tell You About

Samir Patil — Wed, 18 Mar 2026 13:22:13 GMT

Photo by Kristina Flour on Unsplash

When WhisperFlow started, the founders were trying to build something extremely ambitious: a system that could convert thoughts directly into text. No speaking. No typing. Just thinking.

It sounded like science fiction.

They spent almost two years trying to make it work. Eventually they couldn’t.

So they stepped back and asked a different question:

What is the smallest real problem we can solve right now?

The answer was surprisingly simple:

Press Fn → speak → text appears anywhere.

At first glance this almost looks like giving up on the original vision. From mind-reading technology to just… pressing a key and talking.

Anyone could criticize this:

“Why build this? We already have ChatGPT. We already have transcription. Just open an app, record, copy, paste.”

Technically correct.

Product-wise, completely wrong.

Because the real problem was never transcription.

It was friction.

Opening another app.
Switching tabs.
Copying text.
Returning back.
Rebuilding mental context.

Each step feels small. Together they destroy flow.

WhisperFlow didn’t invent a new capability.

They removed interruption.

And that is often where the real value lives.

Users Don’t Tell You Their Real Problems

There is another reason this insight is hard to discover:

Users rarely tell you their real problems.

Not because they are hiding them intentionally.

But because they want to sound competent.

When people are interviewed, they describe problems that sound professional:

Lip sync issues
Storyboarding taking too long
Image generation quality
Missing features
Technical limitations

These sound like serious problems.

What they rarely say:

I lose focus switching tools
My workflow is scattered
I waste time rebuilding context
I get distracted between steps
Too many tabs slow me down

Because these sound like personal inefficiencies, not product problems.

So people describe impressive problems instead of real friction.

The Problem I Only Understood After Living It

I understood this when I started building AI films for myself.

Before that, after talking to 20–30 filmmakers, I thought the main problems were:

Storyboarding
Lip sync
Sound design
Asset reuse

But when I actually did the work myself, I discovered my real bottleneck wasn’t any of these.

It was this:

Context switching.

My workflow looked like:

Script in one doc.
Storyboard in another.
Sound references somewhere else.
GPT for rewriting.
Claude for refinement.
Video tools for generation.
Audio tools for voice.

Constant switching.
Constant copying.
Constant mental resets.

Work that should have taken one hour took five or six.

Not because the work was hard.

Because the workflow was fragmented.

And here is the important part:

Nobody told me this problem.

I only understood it after experiencing the pain myself.

Because if someone had asked me earlier, I probably would not have said:

“My biggest problem is switching tabs.”

That doesn’t sound like a serious problem.

But living it showed me the truth:

This wasn’t a discipline problem.
This was a design problem.

The Founder Pattern Most People Miss

Here’s the pattern:

Users describe what sounds important.
But they suffer from what happens repeatedly.

A 2-minute friction repeated 50 times a day is a bigger problem than a 30-minute problem that happens once a week.

This is where many startup opportunities hide.

Not in big obvious problems.

But in small repeated interruptions.

The best founders don’t just listen to what users say.

They notice:

Where energy drops
Where attention breaks
Where steps repeat
Where context resets
Where flow dies

Because productivity is rarely destroyed by difficulty. It is usually destroyed by interruption.

The Real Lesson

Don’t just interview users. Become one.

Because:

Users explain impressive problems.
But experience reveals expensive ones.

And sometimes the most valuable companies are built not by solving the biggest problems…

…but by removing the smallest frictions that nobody thinks are worth mentioning.

Stop the Silent Bloat: Why LangChain Checkpoints Make Your DB 10x Bigger (and How to Clean It)]

Samir Patil — Tue, 06 Jan 2026 19:54:27 GMT

Photo by Bernd 📷 Dittrich on Unsplash

If you are using LangChain in production, one issue will surface sooner or later: the database size grows much faster than expected.

Your actual business data might be only 40–50 MB, but your database can easily grow to 700–800 MB or more. This growth is not caused by your core application data — it comes from LangChain’s checkpointing and state management.

This article explains why this happens, which tables are responsible, and how to safely clean them up using a simple, production‑ready cleanup function.

Why LangChain Databases Grow So Fast

LangChain maintains execution state to support:

Stateful agents
Long‑running workflows
Recovery from failures
Tool execution history

To enable this, LangChain (and LangGraph‑style workflows) writes data into multiple internal tables on every agent step.

Over time, this state data accumulates rapidly — especially when:

You run multiple agents
You retry executions
You scale horizontally
You do not clean up old threads

Tables Responsible for Database Bloat

Most of the unnecessary database growth comes from these tables:

checkpoints — Stores serialised agent state as JSON (including timestamps)
checkpoint_blobs — Stores large binary or serialised payloads
checkpoint_writes — Stores intermediate writes during execution

These tables grow continuously and are not automatically cleaned up by LangChain.

Why Cleanup Is Mandatory in Production

If you don’t clean old checkpoints:

Database size grows indefinitely
Indexes become bloated
Queries slow down
Backups become larger and slower
Storage costs silently increase

In many real systems, 90%+ of database size is just expired checkpoint data.

Cleanup Strategy

The safest cleanup strategy is:

Delete threads whose latest checkpoint is older than a defined retention period (for example, 30 days).

This ensures:

Active threads remain untouched
Only inactive or abandoned agent runs are removed
No corruption of running workflows

Step 1: Identify and Clean Old Threads

Below is a single, production‑ready function that:

Finds all expired thread_ids
Deletes related data from:
checkpoints
checkpoint_blobs
checkpoint_writes
Runs safely in one transaction

Cleanup Function

from datetime import datetime, timedelta, timezone
import logging


def cleanup_old_threads(checkpointer, retention_days: int) -> None:
    """
    Deletes all threads whose latest checkpoint is older than the given retention period.
    This removes data from:
      - checkpoints
      - checkpoint_blobs
      - checkpoint_writes
    """

    logging.info(
        f"Running checkpoint cleanup for threads older than {retention_days} days..."
    )

    cutoff_date = (
        datetime.now(timezone.utc) - timedelta(days=retention_days)
    ).isoformat()

    # Find thread_ids whose latest checkpoint timestamp is older than cutoff
    query = """
        SELECT thread_id
        FROM checkpoints
        GROUP BY thread_id
        HAVING MAX(checkpoint ->> 'ts') < %s
    """

    try:
        with checkpointer.conn.connection() as conn:
            with conn.cursor() as cur:
                cur.execute(query, (cutoff_date,))
                thread_ids = [row[0] for row in cur.fetchall()]

                if not thread_ids:
                    logging.info("No expired threads found.")
                    return

                logging.info(f"Found {len(thread_ids)} expired threads. Deleting...")

                # Bulk delete all related data
                cur.execute(
                    "DELETE FROM checkpoints WHERE thread_id = ANY(%s)",
                    (thread_ids,),
                )
                cur.execute(
                    "DELETE FROM checkpoint_blobs WHERE thread_id = ANY(%s)",
                    (thread_ids,),
                )
                cur.execute(
                    "DELETE FROM checkpoint_writes WHERE thread_id = ANY(%s)",
                    (thread_ids,),
                )

        logging.info(f"Successfully deleted {len(thread_ids)} expired threads.")

    except Exception as e:
        logging.error(f"Checkpoint cleanup failed: {e}")

Operational Best Practices

Run this cleanup as a cron job (daily or weekly)
Start with 30‑day retention, then tune
Monitor database size before and after cleanup
Test on staging before running in production

Psql Query

WITH expired_threads AS (
    SELECT thread_id
    FROM checkpoints
    GROUP BY thread_id
    HAVING MAX((checkpoint ->> 'ts')::timestamp) < NOW() - INTERVAL '7 days'
),
del_checkpoints AS (
    DELETE FROM checkpoints 
    WHERE thread_id IN (SELECT thread_id FROM expired_threads)
),
del_blobs AS (
    DELETE FROM checkpoint_blobs 
    WHERE thread_id IN (SELECT thread_id FROM expired_threads)
)
DELETE FROM checkpoint_writes 
WHERE thread_id IN (SELECT thread_id FROM expired_threads);

Reclaim space (WARNING: This locks the tables!)

VACUUM FULL checkpoint_writes; 
VACUUM FULL checkpoint_blobs;
VACUUM FULL checkpoints;

Final Thoughts

LangChain is extremely powerful, but stateful agent systems require maintenance.

If you are using checkpoints in production and not cleaning them up:

You are paying a hidden storage tax
Your database is doing unnecessary work
Scaling becomes harder over time

A simple cleanup job like this can reduce database size by 80–90% and immediately improve performance.

If LangChain is running in production, checkpoint cleanup is not optional it’s mandatory.

The Subtle JWT Issue That Broke Our On-Prem Deployment (And Took Half a Day to Debug)

Samir Patil — Wed, 10 Dec 2025 12:08:47 GMT

We recently hit an unexpectedly tricky JWT authentication problem during an on-prem deployment for one of our enterprise customers. On paper, the setup was simple:

Service A → Generates JWTs by talking to Keycloak
Service B → Validates those JWTs, again using Keycloak

Photo by Chanhee Lee on Unsplash

All three components — Service A, Service B, and Keycloak were running on the same machine. Nothing exotic. No networking gymnastics. And yet, every token validation attempt was failing.

The root cause turned out to be one of those “obvious only after you know it” bugs.

🔍 The Setup: Same Machine, Different URLs

Even though everything was deployed on a single server, each service reached Keycloak using a different base URL:

http://172.17.0.1:/auth
http://localhost:/auth
http://keycloak.mycompany.local:/auth

All of these pointed to the same Keycloak instance.

So why should it matter?

Because it turns out: the host Keycloak uses when issuing a token must match the host used when validating it.

💥 The Breakage: Mismatched Issuer Hosts

When Service A requested a JWT from Keycloak, the token included metadata — most importantly, the issuer (the iss claim). That issuer is tied to the exact base URL Keycloak sees during token generation.

So if Service A generated a token using:

http://172.17.0.1:/auth

Then Service B must validate the token using the same exact URL.

But in our case, Services A and B were using different hosts to talk to Keycloak. And that tiny mismatch meant:

The iss claim inside the token didn’t match
Keycloak rejected the token during validation
Every validation attempt failed, even though the token itself was perfectly valid

No obvious errors — just consistent validation failures.

🧠 The “Aha!” Moment

Looking back, it feels like a simple configuration oversight.

But during debugging — between unclear documentation, container networking, and the complexity of on-prem environments — this tiny inconsistency took half a day to uncover.

Once we aligned all services to use the same Keycloak base URL, everything started working instantly.

✅ Takeaway: Consistency Matters (More Than You Think)

If you’re using Keycloak (or any identity provider that embeds issuer metadata), make sure:

All services use the same base URL to interact with the auth server
Avoid mixing localhost, IPs, and hostnames
Ensure container-internal URLs match whatever will be used for validation

It’s a small detail, but it can completely break your authentication flow in ways that are not immediately obvious.

Solving the Silent Postgres Disconnect Problem in Agentic Systems

Samir Patil — Sat, 15 Nov 2025 17:58:52 GMT

How I Fixed Idle Connection Failures in Production

Long-running agent workflows are one of LangGraph’s biggest strengths. But they also expose a subtle production issue: PostgreSQL quietly kills idle connections, and your persistence layer isn’t always ready for it.

Photo by Kerin Gedge on Unsplash

If your app holds a connection for too long and Postgres restarts, times it out, or drops it, you won’t find out until the next query suddenly fails. For agent systems that depend on continuous checkpointing, that failure can stall runs or even require a server restart.

I ran into this exact issue. Here’s what was happening and the solution that made my persistence layer production-proof.

The Problem: Idle Connections Die in Production

In development, your database rarely disconnects. But in real deployments, several things can kill idle sessions:

Postgres maintenance restarts
idle_session_timeout or PgBouncer timeouts
Short network blips or failovers

When your persistence layer tries to reuse one of these dead connections, the driver throws an error like:

connection not open
server closed the connection unexpectedly

Without explicit recovery, that error bubbles up and breaks checkpointing logic. Some setups even end up needing a full restart to reset connections.

That’s unacceptable for agentic systems running for hours — or days.

What Reliability Should Look Like

A resilient persistence layer should:

Detect retry-able connection errors
Reconnect automatically
Retry the failed operation
Do all of this transparently, without changing the rest of your code

Surprisingly, this behavior isn’t built into most savers by default.

The Fix:

ResilientPostgresSaver

To solve this, I built a wrapper: ResilientPostgresSaver.

It sits around the existing Postgres saver and adds reconnection + retry logic in a clean, centralized way.

Key Features

1. Automatic Connection Error Detection

The saver intercepts errors associated with terminated or invalid connections. Instead of failing fast, it attempts controlled recovery.

2. Retry Logic with Backoff

You can set:

max_retries
retry_delay

This makes the retry flow predictable and production-safe.

3. Works With Connection Pools

Instead of destroying the pool, it simply acquires a fresh connection.

This avoids double-closing and keeps pool state healthy.

4. Drop-In Replacement

You don’t have to touch anything else in your LangGraph code.

5. Survives Forced Termination (pg_backend_terminate)

Even if a backend connection is explicitly terminated (for example, using pg_backend_terminate), the saver detects the failure, acquires a fresh connection, and retries, so workflows keep running uninterrupted.

Usage Example

conn_pool = get_connection_pool(database_url)
checkpointer = ResilientPostgresSaver(
    conn_pool,
    max_retries=3,
    retry_delay=2.0
)
checkpointer.setup()

If the connection was idle and Postgres restarts, the saver:

Catches the failure
Reconnects
Retries the operation

Your agent continues running without interruptions.

Impact

After integrating this, I saw immediate improvements:

Zero crashes during brief database restarts
Stable agent runs even with long idle periods
No more manual process restarts after connection resets

It turned out to be one of those small architectural upgrades that significantly improves reliability.

Final Thoughts

Postgres is reliable, but production environments aren’t perfect. Idle connections can and will be dropped. If you’re running LangGraph with Postgres, your persistence layer needs to handle that reality.

ResilientPostgresSaver ensures that transient database hiccups don’t break your workflows.

I have used this in my agentic repo template : https://github.com/samirpatil2000/agentic-template/blob/main/agents/resilient_postgres_saver.py

Code Snippet

import time
from typing import Optional
import logging

from langgraph.checkpoint.postgres import Conn, PostgresSaver
from langgraph.checkpoint.serde.base import SerializerProtocol
from psycopg import Connection, Pipeline
from psycopg.errors import OperationalError
from psycopg.rows import dict_row
from psycopg_pool import ConnectionPool

# Configure logger
logger = logging.getLogger(__name__)


class ResilientPostgresSaver(PostgresSaver):
    def __init__(
        self,
        conn: Conn,
        pipe: Optional[Pipeline] = None,
        serde: Optional[SerializerProtocol] = None,
        max_retries: int = 3,
        retry_delay: float = 2.0,  # seconds
    ) -> None:
        super().__init__(conn, pipe, serde)
        self.max_retries = max_retries
        self.retry_delay = retry_delay

    def _execute_with_retries(self, query_func, *args, **kwargs):
        retries = 0
        while retries < self.max_retries:
            try:
                return query_func(*args, **kwargs)
            except (OperationalError, ConnectionError) as e:
                logging.error(
                    f"Database operation failed: {e}, retrying...", exc_info=True
                )
                retries += 1
                if retries >= self.max_retries:
                    raise
                time.sleep(self.retry_delay)
                self._reconnect()

    def _reconnect(self):
        try:
            self.conn.close()
        except Exception as e:
            logging.exception(f"Error closing connection {e}", exc_info=True)

        if isinstance(self.conn, ConnectionPool):
            logging.info(f"Reinitializing connection pool for {self.conn.conninfo}")
            try:
                self.conn = get_connection_pool(self.conn.conninfo)
            except Exception as e:
                logging.exception(
                    f"Error reinitializing connection pool {e}", exc_info=True
                )
        else:
            try:
                self.conn = Connection.connect(
                    self.conn.conninfo,
                    autocommit=True,
                    prepare_threshold=0,
                    row_factory=dict_row,
                )
            except Exception as e:
                logging.exception(f"Error reconnecting {e}", exc_info=True)

    def setup(self) -> None:
        self._execute_with_retries(super().setup)

    def list(self, *args, **kwargs):
        return self._execute_with_retries(super().list, *args, **kwargs)

    def get_tuple(self, *args, **kwargs):
        return self._execute_with_retries(super().get_tuple, *args, **kwargs)

    def put(self, *args, **kwargs):
        return self._execute_with_retries(super().put, *args, **kwargs)

    def put_writes(self, *args, **kwargs):
        return self._execute_with_retries(super().put_writes, *args, **kwargs)


def get_connection_pool(db_url: str):
    connection_kwargs = {
        "autocommit": True,
        "prepare_threshold": 0,
        "application_name": "cosmos",
    }

    conn_pool = ConnectionPool(
        conninfo=db_url,
        kwargs=connection_kwargs,
        min_size=1,
        max_size=4,
        max_idle=60 * 2,
    )
    return conn_pool

Build Agentic AI Systems Faster with a FastAPI + LangGraph Workflow Template

Samir Patil — Tue, 11 Nov 2025 13:43:15 GMT

I just shipped an open-source template for orchestrating agentic AI systems — built on FastAPI and LangGraph — designed to help you go from idea to production-ready workflows, quickly and cleanly.

Photo by Volodymyr Hryshchenko on Unsplash

What’s inside

FastAPI REST endpoints to start and continue workflows
LangGraph-powered orchestration with modular nodes
State management, checkpointing, and PostgreSQL persistence
Non-blocking execution via ThreadPoolExecutor
A clean, extensible architecture that’s easy to grow
And more

Why it helps

Kickstart agent workflows with a sensible baseline
Get built-in persistence plus interrupts/continuations
Scale with a clear, maintainable structure

Who it’s for

Teams prototyping agentic systems
Developers who need reliable state + orchestration from day one
Anyone tired of stitching together ad hoc workflow plumbing

I’m actively improving the template and would love your feedback. Raise an issue or open a PR to add features, fix rough edges, or suggest patterns.

Repo link: https://github.com/samirpatil2000/agentic-template

Stop Wasting Storage: A Simple Way to Auto-Cleanup Orphaned Files in Your Notes, Comments Editor

Samir Patil — Wed, 24 Sep 2025 19:46:30 GMT

Photo by Leiada Krözjhen on Unsplash

When you’re building a document editor with features like auto-save and file uploads (images, PDFs, etc.), one sneaky problem often creeps in:

Files stay in storage even when the user deletes them from the document — leading to wasted space and higher costs over time.

Let’s break this down and see how we can solve it once and for all.

The Problem

Imagine this workflow:

A user uploads an image.
Your frontend immediately uploads the file to blob storage (like S3) and inserts its URL into the document content.
Your app auto-saves the updated content.
The user changes their mind, deletes the image from the document, and moves on.

Here’s the catch:

❌ The file is still sitting in storage.

Over time, thousands of such orphaned files can accumulate, bloating your storage bill and making it harder to manage content.

✅ The Solution: Delayed Cleanup

Instead of trying to clean up files in real-time (which is tricky with auto-save), you can use a delayed cleanup mechanism.

This approach gives users a grace period to undo changes — while keeping your storage clean and lean.

Here’s the step-by-step strategy:

1️⃣ File Upload Handling

When a user uploads a file, store it under a document-scoped path:

/documents/{documentId}/{uniqueFileId}.{extension}

This makes it easy to list all files related to a specific document later.

Immediately insert the file URL into the document’s in-progress content and auto-save.

2️⃣ Track Document Updates

Every time a document is updated, push a small payload into a queue (or Redis list):

{
  "documentId": "12345",
  "lastUpdatedAt": "2025-09-24T18:12:00Z"
}

This tells your system that something changed and might need cleanup later.

3️⃣ Delayed Cleanup Trigger

Have a background worker consume this queue.

For each update:

Wait for a grace period debounce (e.g. 5 minutes).
If no further edits occur within that time window, consider the document “stable.”
Trigger the cleanup job.

4️⃣ Cleanup Job Steps

The cleanup job runs through this sequence:

Fetch the latest document content from the database.
Parse and collect all file URLs still referenced in the content.
List all files under /documents/{documentId}/ in storage.
Compare both lists (set difference):
Files in storage but not in the document = orphaned files.
Delete orphaned files from storage.

🎯 Benefits of This Approach

✅ Automatic Cleanup — No manual admin work required.

✅ Storage Cost Savings — Unused files don’t pile up.

✅ User-Friendly — The grace period allows undoing accidental deletes.

✅ Consistency — Storage reflects the true state of your document.

🧠 Final Thoughts

Building collaborative, auto-saving editors is tricky enough without worrying about rogue files silently eating into your storage budget.

By introducing a delayed cleanup process, you get:

A clean document-to-file relationship
Lower storage bills

This is one of those small but powerful optimizations that make your system more robust at scale.

⚠️ Important Note on Versioning

If your system uses document versioning (e.g. storing a new row in the database for every save), this approach will need some adjustments.

You have two options:

Disable cleanup for older versions if you want to preserve full history (and thus keep all files ever referenced).
Limit cleanup to the latest version by only fetching and parsing the most recent version from the database.

Make sure your cleanup logic respects your versioning strategy otherwise, you risk deleting files that are still referenced in older document versions.

Setting Up n8n with PostgreSQL and NGINX (SSL Included)

Samir Patil — Sat, 20 Sep 2025 07:00:29 GMT

If you’re self-hosting n8n, running it with a proper database (PostgreSQL) and securing it with NGINX + SSL is the way to go.

This guide walks you through the exact steps I used to get a production-grade setup running — no guesswork, just copy, paste, and ship.

🛠 Prerequisites

Before we dive in, make sure you have:

✅ Ubuntu server with Docker + Docker Compose installed

✅ Domain name — e.g., n8n.your-company.tld pointing to your server

✅ SSL certificate (self-signed or from Let’s Encrypt/your CA)

🔧 Step 1: Configure NGINX as Reverse Proxy

Let’s start by putting n8n behind NGINX with SSL termination.

Edit your NGINX site config (/etc/nginx/sites-available/default):

server {
    listen 443 ssl http2;
    client_max_body_size 2G;

    ssl_certificate /etc/nginx/ssl/ssl-bundle.crt;
    ssl_certificate_key /etc/nginx/ssl/sandbox_private_key.pem;

    server_name n8n.your-company.tld;

    proxy_read_timeout 300;

    # Security headers
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload";
    add_header X-XSS-Protection "1; mode=block";
    add_header X-Content-Type-Options nosniff;
    add_header Referrer-Policy "strict-origin";

    location / {
        proxy_pass http://127.0.0.1:5678/;

        # WebSocket support (critical for n8n!)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_read_timeout 600s;
        proxy_send_timeout 600s;
    }
}

Test & reload:

sudo nginx -t && sudo systemctl restart nginx

Step 2: Environment Variables

Create a .env file in your project directory:

# Database
DB_TYPE=postgresdb
DB_POSTGRESDB_HOST=postgres
DB_POSTGRESDB_PORT=5432
DB_POSTGRESDB_DATABASE=n8n
DB_POSTGRESDB_USER=postgres
DB_POSTGRESDB_PASSWORD=postgres

# n8n
N8N_BASIC_AUTH_ACTIVE=true
N8N_BASIC_AUTH_USER=admin
N8N_BASIC_AUTH_PASSWORD=secret

N8N_HOST=n8n.your-company.tld
N8N_PROTOCOL=https
N8N_PORT=5678

VUE_APP_URL_BASE_API=https://n8n.your-company.tld/

💡 Pro tip: Use a strong password in production. Consider managing secrets with Vault or Doppler.

Step 3: Docker Compose Setup

Create docker-compose.yml:

version: '3.8'

services:
  postgres:
    image: postgres:15
    restart: always
    environment:
      POSTGRES_USER: ${DB_POSTGRESDB_USER}
      POSTGRES_PASSWORD: ${DB_POSTGRESDB_PASSWORD}
      POSTGRES_DB: ${DB_POSTGRESDB_DATABASE}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432" # Optional — only if you need external access

  n8n:
    image: n8nio/n8n:1.111.0
    restart: always
    ports:
      - "5678:5678"
    env_file:
      - .env
    depends_on:
      - postgres
    volumes:
      - ./n8n_data:/home/node/.n8n

volumes:
  postgres_data:
  n8n_data:

Bring everything up:

docker-compose up -d

Check logs to ensure n8n is running properly:

docker-compose logs -f n8n

Step 4: Access Your Instance

Once everything is running, open:

➡️ https://n8n.your-company.tld

Username: admin
Password: secret

💡 Pro Tips for Production

🔒 Use Let’s Encrypt for free, auto-renewing SSL

🔑 Rotate your N8N_BASIC_AUTH_PASSWORD regularly

🧱 Ensure your firewall allows 80/443

⚡ Keep WebSocket headers in NGINX — n8n won’t work properly without them

By following these steps, you now have a secure, scalable, production-ready n8n setup with PostgreSQL as a persistent database and NGINX handling SSL and reverse proxying.

Here is github repo : https://github.com/samirpatil2000/n8n

LLM Mocks: Your New Dev Shortcut

Samir Patil — Sun, 22 Jun 2025 15:16:43 GMT

When you’re building on top of an LLM — whether it’s for code generation, image creation, or reasoning over user inputs , waiting for real-time responses can slow you down. We’ve all been there: you’ve already verified the LLM prompt and logic, but now you’re iterating on how your application processes the output. Do you really need to wait 30s to 3 minutes for every run?

Mocking saves time, speeds up feature development, and makes debugging predictable.

Here’s the better approach to mock the LLM response.

🔍 Why mocking LLM responses is important:

You’ve already validated your prompt, no need to re-query LLMs just to test UI or downstream logic.
LLM APIs are slow (esp. for image/gen tasks), mocking makes your dev loop instant.
You want to work offline or avoid burning through API credits.
Easier unit testing of LLM-dependent flows.

mock_config.json

{
  "analyseUserIntent": {
    "mock": true,
    "response": {
      "content": "User wants to create a new marketing campaign."
    }
  },
  "generateCampaign": {
    "mock": true,
    "response": {
      "content": "Here's a draft campaign based on your input..."
    }
  },
  "searchSimilar": {
    "mock": false,
    "response": {
      "content": "Matching records found: ..."
    }
  },
  "generateImage": {
    "mock": false,
    "response": {
      "content": "",
      "image_url": ""
    }
  }
}

Here’s the Python backend function that wraps LLM usage. You can do the same in any language.

def call_llm(purpose, user_input):
    if mock_config[purpose]["mock"]:
        return mock_config[purpose]["response"]
    
    # Actual LLM call
    return actual_llm_call(purpose, user_input)

On the frontend, we use a similar approach with conditional logic based on environment or flags.

Love to hear, what are your thoughts.