Stories by Hugo Sequier on Medium

Learning the Result Forgetting the Process

Hugo Sequier — Wed, 13 May 2026 05:53:08 GMT

From school to AI, I was often taught to pass the test before I was taught to understand what I was doing.

For a long time, I knew how to get decent grades without really understanding. Not always. Not in every subject. But often enough for it to become a method.

I was not looking for the why first. I was looking for the pattern. The type of exercise. The formula that matched. The recurring trap. The shortcut that worked on exam day.

And the worst part is that the system proved me right.

Image generated by GPT Image 2.0

I learned how to beat the test

At school, I cheated quite often.

Not in some spectacular way. Not the heist version with an earpiece, an accomplice, and a cheat sheet printed in font size 4. More the ordinary kind of cheating, the kind many people know but rarely admit. Looking at an answer. Preparing a trick. Memorizing just enough. Learning the combinations that showed up often.

Over time, I became good at one specific thing: recognizing the situation.

In math, I did not always understand the theorem deeply. But I knew where to use it. I would see a shape, a type of question, a clue in the prompt, and my brain would say: here, use this formula. There, take the derivative. There, factorize. There, do what we did in exercise 7 of the chapter.

That was not understanding. That was pattern recognition.

I see it much more clearly now. It may also explain why I loved the Souls games so much, Dark Souls, Elden Ring, Bloodborn. You die, you watch, you recognize the attack, you answer at the right time. The boss is no longer a mystery. It becomes a sequence. You do not need to understand the metaphysics of Lordran to dodge a charged attack.

At school, I was doing the same thing.

I was not always learning the subject. I was learning the boss.

School said ”understand”, but graded “succeed”

Students are told they need to understand. That is the official story. Serious teachers repeat it, and they are right.

But in many key moments, the reward lands somewhere else: performance at one specific point in time.

You can work slowly, ask good questions, improve, change the way you think. If you fail the exam, the visible trace is bad. On the other side, you can cram, copy a method, recognize two exercise types, and get a good grade. The visible trace is good.

The report card does not tell the process.

It tells the result.

MCQ tests make this even clearer. They can be useful, of course. But they often reward one very specific skill: picking the right box. Sometimes you know. Sometimes you eliminate. Sometimes you guess intelligently. And sometimes you simply recognize the structure of the question.

The moments where I felt the human was actually being evaluated were rarer. Oral exams, some live exercises, some reasoning problems where you had to explain your path. There, you could not just drop the result like a package at the door. You had to show how you were thinking. Your hesitation mattered. The way you corrected a mistake mattered. Your ability to stand in front of someone pushing you a little mattered.

It was uncomfortable.

So probably closer to learning.

The process is invisible, so we abandon it

The problem is that process does not show well.

A grade shows. A degree shows. A job title shows. Income shows. An app launched in 48 hours shows. A clean dashboard screenshot shows.

The process is often ugly.

It looks like drafts, mistakes, crossed-out sentences, moments where you realize you do not know, steps backward, days with nothing worth posting.

Nobody wants to post: “Today I realized my solution was wrong, so I deleted three days of work.” Yet sometimes that is the best day of the week.

This obsession with results also creates a toxic bias: we mostly listen to survivors.

Someone succeeds, then explains that it was possible, that you just had to believe, work hard, wake up early, stay disciplined, do what they did. Their main argument becomes their success. Look, it worked for me.

But we do not see the people who applied the same method and failed. We do not see the people who did not start with the same conditions. We do not see the failed attempts, the luck, the invisible support, the timing, the network, the mental health, the family money, the language, the city, the randomness.

We turn a survivor’s result into a universal lesson.

And because we mostly judge the end, we start looking down on anything that does not produce quickly.

Thinking becomes suspicious. Doubt becomes weakness. Going back to basics feels like wasted time. Deep understanding seems less profitable than knowing what to apply at the right moment.

I know that reflex because I used it for years.

The quiet disappearance of “why?”

There is another loss inside this, quieter but maybe more damaging.

We stop asking why.

Not because the question became useless. The why is still where real understanding begins. Why does this theorem work? Why is the formula shaped like that? Why does this architecture fail here and not there? Why did this strategy work once and collapse the second time?

The why is curiosity, but it is also compression. When you understand why something works, you do not need to memorize every possible variation. You can rebuild the answer from first principles. You can adapt when the exercise changes slightly. You can notice when a shortcut no longer applies.

But the why is expensive.

It takes time. It makes you slower at first. It does not always improve the next grade, the next demo, the next screenshot, the next visible result. You can spend an hour understanding a concept deeply and receive no external signal for it. No applause. No peer validation. No metric moving up. Nobody sees the small internal click.

So you start asking why only for yourself.

And that is hard to maintain in a world that rewards everything around it. The finished feature gets praise. The high mark gets praise. The launch post gets likes. The person who asks why might just look slow.

At first, you still ask. Then you ask less. Then you ask only when you have time. Then time never comes. Step by step, the space that used to belong to curiosity gets occupied by output.

That is how you lose the why without noticing.

Not through one dramatic decision. Through a thousand small moments where understanding has no immediate reward, while results do.

AI makes the shortcut almost perfect

In my last article, I wrote about the productivity trap of AI tools. The more I think about it, the more I believe AI did not create our obsession with results. It made it much more efficient.

Before, there was still friction.

If you wanted to code a feature, you had to fight with the language, the API, the bugs, the files, the documentation. Even if you wanted to go fast, reality forced you to meet part of the process.

Now you can delegate a lot.

You give an intention to Claude Code, Codex, or any other AI Coding Agents. The agent writes the files, proposes an architecture, runs tests, fixes errors, formats everything. You get back a clean result, often impressive. And if you are not careful, your brain claims the result without having lived the path.

That is where something breaks.

Because our sense of legitimacy is not built only on what exists at the end. It is built on effort, decisions, errors crossed, small cognitive pains that prove to your brain: I was there, I participated, I understood enough to move forward.

When AI takes the whole process and you keep only the deliverable, attribution becomes blurry.

Did I build this feature? Did I only ask someone to build it? Do I really understand this code? Could I rebuild it without help?

Impostor syndrome loves that kind of grey zone.

Image generated by GPT Image 2.0

I do not want to confuse shortcut with intelligence anymore

I do not regret learning how to recognize patterns. It is useful. It is even a big part of practical intelligence.

A good engineer recognizes architectures. A trader recognizes setups. A musician recognizes progressions. A Souls player recognizes the animation that announces the slap.

The problem starts when recognition replaces understanding entirely.

For a long time, I saw the why as a waste of time. Why learn a formula deeply if the exam lasts two hours and the next chapter starts on Monday? Why understand the whole architecture if the agent can generate a solution that passes the tests? Why slow down if the world rewards people who show output quickly?

That logic works. Until it does not.

It works to pass an exam. It works to produce a demo. It works to feel like you are moving forward. Then one day, you need to explain, change, fix, transmit, teach, decide without an obvious model. And there, patterns are not enough.

You realize you optimized for exam day.

Not for the person you wanted to become.

Protecting the process

I do not have a clean moral lesson to sell here. I still use AI every day. I still like shortcuts. I am still tempted by the fast result, the good grade, the clean screenshot, the pleasant feeling of having moved forward.

But I am trying to draw a boundary again.

Sometimes, I should let AI do it. Sometimes, I should do it myself. Not out of artisanal pride. To keep the muscle.

Write before asking for a rewrite. Draw the architecture before generating the code. Explain a concept without looking at the answer. Read a file from top to bottom. Redo a calculation by hand. Accept spending one hour on a problem AI could probably solve in twenty seconds.

Not all the time. That would be ridiculous.

But often enough for my brain to stay involved in its own life.

Because when we only value the result, we eventually become strangers to what we produce. We collect visible proof of competence, but we lose the private feeling of having understood something.

I think that is the real danger.

Not that AI makes us stupid overnight. More that it lets us continue an old habit with much more powerful tools: succeeding without always learning, producing without always understanding, collecting credit for a process we barely went through.

So the question I want to keep in mind is simple.

When I get a good result, which part of the path truly belongs to me?

The Productivity Trap of AI

Hugo Sequier — Sun, 10 May 2026 06:46:35 GMT

We learned to produce faster, but I am not sure we learned what that production is doing to us.

Around AI, productivity became the whole personality. Not thinking. Producing.

I use these tools every day. Claude Code, Codex, Cursor, agent workflows, multi-terminal setups, API credits burned like jet fuel. I have been deep inside that wave.

So this is not an anti-AI rant. I use these tools. I like them. They can free people in real ways.

It is a self-reflection from someone who likes the tools and still feels that something is wrong, not only in software, but in the way we look at work, value, intelligence, and each other.

Image generated by GPT Image 2.0

The New Standard Is Output

For a while, AI coding tools felt like magic because they removed friction.

You had an idea. You described it. Depending on your preference, you opened Claude Code, Codex, Cursor, or another agentic coding tool. The model produced a first version, edited files, ran commands, and pushed the project forward. Suddenly, a solo developer could move like a small team.

That part is real. I do not want to pretend otherwise. I have seen these tools save hours across agent systems, trading bots, internal tools, and content workflows. When the setup is clean, the gain is obvious.

But the social layer around these tools changed fast, and this is the part that feels bigger than coding. The conversation stopped being: “Can we build better things?” It became: “How much did you produce today?”

Screenshots of apps built in one weekend. Videos of agents writing thousands of lines. Threads about running several agents or terminals in parallel. People talking about spending $1,000 in API calls like it is the price of being serious. If you are not using one of the newest agentic coding tools, you feel late.

The new standard is not quality. It is motion.

And once the standard becomes motion, stillness starts to feel like failure.

That is the part that bothers me.

What the Metrics Hide

Lines of code are easy to count (hello Garry). Features are easy to demo. Deployment frequency looks good in public. Architecture does not look good in a tweet. Neither does code review, deleting code, or spending two hours deciding that the feature should not exist.

AI coding tools make this worse because they are extremely good at producing visible output. Ask for a dashboard, you get a dashboard. Ask for auth, you get auth. Ask for a SaaS boilerplate, you get a SaaS boilerplate. The first impression is strong enough to feel like progress.

But “it works” is a very low bar.

Does it scale? Can another developer understand it? Can you change it safely next month? Do you know why the agent made that architecture choice? Did anyone review the failure modes?

Most of the time, we skip those questions because the demo already feels like success.

I have caught myself doing this. Letting the agent push forward because forward feels productive. Accepting files I did not fully read because the tests passed. Asking for one more feature instead of asking whether the system was still coherent.

That is a dangerous habit. Not because the machine is evil. Because the machine rewards impatience.

Image generated by GPT Image 2.0

This Is Bigger Than Coding

The uncomfortable part is that coding is just one version of a wider problem.

Capitalism is very good at turning human activity into measurable output. More calls. More posts. More tickets closed. More meetings. More workouts. More content. More apps. More everything.

And when the number becomes the target, the activity changes shape.

People do not write to clarify ideas. They write to publish. They do not exercise to feel healthy. They exercise to track streaks. They do not learn to understand. They learn to display progress. They do not build software to solve a problem. They build because shipping is proof that they are not falling behind.

That pressure is not new. AI just compresses the cycle, then makes the cycle look normal. Before, if you wanted to build an app without thinking, reality stopped you. You had to design the database. You had to wire the frontend. You had to fight with the API. The friction forced some reflection.

Now the friction is optional.

You can wake up with a vague idea, open your preferred coding agent, run several tasks in parallel, generate the repo, the UI, the backend, the landing page, the post announcing it, and feel like you built something.

Maybe you did. But maybe you only supervised output.

That distinction matters because our brains adapt to what we ask from them. If the first move is always delegation, we train ourselves to skip the slow part: framing the problem, sitting with ambiguity, choosing constraints, saying no.

We start losing the muscle of thinking before doing. And then we call it productivity. But this is not only about ambitious developers shipping too many side projects. That is almost the privileged version of the problem.

For many people, work is not a playground where they can optimize workflows and automate boring tasks. Work is rent. Food. Debt. A salary that arrives at the end of the month and sometimes still does not cover the bills.

When life gets that tight, people do not have much space left to think about “the future of work.” They think about surviving this month.

And then AI arrives with two promises at the same time. The first promise is beautiful: you can produce more with less. You can save time. You can learn faster. You can create things without asking permission. The second promise is darker: if you do not adapt, you may be replaced. Both promises are true enough to be dangerous.

The Inequality Nobody Wants to Say Out Loud

I hear people say that AI is available to everyone. Technically, yes. Many people can open ChatGPT or Claude. A phone is enough to start.

But access is not the same thing as power.

To make money with AI, you usually need more than a prompt box. You need taste. Domain knowledge. Time after work. Confidence. A network. Some understanding of business, code, design, distribution, or operations.

Those things are not distributed equally.

If you grew up around educated people, if you learned how to learn, if you already work with information, AI can feel like a multiplier. You have processes to automate. You have problems to package. You know where the money is.

Obviously, everyone can read a $10 book (or borrow at the library) or watch free content on YouTube to learn about AI, business, networking, etc. But there is a true dissociation between access and orientation. The resources may be public, but knowing what to search for, why it matters, how to evaluate it, and how to turn it into leverage is itself a form of privilege.

The internet made information cheap, but it did not make opportunity equally usable.

If you are a cashier in a grocery shop, the story is different. You can know exactly what is coming. Self-checkout, cameras, robots, inventory systems, automated support, fewer humans on the floor. You can see the direction. But what are you supposed to do with that information after an exhausting shift?

Open Claude and build a SaaS? Start a personal brand? Buy a course from someone selling “AI side hustles” with screenshots of fake revenue?

That advice is insulting when it ignores the conditions people live in. I do not believe that a scammy e-book or a two-hour online formation gives someone the knowledge needed to turn AI into economic freedom. Useful knowledge is slower than that. It comes from exposure to systems, markets, clients, failure, and people who know the game.

So AI may emancipate some people while making others more replaceable.

That is the part I cannot shake.

The Question I Cannot Ignore

I understand why everyone is excited. A single developer can now do things that were absurd two years ago. That is not a small shift. It opens real possibilities for builders, indie hackers, researchers, small teams, and people who were blocked by lack of resources.

But I do not want to confuse access with wisdom.

The ability to produce software does not automatically create the ability to design software. The ability to generate content does not create a point of view. The ability to access AI does not create equal opportunity.

And if we are not careful, we will build a culture where the highest-status person is the one producing the most artifacts, not the one thinking the most clearly.

That culture will produce apps. Many of them. It will also produce burnout, shallow thinking, fragile systems, and people who feel behind even while moving faster than ever.

So the questions I keep coming back to are simple, but I do not think they have simple answers.

Is it really productive to ship ten apps in a week if you do not understand what you built?

Is AI really emancipating people if the people best positioned to benefit are already the people with the most education, time, language skills, and network?

What happens to the people whose jobs are easy to automate but whose lives leave them almost no room to reskill?

What happens to our brains when the first reflex is no longer to think, but to delegate?

I do not have a clean answer. Maybe that is why this topic keeps bothering me.

For now, I only know that I do not want to celebrate productivity without asking who pays for it, who benefits from it, and what parts of ourselves we trade away to keep up.

Why AI Agent Harnesses Matter More Than Model Choice

Hugo Sequier — Wed, 15 Apr 2026 14:44:33 GMT

Gemma and Qwen keep getting better, but most agent failures still come from bad execution architecture.

Everyone is debating models. Almost nobody is debating harness design. That is why teams keep switching models while their coding agents stay unstable.

There is also a quiet fear in the market now: every new model release makes people feel more dependent on the next one. Six months ago, many teams were shipping serious work with the models they had and were happy with the results. Today, after a few new launches, those same models are often treated like they are obsolete or unusable.

Most of the time, that perception shift is exaggerated. With a disciplined harness, many “older” or smaller models still perform very well on real coding workflows.

I learned this after burning through a paid coding plan in record time. I tested many model and tooling setups across Codex, Claude Code, OpenCode, Cursor, Z.AI, and MiniMax CLI. At first I blamed context limits. Then I blamed pricing. The root cause was my own architecture.

The Missing Variable in Most Agent Discussions

When an agent fails, the first reaction is model shopping.

Should we move from model A to model B?

That question matters, but it comes second. First comes harness quality.

For coding agents, the harness is the execution layer around the model: routing rules, tool contracts, state handling, memory strategy, sandbox policy, retry behavior, and verification.

If this layer is weak, a better model only gives you cleaner failure logs.

Diagram generated with Gemini

What Changed in My Own Workflow

My previous setup was classic and messy.

One large root instruction file, one agent handling everything, one context carrying code exploration, shell output, planning notes, and implementation all together.

The root file was around 200 lines. It was read again and again, even for narrow tasks. Shell output stayed in memory long after it was useful. Dead-end branches stayed too.

So I rebuilt the architecture.

I cut the root router to 27 lines. I moved detailed context to scoped docs. I split responsibilities into seven specialized subagents with isolated context. Deterministic steps, lint, build, tests, moved to shell paths instead of reasoning paths.

Result: roughly 40 to 50 percent less token load per session, and better output quality because the model stopped swimming in irrelevant context.

You can read more detail in my previous Article : https://medium.com/@sequierh/how-i-cut-token-usage-in-half-by-redesigning-my-agent-architecture-05a3cff82b5b

Terminal-Bench 2.0 Makes the Point Obvious

Terminal-Bench 2.0 is a strong reminder that agent performance is system performance.

The benchmark includes 89 tasks in real terminal environments. That means runtime setup can influence outcomes directly.

Anthropic published analysis showing that infrastructure choices alone shifted Terminal-Bench 2.0 results by up to 6 percentage points in their experiments. Same model, different resource enforcement and headroom, different scores.

This should reset how we read leaderboards.

A few points of difference can come from harness and infra behavior, not just model ability. If two teams report different outcomes on the same model, the first place to inspect is execution setup.

Benchmark from Terminal-Bench 2.0 filtered on Claude Opus 4.6

Why Local Models Make Harness Design Even More Important

Local AI is moving fast. Gemma 4 and Qwen 3.x are improving quickly for tool-use scenarios. I track them closely, and the progress is real.

But smaller local models are less tolerant of fuzzy tool definitions and loose control flow. They need stricter harness rules to stay reliable.

What helps most:

explicit tool schemas
clear preconditions before tool calls
constrained argument formats
compact state snapshots
short verify loops

Without this structure, drift appears early. With this structure, local models can power useful coding and automation tasks at much lower cost.

A Minimal Harness Blueprint

You do not need a giant framework to start. A simple loop is enough:

loop:
  - observe: read scoped docs and active state
  - plan: choose one small target
  - act: use the right tool path
  - verify: run checks before claiming success
  - write_back: store only durable decisions
routing:
  - explore: read-only agent
  - docs: documentation router
  - shell: deterministic runner
  - code: main reasoning agent
  - review: isolated critic

This works with frontier models, local models, or a mixed stack. Model quality still matters. Harness quality decides whether the model spends effort on forward progress or cleanup.

When an agent underperforms now, I audit harness friction before changing models: context bloat, wrong routing, weak verification, and noisy memory write-back.

That audit usually finds the bottleneck quickly.

Model progress will keep accelerating. The teams that get consistent output will be the ones that treat harness design as core engineering work. If you evaluate agent stacks this quarter, score both model quality and harness quality side by side. Most teams still score only one.

I’m Hugo, a freelance Data Scientist specializing in AI for Construction. I build intelligent systems for a living, and apparently also for my note-taking. Find me on LinkedIn or on my Linktree.

How I Cut Token Usage in Half by Redesigning My Agent Architecture

Hugo Sequier — Tue, 07 Apr 2026 13:01:01 GMT

Bigger models, same bill. The fix is in how you architect.

For the past two months, my Twitter feed has been full of the same complaint: “I hit my Claude Code plan limit in 3 days.” “Codex burned through my credits by Thursday.” “These coding agents are a money pit.” Same frustration, different avatar. And every time, someone replies with the same advice: use a cheaper model, compress your prompts, buy the higher tier.

That advice is wrong.

I recently switched to OpenCode, partly to test Codex integration, partly to experiment with local LLMs. One week in, I blew through my entire Codex subscription. That got my attention. I work across multiple projects every day, a computer vision pipeline for floorplan analysis, a Polymarket trading bot, a SaaS product, and I lean heavily on subagents to keep all of it moving. Turns out the way I was using them was the reason my bill was so high.

The Plan Limit Panic Is a Symptom

Let’s get something out of the way. Hitting your plan limit means your architecture is wasteful. The plan size is fine.

Most people set up their coding agent like this: one giant CLAUDE.md at the root that explains everything. Full project structure, every convention, every decision, every gotcha, all in one file. Then they start a session, ask the agent to do something, and the agent reads all of it. Every time. Even when the task only touches one file.

That’s like handing someone a 200-page manual when they asked where the bathroom is.

The agent doesn’t need to understand your entire project to fix a bug in the auth module. It needs to understand auth. The rest is noise. And noise costs tokens. Tokens cost money. Money makes people angry on Twitter.

The fix isn’t a bigger plan or a cheaper model. The fix is structuring your context so the agent reads what matters and skips what doesn’t.

Rule One: Don’t Explain, Route

My root AGENTS.md is 27 lines. That's it. No project architecture, no tech stack dumps. Just core behavior rules and a dispatch policy that points the agent to project-specific files where the real context lives.

## Dispatch policy

- `repo-explorer`: repo exploration, file localization, no edits.
- `doc-router`: documentation-first routing, identify canonical docs.
- `plan-critic`: stress-test plans, find risks, propose simplifications.
- `bash-runner`: utility shell checks, lint/build, no edits.
- `vault-bridge`: vault-oriented commands and workflows.
- `changelog-keeper`: version bookkeeping.
- `scratch-compactor`: compact session state into resume-ready notes.

The root file is a router, not an encyclopedia. When the agent needs depth, it reads the project-specific documentation on demand. When it doesn’t need depth, it never sees it. That alone cut my per-session token usage by maybe 30% because the agent stopped reading information it would never use.

Documentation as a Routing System

I used to think feature docs were about explaining your code to the agent. Turns out they’re routing tables.

My documentation system has three layers:

Feature-level docs. One markdown file per feature, already condensed, maybe 30 lines each. Key files, patterns, conventions, and gotchas for a specific area.

Routing indexes. Lightweight files that tell the agent where to look: docs/_index.md as the entry point, feature-map.md for feature-to-doc mapping, architecture/_index.md for technical routing, decision-index.md for constraints and past decisions.

Decision records. Short entries that capture why something was built a certain way, so the agent doesn’t re-litigate old choices.

The goal is simple: avoid scanning the entire codebase. Read only what’s necessary for the task at hand. An agent working on the payment module should never see the notification system’s architecture notes.

This sounds obvious when you say it. But most setups don’t do it. They dump everything into one place and hope the agent sorts it out. The agent sorts it out, all right, by reading all of it and billing you for every token.

Seven Agents, Seven Jobs

The next move was splitting the agent’s responsibilities. One agent doing everything (exploring code, reading docs, writing plans, running shell commands, updating changelogs) gets bloated fast. I created specialized subagents instead, each with a narrow job.

The subagents:

repo-explorer finds files, searches code, locates functions. Read-only, cheap model, fast.

doc-router decides what documentation to read based on the task. Avoids pulling in code when docs are enough.

plan-critic stress-tests implementation plans. Finds risks, weak assumptions, and over-engineering. Runs in isolation so it doesn't carry the full session context.

bash-runner runs CLI tasks: lint, build, test, sanity checks. Deterministic work that doesn't need an LLM at all.

vault-bridge manages project memory: kanban updates, changelog entries, knowledge writes.

changelog-keeper handles version bookkeeping with structured output.

scratch-compactor summarizes session state into compact notes at the end of a session. Only writes durable memory when explicitly needed.

Each subagent runs with its own context. The main agent never sees the bash output unless it matters. The plan critic doesn’t carry the exploration history. The doc router doesn’t carry the shell results. Context stays local to where it’s needed.

This is the part that feels like overkill until you measure it. A bash runner executing 5 shell commands generates maybe 200 lines of output. If that output stays in the main agent’s context for the rest of a 50-message session, every subsequent message re-processes it. Multiply that across commands, explorations, and dead-end branches, and your context is 60% noise by the end.

The Dispatch Rules

Having subagents is only useful if you route tasks correctly. Here’s my dispatch logic:

Exploration tasks → cheap subagent (repo-explorer)
Documentation lookups → cheap subagent (doc-router)
CLI and shell work → cheap subagent (bash-runner)
Complex reasoning and code writing → main agent
Plan validation → isolated subagent (plan-critic, no session context)
Memory and state updates → dedicated subagent (vault-bridge)

The principle: if a task is deterministic or narrow, it doesn’t belong in the main agent’s context. The main agent should focus on decisions that require reasoning, architecture choices, code design, trade-off analysis.

Everything else gets delegated.

The Workflow Pipeline

My old workflow was: think, explore, code, test, done. All in one agent. All in one context. All accumulating tokens.

The new workflow is a 9-step pipeline:

Session start (light, read routing indexes only)
repo-explorer (locate relevant files)
doc-router (identify what docs to read)
Plan draft (main agent, scoped context)
plan-critic (isolated review, separate context)
Implementation (main agent, clean context)
bash-runner (tests, lint, build — if needed)
scratch-compactor (summarize session)
Write-back (only if a decision was made or pattern discovered)

The user experience stays simple. I type /plan review or a task description, and the pipeline runs. But under the hood, each step gets a fresh, scoped context. The agent doesn't carry dead exploration branches into implementation. It doesn't re-process shell output during plan review.

Step 9 is the one most people get wrong. They write back after every action. Don’t. Only write to memory when:

A decision was made that affects future work
A new pattern or convention was discovered
A meaningful change happened (architecture shift, new dependency, resolved gotcha)

If nothing changed that future sessions need to know, don’t write anything. Less noise in the memory layer means less noise in the next session’s startup. It compounds.

Use the Right Tool for the Job

A coding agent is a reasoning engine. Using it to run npm run lint is like hiring a consultant to check your email.

I route deterministic tasks to CLI tools and scripts. Linting, building, running tests, checking file existence, querying logs. All of that runs through bash-runner or direct shell commands. The LLM only sees the result if the result needs interpretation (test failure, build error with a non-obvious cause).

This alone saves a surprising amount. A single lint run might output 50 lines. If that goes through the LLM, every subsequent message in the session processes those 50 lines. If it goes through bash-runner, the main agent sees a one-line summary: “3 warnings, 0 errors.”

Same information. Different cost.

The Active Context File

One more thing that helped: a lightweight _active.md file maintained across the session. It contains:

Current goal
Files touched so far
Decisions made this session
Next step

Instead of reloading full project context at every turn, the agent reads this compact file. It’s maybe 10 lines instead of 200. When the agent needs depth on a specific file, it reads the feature doc for that area. When it just needs to remember what it’s doing, _active.md is enough.

What Actually Changed

After implementing this architecture across my projects:

The token count per session dropped roughly 40–50%. Not because I’m using fewer messages, but because each message processes less irrelevant context. The agent reads the routing index, loads only the relevant feature doc, delegates narrow tasks to subagents, and keeps the main context focused on decisions.

Output quality went up. Counterintuitive at first, but it makes sense. When the agent isn’t drowning in noise, it makes better decisions. It doesn’t get confused by conventions from a different module. It doesn’t second-guess itself because it saw conflicting patterns in unrelated code. Scoped context produces scoped thinking, and scoped thinking is more accurate.

The Codex subscription stopped being a problem. I stopped burning through it by mid-week. Same projects, same output quality, just a cleaner architecture doing the heavy lifting.

Bigger Boxes, Same Junk

Every few months, a new model drops with a bigger context window, and everyone celebrates. 200K tokens, 500K, 1M. And every time, I think the same thing: if your context is 60% noise at 200K, it’ll be 60% noise at 1M. A bigger box full of junk is still a box full of junk.

The real optimization doesn’t come from the model. It comes from how you structure the information the model sees. Route your context, don’t dump it. Specialize your agents, don’t generalize them.

The agents are getting smarter. That’s great. But a smart agent reading the wrong context will still produce the wrong answer. It’ll just do it faster and at a higher price point.

Your plan limit isn’t the bottleneck. Look at your architecture.

I’m Hugo, a freelance Data Scientist specializing in AI for Construction. I build intelligent systems for a living, and apparently also for my note-taking. Find me on LinkedIn or on my Linktree.

How I Built a Second Brain for Claude Code

Hugo Sequier — Thu, 02 Apr 2026 13:01:02 GMT

After using Obsidian as my own external brain, I realized the real bottleneck was the agent doing the work

Last time I explained how I use Obsidian and Claude Code as a second brain. A few weeks later I realized I was solving the wrong memory problem. If Claude Code is the one doing a big part of my job right now, why did I build a second brain for myself and not for the agent?

That question changed the architecture of my vault. I stopped treating Obsidian as a personal note system with AI access, and started treating it as an external memory layer for the coding agent itself. That shift fixed the part that every heavy user of Claude Code, Codex, or OpenCode runs into sooner or later: the chat gets fat, the context gets noisy, and the agent keeps relearning things it should already know.

The Memory Problem Shows Up Fast

If you use coding agents lightly, this barely matters. Open a repo, ask for a small fix, close the session, move on.

But that is not how I work anymore.

Claude Code touches my production projects every day. It writes code, updates docs, creates notes, logs project progress, and helps me move between very different contexts: a construction-tech pipeline, a trading bot, content systems, and internal automation. After enough sessions, one thing becomes obvious. The problem is not model intelligence. The problem is memory hygiene.

Inside a long session, the agent carries all the usual baggage: old prompts, explored files, dead ends, partial plans, tool output, half-useful explanations. Some of that context is valuable. A lot of it is just residue. And when you start the next session, that residue is gone anyway unless you manually restate everything.

So you get hit twice.

First, long sessions become slower and more expensive because too much irrelevant context sticks around.

Second, finished sessions throw away useful things that should have survived: architectural decisions, naming conventions, lessons from failed attempts, and links between projects that share the same domain knowledge.

That is a bad setup for an agent that is supposed to work like a real collaborator.

My First Setup Solved Only Half the Problem

I had already fixed part of this.

In a previous article I explained how I keep Claude Code fast with a root CLAUDE.md, a .claude/ folder, and feature docs written in markdown. That setup still works. I use it every day. It is simple, cheap, and much better than piling on plugins or MCP layers just because the agent keeps missing context.

Inside one repo, that pattern is enough most of the time.

The issue appeared when the work stopped being repo-local.

I wanted the agent to remember that a decision made in one project might matter in another. I wanted reusable notes about tools, APIs, and architecture tradeoffs to survive outside a single codebase. I wanted failed experiments to become assets instead of disappearing into old conversations. And I wanted the agent to start a session by reading the right memory, not by asking me to explain the same thing for the sixth time.

That was the gap.

My own second brain was organized. The agent’s memory was still trapped inside chats.

The Pivot Was Simple: Write the Memory to the Vault

So I changed the model.

Architecture Diagram of my Obsidian Vault

Obsidian stopped being just my note-taking system. It became the place where the agent reads before working and writes back after working.

At the top, I keep a root CLAUDE.md that explains the vault structure, conventions, and note formats. That file acts as the bootstrap layer. It tells the agent what the vault is, how sections are organized, how files are named, and what kind of data belongs where.

Then I split the memory into two zones.

global/ holds memory that should survive across projects.

knowledge/ for tools, APIs, patterns, and domain notes
decisions/ for cross-project ADR-lite decisions
learnings/ for post-mortems, mistakes, and practical lessons
templates/ for reusable note scaffolds
preferences/ for code style, workflow, and stack choices
contacts/ for people and relationship context when relevant

projects// holds memory that belongs to one active project.

_project.md for the project overview
kanban.md for execution state
versions.md for the compact version table
changelog/ for detailed release entries
architecture/ for ADRs and diagrams
docs/ for specs, research, and guides
context/ for working context that should persist for a while
scratch/ for temporary thinking space

The operating rule is easy to explain.

At session start, the agent reads global memory plus the relevant project memory.

At session end, it writes back what changed.

That one write-back rule is the whole game.

Without it, your vault becomes a nice archive. With it, the vault becomes working memory that compounds.

Why This Works Better Than Chat Memory Alone

The first benefit is obvious. I repeat myself far less.

If my agent already has a place to read my conventions, project state, and old decisions, I do not need to retype them every session. I can start from the task, not from the preamble.

The second benefit matters even more. The agent reads cleaner context.

A markdown note called decisions/api-versioning.md is a much better memory object than a buried paragraph from three days ago inside a chat that also contains shell output, debugging guesses, and a failed refactor. Same information. Different signal-to-noise ratio.

The third benefit is transfer.

This is where the vault starts feeling different from a normal repo setup. Let’s say two projects touch the same technical area. One project taught the agent that a certain API has weird pagination limits. Another project now needs that same API family. If the learning lives only in chat history, it is gone or inaccessible. If it lives in global/knowledge/ or global/learnings/, the next project can reuse it immediately.

And that is the real point. Good memory should travel when the knowledge travels.

The Best Things to Store Are Not the Obvious Ones

At first I thought the vault should mostly store polished documentation.

Wrong.

The most useful notes are often the messy ones that change future decisions.

Examples:

a failed implementation attempt and why it failed
a naming convention that saved me from breaking Obsidian links
a project-level decision to keep YAML declarative instead of rebuilding a programming language by accident
a note that one workflow belongs in scratch/ until it survives real usage
a short learning that one project’s architecture pattern is reusable elsewhere

These are small pieces of memory, but they save a lot of wasted motion.

I noticed this on 31 March when I wrote down the first version of the extended memory architecture for Claude Code. The structure itself was useful, sure. But the important part was what it implied: the agent now had a stable place for global knowledge, decisions, and learnings, instead of treating each session like an isolated event.

That changed how I work with it.

When a session produces something that would help a future session, it belongs in the vault. Not in my head. Not in the scrollback.

This Also Fixes the Cross-Project Blind Spot

Most coding setups assume each project is its own island.

Real work is not like that.

My projects leak into each other all the time. A documentation habit from one project improves another. A research note from a trading system changes how I think about monitoring or versioning somewhere else. A content workflow teaches me how to design reusable templates for project docs. Even my journaling system matters because it creates a timeline of what was built, what changed, and when the pivot happened.

The vault gives the agent a map of those connections.

That matters because agents are great at local execution and bad at implicit continuity. They will do exactly what is in front of them. If the memory layer is fragmented, the work stays fragmented too.

Once I moved memory into Obsidian, related projects stopped feeling disconnected. The agent could pull context from a shared place, then apply it locally where it mattered.

I’m Hugo, a freelance Data Scientist specializing in AI for Construction. I build intelligent systems for a living, and apparently also for my note-taking. Find me on LinkedIn or on my Linktree.

Autoresearch Beyond ML: How I Adapted Karpathys Loop to Trade Prediction Markets

Hugo Sequier — Tue, 17 Mar 2026 12:53:50 GMT

Fix the evaluator. Search the strategy space. That’s the whole idea.

On March 8, Andrej Karpathy open-sourced a ~630-line Python script called autoresearch. Within hours, Shopify CEO Tobi Lütke ran it on a 0.8B parameter model and reported a +19% score improvement after 8 hours and 37 experiments. By the next morning, people on Twitter were sharing results where the loop had independently rediscovered techniques like RMSNorm and tied embeddings, optimizations that took human researchers years to formalize.

I saw those results and thought: this has nothing to do with ML.

The ML part is just the current application. The architecture underneath is something else entirely. I spent the following week proving it by building my own version for trading on crypto prediction markets.

What Autoresearch Actually Does

You give it a training setup and an instruction prompt. An LLM agent modifies the code, trains for 5 minutes on a single GPU, checks if the result improved, keeps the change or discards it, and loops again. About 12 experiments per hour.

But the interesting part is what it does NOT do.

It doesn’t let the agent rewrite the evaluation metric. It doesn’t let it change the dataset. The agent can only touch the model architecture and hyperparameters within a fixed experimental setup.

This constraint is the whole point.

Fixed Surface, Mutable Surface

Strip away the ML specifics and you get two layers:

Fixed surface: data preparation, evaluation function, scoring formula, temporal splits. These anchor comparability. If you change them mid-search, every previous result becomes meaningless.

Mutable surface: the things you’re searching over. In Karpathy’s case, model architecture tweaks. In my case, trading strategy parameters.

Most people who try to automate research let the agent touch everything. Evaluator included. That’s how you get a system that scores 95% on its own benchmark and fails in production. The agent optimized the test, not the solution.

Keeping the evaluator frozen means every candidate gets judged on exactly the same terms.

This Pattern Works Everywhere

Once you see the fixed/mutable split, you notice it applies to problems that have nothing to do with neural networks.

Trading strategy search: fixed replay data, execution simulator, fee model. Mutable: entry thresholds, position sizing, timing windows.

Regulatory compliance: fixed floor plan data, measurement functions, legal text. Mutable: classification rules, compliance thresholds. Generate rule variants, evaluate against known-good annotations, keep the ones that match expert judgments.

Game AI tuning: fixed game engine, opponent pool, match simulator. Mutable: agent behavior parameters. Same loop.

The pattern works anywhere you have a repeatable evaluation function, a bounded search space, a way to generate candidates, and temporal splits to catch overfitting.

My Version: Autoresearch for Polymarket

I trade on Polymarket. Binary options on whether BTC will be above or below a strike price at a specific time. Markets settle every 5 minutes. Payout is $1.00 or $0.00.

I’d been trying to find a profitable strategy for months. Settlement Arbitrage, order book imbalances, directional bets near expiry. The pattern was always the same: a strategy would look robust on paper trading, I’d get confident, switch to real money, and it’d work for a day. Then the next day it’d lose everything back. No backtesting infrastructure, no systematic way to compare variants. Just me staring at paper trading results, convincing myself the edge was real, and getting burned when it wasn’t.

When Karpathy released autoresearch, I saw the fix. Within the week, I built a bronze dataset recording every BTC 5-minute market from Binance, Coinbase, and Polymarket order books. For the first time I had replay data. And with replay data, I could build a proper bounded research loop.

Fixed surface: a data prep module joins exchange and Polymarket order book data into unified snapshots. A backtest module simulates execution with realistic fees. A splits module defines temporal train, validation, and test periods.

Mutable surface: a training module contains multiple strategy implementations with ~20 registered baseline variants. A families module defines several bounded search spaces, each with a written thesis explaining what it’s trying to do and what parameters it’s allowed to mutate.

Three explicit roles in the loop. A Generator creates deterministic mutations from parent strategies, seeded and deduplicated. A Reviewer rejects candidates that violate family constraints before any expensive evaluation. A Selector runs survivors through train, then holdout, then compares against parents.

One command:

py -3 -m autoresearch research-cycle \
  --root data/lake/bronze --asset btc_5m \
  --family [family_name] --population 24 --survivors 6

24 candidates generated, reviewed, evaluated. 6 survive. Runs in minutes, not hours, because prediction market backtests are cheaper than GPU training.

Diagram generated with Gemini 3.1 Canvas

What the Loop Taught Me

The first research cycle produced something I didn’t expect. The loop didn’t just find better parameters. It invalidated assumptions I’d been carrying for months.

I’d spent weeks optimizing exit logic: stop losses, take-profit levels, time-based exits. The loop showed that position sizing explained most of the P&L variance, while exit timing contributed surprisingly little. I was polishing the wrong knob.

The best candidate from one family got there by loosening a filter I thought was essential. I would never have tried that manually. It felt wrong. The data said otherwise.

Another winner had a win rate below 50%. Worse than a coin flip. But it was profitable because the payoff asymmetry compensated. A human researcher with a spreadsheet would have filtered it out immediately on win rate alone. That’s the kind of finding a bounded loop surfaces.

Neither of these candidates beat my existing benchmark. The loop confirmed the benchmark was strong, which is also useful information. I now have a promotion pipeline (stress testing under adverse scenarios, portfolio correlation analysis) to evaluate whether new candidates add value as complements rather than replacements.

Why No AI Agent in the Loop

The obvious question: Karpathy’s version uses an LLM to generate code mutations. Why didn’t I?

Because I didn’t want an unconstrained AI agent optimizing a noisy backtest loop.

The btc_5m problem is narrow. The gap between offline backtest and live execution is already large. An LLM inside the inner loop would very likely learn to exploit backtest quirks instead of finding robust edge.

“Candidate 004 changed one filter threshold from X to Y” is something I can read, understand, and verify. “The model rewrote half the strategy” is not.

If the evaluator, the data prep, and the split logic change between runs, every previous result becomes meaningless. Bounded families with deterministic seeds are easy to reproduce, diff, and publish. An LLM-driven loop adds token cost, prompt tuning, latency, and more failure modes before I even trust the offline selection itself.

At this stage, the bottleneck is not creativity. I have plenty of strategy ideas. What I don’t have is confidence that any of them survive live conditions.

Once I’ve tested the strategies I have in mind and built enough trust in the offline evaluator, I will integrate an AI agent into the loop. But on my terms: AI suggests new family ideas, humans turn them into explicit bounded mutation spaces, and the fixed evaluator decides whether they survive. AI stays in the hypothesis layer, not in the trust layer.

You earn the right to automate a step by doing it manually first.

Nothing Magic Here

There is nothing magic about this system. Backtest environments and live environments are always very different. Fees are approximated. Execution timing is idealized. Order book depth in replay isn’t the same depth you’ll face when your real order hits the market.

Autoresearch is very good at removing assumptions, systematically trying things you wouldn’t try manually, and killing strategies that only look good because you never tested them on holdout data. It will not find you the strategy that makes you a millionaire in one loop.

And while I’m at it: avoid the Twitter larp articles. “How this strategy makes $589/day.” “How this quant turned Autoresearch into a Polymarket bot that turned $30 into $60,000.” 99% of them are fake. Fake data, storytelling layered over someone else’s Polymarket account screenshot, zero reproducibility.

If you actually want to build a Polymarket bot, start by creating a repo on GitHub and try it yourself. Record your own data. Write your own evaluator. Watch your strategies fail on holdout. Then you’ll understand why all those articles are fiction, and why having an edge is way more complicated than asking Claude to make you rich.

I’m Hugo, a freelance Data Scientist specializing in AI for Construction. I build intelligent systems for a living, and apparently also for my note-taking. Find me on LinkedIn or on my Linktree.

The Impostor Loop: Why Vibecoding Made Me Feel Like a Fraud

Hugo Sequier — Thu, 12 Mar 2026 13:01:03 GMT

I ship faster than ever. I’ve never felt less competent.

Image by Joel Lee from Unsplash

I’ve been writing code for six years. Data science for three. I started the way most people my age did: writing algorithms on paper before touching a keyboard. Understanding loops, recursion, data structures by drawing them out. Then building things the hard way, line by line, bug by bug.

I’ve lived through every wave since. ChatGPT drops, and suddenly everyone is copy-pasting code from a chatbox. Then Cursor makes it seamless, the AI lives inside your editor. Now Claude Code, Codex, agents that write entire features autonomously. Each step removed a layer of friction. Each step removed a layer of you from the process.

Last Tuesday, I caught myself playing chess. Not during a break. Between two Claude prompts, while waiting for my AI coding assistant to generate a rules engine for an architectural compliance system I’ve been building for months.

I’d been doing this all day. Prompt. Wait. Check the output. Adjust. Prompt again. Wait. Open chess. Close chess. Check output.

I shipped more code that day than I used to ship in a week. And I felt like I’d done absolutely nothing.

The prompt-and-wait loop

Here’s what my typical workday looks like now as a freelance data scientist.

I wake up. I open VSCode. I write a prompt describing what I need: a new detection pipeline, a data processing module, an API endpoint. The AI generates it. I review, tweak the prompt, maybe run it again. The feature works. I move to the next one.

Between each prompt, there’s a gap. Sometimes two minutes, sometimes ten. My brain has nothing to do during those gaps. So it reaches for the nearest source of stimulation. Chess. Twitter. YouTube. Random scrolling.

I tried counting once. In a single afternoon, I context-switched to my phone 23 times. Every switch was between two prompts. Every switch was because my brain was bored.

The old workflow was different. Problem shows up. I think about it. I try something. It breaks. I debug. I try again. I read documentation. I stare at the ceiling for five minutes. Something clicks. I write the solution. That loop took hours. But my brain was engaged the entire time.

Now the loop is:

Describe the problem, press enter, wait.

Why your brain calls you a fraud

The classic impostor syndrome goes like this: I got lucky, I don’t deserve this, someone will find out. The vibecoding version is different. The results are real. The code works. The features ship. But you didn’t do the work. Not the way your brain defines “work.”

Your brain has a simple equation: effort = legitimacy. You struggled with a bug for four hours and fixed it? You earned that fix. You learned something. You can explain it. You could do it again.

But when you prompt an AI and it generates the solution in 30 seconds? The feature works identically. The end result is the same. But your brain didn’t earn it. There was no struggle, no breakthrough moment, no satisfaction of figuring it out yourself.

So it starts whispering. You don’t really understand this codebase. You couldn’t rebuild this without the AI. You’re not a real engineer anymore.

And the worst part? It might be partially right. I realized I couldn’t explain some parts of my own codebase because I never had to think through them. The AI wrote them, I reviewed them, they worked, I moved on. Reviewing is not understanding.

The patience trap

The prompt-and-wait loop does something else to your brain. It recalibrates your sense of time.

When you get results in 30 seconds, everything slower feels broken. I started noticing this pattern in my own behavior: I’d test a new trading strategy for two days, see mixed results, and switch to a different approach. I’d start a side project, spend four days on it, and drop it because it wasn’t showing results fast enough. I’d try a new tool and abandon it the same afternoon. Even with Chess, I used to play Rapid games, now I play bullet because I feel like 10 minutes its too long.

The pattern is always the same. Try something. No immediate result. Next.

Before AI coding tools, building a feature took a week. You expected it to take a week. Your patience was calibrated to that timeline. Now, building a feature takes an afternoon. And your patience has shrunk to match. But not everything in life responds to prompts. Finding product-market fit doesn’t. Building an audience doesn’t. Developing real expertise doesn’t.

I looked at my journal and counted: in 11 days, I had started or touched 8 different projects. I’d made meaningful progress on exactly two.

The productivity FOMO nobody warned you about

There’s another layer to this that I didn’t expect.

When you know you can build anything in an afternoon, every idea becomes a temptation. You scroll Twitter and see someone launch a SaaS. You could build that. Someone shares a trading bot framework. You could fork that and improve it. A new API drops. You could integrate it into something cool by tonight.

The possibilities are genuinely infinite now. And that’s the problem.

Before AI coding tools, starting a project had a cost. Weeks of setup, learning the stack, writing boilerplate. That friction was a natural filter. Only ideas you cared about enough survived the boring phase.

Now there’s no boring phase. You go from idea to working prototype in hours. So you start everything. A plugin here, an agent framework there, a newsletter, a dashboard, a bot. Each one is 20% done and feels exciting for exactly one afternoon.

I know I can build a Shopify plugin. I know I can build a newsletter system. I know I can build an AI agent orchestrator. I’ve proven it to myself by starting all of them. But proving you can build something and actually shipping it are two different things. The first gives you a dopamine hit. The second requires months of boring, repetitive work that no prompt can shortcut.

The FOMO isn’t about missing out on opportunities. It’s about missing out on your own potential. You’re hyper-aware of everything you could be doing, so everything you are doing feels insufficient. You’re scrolling through your own capabilities like a feed, consuming the idea of productivity without actually producing anything.

I had five projects at 20% completion. Total revenue from all five combined: zero.

And social media makes it worse. Way worse.

Open Twitter on any given morning and your feed is a highlight reel of people shipping. Someone launched an AI wrapper and made $10K in a week. Someone else built a full SaaS in a weekend with Cursor. A 19-year-old just raised a seed round for something they vibecoded in three days. Everyone is shipping. Everyone is making money. Everyone except you.

Twitter has become the new Instagram. Except instead of comparing abs and vacations, we’re comparing repos and MRR screenshots. The currency changed, but the mechanism is identical: you’re comparing your behind-the-scenes to everyone else’s highlight reel.

Here’s what the feed doesn’t show you. The 500 people who built the same wrapper and made nothing. The SaaS that got 12 users and was abandoned two weeks later. The seed round that led nowhere. You only see the survivors, never the graveyard. That’s textbook survivorship bias, and the AI tools space is drowning in it.

Every day there’s a new framework, a new model, a new way to vibecode faster. Each one comes with a thread showing what someone built with it. Each thread makes you feel like you’re falling behind. But most of the people posting those threads are in the same loop you are: starting things, showing the exciting part, then quietly moving on to the next shiny tool before the boring part starts.

Nobody tweets “Day 47 of maintaining my SaaS, fixed a timezone bug in the billing module.” That doesn’t get likes. What gets likes is “I just built X in 2 hours with Y.” The incentive structure rewards starting, not finishing. And when that’s all you see, you start believing that starting is the game.

It’s not. Shipping is. But shipping is slow, messy, and invisible on social media.

We’ve always delegated. This time it’s different.

Image by Museums Victoria from Unsplash

Humans have been outsourcing effort since the first plow. The industrial revolution moved muscle work to machines. Assembly lines turned craftsmen into operators. Spreadsheets replaced rooms full of accountants. Every generation finds something to delegate, and every generation panics about it for a while, then adapts.

But there’s a pattern worth noticing. We delegated physical labor first. Then repetitive calculation. Then information retrieval. Each wave moved up the stack, closer to what we considered “thinking.”

Now we’re delegating the thinking itself. Or at least, a version of it. The version that involves translating an idea into working code, line by line.

The factory worker in 1920 didn’t feel like a fraud because a machine stamped metal faster than his hands could. The work had moved, but his role was clear: operate the machine, maintain it, decide what it produces. The machine was a tool.

Vibecoding feels different because the boundary is blurry. When the AI writes a function, did you write it or did the AI? When you design the system but the AI implements every piece, are you the engineer or the client? The factory worker knew he wasn’t the machine. We’re not so sure anymore.

That ambiguity is new. And nobody has a framework for it yet because we’re the first generation living through cognitive delegation at this scale.

Nothing is lost

I don’t have this figured out. I literally just started changing things this week. But here’s the hypothesis I’m testing.

Mornings without AI. Not because AI is bad, but because my brain needs to work on hard problems without assistance. I pick one task that requires actual thinking: designing an architecture, analyzing data by hand, writing (like this article). The AI can’t do the thinking for me on these. I have to sit with the discomfort of not knowing the answer immediately.

Logging decisions, not just outputs. My daily journal used to say things like “built the rules engine.” Now I’m trying to capture what I actually contributed: “chose to abandon the RETE algorithm because it was overkill for our use case and designed a simplified approach instead.” That’s the part the AI didn’t do. That’s the part that matters.

Setting minimum evaluation periods. Before starting anything new, I write down: “I will judge this after X days/trades/attempts, not before.” If the threshold isn’t met, I’m not allowed to conclude it doesn’t work. This one is hard. My brain wants the verdict now.

Filling the gaps. The dead time between prompts is where the damage happens. I’m experimenting with batching: instead of prompt-wait-scroll-prompt, I prepare the next task while the current one runs. Brief the next problem, sketch the approach on paper, review old code. Keep the brain engaged.

The identity shift nobody talks about

Here’s what I think is actually going on, underneath all of this.

I used to be a coder. My identity was built on the ability to solve hard technical problems through sustained effort. That’s how I proved my value to myself and to clients.

Now I’m something else. A decision-maker. An architect. Someone who knows which model to use, when to abandon an approach, how to design a system that works. The AI handles execution. I handle direction.

That’s a more valuable skill set. Clients pay more for someone who knows what to build than for someone who can build fast. But my brain hasn’t caught up. It still measures my worth by hours of struggle, not by quality of decisions.

Maybe the impostor feeling isn’t a bug. Maybe it’s the growing pain of becoming a different kind of engineer. One who thinks more and types less.

I don’t know yet. Ask me in six months.

I’m Hugo, a freelance Data Scientist specializing in AI for Construction. I build intelligent systems for a living, and apparently also for my note-taking. Find me on LinkedIn or on my Linktree.

How I Keep Claude Code Fast With Zero Plugins

Hugo Sequier — Tue, 03 Mar 2026 13:01:02 GMT

No MCP servers, no extensions, no magic. Just markdown files and a 4-step workflow.

Image generated with Nano Banana 2

Last week someone in a Meetup asked me to share my Claude Code setup. I sent my github repository. The response was “wait, where are your MCP servers?” Nowhere. I don’t use any.

No MCP servers. No IDE plugins beyond the stock VS Code extension. No custom integrations. My entire Claude Code configuration is markdown files in a .claude/ folder and a CLAUDE.md at the root. That's it. And it handles production projects — a computer vision pipeline for construction companies, a crypto trading bot, a full-stack app, without breaking a sweat.

I’m not against MCP or plugins. I’ve looked at them. I chose not to use them. Here’s why, and what I do instead.

The Problem Nobody Talks About

Claude Code reads your entire conversation to build context. Every message, every file it opened, every command it ran. As your session grows, two things happen: responses get slower, and your bill gets fatter.

Most people try to fix this by adding more tools. MCP servers that fetch docs automatically. Plugins that index your codebase. Custom integrations that pipe data in from everywhere. The intent is good: give Claude more context so it works better.

But more context is the problem, not the solution. Every extra tool dumps more tokens into the conversation. That MCP server fetching your database schema? That’s tokens. The plugin pulling in related files? More tokens. You end up with a context window stuffed full of information Claude didn’t ask for, paying for it in both latency and cost.

And the industry knows it. People started using MCP servers, realized they were eating context windows alive, so now we’re seeing new plugins and architectures designed to make MCP consume less. New CLIs are popping up every week, each promising a smarter way to manage context. But look at what’s actually happening: we’re stacking layers on top of layers on top of layers. A tool to manage the tool that manages the tool that talks to your AI. At some point you have to ask: what if the first layer was the wrong move?

The fix isn’t giving Claude more stuff. It’s giving Claude the right stuff, fast

One Markdown File Per Feature

Here’s the core idea. For each feature in my project, I maintain a single markdown file at .claude/{feature}/CLAUDE.md. It contains everything Claude needs to work on that feature: where the code lives, how it's structured, what patterns to follow, and what traps to avoid.

A real one from my project looks like this:

# Authentication - Overview

## Quick Reference
- Key files: src/auth/, src/middleware/auth.ts
- Dependencies: jsonwebtoken, bcrypt
- Patterns: Repository pattern, JWT with refresh tokens

## Architecture
Auth uses a layered approach: routes -> controller -> service -> repository.
Tokens are stored in HTTP-only cookies.

## Conventions
- All auth errors return 401/403 with a standard error shape
- Password hashing uses bcrypt with 12 rounds

## Gotchas
- The refresh token rotation invalidates all previous tokens
- Rate limiting is per-IP, not per-user

That’s maybe 20 lines. Claude reads it in under a second. Compare that to what happens without feature docs: Claude opens src/auth/, reads 8 files, scans imports, tries to infer the patterns, maybe opens a few test files to understand the expected behavior. That's hundreds of lines of source code burned into your context window, and Claude still might miss the gotcha about token rotation.

With the doc, Claude knows the architecture, the conventions, and the pitfalls before it touches a single source file. It writes code that fits on the first try instead of the third.

I maintain about 10 of these across my projects. Creating one takes 5 minutes. Updating it after a big change takes 1. The return on that investment is absurd.

The 4-Step Workflow That Prevents Rework

Good docs are half the equation. The other half is making sure Claude actually reads them before doing anything.

I use a workflow I call EPCT: Explore, Plan, Code, Test. Every task goes through these four steps, enforced by a slash command.

/epct add user authentication

Explore. Claude reads the relevant .claude/*/CLAUDE.md files first. Then it scans the codebase to understand what already exists. It identifies patterns, lists files that will be affected, flags potential conflicts. No code written yet.

Plan. Claude produces a concrete plan: files to create or modify, step-by-step implementation, edge cases to handle. Then it stops and waits for my approval. This is the step that saves the most time. I’ve caught misunderstandings here that would have meant 20 minutes of wasted code and another 20 minutes of debugging. A 30-second review of the plan costs nothing. A wrong implementation costs everything.

Code. After I approve, Claude writes the code following the plan and the patterns from the feature docs. Because it already knows the conventions, it doesn’t invent new patterns or contradict existing ones.

Test. Claude runs the test suite, adds tests for the new code, and verifies nothing broke.

The whole thing is a single slash command. I also have individual commands when I only need one step:

These are just markdown files in .claude/commands/. Each one is a prompt template that tells Claude exactly how to approach the task. No plugin system, no API, no framework. Markdown files that Claude reads as instructions.

What This Looks Like in Practice

My project structure:

my-project/
├── CLAUDE.md                        # Root instructions (always loaded)
├── .claude/
│   ├── settings.json                # Permissions
│   ├── commands/                    # Slash commands (markdown files)
│   │   ├── epct.md
│   │   ├── plan.md
│   │   ├── fix.md
│   │   ├── review.md
│   │   ├── refactor.md
│   │   ├── test.md
│   │   ├── new-doc.md
│   │   └── update-docs.md
│   ├── auth/CLAUDE.md               # Feature docs
│   ├── payments/CLAUDE.md
│   └── notifications/CLAUDE.md
└── src/
    └── ...

The CLAUDE.md at the root is always loaded by Claude Code when you start a session. Mine contains the project structure, coding conventions, and pointers to the feature docs. Think of it as the table of contents.

The feature docs in .claude/{feature}/CLAUDE.md get loaded when Claude runs the Explore step of EPCT. Claude knows which docs to read because the root CLAUDE.md lists them.

The slash commands in .claude/commands/ are prompt templates. When I type /epct add payment webhooks, Claude Code loads epct.md, injects my task description, and follows the instructions. The instructions say "read the relevant feature docs first, then plan, then wait for approval." Claude does exactly that.

No runtime dependencies. No servers to start. No config files to debug. If you can write a markdown file, you can build this setup.

What I Deliberately Don’t Use

MCP servers. They’re powerful. They can connect Claude to databases, APIs, documentation sites, all kinds of external data. But every MCP call adds tokens to the context, and most of the time I don’t need live data from external sources while coding. When I do, I just tell Claude to run a curl command or check a URL. That’s one tool call, not a persistent connection draining tokens.

IDE plugins beyond the basics. I use the standard VS Code extension for Claude Code. That’s it. No code indexing plugins, no AI-powered autocomplete layers on top of Claude, no custom sidebar panels. Each of these is another thing that can break, another thing that needs updating, another integration to debug when something goes sideways.

Agent orchestration frameworks. I wrote about using multi-agent setups in a previous article. They have their place. But for daily feature work, a single Claude Code session with good docs outperforms a fleet of poorly-documented agents. Every time.

The pattern I keep seeing: people add complexity to compensate for missing documentation. Your AI agent keeps misunderstanding the codebase? Add an MCP server that indexes it! It keeps forgetting conventions? Add a plugin that enforces them! Or… write a 20-line markdown file that explains both. One costs ongoing maintenance and token overhead. The other costs 5 minutes once.

Managing the Docs

The docs are only useful if they stay current. I have two slash commands for that:

/new-doc payments       # Generate docs for a new feature
/update-docs payments   # Regenerate after significant changes

Both are just markdown prompt templates. /new-doc tells Claude to scan the feature's source code and produce a CLAUDE.md following a standard template. /update-docs tells Claude to read the existing doc, compare it to the current code, and update what changed.

I run /update-docs after any session where I made significant architectural changes. Takes about 30 seconds. If I forget, the Explore step of EPCT usually catches the drift anyway, because Claude reads both the doc and the actual code.

The Config Is Public

I’ve open-sourced my entire Claude Code configuration. The CLAUDE.md, the slash commands, the feature doc template, the settings.json with permission rules. You can grab it, drop it into your project, and start using it today.

https://github.com/Hugo-SEQUIER/claude-config

The settings.json includes sensible defaults: read-only operations and test runners are allowed, .env files and destructive commands are blocked. Adjust to your needs.

What I’d Tell You If You’re Starting Out

Skip the plugin research. Skip the MCP server setup. Skip the “ultimate Claude Code configuration” YouTube videos.

Open your project. Create a CLAUDE.md at the root. Write down your project structure, your conventions, and the things that always trip you up. Then create a .claude/ folder and add a CLAUDE.md for your two or three most complex features.

Start using /epct for your next task. Read the plan Claude produces. Approve it or correct it. Watch how much less back-and-forth you need when Claude already knows your codebase.

The best tool for Claude Code isn’t another tool. It’s a well-written doc.

I’m Hugo, a freelance Data Scientist specializing in AI for Construction. I build intelligent systems for a living, and apparently also for my note-taking. If you want to chat about Obsidian, AI workflows, or Second Brain architectures, find me on LinkedIn or on my Linktree.

How I Turned OpenClaw into a MVP Agency

Hugo Sequier — Tue, 24 Feb 2026 12:56:00 GMT

I set up 5 AI coding agents with OpenClaw on a cheap VPS, wrote a Python script to orchestrate them, and now I ship working apps while reviewing plans on my phone. Here’s how the whole thing works.

Last Sunday I gave a presentation at a tech meetup in Da Nang. I walked through the architecture, explained how the agents are organized, showed the orchestration flow. 10 slides, 15 minutes. But a presentation only scratches the surface. You can’t really explain dependency resolution, escalation chains, and review loops in a slide deck without putting people to sleep.

So I wanted to write the full breakdown. The details that didn’t fit in the slides, the things that actually went wrong, and the real numbers.

The Indie Hacker Problem

I’m a freelance AI engineer. My day job is computer vision for construction companies. But I keep having side project ideas, and I never have time to build them properly.

Hiring devs for an MVP that might go nowhere? Too expensive. No-code tools? I hit walls the moment logic gets slightly custom. Vibe-coding with Claude Code? Actually works pretty well. I’ve shipped real features that way. But it still requires me to sit there, prompt after prompt, review after review.. Multiply that by an entire app and you’re back to the same problem: not enough hours.

So I tried a different approach: split the work across multiple specialized agents, each with a clear role, and write a script to make them work together.

What OpenClaw Is

OpenClaw is an open-source, self-hosted framework for running AI coding agents. You install it on your machine (I beg you, don’t run it on your local machine) or a VPS, configure agents with different roles, and they can read code, write code, run shell commands, and push to GitHub. Each agent gets its own workspace and its own personality file (called a SOUL.md) that defines what it does.

The framework handles the plumbing: agent sessions, a gateway API, Telegram and dashboard integrations, tool permissions. What makes my setup useful is the layer I built on top.

From “Meh” to “Oh Shit”

Most people who try AI agents stop at the obvious stuff. A personal chatbot on Telegram. An email assistant. Maybe a single coding agent for Q&A.

I actually skipped OpenClaw the first time I saw it. Just another agent framework, I thought. The homepage showed the basic use cases: a personal chatbot on Telegram, an email assistant, a single coding agent for Q&A. Stuff I could already do with Claude Code or a quick script. I closed the tab.

A few weeks later someone on X showed a multi-agent setup where agents were talking to each other. Agent A finishes a task, agent B reviews it, and if there’s a problem, agent A gets sent back to fix it automatically. No human in the loop unless something truly needs a decision. I looked at the framework again. OpenClaw supported all of that out of the box: multiple agents, shared workspaces, inter-agent communication, Telegram integration.

That’s when it went from “neat toy” to “wait, this actually ships code.”

My 5 Agents

Five agents. Not six. The orchestrator that coordinates them is a Python script, not an agent. That distinction matters because the orchestrator doesn’t think, doesn’t hallucinate, doesn’t get creative. It just follows rules.

All five agents run on cheap models through OpenRouter: Kimi 2.5 for the heavy work (planning, coding, review) and Minimax as a fallback. Token costs for an entire MVP land around $15. The PM agent is connected to Telegram so I can interact with it from my phone. The others are CLI-only, triggered by the orchestrator.

The Setup: 10 Minutes, $6/Month

The entire thing runs on a single Contabo VPS:

6 vCPU, 12 GB RAM, 100 GB NVMe
Ubuntu 24
Cost: ~$6/month

https://contabo.com/en/openclaw-hosting/

OpenClaw is installed globally via npm, the gateway runs as a systemd service, and each agent has its own workspace directory under “~/.openclaw/”. SSH access with ED25519 keys, UFW firewall open on two ports.

Setting up a new VPS from scratch takes about 10 minutes. Install OpenClaw, run the onboarding wizard, configure your agents, done. I access the dashboard through an SSH tunnel when I need it, but 90% of the time I just talk to the PM on Telegram.

The Orchestrator: A Script, Not an Agent

This is the part that makes everything work. Running 5 agents is easy. Making them collaborate without stepping on each other is the actual problem.

I wrote a Python script that acts as the coordinator. It doesn’t use an LLM. It’s a deterministic loop:

Poll the KANBAN.md file in the repo every 60 seconds
Find the next task that’s ready (respecting dependency order: backend before frontend)
Spin up the right agent with the task ID
When the agent finishes, trigger Reviewer for code review
If Reviewer approves, merge. If Reviewer requests changes, send back to the dev agent (max 3 cycles before escalating to me)
When all priority tasks are done, trigger Content
Ping me on Telegram: “MVP done”

The orchestrator also auto-syncs the KANBAN when it detects a git merge that wasn’t reflected yet. And it sends Telegram notifications directly via the API for status updates.

Why not make the orchestrator an agent too? Because I don’t want the thing deciding which tasks to work on based on vibes. Task ordering, dependency resolution, retry logic: these are deterministic problems. A “while True” loop with some “if” statements handles them better than any LLM.

Public repo

GitHub - Hugo-SEQUIER/openclaw-template

The Template: Everything Baked In

Every new project starts from the same repo template. That template contains:

An “IDEA.md” file where I describe the app (this is all I write)
“.claude/commands/” with skill definitions for each agent
“conventions/” with coding standards, typing rules, git branching strategy
Pre-defined API contract structure so frontend and backend agents agree on interfaces

The agents don’t start from zero. They start from a well-defined environment with clear rules. You write the IDEA.md. They do the rest.

The Escalation Chain

When an agent is stuck, it doesn’t spin in circles:

Dev agent needs a product decision? Asks the PM agent.
Dev agent needs something external (API key, third-party service)? Sends me a Telegram message and stops working until I respond.
Reviewer blocked after 3 review cycles? Escalates to me.
PM unsure about scope? Asks me on Telegram.

The hierarchy: Dev -> PM -> Hugo. No agent tries to solve a problem outside its role.

Security

Each GitHub repo gets its own SSH deploy key. The agents can only access repos where I’ve explicitly added the key. No global GitHub token floating around.

RLS (Row Level Security) on every Supabase table is mandatory and verified by the Reviewer agent during code review. This is non-negotiable in the conventions.

The Workflow: From Idea to App

Here’s what a new project looks like end-to-end:

Step 1 (me, ~10 min): Create a repo from the template. Write an IDEA.md describing the app. Push it. Send the PM a Telegram message: “New project, read the IDEA.”

Step 2 (PM, automatic): The PM reads the IDEA.md and generates an architecture document, a KANBAN with prioritized tasks, detailed specs for each task, and API contracts.

Step 3 (me, ~5 min): The PM pings me on Telegram with the plan. I review it. I say “Go.”

Step 4 (orchestrator, automatic): Backend tasks first, frontend after. Reviewer checks every PR. Content agent writes store listings when the MVP is stable. I get pinged only when something needs my attention.*

15 minutes of my time. Then I wait.

First Real Test: HerCompass

My first real project with this setup was HerCompass, a menopause symptom tracking app. I wrote the IDEA.md in about 15 minutes, described the core features (symptom logging, pattern tracking, daily check-in), pushed it, and told the PM to go.

What worked:

The PM’s architecture doc was clean. Good separation between Supabase tables, sensible RLS policies, typed contracts between front and back.
Backend tasks completed without issues. Migrations, edge functions, auth flow.
The Reviewer caught real problems: missing RLS on one table, a type mismatch in an API contract.
The STOP step let me redirect the frontend agent twice when the UI approach didn’t match what I wanted.

What didn’t:

Race conditions between agents. The dev would start a new task while the Reviewer was still asking him to fix the previous one. Two branches diverging, merge conflicts, chaos.
The PM sometimes over-specced simple features. A settings page doesn’t need a 200-line spec.
Context window limits meant agents occasionally forgot earlier decisions in long sessions. Once I had to SSH in and push a Supabase function myself because the agent just… forgot to deploy it.

But here’s the thing: each of these problems only happened once. The dev/Reviewer race condition? I added a lock in the orchestrator so a dev agent can’t pick up a new task until its previous PR is merged or closed. The forgotten deploy? Added a post-task checklist to the SOUL.md. Every bug in the pipeline becomes a two-line fix in the orchestrator or a new rule in the conventions. The system gets better with each project.

The MVP skeleton is working: auth, database, core screens, navigation. The agents wrote about 95% of the code.

What I Actually Learned

Keep the orchestrator dumb. A Python script with a while loop and if statements. No LLM deciding which task to pick next. Deterministic task dispatch means predictable behavior, and predictable behavior means I can debug problems in 5 minutes instead of wondering why the AI decided to skip a task.

The STOP step pays for itself every single time. Five minutes of plan review saves hours of wrong implementation.

$20 per app, for real. VPS: $6/month. LLM tokens for one MVP: ~$15. GitHub, Supabase, Vercel: free tiers. Total: roughly $20. Even if you count the VPS across multiple projects it gets cheaper.

Telegram as the interface was the right call. I review plans, approve tasks, and get status updates from my phone while walking around Da Nang. No need to SSH into the VPS unless something breaks.

What’s next? Shipping more MVPs, for clients and for myself. Every project exposes new edge cases in the orchestrator, new gaps in the conventions. But the foundation works. And it costs about $20 per app.

If you want to try this yourself, OpenClaw is open source. The hardest part isn’t the tooling. It’s writing good SOUL.md files, setting up conventions that agents actually follow, and accepting that you’ll spend the first two projects debugging your own rules rather than shipping features.

That part nobody tells you. But once it’s dialed in, it works.

I’m Hugo, a freelance Data Scientist specializing in AI for Construction. I build intelligent systems for a living, and apparently also for my note-taking. If you want to chat about Obsidian, AI workflows, or Second Brain architectures, find me on LinkedIn or on my Linktree.

How to Use Kimi K2.5 Free on OpenClaw

Hugo Sequier — Fri, 20 Feb 2026 12:12:40 GMT

NVIDIA NIM gives you Kimi K2.5 for free. Here’s how to set it up, and what they don’t tell you about latency

Kimi-k2.5 model card on https://build.nvidia.com/moonshotai/kimi-k2.5/modelcard

Moonshot AI’s Kimi K2.5 is an impressive open-source model, 1 trillion parameters (32B activated via MoE), 200K context window, native multimodal support, and strong coding performance that rivals paid alternatives. NVIDIA offers free access to it through their NIM platform.

Here’s how to set it up on OpenClaw in under 5 minutes. Zero cost.

What You Need

OpenClaw installed and running (if you’re using a Contago VPS, it’s already pre-configured, literally nothing to set up)
A free NVIDIA account

That’s it.

Step 1: Get Your NVIDIA API Key

Go to NVIDIA API Keys
Create an account or sign in
Generate an API key, it starts with “nvapi-…”
Copy it somewhere safe

Generate API Key interface

Step 2: Set the Environment Variable

The API key needs to be available as an environment variable for the OpenClaw service.

For your shell (manual usage)

echo 'export NVIDIA_API_KEY="nvapi-YOUR-KEY-HERE"' >> ~/.bashrc
source ~/.bashrc

For the systemd service (important!)

The systemd service does not read “~/.bashrc”. You need to set the variable separately:

# Immediate (current session)
systemctl --user set-environment NVIDIA_API_KEY="nvapi-YOUR-KEY-HERE"

# Persistent (survives reboots)
mkdir -p ~/.config/environment.d
echo 'NVIDIA_API_KEY=nvapi-YOUR-KEY-HERE' >> ~/.config/environment.d/openclaw.conf

Step 3: Configure openclaw.json

Open your config file:

nano ~/.openclaw/openclaw.json

Add the NVIDIA NIM provider

In the ”models” section, add:

"models": {
  "mode": "merge",
  "providers": {
    "nvidia-nim": {
      "baseUrl": "https://integrate.api.nvidia.com/v1",
      "apiKey": "${NVIDIA_API_KEY}",
      "api": "openai-completions",
      "models": [
        {
          "id": "moonshotai/kimi-k2.5",
          "name": "Kimi K2.5 (NVIDIA NIM)",
          "reasoning": false,
          "input": ["text"],
          "cost": {
            "input": 0,
            "output": 0,
            "cacheRead": 0,
            "cacheWrite": 0
          },
          "contextWindow": 200000,
          "maxTokens": 8192
        }
      ]
    }
  }
}

Key things to note:

”mode”: “merge” is mandatory for custom providers to work
”apiKey”: “${NVIDIA_API_KEY}” references the env variable , don’t hardcode your key
The model ID is moonshotai/kimi-k2.5 (no nvidia/ prefix)

Set it as default with an alias

In the ”agents” section:

"agents": {
  "defaults": {
    "model": {
      "primary": "kimi",
      "fallbacks": [
        "openrouter/minimax/minimax-m2.5",
        "openrouter/moonshotai/kimi-k2.5"
      ]
    },
    "models": {
      "nvidia-nim/moonshotai/kimi-k2.5": {
        "alias": "kimi"
      }
    }
  }
}

The alias ”kimi” lets you write ”primary”: “kimi” instead of the full provider path.

The fallbacks matter: when NVIDIA rate-limits you (40 RPM), OpenClaw automatically switches to OpenRouter. No manual intervention needed.

Step 4: Restart OpenClaw

systemctl --user restart openclaw-gateway

Step 5: Verify

openclaw status

You should see something like:

| agent:main:main | direct | 1m ago | moonshotai/kimi-k2.5 | unknown/200k (?%) |

Quick test:

openclaw agent --agent main --session-id test-kimi -m "Say hello and tell me which model you are"

If an agent shows “minimax/minimax-m2.5” instead, it means the NVIDIA rate limit was hit and the fallback kicked in. That’s expected behavior.

Full Configuration Example

Here’s a minimal working openclaw.json:

{
  "models": {
    "mode": "merge",
    "providers": {
      "nvidia-nim": {
        "baseUrl": "https://integrate.api.nvidia.com/v1",
        "apiKey": "${NVIDIA_API_KEY}",
        "api": "openai-completions",
        "models": [
          {
            "id": "moonshotai/kimi-k2.5",
            "name": "Kimi K2.5 (NVIDIA NIM)",
            "reasoning": false,
            "input": ["text"],
            "cost": {
              "input": 0,
              "output": 0,
              "cacheRead": 0,
              "cacheWrite": 0
            },
            "contextWindow": 200000,
            "maxTokens": 8192
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "kimi",
        "fallbacks": [
          "openrouter/minimax/minimax-m2.5"
        ]
      },
      "models": {
        "nvidia-nim/moonshotai/kimi-k2.5": {
          "alias": "kimi"
        }
      },
      "maxConcurrent": 4,
      "subagents": {
        "maxConcurrent": 8
      }
    }
  },
  "commands": {
    "native": "auto",
    "nativeSkills": "auto"
  },
  "gateway": {
    "mode": "local",
    "auth": {
      "mode": "token",
      "token": "YOUR-GATEWAY-TOKEN"
    }
  }
}

Troubleshooting

Missing env var “NVIDIA_API_KEY”

The systemd service can’t see the variable. Fix:

systemctl --user set-environment NVIDIA_API_KEY="nvapi-YOUR-KEY"
systemctl --user restart openclaw-gateway

Agent is slow (>30s to respond)

That’s the free tier. NVIDIA NIM can be sluggish. If it’s too painful, swap the priority, put OpenRouter as primary and keep NVIDIA as fallback:

"model": {
  "primary": "openrouter/minimax/minimax-m2.5",
  "fallbacks": ["kimi"]
}

Rate limit (40 RPM)

If you’re running multiple agents in parallel, you’ll hit the NVIDIA rate limit fast. The fallbacks handle it automatically. Nothing to do.

An agent uses a different model

If an agent has its own “model.primary” override in the “list” section, it won’t use the default. Remove per-agent model overrides to inherit the global config.

The Honest Take

Setting this up takes about 3 minutes. Kimi K2.5 is genuinely capable, 256K context, strong at code generation, and the price is unbeatable (free).

But here’s the catch: the NVIDIA NIM free tier is slow. We’re talking 30+ seconds per response sometimes. For quick experiments or light usage, it’s great. For anything intensive , multi-agent workflows, iterative coding sessions, the latency kills your productivity.

I personally switched back to OpenRouter after a day. The speed difference is night and day. But I still keep NVIDIA NIM configured as a fallback, free tokens are free tokens.

TL;DR: Set it up, enjoy the free access, but don’t expect production-grade speed on the free tier. Use OpenRouter when you need to move fast.