Notes for January 26 - February 1

I’ve had some feedback that my last few weekly notes (especially ) have been a bit too long and that I should try to keep them shorter and more focused on a single topic.

Well, this week I wrote a lot thanks to the inclement weather and some insomnia, so I broke most of it out into separate posts:

This is the age of TikTok, after all, and attention spans are shorter than ever, so I might as well try to adapt.

But there was a lot more going on this week, so here’s a quick roundup of two other things I worked on:

go-rdp Improvements

My web-based RDP client got UDP transport support—experimental, gated behind a --udp flag, but apparently functional. So far it’s going well, even if audio support is not all there yet (and yes, I know it’s a bit much for a web client).

This is one of my “things that should exist” projects that I’ve been using to experiment how to spoon-feed agents with highly complex protocol specs, and it’s been working out great so far, largely because RDP is so well documented.

The test suite now includes Microsoft Protocol Test Suite validation tests for RDPEUDP, RDPEMT, RDPEDISP, and RDPRFX. It’s the kind of spec compliance work that’s tedious but essential, and that I am increasingly convinced can be massively sped up with (after all, it took me less than a working week of continuous effort to do the whole thing, instead of the months it would have taken me otherwise).

go-ooxml from OO to… Hero?

As another data point, I decided to build my own Office Open XML library in this week, called go-ooxml. It too is meant to be a clean-room implementation, and I intend it to be a comprehensive replacement for the various format-specific libraries out there–I’ve already been using as an accelerator for in things like pysdfCAD, so I know it works.

And it also fits into the “things that I think should exist” category, because existing libraries for this are either abandoned, incomplete, or have APIs that make me want to cry.

And on the front the ECMA-376 spec is perfect:

  • It is a real thing people used for other implementations, and progress has been ridiculous–agents are genuinely useful for this kind of spec-driven implementation work.
  • In about two days (less than four hours of “actual work”), I went from initial project setup to enough to handle basic document creation (mostly Word, which is the most complex)

Is it production-ready? Of course not.

But it’s already more complete than many alternatives, and having the foundation right means I can add features as I need them without fighting the architecture.

Agents make the breadth possible, but they don’t remove the need for taste: deciding what the public API should look like, what’s testable, and what’s going to be maintainable six months from now is, of course, the hard part still to come…

At least I’m aware of it, I guess.

Thoughts on AI-Assisted Software Development in 2026

A few things I jotted down during –i.e., while building out my agentbox and webterm setups and other things.

Agents love specs. My go-ooxml project went from nothing to 60% spec compliance in days because I fed the agents the actual ECMA-376 spec documents and told them to implement against those. No hallucination about what XML elements should be called, no invented APIs—just spec-compliant code.

Mobile is an afterthought until it isn’t. Half the webterm fixes this week were iOS/iPad edge cases. If you’re building tools you’ll use on multiple devices, test on all of them early, because agents can only help you with things that they can test for autonomously.

The unglamorous work matters. I did a lot of CI/CD cleanup jobs, release automation, pipelines, and invested quite a few hours in creating solid SKILL.md scaffolding–none of this is exciting, but it’s what separates a tool you can rely on from a tool that occasionally bites you, and right now, for me, at least, it’s what makes agents genuinely useful.

There’s going to be more software. With such a low barrier to entry into new languages, tools or frameworks, any decent programmer is soon going to realize that their skills are transferable to the point where they can take on work in any technology.

There’s going to be more shitty software because, well, there are a lot of overconfident people out there, for starters, and the law of averages is inevitably going to kick in at some point. I am acutely aware that I am treading a fine line between “productive developer leveraging AI” and “architecture astronaut”, but my focus is always on shipping self-contained, small tools that solve real problems (for me, at least), so I hope I can avoid that pitfall.

The number of truly gifted developers is going to stay roughly the same, because programming, like any form of engineering, is a mindset much more than it is a skill.

Some of these are debatable, of course, but they are my current take on things. Let’s see how they hold up over time.

Vibing with the Agent Control Protocol

Although most of my actual work , I have been wanting an easy way to talk to the newfangled crop of agents from my iPhone.

So I spent a good chunk of this week building out a Slack-like web interface for chatting with agents via the Agent Client Protocol (ACP) called vibes.

Right now, it looks like this:

Just a web view, right? Well, not quite.
Just a web view, right? Well, not quite.

The UX

I blame for planting the seed of a chat-like interface for agents in my mind–this is not replacing my terminal-based workflow, but it’s a nice complement, especially for quick check-ins or when I want to give an agent a task and review the results later from my phone.

The web layer is “just” Preact and SSE, with enough CSS for it to work nicely in small screens and touch input, and the main timeline view shows messages from me and the agent, with support for rich content like code blocks, KaTeX formulae, images, and resource links.

But the key thing is the tool permission flow: when the agent wants to call a tool, it shows a modal with an explanation of what the tool does (fetched from the ACP server), and I can approve or deny it with a tap–that is the key part of ACP that I wanted to leverage, and that so far I’ve only seen in CLI/TUI clients.

The Back-End

How things tie together
How things tie together

One thing that isn’t on the diagram above is the database. I love for small projects like this, and all the more so now that I learned the tricks around using JSON columns for flexible data storage. And, of course, you get full text search support out of the box, which is perfect for searching what I intend to be an infinite timeline.

Hacking In ACP

Like , I am . ACP has many of the same flaws, except that now you also have to deal with the ambiguity of how to surface all of the interactivity you’d have in a TUI in a chat timeline.

Content parsing went through several iterations to handle all the edge cases: tool calls, thinking panels, resource links, embedded resources with annotations, live updates from the agent, etc.

And I had to test it with multiple ACP servers, since each implementation has its own quirks. Right now, vibes works reasonably well with my python-steward, Mistral’s vibe and GitHub Copilot CLI, but all of them have small differences in how they implement the spec.

If I had to do it again, I would have probably built a proper acp client library in or first, but since I was building both the client and server sides at the same time, I just kept iterating on the wire format until everything worked.

But Why?

It’s not just the convenience of having a cute web app on my phone–having a low-friction review loop is essential when working with agents (which is why I was keen on leveraging ACP in the first place), but I also wanted persistent history and richer rendering than what a terminal can provide, because I want to give my agents more complex tasks that involve multiple steps and outputs.

Everyone and their dog seems to be thinking that agents only have to have bash (Armin Ronacher makes some excellent points), but I am trying to strike a balance when designing steward: Give it all the tools it should need for most use cases, a little scripting engine (QuickJS) for extensibility, and extensive SKILL.md support so I can teach it to do new things.

The Sandboxing Endgame

I am pretty sure that my endgame will eventually involve WASM (maybe tinygo in a sandbox or a Cloudflare-like V8 isolate) and I’m actually hedging my bets by looking at porting a subset of busybox to , but for the moment I want to keep things simple and give agents access to higher-level tools that can do complex things without needing to script them from scratch.

Because, well, I don’t want to write coding agents. There’s a special kind of myopia around their incredible success, but I think there should be some balance in the Force.

Crustaceans are cool and all, but sometimes you just want to vibe with your agent about something as prosaic as scheduling a meeting or searching your vault.

Seizing The Means Of Production (Again)

Since , I’ve been hardening my agentbox and webterm setup through sheer friction. The pattern is still the same:

Small, UNIX-like, loosely coupled pieces which I then glue together with labels, WebSockets and enough duct tape to survive , and the goal is still the same: to reduce friction and cognitive load when managing multiple agents across what is now almost a dozen development sandboxes.

My take on tooling is, again, that a good personal stack is boring. You should be able to understand it end-to-end, swap parts in and out of, and keep it running even when upstreams change direction. The constraint is always the same: my time is finite, so any complexity I add needs to pay rent.

And boy, has there been complexity this week.

Going Full-On WASM and WebGL

I rewrote a huge chunk of webterm this week.

I started out with the excellent Textual scaffolding (i.e., xterm.js + a thin server), but I kept having weird glitches (mis-aligned double-width characters, non-working mobile input, theme handling, etc.).

So, being myself, I decided to reinvent that particular wheel, and serendipitously I stumbled onto a build of Ghostty that is pretty amazing–it can render using WebGL, fixed all of my performance issues with xterm.js, and… well, it was a bit of a challenge to deal with, but only because of a few incomplete features.

In the pre- days, I would have stopped there, but this week it took me under an hour to create a patched fork of ghostty-web that filled in the gaps I wanted and that I could just drop into webterm.

Then came the boring part–ensuring the font stack worked properly across platforms, fixing a few rendering glitches, replacing the entire screenshot capture stack (which is what I loved about Textual) with pyte, and… a lot of mobile testing.

Still, the end result is totally worth it:

The prettiest thing I did all week
The prettiest thing I did all week

There were a lot of little quality-of-life improvements that came out of this rewrite:

  • The dashboard got typeahead search, so I can quickly find the right sandbox among many.
  • And the most satisfying cosmetic fix: dashboard screenshots now use each session’s actual theme palette.
  • PWA support landed, so the iPad can treat it like a proper “app”.
  • The WebSocket plumbing got a proper send queue so slow clients couldn’t freeze other sessions.

I would have rewritten this in , but as it happens the Go equivalent of pyte didn’t seem to be good enough yet, and running half a dozen sessions at a time for a single person isn’t a load-sensitive setup anyway.

Again, this is all about reducing friction: Color helps me recognize the project, and typehead find makes it trivial to, well… find. The less mental overhead I have to deal with when switching contexts, the more likely I am to actually use the tools I’ve built.

Mobile Woes

But getting it to work properly on mobile was a pain:

  • Mobile keyboard handling was a mess. You can’t customize the onscreen keyboard in the browser, and modifier keys were especially problematic.
  • To make mobile usable for real work (not just htop screenshots), webterm now pops up a draggable keybar with Esc/Ctrl/Shift/Tab and arrow keys, which are “sticky” so you can tap out proper Ctrl/Shift arrow sequences–and Ctrl+C, which is kind of essential.
  • Focus was a big problem. is incredibly finicky about input browser input events–and if you test on an iPad with a keyboard attached, you miss half the problems. The “solution” was to monkeypatch input via a hidden textarea that captures all input events and forwards them to the terminal renderer–and that still breaks in weird, unpredictable ways.

I might have gone a bit overboard with testing–I don’t have an Android tablet, so I decided to test on my Oculus Quest 2 headset browser, which is almost Android with a head strap:

Testing `webterm` on the Oculus Quest 2 browser--it works surprisingly well!

ANSI Turtles All The Way Down

Then came even weirder rendering bugs, since, well, terminals are terminals. And for such a simple concept, the stack is surprisingly complex:

Each and every one of those arrows gave me a headache
Each and every one of those arrows gave me a headache

For instance, you’ll notice in the diagram above that there is a PTY layer and in the mix. That means there are two layers of terminal emulation happening, and both need to be configured properly to avoid glitches.

For instance, I kept getting 1;10;0c when I connected, which led me down the weirdness of ANSI escape codes and nested terminal emulators (something I hadn’t done since running emacs to wrap VAX sessions…). sends DA2 queries, but my wrapper ended up having to filter more than DA1 responses and not messing up UTF-8 sequences.

Then I realized that the Copilot CLI sends a bunch of semi-broken escape sequences that pyte couldn’t handle properly, which led to all sorts of rendering glitches in the screenshots, and another round of patches, and another…

Scaffolding The Future

I also spent a good chunk of time this week improving the agentbox Docker setup, adding better release automation, cleaning up old artifacts, and generally making it easier to spin up new sandboxes with the right tools and my secret weapon:

A set of starter SKILL.md files that teach the bundled agents how to manage the environment, use how I prefer to develop, and generally be useful and run through proper code/lint/test/fix cycles without me having to babysit them.

Right now I’m at a point where I can just go into any of my git repositories, run make init (or, if it’s an old project, point Copilot at the skel files and tell it to read and adapt them according to the local SPEC.md), and have a fully functional AI agent sandbox ready to go.

That I can do that and the infra for it in under a minute, with proper workspace mappings, RDP/web terminal access, and to get the results back out, is just… chef’s kiss.

Ah well. At least now I have a pretty solid UX that even works from on my ageing iPad Mini 5 snappily enough (as long as I don’t try to open too many tabs), and I can finally start focusing on other stuff.

Which I sort of did, all at once…

TIL: Apple Broke Time Machine Again On Tahoe

So… Here we are again.

Today, after a minor disaster with my vault, I decided to restore from Time Machine, and… I realized that it had silently broken across both my Tahoe machines. I use a NAS as Time Machine target, exporting the share over and that has worked flawlessly for years, but this came as a surprise because I could have sworn it was working fine a couple of months ago–but no, it wasn’t.

For clarity: It just stopped doing backups, silently. No error messages, no notifications, nothing. Just no backups for around two months. On my laptop, I only noticed because I was trying to restore a file and the latest backup was from December. On my desktop, I had a Thunderbolt external drive as a secondary backup.

After some research, I found out that the issue is with unilateral decision to change their SMB defaults (without apparently notifying anyone), and came across a few possible fixes.

What Seems To Be Working Now

I found this gist, which I am reproducing here for posterity, that seems to be working for me, but which entails editing the nsmb.conf file on the Mac itself–which is not exactly ideal, since I’m pretty sure Apple will break this again in the future.

sudo nano /etc/nsmb.conf # I used vim, of course

…and adding the following lines (the file should be empty):

[default]
signing_required=yes
streams=yes
soft=yes
dir_cache_max_cnt=0
protocol_vers_map=6
mc_prefer_wired=yes

The explanation here is that Tahoe changed the default from signing_required=no to stricter control, and NAS devices with relaxed SMB settings cannot handle this without explicit configuration.

Another common pitfall is name encoding issues in machine names, so you should remove Non-ASCII Characters from the .sparsebundle name (that wasn’t an issue for me, but YMMV).

On the side, the recommendation was to go to Control Panel > File Services > SMB > Advanced and set:

  • Maximum SMB protocol: SMB3
  • Enable Opportunistic Locking: Yes
  • Enable SMB2 Lease: Yes
  • Enable SMB Durable Handles: Yes
  • Server signing: No (or “Auto”)
  • Transport encryption: Disabled

That doesn’t quite match my DSM UI, but it’s close enough, and my settings now look like this:

My SMB settings, as of DSM 7.3.2-86009-1
My SMB settings, as of DSM 7.3.2-86009-1

My Backup Backup Plan

Since I’m tired of Apple breaking Time Machine every few years and the lack of transparency around this (it’s not ’s fault), I have decided to implement a more robust solution that doesn’t depend on Synology’s SMB implementation.

I already have that has an LXC container running Samba for general file sharing, so I decided to look into that as a possible Time Machine target.

As it happens, mbentley/timemachine is a image specifically designed for this purpose, and it seems to be well-maintained, so I’m testing it like this:

services:
  timemachine:
    image: mbentley/timemachine:smb
    container_name: timemachine
    restart: always
    network_mode: host
    environment:
      - TM_USERNAME=timemachine
      - TM_GROUPNAME=timemachine
      - PASSWORD=timemachine
      - TM_UID=65534 # 'nobody' user
      - TM_GID=65534 # 'nobody' group
      - SET_PERMISSIONS=false
      - VOLUME_SIZE_LIMIT=0
    volumes:
      # this is a pass-though mountpoint to the ZFS volume in Proxmox
      - /mnt/shares/timemachine:/opt/timemachine
    tmpfs:
      - /run/samba

Right now the first option seems to be working, but I will probably switch to the solution in the near future, since it gives me more control over the implementation and avoids relying on ’s software.

But if anyone from Apple is reading this: please, stop breaking Time Machine every few years. It’s a critical piece of infrastructure for many users, and the lack of communication around these changes is frustrating.

A Minor, Yet Annoying, Additional Problem

Plus I’m annoyed enough that earlier this morning I tried to set up a new device and the infamous Restore in Progress: An estimated 100 MB will be downloaded… bug (which has bitten me repeatedly over the last six years) is still there.

The usual fix was hitting Reset Network Settings and a full hardware reboot, plus reconnecting to Wi-Fi… But this time it took three attempts.

Come on, Apple, get your act together. Hire people who care about the OS experience, not just .

Notes for January 19-25

Since , I’ve been heads-down building a coding agent setup that works for me and using it to build a bunch of projects, and I think I’ve finally nailed it. A lot more stuff has happened since then, but I wanted to jot down some notes before I forget everything, and my next weekly post will probably be about the other projects I’ve been working on.

Seizing The Means Of Production

I have now achieved coding agent nirvana–I am running several instances of my agentbox code agent container in a couple of VMs (one trusted, another untrusted), and am using my textual-webterm front-end to check in on them with zero friction:

My trusted set of agents
My trusted set of agents

This is all browser-based, so one click on those screenshots (which update automatically based on terminal activity) opens the respective terminal in a new tab, ready for me to review the work, pop into vim for fixes, etc. Since the agents themselves expend very little CPU or RAM and I’ve capped each container to half a CPU core, a 6-core VM can run literally dozens of agents in parallel, although the real limitation is my ability to review the code.

But it’s turned out to be a spectacularly productive setup – a very real benefit for me is having the segregated workspaces constantly active, which saves me hours of switching between them in , and another is being able to just “drop in” from my laptop, desktop, iPad, etc.

As someone who is constantly juggling dozens of projects and has to deal with hundreds of context switches a day, the less friction I have when coming back to a project the better, and this completely fixes that. Although I had this mostly working last week, getting the pty screen capture to work “right” was quite the pain, and I had to guide the LLM through various ANSI and double-width character scenarios–that would be worth a dedicated post on its own if I had the time, but anyone who’s worked with terminal emulators will know what I’m talking about.

You Wanted Sandboxing? You Got Sandboxing

Another benefit of this approach is that none of the agents are running locally and can’t possibly harm any of my personal data.

The whole thing (minus , which is how I connect everything securely) looks like this:

I had to explain this to a few people already, so here's the detailed diagram
I had to explain this to a few people already, so here's the detailed diagram

I have several levels of sandboxing in place:

  • Each container is an agentbox instance with its own /workspace folder
  • Containers are capped in both CPU and RAM (although that only impacts their ability to run builds and tests–but even Playwright testing works fine)
  • The containers are running in a full VM inside (capped at six cores and 16GB) and one of my ARM boards (more cores, but just 8GB of physical RAM)
  • The “untrusted” agents use LiteLLM to access Azure OpenAI, so they never have production keys and can be capped in various ways
  • Each setup runs a instance that syncs the workspace contents back to my Mac so I can do final reviews, testing and commits–that’s the only way any of the code reaches my own machine.

As to the actual agent TUI inside the agent containers, I’m using the new GitHub Copilot CLI (which gives me access to both Anthropic’s Claude Opus 4.5 and OpenAI’s GPT-5.2-Codex models), Gemini (for kicks) and Mistral Vibe (which has been surprisingly capable).

After I relegated OpenCode to the “untrusted” tier, and I also have my own toy coding assistant (based on python-steward, and focused on testing custom tooling) there.

KISS

A good part of the initial effort was bootstrapping this, of course, but since I did it the UNIX way (simple tools that work well together), I’ve avoided the pitfall of doing what most agent harnesses/sandboxing tools are trying to do, which is to do full-blown, heavily integrated environments that take forever to set up and are a pain to maintain.

I don’t care about that, and prefer to keep things nice and modular. Here’s an example of my docker compose file:

---
x-env: &env
  DISPLAY: ":10"
  TERM: xterm-256color
  PUID: "${PUID:-1000}"
  PGID: "${PGID:-1000}"
  TZ: Europe/Lisbon

x-agent: &agent
  image: ghcr.io/rcarmo/agentbox:latest
  environment:
    <<: *env
  restart: unless-stopped
  deploy:
    resources:
      limits:
        cpus: "${CPU_LIMITS:-2}"
        memory: "${MEMORY_LIMITS:-4G}"
  privileged: true # Required for Docker-in-Docker
  networks:
    - the_matrix

services:
  syncthing:
    image: syncthing/syncthing:latest
    container_name: agent-syncthing
    hostname: sandbox
    environment:
      <<: *env
      HOME: /var/syncthing/config
      STGUIADDRESS: 0.0.0.0:8384
      GOMAXPROCS: "2"
    volumes:
      - ./workspaces:/workspaces
      - ./config:/var/syncthing/config
    network_mode: host
    restart: unless-stopped
    cpuset: "0"
    cpu_shares: 2
    healthcheck:
      test: curl -fkLsS -m 2 127.0.0.1:8384/rest/noauth/health | grep -o --color=never OK || exit 1
      interval: 1m
      timeout: 10s
      retries: 3

  # ... various agent containers ...

  guerite:
    <<: *agent
    container_name: agent-guerite
    hostname: guerite
    environment:
      <<: *env
      ENABLE_DOCKER: "true" # this one needs nested Docker
    labels:
      webterm-command: docker exec -u agent -it agent-guerite tmux new -As0 \; attach -d
    volumes:
      - config:/config
      - local:/home/agent/.local
      - ./workspaces/guerite:/workspace

  go-rdp:
    <<: *agent
    container_name: agent-go-rdp
    hostname: go-rdp
    ports:
      - "4000:3000" # RDP service proxy
    labels:
      webterm-command: docker exec -u agent -it agent-go-rdp tmux new -As0 \; attach -d
    volumes:
      - config:/config
      - local:/home/agent/.local
      - ./workspaces/go-rdp:/workspace

# ... more agent containers ...

volumes:
  config:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: ./home
  local:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: ./home/.local

networks:
  the_matrix:
    driver: bridge

You’ll notice the labels, which are what textual-webterm uses to figure out what containers to talk to.

The Outputs

It’s been insane. Since this setup lets me drop back into each project at the click of a link and I can guide the agents for a couple of minutes at a time, or take notes and write specs in a separate window. Either of which fits well with my workflow and doesn’t require me to fire up a bloated IDE and loading a project folder (which can take quite a long time on its own).

So I now have the ability to create a bunch of things that I think should exist:

  • I now have my web-based RDP client working with a back-end that uses tinygo and to do high-performance decoding in the browser (which is something I’ve always wanted), and I decided to push it to the limit against the public test suites because I think a Go-based RDP client is something that should exist.
  • I took the existing pysdfCAD implementation of signed distance functions and replaced the slow marching cubes implementation it was using to render STL meshes with a Go-based backend that renders meshes much faster and with better quality (when it works–I need to sort out some bugs).
  • I built two (for now) extensions for mind-mapping and Kanban that match what I currently need from (and will be looking at enhancing Foam to match the editor soon)
  • I’m taking a couple of years of hacky scripts and building a writing agent that is going to help me do the automated conversion and re-tagging of the 4000+ legacy pages of this site that are still in format (the name editor is taken for building a WYSIWYG editor to replace with )
  • I ported a bunch of my own stuff (and a few fun things, like Salvatore Sanfilippo’s embedding model) to .
  • I started packaging my own servers as Azure App Services, so I can use the basic techniques to accelerate .

Lessons Learned

I’ve read about the Ralph Wiggum Loop, and it’s not for me (I find it to be the kind of thing you’d do if you’re an irresponsible adolescent rich kid with an inexhaustible supply of money/tokens and don’t really care about the quality of the results, and that’s all I’m going to say about it for now).

  • (write a SPEC.md, instruct the agents to run full lint/test cycles and aim for 80% test coverage, then go back, review and write TODO.md files for them to base their internal planning tools out of and work in batches) still works the best as far as final code quality is concerned. I still have to ask for significant refactorings now and then, but since my specs are usually very detailed (what libraries to use, which should be vendored, what code organization I want, and what specific test scenarios I believe to be the quality bar) things mostly work out fine.
  • Switching between models for coding and auditing/testing is still key. Claude (even Opus) has a tendency to be overly creative in tests, so I typically ask for test and security audits with GPT-5.2 that catch dozens of obviously stupid things that the Anthropic models did. Gemini is still a grey area, since I’m just using the free tier for it (although it seems unsurprisingly good at architecting packages).
  • Switching between frontier and small(ish) models for coding and testing also works great. gpt-5-mini, sonnet, haiku, mistral and gemini flash do a very adequate job of running and fixing most test cases, as well as front-end coding.
  • really doesn’t like when agents create virtualenvs or install npm packages, so I routinely have to tell the agents that they are in a containerized environment and that it’s fine to install pip and npm packages globally (i.e., outside the workspace mount point).
  • a little while back, is still the way to go for deterministic results with tools. Support for (and SKILL.md) is very uneven across all the current agentic TUIs, but with a few strategically placed symlinks I can have a workspace setup that works well across and the remote agents.
  • Having a shared set of tooling and skills across as many of your agents as possible really cuts down on the amount of prompting and scaffolding agents need to create per project. In that regard, umcp has probably been the best bang for the buck (or line of code) that I wrote in 2025, because I use it all the time.
  • Claude Code and Gemini have a bunch of teething issues with . Fortunately both Mistral Vibe and the new Copilot CLI work pretty well, and clipboard support is flawless even when using them inside both and textual-webterm.

And, finally, coding agents are like crack. My current setup is so addictive I find myself reviewing work and crafting TODOs for the agents from my iPad before I go to bed instead of easing myself into sleep with a nice book, which is something I really need to put some work into.

But I have a huge, decades-long list of ideas and projects I think should exist, and after three years of hallucinations and false starts, we’re finally at an inflection point where for someone with my particular set of skills and experience LLMs are a tremendous force multiplier for getting (some) stuff done, provided you have the right setup and workflow.

They’re still very far from perfect, still very unreliable without the right guardrails and guidance, and still unable to replace a skilled programmer (let alone an accountant, a program manager or even your average call center agent), but in the right hands, they’re not a bicycle for the mind–they’re a motorcycle.

Or a wingsuit. Just mind the jagged cliffs zipping past you at horrendous speeds, and make sure you carry a parachute.

The NestDisk

This one took me a while (for all the reasons you’ll be able to read elsewhere in recent posts), but the NestDisk has been quietly running beside my desktop for a month now, and it’s about time I do it justice.

The NestDisk mini NAS
The NestDisk mini NAS

This is a tiny Intel machine whose entire reason for existing is to let me cram four M.2 2280 SSDs behind dual 2.5GbE and end up with a small, fast, “boring” NAS that, like most mini PCs these days, can also double as a router and even an “AI box”, but the key point is that it’s designed around storage density and decent networking in a very small form factor.

Disclaimer: YouYeeToo sent me the NestDisk free of charge (for which I thank them), and as usual this article follows .

Like , the NestDisk is built around Intel’s N150, which is pretty much perfect for this category—low power (around 12W), modest clock speeds (1.6GHz up to 3.6GHz), and usually more than enough for file serving and a few services. The catch, as always with this kind of machine, is… thermals.

The short version is that if you already want an SSD-only NAS and have (or plan to have) 2.5GbE, this can make a lot of sense—provided it stays cool and stable once you actually populate all four slots.

Hardware

Even for an N150 machine, this is a pretty small box: 146×97.5×35mm, which is roughly the size of the external HDD enclosures I used to get a few years ago.

Design and Build Quality

I have to say that I very much like the color. I’m not usually a fan of bright colors on tech gear, but since most of the stuff on my desk is black, white or various shades of brown, the NestDisk stands out in a good way:

The NestDisk's bright orange case is much nicer in person.

In The Box

Besides the NestDisk itself, I also got an unusual accessory: a dual-fan USB cooler with quite nice-looking 120mm fans, which is meant to sit underneath the NestDisk and blow air over the M.2 area. I found this especially amusing for two reasons: first, because it’s much bigger than the NestDisk itself; and second, because I’ve actually been building a similar DIY cooling solution for my own use with a cheap fan controller and two 90mm fans:

The cooler accessory and my homebrew dual-fan setup

So I can see the rationale here, although it did make me wonder how cool the machine ran. The case does have two rubber feet on the bottom that ensure it has an airgap on a desk, but removing the heatsink was actually quite revealing:

This is a surprisingly beefy heatsink, and the four thermal pads show they aren't skimping on expectations.

First of all the thing is thick. It’s almost 7mm of solid metal, and probably the most substantial part of the entire enclosure. Second, YouYeeToo clearly didn’t skimp on expectations: they added all four thermal pads for the M.2 slot (unlike other manufacturers that only add one or two).

The extra SSD fans
The extra SSD fans

And, looking inside the SSD cavity (which, by the way, already came with a 1TB SSD on the second slot), I noticed there are two very small (40mm) fans that seem to blow air into the M.2 cavity.

This is interesting given that they seem to bring in air from the same side as the USB-A ports, although I have to question how effective they might be to cool all the SSDs.

Incidentally, the CPU seems to have its own cooling path and I believe there is a third fan that takes advantage of the case grilles to exhaust warm air out the other side and top.

But other than the (tragically ill-fated) , this is the most substantial heatsink I’ve seen in a mini PC of this size.

Side Note: refreshingly, you can apparently disassemble the whole thing without removing any of the rubber feet, but I didn’t test that for two reasons: first, because the plastic enclosure is very tightly fitted, with the ports flush with the outside (and thus holding the enclosure in place); and second, because I didn’t want to risk damaging the device before I even got to the “fun” parts.

Specs

The specs themselves are fairly straightforward:

  • Intel N150 CPU (4 cores / 4 threads; 1.6GHz base, up to 3.6GHz turbo)
  • 12GB LPDDR5 (soldered, not upgradeable, a very common N150 configuration)
  • Dual 2.5GbE Intel i226‑V network interfaces (also a common N150 reference design feature)
  • Wi-Fi 6 + Bluetooth 5.2
  • 64GB eMMC boot device (preloaded with OpenMediaVault)
  • Four M.2 2280 slots (PCIe 3.0 x2 per slot for NVMe; one slot also supports M.2 SATA)
  • Dual HDMI outputs + a 3.5mm audio jack
  • Three USB-A ports + one USB-C data port + one USB-C power port (not PD, apparently 19V/3.42A only)

The USB-C data port should support DisplayPort alt-mode, which means you can theoretically run three displays at once (2× HDMI + 1× USB-C DP), but I didn’t test that given that this is supposed to be a NAS.

Storage Layout

The NestDisk boots off the internal 64GB eMMC, which is plenty for OpenMediaVault and some plugins, and after setting it up (more on that later), here’s what the storage layout looks like:

Disk /dev/mmcblk0: 58.25 GiB, 62545461248 bytes, 122159104 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: CB0357AA-AB03-4851-8579-C52186DC58AD

Device             Start       End   Sectors  Size Type
/dev/mmcblk0p1      2048   1050623   1048576  512M EFI System
/dev/mmcblk0p2   1050624 120158207 119107584 56.8G Linux filesystem
/dev/mmcblk0p3 120158208 122157055   1998848  976M Linux swap

root@admin:~# lsblk
NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda            8:0    1  21.2G  0 disk
mmcblk0      179:0    0  58.3G  0 disk
|-mmcblk0p1  179:1    0   512M  0 part /boot/efi
|-mmcblk0p2  179:2    0  56.8G  0 part /
`-mmcblk0p3  179:3    0   976M  0 part [SWAP]
nvme0n1      259:0    0 953.9G  0 disk
mmcblk0boot0 179:256  0     4M  1 disk
mmcblk0boot1 179:512  0     4M  1 disk

M.2 slots and PCIe lane budgeting

The thing about building NAS devices with an N150 is that even though you get four M.2 2280 slots, they’re not necessarily full‑fat x4 per slot—from the specs and a little close inspection, the machine actually uses PCIe 2.0 switches to multiplex the available lanes, so this isn’t exactly a straightforward x4/x4/x4/x4 layout.

Here’s the full PCI layout for reference:

root@admin:~# lspci
00:00.0 Host bridge: Intel Corporation Alder Lake-N Processor Host Bridge/DRAM Registers
00:02.0 VGA compatible controller: Intel Corporation Alder Lake-N [Intel Graphics]
00:0a.0 Signal processing controller: Intel Corporation Platform Monitoring Technology (rev 01)
00:0d.0 USB controller: Intel Corporation Alder Lake-N Thunderbolt 4 USB Controller
00:14.0 USB controller: Intel Corporation Alder Lake-N PCH USB 3.2 xHCI Host Controller
00:14.2 RAM memory: Intel Corporation Alder Lake-N PCH Shared SRAM
00:14.3 Network controller: Intel Corporation CNVi: Wi-Fi
00:15.0 Serial bus controller: Intel Corporation Device 54e8
00:15.1 Serial bus controller: Intel Corporation Device 54e9
00:16.0 Communication controller: Intel Corporation Alder Lake-N PCH HECI Controller
00:1a.0 SD Host controller: Intel Corporation Device 54c4
00:1c.0 PCI bridge: Intel Corporation Alder Lake-N PCI Express Root Port
00:1c.6 PCI bridge: Intel Corporation Alder Lake-N PCI Express Root Port
00:1f.0 ISA bridge: Intel Corporation Alder Lake-N PCH eSPI Controller
00:1f.3 Audio device: Intel Corporation Alder Lake-N PCH High Definition Audio Controller
00:1f.4 SMBus: Intel Corporation Alder Lake-N SMBus
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-N SPI (flash) Controller
01:00.0 Non-Volatile memory controller: Realtek Semiconductor Co., Ltd. RTS5765DL NVMe SSD Controller (DRAM-less) (rev 01)
02:00.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
03:03.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
03:07.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
04:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-V (rev 04)
05:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-V (rev 04)

Either way, the important part is that the bandwidth seems to be plenty for most consumer SSDs, even when used four at a time. I wanted to test this, but since most of my test SSDs , I couldn’t really do any significant performance testing.

Power

The USB-C power input is an interesting choice. The YouYeeToo Wiki lists it 19V/3.42A, which is refreshingly specific, and it should support Power Delivery negotiation, but my policy with USB-C power inputs is still to always use the power brick that came with the device, so I didn’t try any alternatives.

What I can say is that the power envelope is pretty much N150 standard: 11W idle, up to around 25W under load, can be pushed a little higher if you really tax the CPU. That’s pretty good for a personal NAS.

Fan noise

The NestDisk is not fanless, but I never heard the fan until I load tested it, and I would classify it as “quietly insistent”. I don’t know the size of the CPU fan, but I know for a fact that the kind of 40mm internal fans I spotted near the NVMe cavity can definitely be audible in a quiet room, but in this case I seldom heard them.

But I did do one thing that most people probably wouldn’t: I set the machine up vertically, with the USB ports down, but held by a 3D-printed bracket similar to this one:

A 3D-printed vertical stand for the NestDisk
A 3D-printed vertical stand for the NestDisk

I broke it by accident the day before I was finalizing this draft (I dropped the bracket on the floor when re-arranging my desk), but while it lasted it held the NestDisk firmly in place, and more importantly it oriented the case grilles in a way that allowed better airflow.

Thermals

Which leads me to some of my testing. I got a fancy new IR thermometer for Christmas, so I was able to keep tabs on the NestDisk throughout, and the M.2 heatsink averaged 45°C, with everything else between 27°C under normal use. But it’s been quite a cold winter here (10°C outside average) and I don’t warm up the office much (it’s 20°C now), so these numbers might be a bit optimistic.

BIOS

The BIOS itself is fairly standard for an N150 machine, with the usual assortment of power management, boot order, and device configuration options. There are a few interesting bits, though:

Note the fan control options and power settings, which are important for a NAS device.

Software

The NestDisk comes preinstalled with OpenMediaVault, although you’ll have to be a little patient with it if you’re new to OMV. First off, there’s no initial setup wizard (or anything in the console, really), so you’ll have to boot the machine, note the IP address, and then log in with the default admin:openmediavault credentials.

The machine boots quickly into OpenMediaVault, and there isn't much to see on the console

OpenMediaVault Itself

Even though I am not a fan of OpenMediaVault’s quirky Amiga-era inside jokes (like fake error messages and Amiga cursors, which many newcomers find confusing), I can’t argue that it works perfectly for the NestDisk—it’s small enough to run off the internal eMMC and have plenty of room to spare for plugins, and it makes it trivial to do exactly what I want for this kind of device: get it on the network quickly, create shares, and move on with my life:

OMV's dashboard is clean and functional, giving me quick access to system status and storage overview.

What makes OMV a nicer consumer choice than a plain Debian install is that it gives me the things most people actually need—SMB/CIFS sharing, NFS if I want it, users/groups, permissions, monitoring, scheduled jobs and notifications—without requiring anyone to remember which config file does what.

OMV’s idea of a NAS is pleasantly conservative, and that’s great, and if I want to go beyond the core experience, OMV-Extras is the usual next step: it adds a much wider selection of third‑party plugins that can turn this from a simple NAS into a small server that happens to have a nice storage UI, which is exactly what I did with it over the past month: It sat on my desk running a small docker compose stack with the services I’ve been developing, and it did great.

A few caveats apply, though, besides the quirky interface: the OMV version that comes preinstalled tries to upgrade upon first boot and mine got a little confused, so it took a bit to get going. And you’ll probably want to set up ZFS if you use more NVMEs, since OMV still seems to default to EXT4 for new filesystems.

Performance

I saw no real difference from , which is to say that it performed exactly as expected for this class of device: plenty fast for SMB/CIFS file sharing, more than enough network throughput to saturate one 2.5GbE link, and more than enough CPU power to run a few containers without breaking a sweat.

What I did notice is that the thermals were better than I expected—I had some trouble getting the CPU to get past 60°C even under load, which is impressive for such a small box, and the idle temperatures were very reasonable:

# sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:            N/A

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8 C

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +47.0 C  (high = +105.0 C, crit = +105.0 C)
Core 0:        +47.0 C  (high = +105.0 C, crit = +105.0 C)
Core 1:        +47.0 C  (high = +105.0 C, crit = +105.0 C)
Core 2:        +47.0 C  (high = +105.0 C, crit = +105.0 C)
Core 3:        +48.0 C  (high = +105.0 C, crit = +105.0 C)

nvme-pci-0100
Adapter: PCI adapter
Composite:    +44.9 C  (low  =  -0.1 C, high = +99.8 C)
                       (crit = +109.8 C)

It took me half an hour of stress-ng to get the CPU to hit 65°C, which is very good for a mini PC of this size. Of course, the elephant in the room is how well it handles thermals when all four M.2 slots are populated, and I honestly don’t know, and can’t know until I get my hands on more SSDs.

Alternative OS options

This is, of course, where comes in in most of my reviews of mini PCs these days, and the NestDisk is no exception—I haven’t done it yet, but given its decent CPU, dual 2.5GbE, and four M.2 slots, it makes a lot of sense as a tiny hypervisor host that can run multiple VMs or containers, and I see no reason why it wouldn’t work well for that.

Conclusion

At this point, I think the NestDisk makes sense for three very specific use cases:

  • If you want an SSD-only NAS (and I’ve accepted that the drives cost more than the box, especially in this day and age).
  • If you have (or are moving to) 2.5GbE, and you care about improving the sustained transfer rates you’d get from an existing machine (say an HDD-based gigabit NAS).
  • If you value small and low power more than infinite expandability.

The NestDisk is appealing to me because despite coming from a standard N150 reference design, it is opinionated hardware: it prioritizes storage density and decent networking in a form factor that’s closer to a portable drive enclosure than a mini PC.

If the NVMe thermals perform well (which I haven’t been able to confirm), it can be a genuinely good always-on NAS or home media server for people who have already moved to SSDs and want something smaller and faster than a traditional SATA box.

Notes for January 1-18

Return to work happened mostly as expected–my personal productivity instantly tanked, but I still managed to finish a few things I’d started during the holiday break–and started entirely new ones, which certainly didn’t help my ever-growing backlog.

Herding Agents

As a way to chill out after work, I have been building more tooling, and since a lot of people are suddenly worried about sandboxing AI agents (which I’ve been doing via for a while now), I decided to fish out my old Azure development sandboxes and build an agentic one for myself, .

I’ve since rebranded it to agentbox, and had a lot of fun doing an icon for it:

Yes, Mr. Anderson, I'm part AI-generated, but Pixelmator is great for tweaking faces
Yes, Mr. Anderson, I'm part AI-generated, but Pixelmator is great for tweaking faces

In short, the agentbox container gives you a few bundled TUI agents (plus , , Docker in Docker and a bunch of development tools), and the docker compose setup makes it trivial to spin up agents with the right workspace mappings, plus to get the results back out to my laptop.

That led me down a few rabbit holes regarding actually getting access to the containers. The first trick is just attaching to the container consoles themselves using a trivial trick in a Makefile:

enter-%: ## Enter tmux in the named agent container (usage: make enter-<name>)
  docker exec -u agent -it agent-$* sh -c "tmux new -As0"

The second is having plain browser access to the containers. Rather than taking the trouble of building (or re-purposing) yet another web terminal wrapper, I took a more direct approach:

Since many AI agent TUIs use Textual and it can serve the entire UI via HTTP, I submitted patches to both Toad and Mistral Vibe to do that and make it even easier to access the sandbox.

But since I am also making a full RDP server available to each sandbox (because I want agents to be able to run a browser and playwright for UI testing), I decided to tackle another of my longstanding annoyances.

You see, one of the things that has been at the back of my mind for years is that Apache Guacamole seems to be the only decently maintained answer for connecting to RDP servers via a browser–and I find it to be a resource hog for most of my setups.

So this Friday I hacked at a three-year-old RDP HTML5 client until it worked with modern RDP servers. I don’t need a lot of fancy features or high-efficiency encoding to connect inside the same VM and I trust traefik and authelia to provide TLS and stronger authentication, so I aim to keep it simple:

This is a bit meta, but it is working great for a first pass
This is a bit meta, but it is working great for a first pass

But of course I couldn’t stop there… In a classic “belt and suspender” move, and since I’d like a generic web terminal solution that I can have full control over, I spent a few hours this Sunday afternoon hacking together textual-webterm as well.

Which… was completely unplanned, took away four hours of time I am never getting back (and that I needed today), and means I need to cut back on all these side projects since I’m already behind on so many things.

Telemetry Antics

I finally started pushing my homelab metrics to Graphite, and even if I have begrudgingly accepted I’ll probably have to live with Grafana for a little while, I mostly managed to figure out a simple (and relatively straightforward) data collection strategy using Telegraf as a sort of “universal” collector.

This did, however, sort of balloon out of control for a while because getting the metrics namespace the way I wanted it took a fair bit of doing–something I might write about separately.

Additionally, I realized that most application observability solutions there are overkill for my local development needs, so I hacked together a (relatively simple) OpenTelemetry to Graphite proxy, and following a trend of going back to and creating whimsical logos, I called it gotel:

I know, I've gone overboard on cute icons too
I know, I've gone overboard on cute icons too

And, of course, the instant you have observability you start spotting issues–in my case, was completely tanking the CPU on my NAS, so I spent a few evenings trying to tweak the configuration file, switching container images, etc., and think I fixed it:

My Synology stats look much better now, especially I/O wait across 5 drives
My Synology stats look much better now, especially I/O wait across 5 drives

You see, a serious problem with is that it insists on re-scanning folders at intervals (and even then with an element of randomness), which completely tanks CPU and I/O on low-end machines–especially NAS devices with hard disks.

It also has no way to schedule that regular maintenance sweep, so I created syncthing-kicker to see if I can get it to only do that during the wee hours.

And this is just half of what I have been up to this couple of weeks–I still have a huge backlog of stuff to finish, including a number of posts I’ve been putting out as I finish them…

My Rube Goldberg RSS Pipeline

Like everybody else on the Internet, I routinely feel overwhelmed by the volume of information I “have” to keep track of.

Read More...

Notes on SKILL.md vs MCP

Like everyone else, I’ve been looking at SKILL.md files and tried converting some of my tooling into that format. While it’s an interesting approach, I’ve found that it doesn’t quite work for me as well as does, which is… intriguing.

Read More...

When OpenCode decides to use a Chinese proxy

So here’s my cautionary tale for 2026: I’ve been testing toadbox, my very simple, quite basic coding agent sandbox, with various .

Read More...

Lisbon Film Orchestra

Great start to the show
A little while ago, in a concert hall not that far away…

How I Manage My Personal Infrastructure in 2026

As regular readers would know, I’ve been on the homelab bandwagon for a while now. The motivation for that was manifold, starting with the pandemic and a need to have a bit more stuff literally under my thumb.

Read More...

Notes for December 25-31

OK, this was an intense few days, for sure. I ended up going down around a dozen different rabbit holes and staying up until 3AM doing all sorts of debatably fun things, but here’s the most notable successes and failures.

Read More...

TIL: Restarting systemd services on sustained CPU abuse

I kept finding avahi-daemon pegging the CPU in some of my LXC containers, and I wanted a service policy that behaves like a human would: limit it to 10%, restart immediately if pegged, and restart if it won’t calm down above 5%.

Well, turns out systemd already gives us 90% of this, but the documentation for that is squirrely, and after poking around a bit I found that the remaining 10% is just a tiny watchdog script and a timer.

Setup

First, contain the daemon with CPUQuota:

sudo systemctl edit avahi-daemon
[Service]
CPUAccounting=yes
CPUQuota=10%
Restart=on-failure
RestartSec=10s
KillSignal=SIGTERM
TimeoutStopSec=30s

Then create a generic watchdog script at /usr/local/sbin/cpu-watch.sh:

#!/bin/bash
set -euo pipefail

UNIT="$1"
INTERVAL=30

# Policy thresholds
PEGGED_NS=$((INTERVAL * 1000000000 * 9 / 10))   # ~90% of quota window
SUSTAINED_NS=$((INTERVAL * 1000000000 * 5 / 100)) # 5% CPU

STATE="/run/cpu-watch-${UNIT}.state"

current=$(systemctl show "$UNIT" -p CPUUsageNSec --value)
previous=0
[[ -f "$STATE" ]] && previous=$(cat "$STATE")
echo "$current" > "$STATE"

delta=$((current - previous))

# Restart if pegged (hitting CPUQuota)
if (( delta >= PEGGED_NS )); then
  logger -t cpu-watch "CPU pegged for $UNIT (${delta}ns), restarting"
  systemctl restart "$UNIT"
  exit 0
fi

# Restart if consistently above 5%
if (( delta >= SUSTAINED_NS )); then
  logger -t cpu-watch "Sustained CPU abuse for $UNIT (${delta}ns), restarting"
  systemctl restart "$UNIT"
fi

…and mark it executable: sudo chmod +x /usr/local/sbin/cpu-watch.sh

It’s not ideal to have hard-coded thresholds or to hit storage frequently, but in most modern systems /run is a tmpfs or similar, so for a simple watchdog this is acceptable.

The next step is to make it executable and figure out how to use it via systemd templates:

sudo chmod +x /usr/local/sbin/cpu-watch.sh
# cat /etc/systemd/system/[email protected]
[Unit]
Description=CPU watchdog for %i
After=%i.service

[Service]
Type=oneshot
ExecStart=/usr/local/sbin/cpu-watch.sh %i.service
# cat /etc/systemd/system/[email protected]
[Unit]
Description=Periodic CPU watchdog for %i

[Timer]
OnBootSec=2min
OnUnitActiveSec=30s
AccuracySec=5s

[Install]
WantedBy=timers.target

The trick I learned today was how to enable it with the target service name:

sudo systemctl daemon-reload
sudo systemctl enable --now [email protected]

You can check it’s working with:

sudo systemctl list-timers | grep cpu-watch
# this should show the script restart messages, if any:
sudo journalctl -t cpu-watch -f

Why This Works

The magic, according to Internet lore and a bit of LLM spelunking, is in using CPUUsageNSec deltas over a timer interval, which has a few nice properties:

  • Short CPU spikes are ignored, since the timer provides natural hysteresis
  • Sustained abuse (>5%) triggers restart
  • Pegged at quota (90% of 10%) triggers immediate restart
  • Runaway loops are contained by CPUQuota
  • Everything is systemd-native and auditable via journalctl

It’s not perfect, but at least I got a reusable pattern/template out of this experiment, and I can adapt this to other services as needed.

Ovo

Yeah, I don’t know what the grasshoppers want with the egg either
Another great evening spent in the company of Cirque du Soleil

Predictions for 2026

I had a go at doing predictions for 2025. This year I’m going to take another crack at it—but a bit earlier, to get the holiday break started and move on to actually relaxing and building fun stuff.

Read More...

Archives3D Site Map