Anton Nesterov (@nesterow@vski.app)

Anton Nesterov @nesterow

What can a tiny ML model do?

If your phone has an IR sensor then it is possible to determine what are you paying attention to and possibly determine emotional responses.

This is just a useful side effect of face id, auto focus and other features.

Similar to voice assistants background scanning. While it scans for a trigger keyword it may pick something else.

Nope. No one burns servers to send your face to Google - this isn't how it works.

What can a tiny ml model do? Things like that and sometimes more depending on how applied .

Don't underestimate or discard simple and powerful ML techniques because of LLM hype.

Published: Mar 28, 2026, 23:35
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0

Anton Nesterov @nesterow

Deepseek actually figured out somewhat reliable way to embed "memory" during training.

It's an N-gram, so me calling it "memory" is just an oversimplification. But in a couple of words:

They've added a new component - "Engram", which is just a hash table NGram -> Embedding, These embeddings are adjusted with training so during inference they affect activation. The result is that the model has to make less of "guessing" having some bits of factual information which unsurprisingly improves accuracy and other metrics.

How is it different from RAG? While RAG improves the "summary" and augments the input - the Engram improves certainty in particular details because it works alongside transformers, so the augmentation comes while model generates tokens.

Arxiv: 2601.07372v1

Published: Mar 28, 2026, 01:06
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0

Anton Nesterov @nesterow

AI Providers Will Increase Prices

As I've been saying, trying to solve problems employing a "quantity over quality" approach is a failing experiment.

Initially the hope was that a high volume of data from regular users would help to improve ai models. In an ideal scenario the customers would work as janitors, fixing mistakes and thus improving accuracy and other important metrics. This is something that worked for a while for the software industry and coding models. Many developers were happy "to share" the codebases and internal dev processes working as code janitors. It worked. Now even small coding models are decent enough to improve average dev productivity.

There is a problem - what works in one case maybe not a good case study for the others.
Software industry was pretty much automated before the AI hype. From a couple of discussions that I had with my friends from Microsoft and a couple of middleman AI providers, it turned out that 95% of the inference goes to processing essentially the same queries. And only about 5% may carry something worth stealing. I am talking about cleaned and processed data, and I would argue it's even less.

The solution is simple. There won't be free tiers anymore and there'll be conditions attached.

The other problem they don't see yet is that people who have valuable experience and work on something worth stealing understood this business model very well. So there is a reason why we'd rather buy a €5k PC to run local llms rather than using Claude & Co. There is a reason people and companies switch to Linux. And I have a feeling the local AI might become cheaper all things considered.

Oh btw. Dear Google, if you think that the disclaimer "We don't use your org data to train models" made someone believe - you're wrong)

Published: Mar 27, 2026, 10:32*
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0
Edit timeline:: Edited Mar 27, 2026, 10:35; Edited Mar 27, 2026, 10:34; Published Mar 27, 2026, 10:32

Anton Nesterov @nesterow

Crystal is good for humans and LLMs.

Crystal is a compiled language inspired by Ruby. I kept my eye on it for quite a while. And I made myself productive in it in one day.

The amount of code you have to review and refactor has grown by factors.
So recently I've been looking for something easy to review, idiomatic and portable.

A ~10 years ago I had some experience with Ruby. I liked the language but the market decided what we had to use, so Ruby became locked in RoR ecosytem. It got me thinking - Wouldn't it be an advantage? Being a one-framefork-language means that it is idiomatic in every way possible.

So I made an exeriment. I gave coding agent a task to rewrite one of my services in Crystal. I only provided the tests and documenation. If Ruby is so idiomatic, whouln't Crystal be the same?

It was a success. I've got 50% less code than original, the logic is easy to trace, and during session the agent made less mistakes. I took less time to review and fix. Considering that It's new for me and the language is not so popular I think It's a huge success.

I'll see how it goes from now on.

#crystallang #ruby

Published: Mar 22, 2026, 12:43*
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0
Edit timeline:: Edited Mar 22, 2026, 16:23; Published Mar 22, 2026, 12:43

Anton Nesterov @nesterow

Android Desktop Mode

I tried it and I like this direction. It still needs some polishing, but in general I believe this is the kind of "Linux" most people will use.

It looks like Apple with their Mac Neo will get a formidable competition.

Docking phones and using them as a regular PC isn't a new idea and I anticipated it would happen ~5 years ago. Android phones have enough processing power to be a daily driver and replace a PC for the most people.

Published: Mar 20, 2026, 10:42*
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0
Edit timeline:: Edited Mar 21, 2026, 01:52; Edited Mar 20, 2026, 11:44; Edited Mar 20, 2026, 11:21; Published Mar 20, 2026, 10:42

Anton Nesterov @nesterow

Developing my own coding agent for nvim using my own coding agent for nvim 😬

Published: Mar 18, 2026, 19:05
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0

Anton Nesterov @nesterow

Vigent - My Nvim native Agent.

I am developing coding agent with an Nvim interface.

Nvim interface is not the main goal and I wasn't a heavy vim user until recently. I've been using motions and some basic features in Zed and VScode, but when I started developing my own agents (not only for coding btw) I've got several reasons to use neovim instead.

When you need a fully extensible editor Nvim is the best option. Now It's not only about coding - when you develop an agent you need to iterate faster, debug sessions, executions, workflows, etc And you create your own dev experiences for it, because Langchain & Co are shitty tools. Especially when you develop local agents with ~1b models...

https://vski.sh/x/vigent

* While in early alpha it is not distributed as a plugin, because for dev and testing it's easier to embed Lua in binary. Later I am going to separate concerns.

Published: Mar 18, 2026, 12:16
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0

Anton Nesterov @nesterow

New tech, old challenges.

Developing agents isn't too hard. Most of them start as a simple loop with a tool registry and some sugar.

Honestly, I don't understand why some software devs are proud of reverse-engineering clawbot or similar achievements.

There are some challenges different from "classic" software dev. Most of them are related to fighting non-deterministic behavior in a more broader sense than we used to. The other issues are domain specific.

Everything else is pretty much the same but In some cases it's more engaging because this field is not yet fully explored.

For example, I am currently playing with long-term memory. Do you think RAG techniques are all explored? Nope. The keyword is "techniques." 99% info about RAG are basic prompt augmentation examples.

Agent Engineering? - The same "classic" problems.
If you have a simple agent loop, you get a simple solution. If you add "sub-agents", parallel execution, and more than one consumer then the complexity grows by the factors the same way.

Use cases and applications? I know at least 10 commercialy viable. Though, we need this bubble to pop before discussing them seriously.

Published: Mar 17, 2026, 21:22
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0

Anton Nesterov @nesterow

Most VibeCoders Need Therapy

On one hand vibecoding is an addiction. Software engineers get dopamine from solving problems, so now when it seems that some problems are easier to solve, people try to achieve more. In other words - achieve results by volume and by trying all ideas they had.

On the other hand the bad consequence of this is pure FOMO that fuels a horrible rat race. A silent competition among people who dont know wtf they are trying to achieve, but need to achieve something anyway.

And oh boy, I see how It is exploited by every AI company. I mean, we have a new type of the addicts who babysit agents and get dopamine hits when they make something work. Uff.

I use agents. Your dog uses agents. Everyone uses agents.

I wonder why for some people it got old in a month, and why the others got themselves addicted and lost sleep over it.

Working with agents is similar to directing people. You can achieve a lot of results in both cases, but only in one case you are proud of achieving something no one needs.

Published: Mar 16, 2026, 10:42*
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0
Edit timeline:: Edited Mar 16, 2026, 16:33; Published Mar 16, 2026, 10:42

Anton Nesterov @nesterow

Entitled People.

Usually I approach what is happening with humor because I accepted that there are many things out of my control. People's behavior is one of them. You can influence it in some cases but you cannot control it.

For example, the OpenSource now after experiencing a flood of entitled people. It is not limited to tech space, but the people involved in OpenSource started to feel it.

Let me tell you a story.
5 years ago I was a lead engineer at iCode - a software agency specialized in Odoo ERP.
We hired a new "junior developer". He passed all the filters and seemed like a reasonable guy who understood the industry. I gave him the first task which involved customizing the invoicing module for our client. Nothing hard - adding a couple of fields in db and some simple logic in python. He asked me "why didn't our software have settings for it" - he couldn't find them. As his senior I felt that maybe he misinterpreted the task, and after a 15 min conversation I understood that he did not have a concept of what a "developer" should do. He imagined that his job was to click buttons in the settings view and the rest should be done by the "vendors". He had some experience in the industry btw, but lied about the details. How did he pass the filters? Well, it turned out he had someone else to write code on first interviews, and he had someone to help him with other stages. After this fraud was exposed I started to participate in the interviews and omg it was a circus. I've never seen as many entitled and delusional people in my life. It was a valuable experience for me because it was the first time I interacted with so many people outside of my bubble. Yes, it turned out we all live in different bubbles.

I've heard that a tech blogger had an experiment asking people if they could land a plane on their own. A good kind of question to filter idiots if you ask me. I have a couple of similar captchas - one of them checks what people believe and how they interpret certain situations because there are trained idiots who may pass a "plain landing scenario" but still take other things at face value.

Now back to the OpenSource.

Such people associate OSS with the word "free" imagining that it's produced by GitHub, like the kids who think that the food comes from supermarket. If you think they'd believe your reasoning about an effort required to produce something valuable - you're wrong. Man, It's the meme-coin crowd.

Also, trying to solve automation issues with quantity over quality seems to be a failing experiment. It appears most people ask ai to solve the same problems. I predict it'll just result in a stack overflow 2.0 with some sugar to avoid duplicating the same shit.

At this point the LLMs give a performance boost only to the professionals. For everyone else it's something that reinforces their delusions and prevents them from becoming professionals.

Of course, it is not limited to our field but we have a good case study before our eyes.

Published: Mar 14, 2026, 19:58*
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0
Edit timeline:: Edited Mar 15, 2026, 23:41; Edited Mar 14, 2026, 20:29; Edited Mar 14, 2026, 20:24; Edited Mar 14, 2026, 20:20; Published Mar 14, 2026, 19:58

Anton Nesterov @nesterow

Speculative Decoding is actually an interesting performance optimisation technique.

IDK how well it scales. For example if we'd want to do partial inference on the client device. Or more importantly - how it affects accuracy and bias.

Locally on CPU I got good results testing on qwen coder - ~ 70% performance wise.

So in theory, if I run a 4b quantized model locally and use a remote API for decoding, It could actually add some value.
Especially if we have a model optimized for decoding.

🤔

Published: Mar 12, 2026, 09:05*
Visibility: Public

Language: English
Replies: 0
Favourites: 1
Reblogs: 0
Edit timeline:: Edited Mar 12, 2026, 09:06; Published Mar 12, 2026, 09:05

Anton Nesterov @nesterow

MCPs.

The main argument against using MCPs usually sounds like "MCPs just pollute context. The LLMs can generate and execute code anyway and are getting better at it, so let the agents do just that."

Sounds reasonable. But.

When an agent writes code or shell commands it has to do more turns sending the same prompt while debugging a solution, so you don't really save on tokens or context lengths.

And It won't get better with time. The debug loop is a part of the process. Environments are different and the same LLM may operate in different sandboxes. So you have two choices: fine-tune for particular shell configuration or feed it a long prompt explaining specifics.

So in my opinion we cannot go without MCPs yet. Especially for small local models. Unless by a miracle, someone invents an instant reinforcement learning technique.

#ai #mcp #llms

Published: Mar 10, 2026, 21:14
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0

Anton Nesterov @nesterow

Agent skills? How about playbooks?

I borrowed the playbooks concept from Ansible and created a self-improving agent for OpenCode with RAG and long term memory. Initially I created it for infra and process management, but actually It can do everything.

What is a playbook in this agent context? It's the muscle memory. Instead of repeating debug loops everytime the agent can reuse playbooks and actions. RAG techniques allow to save context and input tokens.

Currently testing it on small local models with ollama and the execution part works flawlessly. You need a larger model to write and debug playbooks though.

Self-improving aspects are actually simple. There are two memory layers: Playbooks and Long Term Memory. Long term memory is shared between sessions and the agent decides when it needs to record memories (it's a skill). So everything is reinforced by RAG.

The other aspect is somewhat related to RL. The LLMs API providers may collect the agent execution loops and improve their models, but in this case you can collect personalized data for your org or yourself and finetune models for your experiences.

Overall, after developing and using my own agents, I’ve come to believe they are the future of user experience. Especially if you use voice input.

Everything is in alpha, and things move fast: https://vski.sh/x/vish

#ansible #aiskills #opencode #deno

Published: Mar 09, 2026, 19:44*
Visibility: Public

Language: English
Replies: 0
Favourites: 1
Reblogs: 0
Edit timeline:: Edited Mar 09, 2026, 20:27; Published Mar 09, 2026, 19:44

Anton Nesterov @nesterow

Recently there was a lot of noise about digital id and age verification for using some services on the internet.

IMHO, we have to embrace it. This is more an implementation problem than anything else.
It's coming whether we want it or not. And I have a feeling if It isn't implemented most people will beg for it. I am not talking about stupid age bracket laws, but about "digital id."

In some cases we already struggle with "authenticity issues" on the internet because of AI. There are so many ai-related risks (big and small) and they can compound in a huge mess.

As a frame of reference. When was the last time you answered a phone call from an unknown caller? For the past five years, my phone accepts calls only from numbers I have in my phonebook. You know why.

Normally, I would want a decentralized world wide ledger maintained by different actors so it would be harder to exploit. We already have "agents" that can verify your id and maintain records in such ledger - banks, your gov ids, insurance companies, etc. A tech exists - we use GPG and other tools daily.

The implementation? - Register account, verify your id through an agent, get a cert, choose what info to share, sign whatever you need.

I see only several companies collecting the data and expressing ambition to take this market while the actors who have the data and should've been developing the standards together are silent.

It's coming for everyone. An average CEO has more privacy concerns than an average person and should be more concerned about it being implemented. Yet, most of them are silent. So in a couple years, don't complain that we have 5 corps dictating prices and their standards - you won't be able to accept payments w\o them.

Published: Mar 09, 2026, 11:42*
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0
Edit timeline:: Edited Mar 09, 2026, 14:45; Published Mar 09, 2026, 11:42

Anton Nesterov @nesterow

I am trying to integrate speech-to-text into my work process.

Well, the first experience is far from feeling like Iron Man. But there is a huge productivity boost anyway. In ~85% cases I never touched the mouse. And even started to use it for other things like browsing web.

The main issues?

1. All STT tools use a clipboard pasting text under cursor. The process is relatively good when you have good keybindings to move around, otherwise it might be confusing.

2. Sometimes SST models spit out shit. Usually because of my accent, and sometimes because I use a lightweight model on cpu. This is 15% of cases when I actually have to edit what I dictated.

Here's what I use:
1. Handy with Moonshine Base model. It's actually fast on cpu: https://github.com/cjpais/Handy (https://cjpais.com/)
2. Zed Editor with Vim mode enabled
3. Opencode in terminal tabs
4. A couple of global keybindings to call things like search prompt in browser

In terms of experience I liked Super-STT because it's made for my desktop env, but Handy has faster models for CPU. If you use Cosmic/PopOS with a GPU, check it out: https://github.com/jorge-menjivar/super-stt

Published: Mar 07, 2026, 15:30*
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0
Edit timeline:: Edited Mar 07, 2026, 15:37; Edited Mar 07, 2026, 15:35; Edited Mar 07, 2026, 15:33; Edited Mar 07, 2026, 15:32; Published Mar 07, 2026, 15:30

Anton Nesterov @nesterow

Hey Vibecoder 👋

Suffering from attention disorder?
Making stupid mistakes you'd never make before?
Missing important details?

Learn Vim.
Turn on Vim mode in your editor. Work with text using vim motions. It keeps your focus where it should be.

It's amazing how fast our brains adapt to things that conserve energy. I've noticed that even some smartest people start to over-delegate their responsibilities to agents.

Over-delegation is a bad thing and usually it results in big fuckups.

Published: Mar 07, 2026, 12:59
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0

Anton Nesterov @nesterow

Massive kudos to Jorge Menjivar for his work on super-stt for Cosmic - it works without rituals. @hidden_layerss

https://github.com/jorge-menjivar/super-stt

#cosmic #popos #linux #stt

Published: Mar 06, 2026, 18:31*
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0
Edit timeline:: Edited Mar 07, 2026, 10:33; Published Mar 06, 2026, 18:31

Anton Nesterov @nesterow

Some good software engineers always say that you can do everything with Postgres. Actually, you can do everything with SQLite. No BS. True multitenancy, sometimes better performance, and is super easy to extend. One DB per microservice? How about one DB per user or agent?

A year ago, I found PocketBase and after some thought, of course, I decided to build my own thing. VSKI - my own swiss knife.

Why? Because, there are real-world use cases for ai agents. And you can achieve the same results w\o cloud infra.

https://vski.ai

Published: Mar 05, 2026, 11:06*
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0
Edit timeline:: Edited Mar 05, 2026, 12:07; Edited Mar 05, 2026, 11:08; Edited Mar 05, 2026, 11:07; Edited Mar 05, 2026, 11:07; Published Mar 05, 2026, 11:06

Anton Nesterov @nesterow

There is one huge problem with the current RAM crisis and another after. I predict that consumer devices will get dumber in five to seven years. Chromebook-style products will dominate the market, and anything more powerful will become a privilege.

Call me crazy, but the trend is clear. I don't see how else these investments in datacenters will get a return.

We all want to own our hardware and run AI models on-premises. The technology exists, the methods to make it cheaper exist, and the opportunities to create new markets exist. In a perfect world, that’s where we’d be going. But we are on a different path...

Oh, and I won't be surprised if in 2030 you need a licence to buy a GPU.

Published: Mar 04, 2026, 18:41*
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0
Edit timeline:: Edited Mar 04, 2026, 18:57; Published Mar 04, 2026, 18:41

Anton Nesterov @nesterow

Hello World! 🎉

Toggle visibility

Hello Fediverse!

My anti-social corner; a feed without slope.

Here I'll post updates on my projects and sometimes share my thoughts. Maybe a bit of self-indulgence.

I am mostly about tech.
I’m not a social person, but some things happening now in the tech are too disturbing to ignore. Disturbing because we finally have the tools to back our intent with action.

I have been in tech my whole life, so I see clearly the intent behind some people actions. I believe now It’s time to run from centralized platforms. I am still figuring how...

Published: Mar 01, 2026, 16:21*
Visibility: Public

Language: English
Replies: 0
Favourites: 1
Reblogs: 0
Edit timeline:: Edited Mar 01, 2026, 20:12; Published Mar 01, 2026, 16:21

About nesterow

Recent posts

Profile for nesterow

About nesterow

Fields

Bio

Stats

Recent posts