Annotating for agents
AI coding tools have gotten increasingly good at understanding visual feedback. Some take an all-in-one approach: v0, Lovable, Bolt, and Cursor's visual editor let you select elements directly and iterate on what you see. Others, like Claude Code, are more flexible but rely on screenshots and descriptions.
There's often a tradeoff: point at elements but stay locked in, or use any tool but describe everything in text. This problem becomes especially obvious when working on animations and interactions, which is what got me thinking about how to give better visual feedback to agents in general.
Bridging the gap
When you describe a problem in words, precision gets lost. "The button hover feels sluggish": which part? The delay before it starts? The duration? The easing? "The sidebar looks off": which element? What's off about it? I know exactly what I mean when I see it, but translating that into text loses information.1
The agent has to guess which element you mean, search for it in the codebase, and hope it found the right one. The more specific your feedback, the less guesswork for the agent, but specificity is tedious to type out. Screenshots are only somewhat better because the agent still has to infer which part you're referring to.
Precision tends to decrease as feedback moves from observation to description.
The harder something is to describe, the more helpful it is to just point at it.
A different approach
For my own projects, I've been working on a tool to help with this. It overlays directly on localhost, letting me click on any element (or pause an animation to catch a mid-transition state) and leave a note.
The trick is what it captures alongside that note: class names, selectors, positions, and element context. When I'm done, it generates agent-agnostic markdown I can paste into Claude Code, Cursor, or whatever I'm using.2 The agent can grep for the exact selector instead of guessing which "blue button" I meant.
With this approach, I can give feedback the way I naturally think: visually and in real time. Notice how the notes in the demos above are short. "Slow this down." "Make this more rounded." When you can point directly at something, you don't need to laboriously specify which button or which spinner. The context is already captured. These small differences add up to a process that feels more creative and less cumbersome.
What I see gets converted into what the agent needs automatically. It's a tighter feedback loop that feels far more collaborative.
Closing thoughts
The core insight: pointing beats describing.
The specific implementation will evolve, but I think the underlying principle will remain true. Any approach that lets you react naturally to what you see, while capturing context automatically, is going to produce better results.
Footnotes
-
Style guidelines help here. The
.mdfiles that capture design preferences give agents a baseline. But guidelines describe principles; feedback addresses instances. ↩ -
There are other tools pointing in this direction. Vercel's toolbar lets you annotate previews. React Grab does element selection. Figma MCP connects designs to code. What I wanted was something focused purely on feedback: capture what you're looking at, hand it off to whatever you're already using. ↩