๐Ÿง  About the Project

*Autonomous UX Experimentation

๐ŸŒฑ Inspiration

The project was inspired by a recurring pattern in modern development: UX problems are usually discovered late, diagnosed manually, and fixed reactively. Teams rely on guesswork, scattered feedback, and slow iteration cycles.

I wanted to explore a world where UX improvement could be proactive and autonomous โ€” where agents could experience an interface the way a human would, understand what feels confusing or inefficient, and then repair the issues directly in code.

Instead of โ€œtesting variants,โ€ the system acts more like an AI-powered UX engineer that continuously explores a product, identifies friction points, and generates improved versions in isolated environments.


๐Ÿš€ What It Does

The platform introduces an entirely new workflow:

  • Browser-use agents interact with your application as if they were real users, discovering friction and surfacing usability gaps.
  • Anthropic Claude Code autonomously implements UX improvements inside isolated workspaces.
  • Daytona sandboxes spin up clean, ephemeral environments for every improvement idea.
  • Gemini evaluates behaviors and analyzes agent logs to produce structured insight.
  • Inngest orchestrates the multi-step pipeline reliably and in parallel.

The end result is a system that transforms a GitHub repository and a UX problem into multiple improved, auto-implemented versions of your application, each fully deployed and tested.


๐Ÿงฉ How It Was Built

Building this required combining several complex technologies into one pipeline:

  • Daytona SDK for on-demand cloud developer environments
  • Claude Code Agent SDK for autonomous codebase modifications
  • Browser-use SDK for realistic, task-based agent sessions
  • Google Gemini for log analysis and insight extraction
  • Inngest for orchestrating asynchronous jobs and multi-agent workflows
  • Next.js + Bun + Elysia + PostgreSQL for the control plane, API, and frontend

The biggest challenge was ensuring that all three agents โ€” the browser explorer, the code engineer, and the orchestrator โ€” communicated cleanly and could operate in parallel without interfering with each other.


๐Ÿ” What I Learned

Building an autonomous UX engineering system taught me:

  • How to design multi-agent workflows where each agent contributes a different layer of reasoning or capability
  • How to use Daytona effectively for large-scale, parallel sandbox creation
  • How to guide Claude Code to make reliable, traceable, and reversible code changes
  • How to extract actionable insight from unstructured browser logs using Gemini
  • How to handle state, orchestration, and error recovery using Inngest
  • How to produce a workflow that feels like a new type of intelligence layer on top of a codebase

๐Ÿ”ง Technical Innovation

This is a self-improving UX engineering system, capable of:

  • Understanding UX problems through autonomous exploration
  • Generating multiple possible improvements
  • Implementing and deploying each improvement without human intervention
  • Verifying that each improvement actually works through agent-based testing
  • Producing isolated previews for developers to inspect

The innovation lies in closing the loop: exploration โ†’ insight โ†’ code change โ†’ deployment โ†’ evaluation.

Most tools solve one piece of that loop; this platform solves the entire chain end-to-end.


๐Ÿ† Challenges

  1. Coordinating asynchronous agent activity with consistent state
  2. Keeping sandboxed development servers alive using PM2 inside Daytona
  3. Building a bi-directional webhook system for Claude Code to report back
  4. Ensuring browser agents behaved naturally and not like brittle scripts
  5. Designing an audit trail so every agent action is transparent and traceable

๐Ÿ”ฎ Future Directions

  • Continuous โ€œwatch modeโ€ where agents re-check UX after every commit
  • Automatic PR creation for improvements that pass evaluation
  • Visual regression and accessibility scanning
  • Cross-browser automatic comparison
  • Multi-step user journey evaluation (e.g., onboarding โ†’ checkout โ†’ review)

Built by SATHVIK VEMPATI Powered by Daytona, Claude Code, Browser-use, Gemini, and Inngest

Built With

Share this project:

Updates