Inspiration

As a developer, my browser is my workspace, and for years, it’s been in a familiar state of digital chaos. Like so many others, I found myself drowning in a sea of tabs with dozens of articles, documentation pages, and reference links, all essential at one point, but now contributing to a significant cognitive drain. Switching contexts was slow, finding information was a chore, and the sheer number of open tabs was a constant, low-level distraction.

The announcement of the Google Chrome Built-in AI Challenge 2025 was the catalyst. The promise of powerful, on-device models like Gemini Nano presented the perfect opportunity to build the tool I had always wanted: a truly intelligent, proactive, and most importantly completely private tab manager. This was the moment the idea for Tab Agent was born.

What it does

Tab Agent is a smart Chrome Extension that uses Google's on-device AI to automatically organize your tabs, reduce clutter, and help you focus on what's important. At its core, it analyzes your open tabs, groups them contextually based on your goals, and helps you decide what to keep and what to close, transforming chaos into a curated workspace.

Key Features:

  • AI-Powered Tab Grouping: It uses the Prompt API in Gemini Nano built-in model to create meaningful, intuitive groups like "Research Papers" or "Developer Docs" based on tab content and your current goal.
  • Intelligent On-Device Summarization: Tab Agent employs a two-part strategy using the Summarizer API:
    • A low-priority Background Cacher proactively summarizes inactive tabs during idle time.
    • A Just-in-Time (JIT) checker ensures your most important, high-priority tabs are summarized right before a cleaning operation, guaranteeing the AI has the best possible context.
  • Privacy-Preserving Priority Engine: It intelligently ranks tabs based on browser facts like recency and engagement, not by snooping on your content.
  • Dual Autonomy Levels: Choose between Confirm Mode, where you approve the agent's plan, or Auto Mode, for scheduled, silent cleanups that run in the background.
  • Full Cleaning History: Easily view and restore any tab from a previous cleaning session.
  • Optional Notion Integration: It can automatically save a clean, organized report of each cleaning session to a Notion database, creating a persistent archive of your work.

How I built it

The development process for Tab Agent was a sequential journey, starting with the user-facing foundation and progressively adding layers of intelligence.

I began by setting up a modern tech stack with React, TypeScript, and Vite, building the core side panel UI to list and interact with open tabs. The next crucial step was to create the privacy-first priority engine, which scored tabs based on reliable browser facts like recency and engagement. This gave the application its initial layer of "smart" awareness without touching user content.

With the foundation in place, I integrated Chrome's on-device AI. I first implemented the Summarizer API to generate on-device summaries, designing an intelligent caching system with a background worker and a Just-in-Time (JIT) process to ensure summaries were available efficiently.

This summary data then became the fuel for the core feature: the AI grouping powered by the Prompt API with Gemini Nano. I engineered a clear and resilient prompt that instructs the model to return a structured JSON array of tab groups, turning its creative power into a reliable and predictable tool.

Finally, with the AI engine complete, I built the user control systems around it: the cleaning logic, the dual autonomy modes, the scheduled alarms, and the full cleaning history panel. This layered approach ensured that each feature was built on a solid, functional foundation.

Challenges I ran into

The single greatest challenge was architecting a resilient AI agent that could operate reliably within the highly asynchronous and stateful browser environment, all while respecting the hard constraints of the on-device AI. The core problem was multi-faceted: browser tabs can be opened or closed by the user at any moment, while AI operations like summarization are slow, asynchronous tasks. This created a constant risk of the agent acting on stale data. Furthermore, the Prompt API has a strict token limit, meaning we couldn't simply send all tab data without causing it to fail.

The solution required a multi-layered architecture that I'm incredibly proud of:

  • Taming the Token Limit: My initial prompts failed when too many tabs were open. I solved this by building a multi-layered defensive system with a tiered fallback strategy. The agent gracefully degrades the amount of data it sends, making it resilient and adaptable to any situation.

  • To handle asynchronicity and prevent race conditions, I implemented a "state re-synchronization" step combining with the dual strategy of state-syncing and dynamic prompt optimisation was the key to making Tab Agent a truly reliable and powerful tool.

Accomplishments that we're proud of

  • A Truly On-Device AI Agent: I'm proud of building a fully-featured AI assistant that runs entirely on-device. It's fast, network-resilient, and respects user privacy by default, showcasing the new paradigm of client-side AI.

  • Robust and Resilient Engineering: Overcoming the token limit and race condition challenges resulted in a much more stable and reliable application. The tiered fallback system for the prompt is an accomplishment I'm particularly proud of.

  • Seamless User Experience: The dual autonomy modes and the solution for the alarm-triggered confirmation provide a polished user experience that empowers the user, whether they prefer hands-on control or hands-off automation.

What's next for Tab Agent

The foundation of Tab Agent is solid, but it's just the beginning. The future of the project is focused on evolving it from a smart utility into a truly indispensable browsing partner through three key initiatives:

  • Embrace Multimodality: Integrate the new image and audio input capabilities of the Prompt API to allow for even richer contextual grouping (e.g., grouping tabs based on product images for a shopping list).

  • Deeper Behavioral Intelligence through API Orchestration: The current priority engine is powerful but could be enhanced. The vision is to orchestrate multiple APIs to build a richer model of user intent, analyse the focus shift tendency and present them in a more advanced way. This would transform Tab Agent from an assistant into a fully customizable productivity system.

  • Codebase Refactoring for Long-Term Scalability: As the Chrome AI ecosystem grows, the agent's logic will become more complex. The next step is a dedicated refactoring phase to abstract the core decision-making engine, making it more modular and easier to integrate new AI capabilities and user-defined rules in the future.

Built With

Share this project:

Updates