Inspiration
The idea began with a simple frustration: despite having access to the entire internet, search engines still require us to do all the work, clicking, reading, filtering, cross-checking, and copying results into structured notes.
We asked: What if browsing the web could feel more like working with a research assistant? An assistant that doesn’t just search, but also reads, reasons, compares, and summarizes, all in real time. That vision inspired us to build an AI agentic browser.
What We Learned
This project taught us a great deal, far beyond just code: Full-stack development: We gained hands-on experience across frontend, backend, and AI integration. Agentic AI: Understanding how autonomous agents plan and execute actions in the browser environment. Open-source libraries: The power of leveraging community-driven tools to accelerate development. Teamwork and communication: Building something complex required constant alignment, clear discussions, and shared problem-solving.
On the technical side, we even found parallels with mathematical optimization. If we represent each browser state as a node and each action (click, scroll, extract) as an edge, then browsing is essentially finding an efficient path through a graph: The challenge was making sure our agent picked the right path while avoiding loops or dead ends.
How We Built It Frontend: We used a JavaScript-based framework, Vue.js, to create an intuitive and responsive UI where users can watch the agent work in real time. Backend: We powered the app with Supabase, which provided a reliable backend-as-a-service for data storage, authentication, and real-time updates. AI Models: We integrated cutting-edge models for reasoning, browsing, and language understanding. Open-source libraries: We brought in several open-source tools for DOM parsing, speech processing, and browser automation to speed up development. This combination gave us a working prototype that is modular, scalable, and transparent.
Challenges We Faced Of course, building this wasn’t easy. Some of our biggest hurdles were: Browser embedding: We struggled to embed a live browser seamlessly into our web app. Speech integration: Implementing text-to-voice and voice-to-text in a reliable, low-latency way was tougher than expected. Complex DOM handling: The web is messy, with popups, captchas, and changing layouts, our agent had to adapt. Time pressure: As with any ambitious project, deadlines forced us to make tough tradeoffs. But through persistence, creativity, and teamwork, we were able to overcome these obstacles and bring our vision to life.
Closing Reflection This project is more than just a demo it’s a glimpse into the future of browsing. We proved to ourselves that with the right mix of AI, web technologies, and determination, we can transform the internet from a static tool into an active collaborator.
And now, we’re thrilled to share our working prototype with the world.
Built With
- browser
- docker
- gemini
- git
- huggingface
- javascript
- openai
- playwright
- python
- supabase

Log in or sign up for Devpost to join the conversation.