Barracua | Devpost

Inspiration

Most blind people use screen readers to navigate the web, and these are limited to reading raw text and basic page structure. We wanted to build an AI agent that goes beyond reading text and isn't robotic and constrained.

What it does

At its core, it's a Chrome extension that acts as an AI assistant for blind and low-vision users. But it's not just a chatbot - it can actually see your screen, understand what's on it, and even control your browser autonomously. It can answer questions and talk to you like a friend.

How we built it

The frontend is a Chrome Extension built on Manifest V3, giving secure access to browser APIs. Users interact through a side panel to chat, control voice input, and manage the system. A background service worker handles AI requests, routes messages, captures screenshots, and connects to Google’s Gemini API. An offscreen document securely captures microphone input, while a content script lets the AI interact with webpages, clicking buttons or typing text as needed. All components communicate through Chrome’s Runtime Message API.

Voice is powered by Deepgram. Speech-to-text uses the Nova-2 model to stream audio in real time and supports over 14 languages, while text-to-speech uses the Thalia voice from Deepgram Aura for natural, low-latency responses.

AI capabilities come from Google Gemini. In order to view the screen, the program takes screenshots and Gemini analyzes them to describe webpages conversationally. Gemini also manages contextual chat and can autonomously navigate websites using its Computer Use API, deciding on actions, executing them, and repeating until tasks are complete.

The autonomous agent runs in a Python backend that talks to the extension over WebSockets. Using Playwright, it controls Chrome while keeping user sessions active, captures screenshots, sends them to Gemini, executes instructions, and repeats until tasks are finished. This setup allows the AI to fully interact with a user’s logged-in pages.

Challenges we ran into

This was our first time building a Chrome extension, which came with platform-specific constraints, and our first time working with the Deepgram API. One of the biggest challenges was reducing end-to-end latency to make the conversation feel truly real-time.

Built With

chrome
deepgram
gemini
javascript
playwright
python

Submitted to

SB Hacks XII
- Winner Grand Prize (Third Place)

Created by

I worked on integrating Gemini’s CUA (Computer Use Agent) and a voice agent API (Deepgram / ElevenLabs) integration. Users speak commands, and the CUA autonomously executes browser interactions via playwright (headless browser tool)

.Ryan Ni
i build things sometimes
Zixiang Zheng
mayhong1 Hong
Spencer Cowles