About the Project — Sullivan
Inspiration
This project started with something very personal. My grandfather has lived with hand tremors his entire life, and I’ve watched him struggle to do even the simplest things on his phone. Trying to open an app, type a short message, or order a ride could take several frustrating minutes — not because he didn’t know how, but because the interface wasn’t designed for someone with unsteady hands.
We realized that technology is supposed to make life easier, yet here it was becoming a barrier. That inspired us to ask: what if we could make the phone work for him? What if he could simply say what he wanted — “book me an Uber,” “order my medicines,” “play the news” — and the phone would take care of everything? That question led to the birth of Sullivan.
What it does
Sullivan lets users complete complex tasks on their phone using only their voice. A user can say something like “order two chicken sandwiches from Popeyes,” and Sullivan will read the screen, decide what to tap, and perform the steps automatically until the order is placed. It essentially becomes a personal digital assistant that can navigate any Android app, making smartphones truly hands-free.
How we built it
We built Sullivan in just 36 hours as a closed-loop system combining Android accessibility APIs, speech recognition, and large language model reasoning.
- Android Client: Written in Kotlin, it uses AccessibilityService to capture the full UI hierarchy, collecting text, clickable/editable flags, and on-screen positions. It also captures screenshots with the MediaProjection API for extra context. It can tap, scroll, and type using AccessibilityNodeInfo actions, making it capable of replicating real user input.
- Context Packaging: The UI tree and screenshot are serialized into a structured JSON payload and sent securely over HTTPS to a cloud endpoint. This ensures the reasoning model always sees exactly what’s on the screen.
- Backend Orchestration: Hosted on n8n Cloud, the backend webhook receives the payload, passes it to a large language model (GPT-4 or Claude Sonnet), and interprets both the user’s voice transcript and the UI state to produce an action plan.
- Closed-Loop Execution: The app executes the returned action (tap, type, scroll), refreshes the screen context, and sends it back to n8n. This loop continues until the goal is complete — for example, reaching a checkout confirmation screen.
Challenges we ran into
Performance was a major hurdle — we had to compress screenshots and minimize payload size to keep interactions fast. Many UI elements were unlabeled, which meant relying on bounding boxes or relative positions to choose the right tap target. Handling ambiguous or multi-step voice commands required careful prompt design and context passing. And, of course, fitting all of this into 36 hours required ruthless prioritization and rapid iteration.
Accomplishments that we're proud of
We are proud of building a fully working voice-to-action loop that can operate on any Android app. Seeing Sullivan tap through an ordering flow — completely hands-free — was a huge moment. We’re also proud that our project tackles a real accessibility problem and has potential to genuinely improve people’s lives.
What we learned
We learned how to tie together multiple systems — speech recognition, accessibility APIs, and LLM reasoning — into a seamless automation pipeline. We discovered how crucial it is to standardize the UI data schema so the model can reason effectively. Most importantly, we learned how much empowerment this gives users who are usually left behind by modern app design.
What's next for Sullivan
We want to make Sullivan faster, smarter, and able to handle more app types. Our next step is to integrate better OCR for unlabeled buttons, add memory so Sullivan can follow multi-step tasks across different apps, and deploy it as a lightweight service. Ultimately, we’d like to turn this into a real product — maybe even a startup — so that anyone, anywhere, can control their phone with their voice, regaining independence and dignity in the process.
Built With
- android
- android-studio
- gemini
- kotlin
- n8n
- webhook


Log in or sign up for Devpost to join the conversation.