Explainable
Inspiration
Before this hackathon, we were just two online tutors trying to be in far too many places at once. We had classes, job interviews (maybe only Eren), and lives to manage, but we still wanted our students to get patient, level-appropriate help whenever they needed it.
So at Swamphacks XI we built Explainable: an AI voice tutor that adapts its explanations to a student’s grade level and talks back in real time — a third teammate in our tutoring practice that’s always on, never tired, and actually good at explaining things at the right level.
What it does
Explainable gives students a conversational tutor that feels personal instead of generic.
We created two assistants, Alberta and Albert, named after the University of Florida’s mascots:
- Alberta works with students from kindergarten through 8th grade.
- Albert takes over for high school and college learners.
Both assistants speak using ElevenLabs voices, and their explanations are generated by the Gemini API. As students talk, we stream text and audio back so it feels like a natural conversation instead of “ask, wait, wall of text.”
We also wanted the tutor to understand more than just a spoken question. Students can attach a single PDF or image — a worksheet, a textbook page, a diagram — and the assistant uses that file as context, so instead of answering “how do integrals work?” in the abstract, it can help with that specific problem they’re staring at.
Because we were working with heavy systems we added caching, storing common questions, responses, and reusable lines in Redis via Upstash. In testing, this sped up our responses by about $30\%$, which is the difference between “wow, this is snappy” and “is it frozen?”
Finally, we really cared about accessibility. Not every student has a fast laptop and rock-solid Wi‑Fi, but a lot of them have a phone. So we used Twilio to give Explainable its own phone number that anyone can call to speak with Albert or Alberta over a regular voice call, completely offline:
888-372-7516
Dial in, follow the prompts, choose your grade level, and you’re talking to the same tutoring system — just without needing a browser!
How we built it
We assembled Explainable using a stack that let us move fast and ship something that feels like a real product.
On the frontend, we used React, TypeScript, and Vite, styled with Tailwind CSS. That said, a lot of the frontend was designed using Loveable, a service that allows us to describe what we wanted things to look like using natural language. We used Loveable to make custom components for age selection, session controls, chat history, and animated “voice orbs” so you can see the assistant “speaking.” A dedicated service handles the streaming connection to our backend, listening for a mix of text chunks and base64 audio and playing them back smoothly in the browser.
On the backend, we built a Flask application with separate routes for web tutoring and phone calls. When a student speaks, the backend:
- Receives the transcription and selected grade level.
- Optionally processes an attached PDF or image to pull out useful context.
- Sends the request to Gemini, using a grade-specific prompt so the tone and depth match the student.
- Streams Gemini’s response back as it’s generated.
- In parallel, converts those sentences into speech with ElevenLabs, then streams audio chunks back to the frontend.
Under the hood, we have a small ecosystem of services:
- A Gemini service that handles prompts, conversation history, optional document parsing, and key rotation.
- An ElevenLabs service that handles both one-shot audio and streaming text‑to‑speech.
- A cache service backed by Redis / Upstash to store common AI responses, greeting and farewell clips, and frequently reused TTS snippets.
For the phone side, we integrated Twilio Voice. Twilio calls our webhook when someone dials the number; we respond with TwiML that:
- Asks the caller to pick a grade level using their keypad.
- Plays a grade-appropriate greeting, voiced by ElevenLabs and served directly from our backend.
- Listens to their question, sends it through the same Gemini pipeline, and then plays back the answer as audio.
We store a lightweight conversation history per call so the phone experience feels like a true back-and-forth.
And because hackathon projects deserve better than “it works on my machine,” we deployed Explainable on Vercel, so judges (and students) can visit a real URL and start talking to it without setting anything up.
Challenges we ran into
For the most part, things went smoothly — but two issues really tried to derail us.
1. Rate limits and credit anxiety
We were genuinely worried about being rate‑limited by both the Gemini API and the ElevenLabs API. At one point, we were basically camping out in Fred from Major League Hacking’s Discord DMs, politely begging for more ElevenLabs credits.
To survive, we:
- Added caching for common questions and responses.
- Rotated API keys when needed.
- Reused audio for standard phrases like greetings and farewells.
It worked, but it definitely wasn’t the most relaxing part of the weekend.
2. Wrestling with Twilio
Building the phone assistant sounded straightforward. In reality, it was one of the hardest parts of the project.
We tried different telephony options, eventually settled on Twilio, set up a trial number, and wired it into our backend. Then, somehow, Twilio managed to interfere with key voice behavior in our web app, which led to a long, confusing debugging session late into the night.
Neither of us had done this exact combination of Twilio + ElevenLabs + Gemini before, so we leaned heavily on documentation, trial and error, and yes — GitHub Copilot — to navigate the integration.
Seeing the phone ring, hearing Alberta or Albert speak back, and realizing it all actually worked end‑to‑end was one of those pure hackathon joy moments.
Accomplishments we’re proud of
The big one: we didn’t just build a cool demo — we deployed a working tutoring assistant.
Historically, our hackathon projects lived on localhost and died with the closing ceremony. This time, Explainable runs on Vercel with a real frontend, backend, and phone number. Anyone can try it without installing anything.
We’re also proud that it adapts to different grade levels in a meaningful way, that both web and phone experiences feel coherent instead of like two hacks taped together, and that we built something we actually plan to use in our own tutoring work with real students.
What we learned
We treated this hackathon as a crash course in “how much new tech can we reasonably ship in a weekend?”
We walked away with hands-on experience using Redis / Upstash, a much deeper understanding of Twilio Voice, and real-world practice integrating ElevenLabs and Gemini into a production‑style flow with streaming, context, and caching.
We also learned a lot about designing educational AI: how small changes in prompts, tone, and response length can completely change how approachable an explanation feels for a 3rd grader vs. a college freshman.
What’s next for Explainable
Explainable is far from “done” — no product really is after 20–30 hours of building — but we have ideas for the future:
- Experiment with more specialized models and conversational agents (including from Hugging Face) for different subjects.
- Add more pedagogical structure, like step‑by‑step guidance, checks for understanding, and progress tracking.
- Make the multimodal side richer, with better support for walking through complex diagrams or multi‑page PDFs.
- Harden the platform for scale so more students can use it without us constantly worrying about credits and rate limits.
Our long‑term goal is simple: turn Explainable into a tutoring companion that can sit alongside a human tutor and actually make high‑quality help more accessible — whether you have a great laptop and Wi‑Fi, or just a phone and a question.

Log in or sign up for Devpost to join the conversation.