Inspiration
In my last semester in Providence, I have desperately wanted to check walking tours of all kinds off my bucket list—historical, spooky, arts/culture, you name it! The only problem is...I cannot pay for all of these walking tours and standard audio tours often feel static and impersonal.
Enter WanderBlob! We made WanderBlob to allow you to explore any tour of any city anywhere in the world, tailored to your specific time and thematic preferences. We have done away with expensive tickets and rigid schedules. Now, it's just you, your phone, and a world of stories.
How we built it
WanderBlob is built on a sophisticated hierarchical agent architecture:
- Backend: FastAPI server orchestrates three specialized AI agents (+ an eval agent) using the Gemini 1.5 API:
- Tour Director: The meta-level orchestrator that plans the initial route and manages state transitions (e.g., from "initial" to "traveling" to "point of interest"). This system is also responsible for dynamically changing the route based on the users preferences during the tour.
- POI Narrator: The storyteller that generates rich, engaging narrations for each stop, personalized to the user's selected theme (Historical, Art, or Spooky).
- Q&A Agent: The interactive "brains" that handles user interruptions and questions through a chat interface using RAG (Retrieval-Augmented Generation) over Wikipedia.
Eval Agent: Our "LLM-as-a-judge" system that continuously monitors quality, scoring responses on metrics like Faithfulness, Answer Relevance, and Context Relevance.
Voice: We integrated ElevenLabs TTS for lifelike narration. Our next step is to enable STT for real-time user interruptions, so users can talk back directly to their guide.
Map: Our frontend is a Next.js application featuring a 3D map built with Mapbox GL JS. It tracks your location in real-time and visualizes your route dynamically.
Data: We leverage Google Maps and Google Places APIs for routing and POI discovery, which is supplemented by a custom RAG pipeline that pulls historical data (or other details) to ground the tour guide's narrations in fact.
Challenges we faced:
- Google Places Fetching: To build the route and add new places, we make use of google places api. However, directly taking the user request and using this for google places was insufficient, so we made use of gemini to parse the user request into a good request for google places. Then, we filter the results based on distance, star rating, and then test the distance added if added to the route and return this information to the user for them to choose.
- STT Voice Interruptions: Handling the "barge-in" experience—where the user speaks while the guide is talking—required fine-tuning our logic to ensure the guide stops immediately and the Q&A agent takes over gracefully.
- Designing good prompts so our tour guide sounds natural and interesting but not too overenthusiastic.
What we learned
This project was excellent practice in building out hierarchical agentic workflows that have their own evaluation systems. Instead of one giant prompt, we had to break the guide down into specialized roles to make the system more robust.
We also deepened our expertise in:
- Synthesizing live API data (Maps, Wikipedia) into a cohesive narrative in real-time
- The math behind route optimization -- we used a greedy approach to solve a variation of the Traveling Salesperson Problem to ensure our tours fit within the user's time budget. Every time a new place is added (like the user wanting to add a stop at a coffee shop), we solve the TSP for the best location to place it, and return the total added distance to the user for them to choose between options.
What's next
We want to enable STT interruptions for the users, allow images to be pulled and displayed next to each site, and explicit Google Maps routing (with directions) between each stop.
Built With
- elevenlabs
- fastapi
- gemini
- google-maps
- google-places
- nextjs
Log in or sign up for Devpost to join the conversation.