Pack & Go - A Multiagent Voice-to-Text Flight Search Agent

Inspiration

Rana actually started it with her real life painful experience of booking flight to Egypt. She needs to travel back and forth a lot, and it is quite hectic to get a perfect flight under short amount of time period. So she had this idea of building a voice to text flight search agent that can easily take her request intelligently and search in the background. We also just did the Kaggle Google ADK capstone for this use case. But the difference for this project is ARM device like a smart watch or tablet is perfect for personal use in terms of flight search. In terms implementation, we original integrated an MCP server for voice to text through Google ADK and LangGraph for flight search, and but for the tablet we think it is better to adopt light weight frameworks to call vtt applications and flight search API requests locally directly.

What it does

A voice-activated flight search system that enables users to search for real flights using natural speech. Users speak their travel requirements (like "Find a round trip from Atlanta to New York, December 1st to 15th, two adults, economy"), and the system transcribes the audio using Whisper AI, extracts structured flight parameters using Gemini AI, and searches real-time flight data via the Amadeus API. It's designed to improve accessibility for users with motor impairments or visual disabilities who have difficulty typing or using traditional search interfaces.

How we built it

We built a three-agent architecture using heterogeneous technologies connected via the Model Context Protocol (MCP). The voice recognition agent runs as a Rust-based MCP server using Whisper.cpp for fast, hardware-accelerated speech-to-text. A Python MCP client communicates with this server over JSON-RPC 2.0, passing transcribed text to an interpreter agent (Gemini 2.0 Flash) that extracts structured parameters like IATA codes and ISO dates. Finally, an executor agent uses LangGraph to orchestrate the Amadeus flight search API and return formatted results. The system supports cross-platform deployment including macOS, Linux, Windows, and ARM tablets via Termux. This project was built by Claude Code.

Challenges we ran into

The biggest challenge was integrating Rust and Python across process boundaries using the MCP protocol—debugging JSON-RPC communication and ensuring proper error propagation required careful protocol design. We encountered API deprecation mid-development when Google's google-generativeai package was deprecated, forcing migration to the modern google-genai API across all files. Voice quality proved difficult with varying microphone inputs and background noise, requiring extensive timeout tuning and model selection (settling on ggml-base.en.bin for the best speed/accuracy tradeoff). Path configuration bugs caused "model not found" errors until we corrected default paths for cross-platform compatibility. Finally, handling Amadeus API error codes (38194 for invalid airports, 477 for date formats) required robust error extraction from nested response bodies to provide actionable user feedback.

Accomplishments that we're proud of

We are able to migrate a comprehensive vtt searching agent to an appropriate edge device that serves its purpose, in a concise three tier agentic architecture. We also tested out building application with AI Assistance like Claude Code.

What we learned

We learned that building multi-agent systems across different programming languages requires careful protocol design—using MCP (Model Context Protocol) taught us how JSON-RPC 2.0 can bridge Rust's performance advantages for audio processing with Python's rich AI ecosystem. We discovered that API deprecation is a real concern in rapidly evolving AI frameworks, forcing us to stay adaptable and maintain migration documentation. Voice recognition quality depends heavily on the right balance between model size and speed—ggml-base.en.bin proved optimal at 142MB rather than tiny (75MB, poor accuracy) or small (466MB, too slow). We learned that hardware acceleration (Metal on macOS, CUDA on Linux) dramatically improves transcription speed, making real-time voice interaction feasible. Error handling in multi-tier systems needs to be robust at every layer—extracting meaningful error codes from nested API responses required defensive programming with proper error propagation. Finally, we learned that accessibility-focused design benefits everyone: voice input initially built for users with motor impairments proved valuable for hands-free operation and natural interaction patterns across all user types.

What's next for Pack & Go - A Multiagent Voice-to-Text Flight Search Agent

To make the light weight framework, we removed a lot of agentic parts to have the program operates efficiently in a tablet. We would think about the design more carefully in how to optimize the agent orchestration if we were to use agents.

Built With

amadeus
android
applemetalgpu
claudecode
gemini
googleadk
googlegenaiapi
json-rpc2.0
langgraph
macos
mcp
python
termux
whisper

Updates

Fan Wu started this project — Dec 04, 2025 01:09 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.