Inspiration

Learning a new language often works well in theory, but breaks down in real conversations. Many of us have experienced knowing the words, yet freezing when it’s time to speak. Existing apps focus on vocabulary and grammar, but rarely help users practice natural, spoken interaction. We wanted to build something that helps people gain confidence where it matters most: real-life conversations. While the idea was inspired by bilingual environments like Montreal, we quickly realized this problem exists everywhere.

What it does

Yap2Learn is a voice-first, scenario-based language learning app that helps users practice real conversations in multiple languages. Users choose a real-life scenario, select their starting language and a target language, and speak naturally with an AI that plays a realistic role in the conversation. The AI responds in real time and provides short, in-context feedback when mistakes happen, helping users improve without breaking the flow of the conversation.

How we built it

For the frontend, we used React Native with Expo to build a cross-platform experience that works on both web and mobile. The backend was done in NodeJS with Express with connection to a MongoDB database. For AI voice capabilities, we integrated ElevenLabs for speech-to-text and text-to-speech, and used Google’s Gemini 2.5 Flash as our large language model to generate natural, scenario-aware conversations.

Challenges we ran into

We encountered unexpected AI responses during early prompt iterations, which required careful prompt engineering and refinement. Handling speech recognition errors while ensuring feedback remained helpful without disrupting the conversation also required significant tuning. Additionally, working within limited API quotas during testing introduced constraints on iteration speed and experimentation.

Accomplishments that we're proud of

  • Building a fully voice-driven language learning experience within a short hackathon timeframe
  • Creating realistic, scenario-based conversations instead of scripted dialogue
  • Delivering real-time, contextual feedback without interrupting the conversation
  • Designing the system to support multiple languages from the start

What we learned

We learned that speaking-focused language learning introduces very different challenges than text-based apps. Timing, tone, and brevity of feedback matter a lot more in spoken conversations. We also gained experience integrating speech and language models in a way that feels responsive and human, and learned how to rapidly iterate under tight time and resource constraints.

On the technical side, we gained hands-on experience with prompt engineering for the Gemini API to generate realistic, scenario-aware dialogue, as well as integrating speech-to-text and text-to-speech using ElevenLabs to enable smooth, real-time verbal interaction. We also learned best practices around securely using APIs, including managing keys properly and avoiding accidental leaks during development. Beyond the technical stack, we learned a lot about what makes language learning effective, convenient, and accessible, especially the importance of practicing real conversations in a low-pressure environment that builds confidence over time.

What's next for Yap2Learn

Expanding support for additional languages Adding typed input as an alternative way to respond to the AI Using computer vision to translate and learn from text in images Introducing tailored lessons based on user proficiency and progress Incorporating mnemonics to help users improve pronunciation and recall Add CEFR levels designations to officially be a language companion

Share this project:

Updates