π Inspiration
Ever had a conversation with your parents or grandparents and felt like you were speaking a completely different language? Slang evolves fast, and generational gaps make it difficult to communicate naturally.
Thatβs where Driptionary comes inβa real-time speech-to-speech slang translator that bridges generations, communities, and cultures by making any slang instantly understandable.
Inspired by misunderstandings in everyday conversations and the power of AI translation, we built Driptionary to seamlessly process, translate, and play back speechβall in real time.
π§ What It Does
Driptionary is an AI-powered slang translator that takes speech input, transcribes it, translates slang to standard English, and plays it back in seconds.
β How It Works:
1οΈβ£ Record & Speak β Press the button and talk naturally.
2οΈβ£ Live Transcription β The app instantly converts speech to text using Groq.
3οΈβ£ AI-Powered Slang Translation β Driptionary detects slang and converts it using OpenAI Whisper to standard English.
4οΈβ£ Real-Time Playback β Using AI speech synthesis, the translated phrase is read aloud.
5οΈβ£ Face Recognition β The app personalizes slang translations based on the userβs vocabulary.
Whether itβs Gen Z slang, regional dialects, or niche terminology, Driptionary ensures seamless, real-time communication without misunderstandings.
π How We Built It
πΉ Backend Pipeline:
- π€ Audio Capture & Streaming β Speech is captured via WebSockets for real-time transmission.
- π Speech-to-Text Processing β Groq instantly transcribes the recorded speech.
- π Slang Detection & Translation β The text is analyzed and processed through using **OpenAI API
- π Text-to-Speech Conversion β The translated text is converted back into speech using ElevenLabs API.
- β‘ Low-Latency Audio Streaming β Optimized for real-time, near-instantaneous responses.
π¨ Frontend (Swift UI):
- Push-to-talk interface for effortless recording.
- Live transcription displaying both original & translated text.
- Playback button for listening to the translated version.
- Face recognition integration for personalized slang translations.
π§ Challenges We Ran Into
- Real-Time Audio Processing: Optimizing WebSocket audio transmission for low latency.
- Custom Slang Detection: Making translations accurate, adaptive, and user-specific.
- Balancing Speed vs Accuracy: AI models like Whisper are powerful but require efficient processing.
- Text-to-Speech Delays: Fine-tuning ElevenLabs API for natural & fast playback.
- Face Recognition Integration: Implementing lightweight yet secure user identification.
π Accomplishments That We're Proud Of
β
Live, Real-Time Slang Translation β A fully functional speech-to-speech AI that translates slang instantly.
β
Sleek, Intuitive UI β Swift UI design that ensures a frictionless user experience.
β
Personalized Slang Dictionaries β Users can train Driptionary to recognize their unique vocabulary.
β
Face Recognition for Personalization β Seamlessly retrieves slang preferences for tailored translations.
π What We Learned
π The Power of Customization: Slang is personalβallowing users to define their own slang translations made the tool far more effective.
β‘ Optimizing for Speed: We learned how to minimize latency for real-time processing while maintaining high accuracy.
π£ Balancing AI & User Control: Giving users input on their slang definitions improved engagement and accuracy.
π What's Next for Driptionary?
β
Multi-Language Support β Expand to bilingual slang translations (e.g., Spanglish, AAVE, Pidgin, etc.).
β
AR/VR Integration (Oculus Quest 2) β Bringing Driptionary into VR environments for real-time metaverse communication.
β
AI Voice Cloning for Personalized Playback β Generate translations in the user's own voice for a fully immersive experience.
Log in or sign up for Devpost to join the conversation.