Driptionary | Devpost

🚀 Inspiration

Ever had a conversation with your parents or grandparents and felt like you were speaking a completely different language? Slang evolves fast, and generational gaps make it difficult to communicate naturally.

That’s where Driptionary comes in—a real-time speech-to-speech slang translator that bridges generations, communities, and cultures by making any slang instantly understandable.

Inspired by misunderstandings in everyday conversations and the power of AI translation, we built Driptionary to seamlessly process, translate, and play back speech—all in real time.

🧠 What It Does

Driptionary is an AI-powered slang translator that takes speech input, transcribes it, translates slang to standard English, and plays it back in seconds.

✅ How It Works:

1️⃣ Record & Speak – Press the button and talk naturally.
2️⃣ Live Transcription – The app instantly converts speech to text using Groq.
3️⃣ AI-Powered Slang Translation – Driptionary detects slang and converts it using OpenAI Whisper to standard English.
4️⃣ Real-Time Playback – Using AI speech synthesis, the translated phrase is read aloud.
5️⃣ Face Recognition – The app personalizes slang translations based on the user’s vocabulary.

Whether it’s Gen Z slang, regional dialects, or niche terminology, Driptionary ensures seamless, real-time communication without misunderstandings.

🛠 How We Built It

🔹 Backend Pipeline:

🎤 Audio Capture & Streaming – Speech is captured via WebSockets for real-time transmission.
📝 Speech-to-Text Processing – Groq instantly transcribes the recorded speech.
🔍 Slang Detection & Translation – The text is analyzed and processed through using **OpenAI API
🎙 Text-to-Speech Conversion – The translated text is converted back into speech using ElevenLabs API.
⚡ Low-Latency Audio Streaming – Optimized for real-time, near-instantaneous responses.

🎨 Frontend (Swift UI):

Push-to-talk interface for effortless recording.
Live transcription displaying both original & translated text.
Playback button for listening to the translated version.
Face recognition integration for personalized slang translations.

🚧 Challenges We Ran Into

Real-Time Audio Processing: Optimizing WebSocket audio transmission for low latency.
Custom Slang Detection: Making translations accurate, adaptive, and user-specific.
Balancing Speed vs Accuracy: AI models like Whisper are powerful but require efficient processing.
Text-to-Speech Delays: Fine-tuning ElevenLabs API for natural & fast playback.
Face Recognition Integration: Implementing lightweight yet secure user identification.

🏆 Accomplishments That We're Proud Of

✅ Live, Real-Time Slang Translation – A fully functional speech-to-speech AI that translates slang instantly.
✅ Sleek, Intuitive UI – Swift UI design that ensures a frictionless user experience.
✅ Personalized Slang Dictionaries – Users can train Driptionary to recognize their unique vocabulary.
✅ Face Recognition for Personalization – Seamlessly retrieves slang preferences for tailored translations.

📚 What We Learned

🔍 The Power of Customization: Slang is personal—allowing users to define their own slang translations made the tool far more effective.
⚡ Optimizing for Speed: We learned how to minimize latency for real-time processing while maintaining high accuracy.
🗣 Balancing AI & User Control: Giving users input on their slang definitions improved engagement and accuracy.

🚀 What's Next for Driptionary?

✅ Multi-Language Support – Expand to bilingual slang translations (e.g., Spanglish, AAVE, Pidgin, etc.).
✅ AR/VR Integration (Oculus Quest 2) – Bringing Driptionary into VR environments for real-time metaverse communication.
✅ AI Voice Cloning for Personalized Playback – Generate translations in the user's own voice for a fully immersive experience.

Built With

elevenlabs
fastapi
groq
openai
python
socket.io
swift
websocket

Updates

Rohil Agarwal started this project — Feb 16, 2025 11:59 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.