Fluent.io | Devpost

Home

Inspiration

In our interconnected world, language remains the last great barrier to global collaboration. We've witnessed brilliant minds unable to share ideas, families struggling to stay connected across continents, and businesses losing opportunities due to communication gaps. The "Hacker-Verse" theme challenged us to transcend physical limitations. We realized that language barriers create invisible walls between people, fragmenting our digital universe into isolated linguistic islands. Fluent.io was born from a simple vision: what if everyone could communicate naturally, regardless of the language they speak?

What it does

Fluent.io is a web-based video calling platform that enables real-time multilingual conversations. Users join a video call, select their preferred language, and speak naturally. Our platform:

Translates speech in real-time: Captures speech, translates it, and synthesizes it in the listener's language
Preserves voice characteristics: Using ElevenLabs AI, we maintain the speaker's emotion, tone, and pace
Provides live subtitles: Displays real-time captions for accessibility and clarity
Supports 29+ languages: From major languages to regional dialects
Works instantly: No downloads, plugins, or setup required - just open your browser

Example: when a Spanish speaker talks, English listeners hear natural English speech in near real-time, complete with appropriate emotional inflection. It feels like everyone is speaking the same language.

How we built it

Our architecture leverages cutting-edge web technologies and AI services:

Frontend Architecture:

React 18 for responsive UI with real-time state management
Three.js for immersive visual effects aligned with the Hacker-Verse theme
Tailwind CSS for rapid, consistent styling
WebRTC for peer-to-peer video/audio streaming

Backend Systems:

Node.js with TypeScript for type-safe server development
Socket.IO for WebRTC signaling and real-time event handling
Supabase for authentication and session management

AI Translation Pipeline:

Web Speech API captures and transcribes speech in the original language
Custom translation service processes the text
ElevenLabs Conversational AI generates natural speech in the target language
WebRTC streams the translated audio to participants

The complete pipeline achieves end-to-end translation in under 300ms:

Latency_total = T_capture + T_STT + T_translate + T_TTS + T_stream

Where we optimized each component to minimize T_total < 300ms

Optimization Techniques:

Implemented audio chunking for streaming translation
Used WebWorkers to prevent UI blocking during processing
Deployed edge functions for reduced latency

Challenges we ran into

Latency Optimization

The biggest challenge was minimizing the delay between speech and translation. Initial attempts had 2-3 second delays, making conversations impossible. We solved this by:

Implementing incremental speech processing
Optimizing API calls with batching
Using predictive text completion for common phrases

Accomplishments that we're proud of

Sub-300ms translation pipeline: Achieved near-instantaneous translation that enables natural conversation
Emotional intelligence: Successfully preserved tone, emotion, and speaking style across languages
Scalable architecture: Designed system to handle multiple concurrent conversations
Accessibility focus: Implemented live captions making the platform inclusive
Zero-installation requirement: Built entirely with web technologies for instant access

What we learned

WebRTC mastery: Gained deep understanding of real-time communication protocols
AI service integration: Learned to orchestrate multiple AI services for seamless UX