🎤 Inspiration

Have you ever struggled to capture the true emotions behind words? Transcripts lack emotion, and text messages often fail to convey tone—leading to misunderstandings, especially for the deaf and hard of hearing. Inspired by these challenges, we created FeelSpeak, a tool that transforms speech into expressive fonts, adding a visual tone to words.

🛠 How We Built It

🎙 Speech Processing & Emotion Analysis

We leveraged Whisper AI for speech-to-text and fine-tuned a BERT-based model for emotion recognition, extracting emotional cues from transcriptions.

🎵 Audio Feature Extraction

Using Librosa & SciPy, we computed Mel spectrograms to represent audio signals visually.

  • Windowing & FFT: We applied Hann windowing and Fast Fourier Transform (FFT) for frequency domain analysis.
  • Mel Filtering: Extracted 23-band Mel spectrograms, mapping speech frequencies to human perception.

📡 Neural Processing (VIR - Voice Intelligence Recognition)

  • Processed waveform data into spectrograms, converting them to PyTorch tensors.
  • Used a CNN-based autoencoder (VAE) to capture emotional patterns.
  • Applied Lagrange interpolation to smooth out emotion fluctuations over time.

🔠 Visual Emotion Mapping

  • Mapped emotional intensity to typography using custom fonts.
  • Bold, vibrant text for excitement, soft, faded fonts for sadness, and more.

🚀 Deployment & Real-Time Interactivity

  • Flask API for backend processing and React frontend for live visualization.
  • Optimized real-time streaming with WebSockets for instant emotion feedback.

FeelSpeak transforms speech into emotions, making conversations clearer, more expressive, and accessible. 🎭✨

🚧 Challenges We Ran Into

🎭 Emotion Detection Accuracy

Training the VIR model to distinguish subtle emotional shifts was tricky. We improved accuracy by fine-tuning on a diverse emotional speech dataset.

⚡ Real-Time Processing Bottlenecks

Generating Mel spectrograms and running emotion classification in real-time caused latency issues. We optimized with batch processing and parallelized computation.

🖋 Designing Expressive Fonts

Mapping emotion to typography while maintaining readability was a challenge. We iterated on dynamic font rendering to ensure emotions were visually intuitive.

🔄 Model Generalization

Some voices were misclassified due to variations in pitch and tone. We improved this by expanding the training dataset and applying data augmentation.

Accomplishments that we're proud of

🎉 What We Learned

We realized that speech, text, and visuals must work together for true emotional expression. Designing an advanced Figma prototype helped refine user experience and emotional typography before development on HTML/CSS

Fine-tuning the VIR model highlighted the complexity of emotion detection, as pitch, pacing, and tone influence perception. Expanding our dataset and applying transfer learning significantly improved accuracy.

To ensure real-time interaction, we optimized batch processing, API requests, and parallel computing, reducing lag. These insights helped us build a seamless, expressive, and accessible platform. 🚀

Share this project:

Updates