Inspiration

What it does

How we built it

Speech Mate 🎙️

Speech Mate is an AI-powered public speaking coach designed to help you write, practice, and perfect your speeches. From generating an outline to analyzing your final performance, Speech Mate is your all-in-one partner for confident communication.

Key Features

AI-Powered Speech Generation: Generate a detailed, structured speech outline in seconds. Just provide a topic, tone, audience, and desired duration, and let our AI build your speech.
Multimodal Performance Analysis: Upload a video of your speech—along with your slides, notes, or any other materials—and receive a comprehensive, AI-driven report.
Detailed Feedback: Get an overall score and detailed breakdowns on:
- Content & Structure: How well you stayed on topic and organized your ideas.
- Delivery & Body Language: Analysis of your posture, gestures, and eye contact.
- Vocal Variety: Feedback on your pace, pitch, tone, and intonation.
- Filler Word Counting: A precise count of "ums," "ahs," "likes," etc.
- Accent & Clarity: Constructive analysis of your pronunciation and language.
Actionable Improvement Plan: Receive specific, actionable tips and a list of recommended YouTube videos tailored to your unique weaknesses.
Text-to-Speech Practice: Listen to your generated speech with a realistic AI voice (powered by ElevenLabs) to practice your timing and cadence.
Secure Authentication: User accounts are secured via Google OAuth 2.0.

How It Works

Speech Mate is a full-stack application composed of a React frontend and a Spring Boot backend.

Authentication: The user logs in using their Google account. The Spring Boot backend uses Spring Security and OAuth2 to authenticate the user and establish a session.
Speech Generation:
- A user fills out the "Generate Speech" form in the React UI.
- This calls the /api/gemini/generate-outline endpoint on the Spring Boot server.
- The GeminiService constructs a detailed prompt for the Gemini 2.0 Flash model, asking it to return a structured JSON object containing a title, thesis, timed sections, talking points, and a conclusion.
Speech Analysis:
- A user uploads a video file and/or other documents (PDFs, images).
- This calls the /api/gemini/analyze-speech endpoint with multipart/form-data.
- The GeminiService uses the powerful Gemini 2.5 Pro multimodal model. It combines the video, slides, and user-provided context (topic, audience) into a single, comprehensive prompt.
- This prompt instructs the AI to act as an expert speech coach and return a highly detailed JSON object with scores, content summaries, language/accent analysis, intonation patterns, filler word counts, specific statement-by-statement feedback, and YouTube recommendations.
- The React frontend (SpeechAnalysis.tsx) parses this complex JSON response and displays it in a user-friendly report.
Speech Practice:
- The user can send text to the /api/elevenlabs/text-to-speech endpoint.
- The backend calls the ElevenLabs API to generate high-quality audio, which the user can play back to practice their delivery.

Technology Stack

Backend: Java 21, Spring Boot 3.5.7, Spring Security (OAuth2), Spring WebFlux (WebClient)
Frontend: React 19, TypeScript, Vite, React Router, Axios, Framer Motion
AI APIs: Google Gemini (2.5 Pro, 2.0 Flash, 2.0 Flash-Lite), ElevenLabs
Authentication: Google OAuth 2.0

Built With

elevenlabs
gemini
google-oauth
java
react
springboot
typescript
vite

Updates

Uyi Nehikhuere started this project — Dec 19, 2025 09:58 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.