Inspiration
What it does
How we built it
Speech Mate 🎙️
Speech Mate is an AI-powered public speaking coach designed to help you write, practice, and perfect your speeches. From generating an outline to analyzing your final performance, Speech Mate is your all-in-one partner for confident communication.
Key Features
- AI-Powered Speech Generation: Generate a detailed, structured speech outline in seconds. Just provide a topic, tone, audience, and desired duration, and let our AI build your speech.
- Multimodal Performance Analysis: Upload a video of your speech—along with your slides, notes, or any other materials—and receive a comprehensive, AI-driven report.
- Detailed Feedback: Get an overall score and detailed breakdowns on:
- Content & Structure: How well you stayed on topic and organized your ideas.
- Delivery & Body Language: Analysis of your posture, gestures, and eye contact.
- Vocal Variety: Feedback on your pace, pitch, tone, and intonation.
- Filler Word Counting: A precise count of "ums," "ahs," "likes," etc.
- Accent & Clarity: Constructive analysis of your pronunciation and language.
- Actionable Improvement Plan: Receive specific, actionable tips and a list of recommended YouTube videos tailored to your unique weaknesses.
- Text-to-Speech Practice: Listen to your generated speech with a realistic AI voice (powered by ElevenLabs) to practice your timing and cadence.
- Secure Authentication: User accounts are secured via Google OAuth 2.0.
How It Works
Speech Mate is a full-stack application composed of a React frontend and a Spring Boot backend.
- Authentication: The user logs in using their Google account. The Spring Boot backend uses Spring Security and OAuth2 to authenticate the user and establish a session.
- Speech Generation:
- A user fills out the "Generate Speech" form in the React UI.
- This calls the
/api/gemini/generate-outlineendpoint on the Spring Boot server. - The
GeminiServiceconstructs a detailed prompt for the Gemini 2.0 Flash model, asking it to return a structured JSON object containing a title, thesis, timed sections, talking points, and a conclusion.
- Speech Analysis:
- A user uploads a video file and/or other documents (PDFs, images).
- This calls the
/api/gemini/analyze-speechendpoint withmultipart/form-data. - The
GeminiServiceuses the powerful Gemini 2.5 Pro multimodal model. It combines the video, slides, and user-provided context (topic, audience) into a single, comprehensive prompt. - This prompt instructs the AI to act as an expert speech coach and return a highly detailed JSON object with scores, content summaries, language/accent analysis, intonation patterns, filler word counts, specific statement-by-statement feedback, and YouTube recommendations.
- The React frontend (
SpeechAnalysis.tsx) parses this complex JSON response and displays it in a user-friendly report.
- Speech Practice:
- The user can send text to the
/api/elevenlabs/text-to-speechendpoint. - The backend calls the ElevenLabs API to generate high-quality audio, which the user can play back to practice their delivery.
- The user can send text to the
Technology Stack
- Backend: Java 21, Spring Boot 3.5.7, Spring Security (OAuth2), Spring WebFlux (
WebClient) - Frontend: React 19, TypeScript, Vite, React Router, Axios, Framer Motion
- AI APIs: Google Gemini (2.5 Pro, 2.0 Flash, 2.0 Flash-Lite), ElevenLabs
- Authentication: Google OAuth 2.0
Built With
- elevenlabs
- gemini
- google-oauth
- java
- react
- springboot
- typescript
- vite
Log in or sign up for Devpost to join the conversation.