An AI-powered speech coaching platform.
- π₯ Video Recording: Record practice sessions using webcam or upload pre-recorded videos
- π€ Speech Transcription: Automatic transcription using ElevenLabs Speech-to-Text
- π€ AI Feedback: Comprehensive analysis using Google Gemini AI
- π Progress Tracking: View your improvement over time with detailed analytics
- π¬ Practice Scenarios: Pre-defined scenarios for job interviews, presentations, and more
- Framework: Next.js 14 (App Router)
- Language: TypeScript
- Styling: Tailwind CSS
- UI Components: shadcn/ui
- Database: MongoDB
- APIs: ElevenLabs Speech-to-Text, Google Gemini (required)
- Charts: Recharts
- Node.js 18+
- MongoDB (local or cloud instance) - Required
- ElevenLabs API key (required)
- Google Gemini API key (required)
- Clone the repository:
git clone <repository-url>
cd DeltaHacks12- Install dependencies:
npm install- Set up environment variables:
cp .env.local.example .env.localEdit .env.local and add your API keys:
MONGODB_URI: Your MongoDB connection stringELEVENLABS_API_KEY: Your ElevenLabs API keyGOOGLE_GEMINI_API_KEY: Your Google Gemini API keyNEXTAUTH_SECRET: A random secret string (generate withopenssl rand -base64 32)FFMPEG_VM_URL: (Optional) URL of FFmpeg service on VM (e.g.,http://45.77.218.210:3001)FFMPEG_API_KEY: (Optional) API key for FFmpeg VM service (must match VM service config)
- Run the development server:
npm run dev- Open http://localhost:3000 in your browser.
βββ app/
β βββ (auth)/ # Authentication pages
β β βββ login/
β β βββ register/
β βββ api/ # API routes
β β βββ upload/ # Video upload endpoint
β β βββ presage/ # Biometric processing
β β βββ whisper/ # Transcription (ElevenLabs)
β β βββ gemini/ # AI analysis
β β βββ process/ # Full pipeline
β βββ dashboard/ # User dashboard
β βββ practice/ # Practice recording page
β βββ feedback/[id]/ # Feedback report page
βββ components/
β βββ recording/ # VideoRecorder component
β βββ feedback/ # BiometricChart, SpeechAnalysis
β βββ ui/ # shadcn/ui components
βββ lib/
β βββ db/ # MongoDB connection
β βββ presage/ # Presage SDK integration (TODO)
β βββ elevenlabs/ # ElevenLabs transcription
β βββ gemini/ # Gemini analysis
βββ types/ # TypeScript type definitions
- Upload: Video is uploaded and stored (S3 or local)
- Extract Biometrics: Presage SDK processes video for biometric data
- Transcribe: ElevenLabs Speech-to-Text transcribes the audio
- Analyze: Google Gemini generates comprehensive feedback
- Display: User views detailed feedback report
The Presage SDK integration is currently using mock data. To integrate the actual Presage SDK:
- Review the TODO comments in
lib/presage/processor.ts - Install the Presage SDK package
- Update
processPresageData()function with actual SDK calls - Map Presage response format to our
BiometricDatainterface
Key questions to clarify with Presage team:
- SDK API structure and authentication
- Video format requirements
- Response data format
- Processing time estimates
- Rate limits
POST /api/upload- Upload video filePOST /api/presage- Process biometric dataPOST /api/whisper- Transcribe audio (using ElevenLabs)POST /api/gemini- Generate feedbackPOST /api/process- Run full processing pipelineGET /api/recordings- Get user's recordingsGET /api/feedback/[id]- Get feedback report
{
id: string;
email: string;
name: string;
createdAt: Date;
preferences: {
language?: string;
notifications?: boolean;
theme?: 'light' | 'dark' | 'system';
};
}{
id: string;
userId: string;
videoUrl: string;
duration: number;
status: RecordingStatus;
createdAt: Date;
}{
id: string;
recordingId: string;
heartRate: number[];
breathing: number[];
facialExpressions: FacialExpression[];
timestamps: number[];
}{
id: string;
recordingId: string;
text: string;
words: Word[];
wordTimestamps: WordTimestamp[];
metrics: SpeechMetrics;
}{
id: string;
recordingId: string;
overallScore: number;
biometricInsights: {...};
speechInsights: {...};
recommendations: Recommendation[];
}- All API routes include error handling with try-catch blocks
- Presage integration uses placeholder data until SDK is integrated
- Processing status is tracked through the pipeline
- TypeScript strict mode is enabled
- Dark mode support with system preference detection
- Real-time processing status updates with WebSockets
- Group practice sessions
- Advanced visualizations
- Mobile app optimization
- Social features and community
- Custom practice scenarios
- Export reports as PDF
MIT