Skip to content

VainerAriel/DeltaHacks12

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

66 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Fluency Lab

An AI-powered speech coaching platform.

Features

  • πŸŽ₯ Video Recording: Record practice sessions using webcam or upload pre-recorded videos
  • 🎀 Speech Transcription: Automatic transcription using ElevenLabs Speech-to-Text
  • πŸ€– AI Feedback: Comprehensive analysis using Google Gemini AI
  • πŸ“ˆ Progress Tracking: View your improvement over time with detailed analytics
  • πŸ’¬ Practice Scenarios: Pre-defined scenarios for job interviews, presentations, and more

Tech Stack

  • Framework: Next.js 14 (App Router)
  • Language: TypeScript
  • Styling: Tailwind CSS
  • UI Components: shadcn/ui
  • Database: MongoDB
  • APIs: ElevenLabs Speech-to-Text, Google Gemini (required)
  • Charts: Recharts

Getting Started

Prerequisites

  • Node.js 18+
  • MongoDB (local or cloud instance) - Required
  • ElevenLabs API key (required)
  • Google Gemini API key (required)

Local Installation

  1. Clone the repository:
git clone <repository-url>
cd DeltaHacks12
  1. Install dependencies:
npm install
  1. Set up environment variables:
cp .env.local.example .env.local

Edit .env.local and add your API keys:

  • MONGODB_URI: Your MongoDB connection string
  • ELEVENLABS_API_KEY: Your ElevenLabs API key
  • GOOGLE_GEMINI_API_KEY: Your Google Gemini API key
  • NEXTAUTH_SECRET: A random secret string (generate with openssl rand -base64 32)
  • FFMPEG_VM_URL: (Optional) URL of FFmpeg service on VM (e.g., http://45.77.218.210:3001)
  • FFMPEG_API_KEY: (Optional) API key for FFmpeg VM service (must match VM service config)
  1. Run the development server:
npm run dev
  1. Open http://localhost:3000 in your browser.

Project Structure

β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ (auth)/          # Authentication pages
β”‚   β”‚   β”œβ”€β”€ login/
β”‚   β”‚   └── register/
β”‚   β”œβ”€β”€ api/             # API routes
β”‚   β”‚   β”œβ”€β”€ upload/      # Video upload endpoint
β”‚   β”‚   β”œβ”€β”€ presage/     # Biometric processing
β”‚   β”‚   β”œβ”€β”€ whisper/     # Transcription (ElevenLabs)
β”‚   β”‚   β”œβ”€β”€ gemini/      # AI analysis
β”‚   β”‚   └── process/     # Full pipeline
β”‚   β”œβ”€β”€ dashboard/       # User dashboard
β”‚   β”œβ”€β”€ practice/        # Practice recording page
β”‚   └── feedback/[id]/   # Feedback report page
β”œβ”€β”€ components/
β”‚   β”œβ”€β”€ recording/       # VideoRecorder component
β”‚   β”œβ”€β”€ feedback/        # BiometricChart, SpeechAnalysis
β”‚   └── ui/              # shadcn/ui components
β”œβ”€β”€ lib/
β”‚   β”œβ”€β”€ db/              # MongoDB connection
β”‚   β”œβ”€β”€ presage/         # Presage SDK integration (TODO)
β”‚   β”œβ”€β”€ elevenlabs/      # ElevenLabs transcription
β”‚   └── gemini/          # Gemini analysis
└── types/               # TypeScript type definitions

Processing Pipeline

  1. Upload: Video is uploaded and stored (S3 or local)
  2. Extract Biometrics: Presage SDK processes video for biometric data
  3. Transcribe: ElevenLabs Speech-to-Text transcribes the audio
  4. Analyze: Google Gemini generates comprehensive feedback
  5. Display: User views detailed feedback report

Presage Integration

The Presage SDK integration is currently using mock data. To integrate the actual Presage SDK:

  1. Review the TODO comments in lib/presage/processor.ts
  2. Install the Presage SDK package
  3. Update processPresageData() function with actual SDK calls
  4. Map Presage response format to our BiometricData interface

Key questions to clarify with Presage team:

  • SDK API structure and authentication
  • Video format requirements
  • Response data format
  • Processing time estimates
  • Rate limits

API Routes

  • POST /api/upload - Upload video file
  • POST /api/presage - Process biometric data
  • POST /api/whisper - Transcribe audio (using ElevenLabs)
  • POST /api/gemini - Generate feedback
  • POST /api/process - Run full processing pipeline
  • GET /api/recordings - Get user's recordings
  • GET /api/feedback/[id] - Get feedback report

Database Schemas

Users

{
  id: string;
  email: string;
  name: string;
  createdAt: Date;
  preferences: {
    language?: string;
    notifications?: boolean;
    theme?: 'light' | 'dark' | 'system';
  };
}

Recordings

{
  id: string;
  userId: string;
  videoUrl: string;
  duration: number;
  status: RecordingStatus;
  createdAt: Date;
}

BiometricData

{
  id: string;
  recordingId: string;
  heartRate: number[];
  breathing: number[];
  facialExpressions: FacialExpression[];
  timestamps: number[];
}

Transcriptions

{
  id: string;
  recordingId: string;
  text: string;
  words: Word[];
  wordTimestamps: WordTimestamp[];
  metrics: SpeechMetrics;
}

FeedbackReports

{
  id: string;
  recordingId: string;
  overallScore: number;
  biometricInsights: {...};
  speechInsights: {...};
  recommendations: Recommendation[];
}

Development Notes

  • All API routes include error handling with try-catch blocks
  • Presage integration uses placeholder data until SDK is integrated
  • Processing status is tracked through the pipeline
  • TypeScript strict mode is enabled
  • Dark mode support with system preference detection

Future Enhancements

  • Real-time processing status updates with WebSockets
  • Group practice sessions
  • Advanced visualizations
  • Mobile app optimization
  • Social features and community
  • Custom practice scenarios
  • Export reports as PDF

License

MIT

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •