Inspiration

Behavioral interviews are stressful because you’re not just answering a question but instead you’re being evaluated on how you communicate: confidence, eye contact, posture, clarity, and pacing. Most people don’t get real practice with real feedback unless they pay for coaching, and even then it’s rarely live and measurable.

We built Nervy to make interview practice feel like a true coaching session: structured questions, timed answers, and immediate feedback on both verbal and non-verbal performance such that anyone can improve faster, with clear, actionable guidance.

What it does

Nervy is an AI-powered behavioral interview coach that helps you practice with real-time feedback.

Behavioral interview practice: Choose from 6 core topics and 20+ subtopics, then answer using STARR-style responses with a configurable question count and timer.

Real-time video analysis: During recording, Nervy analyzes eye contact, posture, facial expression, and confidence using MediaPipe + OpenCV. Scores and live feedback update in real time via WebSockets.

Audio analysis: After each answer, Nervy processes your recording to detect filler words and generate a clarity score.

AI recommendations: Your transcript + performance metrics are sent to Google Gemini to generate personalized tips and improvement plans.

Sign Captions (Accessibility): Real-time ASL-to-text captions from your webcam, supporting ASL alphabet, Yes/No, numbers, and common interview signs.

Authentication: Users can sign in/up securely through Clerk.

How we built it

We built Nervy with a clean full-stack setup optimized for real-time feedback:

Frontend: Next.js 16 + React 19 + Tailwind CSS for a fast, modern UI, integrated with Clerk for authentication (and optional Supabase client usage).

Backend: FastAPI + Uvicorn with WebSockets to stream frames and return live analysis without lag.

Real-time video analysis:

MediaPipe Face + Pose models for landmarks

OpenCV + custom logic for scoring (eye contact stability, posture alignment, expression changes, confidence signals)

Audio analysis: Post-answer processing to compute filler words and clarity scoring (ffmpeg-assisted extraction where needed).

AI layer: Google Gemini generates:

practice questions (topic/subtopic based)

personalized recommendations using transcript + scored signals

Sign captions (ASL): A separate pipeline using PyTorch + Hugging Face (ResNet / LSTM on landmarks) with hand/pose models to recognize signs and output near real-time text captions.

Challenges we ran into

Real-time performance: Streaming video frames while keeping latency low required careful WebSocket payload sizing, throttling, and efficient landmark extraction.

Scoring that feels “fair”: Translating face/pose landmarks into meaningful feedback (not noisy or overly sensitive) took iterative tuning and rule refinement.

Audio processing reliability: Different microphones and environments change results a lot, so we had to tune detection thresholds and processing steps for consistency.

ASL recognition scope: Supporting reliable sign captions under hackathon constraints meant focusing on a practical subset (alphabet, numbers, yes/no, common signs) and optimizing the pipeline for speed.

Accomplishments that we're proud of

Built an end-to-end interview coach that delivers live feedback during recording, not only after.

Integrated multi-modal analysis: video + audio + transcript + AI recommendations.

Delivered an accessibility feature many interview platforms ignore: real-time ASL captions.

Created a structured practice experience (topics/subtopics, timers, STARR prompting) that feels like a real interview session.

What we learned

Real-time systems are as much about engineering tradeoffs as they are about models — efficiency, latency, and UX matter just as much as accuracy.

“Good feedback” needs to be specific and actionable, not just a score. The best coaching is the kind users can apply immediately.

Accessibility should be built-in, not added later. Even small support features can make a huge difference for users.

What's next for Nervy

More interview modes: Industry-specific question packs (SWE, PM, Data, QA), difficulty levels, and company-style sets.

Stronger coaching loop: Progress trends over time, weekly practice plans, and targeted drills (eye contact training, filler word reduction, pacing).

Better analytics: Detailed playback with timeline markers (“you looked away here,” “filler spike here”) and downloadable reports.

Expanded ASL support: More common interview signs, better robustness across lighting/backgrounds, and personalized calibration.

Deployment hardening: Production-ready scaling for WebSocket sessions, improved security, and smoother cross-platform performance.

Built With

  • asl-sign-caption-pipeline
  • clerk-(authentication)
  • corsmiddleware
  • eslint
  • fastapi
  • ffmpeg
  • ffprobe
  • google-gemini-api
  • hugging-face-(resnet-or-lstm-on-landmarks)
  • javascript
  • librosa
  • lucide-react
  • mediapipe-face-model
  • mediapipe-hand-model-(for-asl)
  • mediapipe-pose-model
  • next.js-16
  • numpy
  • opencv
  • pathlib
  • pyaudio
  • pydub
  • python
  • pytorch
  • react-19
  • supabase-client-(optional)
  • tailwind-css
  • tensorflow
  • threading
  • typescript
  • uvicorn
  • wave
  • websocket
  • websockets
Share this project:

Updates