ChatVid AI is a multimodal, Gemini-powered video analysis system that lets you interact with YouTube videos like never before. Paste a link, ask questions, get timestamped breakdowns, and even search for visual content within the video — all using Google's Gemini API and multimodal RAG techniques.
- 🔗 Paste YouTube Link – Upload any public YouTube URL
- 🧠 Chat with the Video – Ask questions like "What's happening at 1:23?" or "Summarize the ending"
- 🕓 Timestamped Summaries – Get structured section breakdowns with clickable timestamps
- 🔍 Visual Search – Ask queries like "Where's the red car?" and get frames + timestamps
- 🧠 Gemini RAG + Embeddings – Uses Gemini's multimodal capabilities for captioning, retrieval, and generation
- 🔐 User-Passed API Key – API key is provided by the user via the frontend for privacy and scalability
- 🎨 Elegant Pistachio Theme – Beautiful, modern UI with smooth animations, glassmorphism effects, and responsive design
| Layer | Tool / Library |
|---|---|
| Backend | Python, FastAPI, youtube-transcript-api, Gemini SDK |
| Frontend | Next.js, TypeScript, Tailwind CSS, Custom CSS |
| UI/UX | Pistachio theme, Glassmorphism, Smooth animations |
| LLM | Gemini API (text, image, video understanding) |
| Embeddings | Gemini multimodal embedding API |
| Retrieval | In-memory vector matching (cosine similarity, no DB) |
ChatVid-AI/
├── server/ # Python backend (FastAPI)
│ ├── main.py # FastAPI routes and server setup
│ ├── transcript.py # YouTube transcript extraction
│ ├── gemini_utils.py # Gemini API integration for chat, embeddings, and RAG functionality
│ ├── sectioning.py # Video section analysis and timestamp generation
│ ├── visual_search.py # Frame extraction and visual content search
│ └── frames/ # Temporary storage for video frames
├── client/ # Next.js frontend (TypeScript + Tailwind)
│ ├── app/ # Next.js App Router pages
│ │ ├── layout.tsx # Root layout with metadata and global styles
│ │ ├── page.tsx # Home page with pistachio theme: YouTube URL + API Key input
│ │ └── chat/ # Chat interface route
│ │ └── page.tsx # Video player + multi-panel chat interface with pistachio theme
│ ├── components/ # Reusable React components
│ │ ├── ChatBox.tsx # Multi-panel chat interface with pistachio theme, visual search, and animations
│ │ ├── SectionList.tsx # Timestamped video sections with hover effects and inline styling
│ │ ├── VideoPlayer.tsx # YouTube player component
│ ├── lib/ # Utility functions and API clients
│ ├── public/ # Static assets (logos, architecture diagrams)
│ ├── styles/ # Global styles with pistachio theme variables and custom effects
│ ├── next-env.d.ts # TypeScript environment declaration file
│ ├── next.config.js # Next.js configuration
│ ├── package-lock.json # Dependency lock file
│ ├── package.json # Frontend dependencies and scripts
│ ├── postcss.config.js # PostCSS configuration
│ ├── tailwind.config.ts # Tailwind CSS configuration
│ └── tsconfig.json # TypeScript configuration
├── .gitignore # Git ignore rules
├── requirements.txt # Python dependencies
└── README.md # Project documentation
- Python 3.8+
- Node.js 16+ and npm
- Gemini API key
-
Clone the repository
git clone https://github.com/MisbahAN/ChatVid-AI.git cd ChatVid-AI -
Create and activate virtual environment
# macOS/Linux python3 -m venv venv source venv/bin/activate # Windows python -m venv venv source venv/Scripts/activate
-
Install Python dependencies
pip install -r requirements.txt cd server -
Run the backend server
# Make sure you're in the server directory uvicorn main:app --reload --port 8000The backend will be available at
http://localhost:8000
-
Install Node.js dependencies
# Navigate to the client directory cd client # Install dependencies. Ensure that the necessary configuration files (next-env.d.ts, next.config.js, postcss.config.js, tailwind.config.ts, tsconfig.json) are present in the 'client' directory from the repository clone. The 'npm install' command reads the package.json file, installs the required packages, and generates the package-lock.json file. npm install
-
Run the frontend development server
# Make sure you're in the client directory npm run devThe frontend will be available at
http://localhost:3000
main.py: FastAPI application setup, route definitions, and API endpointstranscript.py: Handles YouTube video transcript extractiongemini_utils.py: Gemini API integration for chat, embeddings, and RAG functionalitysectioning.py: Video content analysis and timestamped section generationvisual_search.py: Frame extraction and visual content search implementationframes/: Temporary storage directory for extracted video frames
app/: Next.js 13+ app directory containing page components and layoutslayout.tsx: Root layout with metadata and global stylespage.tsx: Home page with YouTube URL and API key inputchat/page.tsx: Video player and chat interface
components/: Reusable React componentsChatBox.tsx: Multi-panel chat interface with pistachio theme, visual search, Q&A chat, and smooth animationsSectionList.tsx: Displays timestamped video sections with hover effects and inline timestamp stylingVideoPlayer.tsx: YouTube player component
lib/: Utility functions and API clients for backend communicationpublic/: Static assets including logos and architecture diagramsstyles/: Global styles with pistachio theme variables, glassmorphism effects, and custom scrollbars- Configuration files:
next-env.d.ts: TypeScript environment declaration filenext.config.js: Next.js configurationpackage-lock.json: Dependency lock filepackage.json: Frontend dependencies and scriptspostcss.config.js: PostCSS configurationtailwind.config.ts: Tailwind CSS configurationtsconfig.json: TypeScript configuration
| Step | Tech / API used |
|---|---|
| Transcript Extraction | youtube-transcript-api |
| Section Breakdown | Gemini (text generation on transcript) |
| Chat with Video (RAG) | Gemini (text generation with transcript context) |
| Visual Search | Gemini's frame embedding + cosine similarity |
- main.py: FastAPI routes for video processing, chat, and visual search
- transcript.py: Handles YouTube video transcript extraction
- gemini_utils.py: Manages Gemini API interactions for chat, embeddings, and RAG functionality
- sectioning.py: Analyzes video content and generates timestamped sections
- visual_search.py: Extracts frames and performs visual content search
-
URL Input (
/):- Elegant pistachio-themed YouTube URL input form with glassmorphism effects
- Gemini API key setup with floating animations (stored in localStorage)
- Smooth transitions and hover effects
- Redirects to chat page with video ID
-
Chat Interface (
/chat):- Embedded YouTube player with responsive design
- Three-panel layout with pistachio gradient backgrounds
- Section summaries with animated timestamps and hover effects
- Chat input for video questions with smooth interactions
- Visual search input for frame queries with elegant styling
- Extract transcript from YouTube link
- Generate Gemini-based section summaries with timestamps
- Implement chat with video using RAG (retrieve + generate)
- Visual scene search: embed frames and match with user text
- Project Setup: Next.js + TypeScript + Tailwind CSS
- User Inputs: YouTube URL + Gemini API Key
- Video Display + Section Summaries
- Chat Interface (Transcript Q&A)
- Visual Search (Semantic Frame Search)
- Crazy good UI Design (Pistachio theme with glassmorphism completed)
- Deploy on vercel to misbahan.com
- Loading states and error handling
- Chat history persistence
- Speaker detection
- Frame thumbnails preview
- Upload local MP4 videos
- Export highlights / timestamps
- Cross-video multimodal search
Misbah Ahmed Nauman
- 🌐 Portfolio
- 🛠️ Built during Headstarter SWE Residency Sprint 2