ChatVid AI

ChatVid AI is a multimodal, Gemini-powered video analysis system that lets you interact with YouTube videos like never before. Paste a link, ask questions, get timestamped breakdowns, and even search for visual content within the video — all using Google's Gemini API and multimodal RAG techniques.

📺 Watch Demo

🌟 Features

🔗 Paste YouTube Link – Upload any public YouTube URL
🧠 Chat with the Video – Ask questions like "What's happening at 1:23?" or "Summarize the ending"
🕓 Timestamped Summaries – Get structured section breakdowns with clickable timestamps
🔍 Visual Search – Ask queries like "Where's the red car?" and get frames + timestamps
🧠 Gemini RAG + Embeddings – Uses Gemini's multimodal capabilities for captioning, retrieval, and generation
🔐 User-Passed API Key – API key is provided by the user via the frontend for privacy and scalability
🎨 Elegant Pistachio Theme – Beautiful, modern UI with smooth animations, glassmorphism effects, and responsive design

⚙️ Tech Stack

Layer	Tool / Library
Backend	Python, FastAPI, youtube-transcript-api, Gemini SDK
Frontend	Next.js, TypeScript, Tailwind CSS, Custom CSS
UI/UX	Pistachio theme, Glassmorphism, Smooth animations
LLM	Gemini API (text, image, video understanding)
Embeddings	Gemini multimodal embedding API
Retrieval	In-memory vector matching (cosine similarity, no DB)

🛠 Project Structure

ChatVid-AI/
├── server/                   # Python backend (FastAPI)
│   ├── main.py               # FastAPI routes and server setup
│   ├── transcript.py         # YouTube transcript extraction
│   ├── gemini_utils.py       # Gemini API integration for chat, embeddings, and RAG functionality
│   ├── sectioning.py         # Video section analysis and timestamp generation
│   ├── visual_search.py      # Frame extraction and visual content search
│   └── frames/               # Temporary storage for video frames
├── client/                   # Next.js frontend (TypeScript + Tailwind)
│   ├── app/                  # Next.js App Router pages
│   │   ├── layout.tsx        # Root layout with metadata and global styles
│   │   ├── page.tsx          # Home page with pistachio theme: YouTube URL + API Key input
│   │   └── chat/             # Chat interface route
│   │       └── page.tsx      # Video player + multi-panel chat interface with pistachio theme
│   ├── components/           # Reusable React components
│   │   ├── ChatBox.tsx       # Multi-panel chat interface with pistachio theme, visual search, and animations
│   │   ├── SectionList.tsx   # Timestamped video sections with hover effects and inline styling
│   │   ├── VideoPlayer.tsx   # YouTube player component
│   ├── lib/                  # Utility functions and API clients
│   ├── public/               # Static assets (logos, architecture diagrams)
│   ├── styles/               # Global styles with pistachio theme variables and custom effects
│   ├── next-env.d.ts         # TypeScript environment declaration file
│   ├── next.config.js        # Next.js configuration
│   ├── package-lock.json     # Dependency lock file
│   ├── package.json          # Frontend dependencies and scripts
│   ├── postcss.config.js     # PostCSS configuration
│   ├── tailwind.config.ts    # Tailwind CSS configuration
│   └── tsconfig.json         # TypeScript configuration
├── .gitignore                # Git ignore rules
├── requirements.txt          # Python dependencies
└── README.md                 # Project documentation

🚀 Getting Started

Prerequisites

Python 3.8+
Node.js 16+ and npm
Gemini API key

Backend Setup

Clone the repository

git clone https://github.com/MisbahAN/ChatVid-AI.git
cd ChatVid-AI

Create and activate virtual environment

# macOS/Linux
python3 -m venv venv
source venv/bin/activate

# Windows
python -m venv venv
source venv/Scripts/activate

Install Python dependencies

pip install -r requirements.txt
cd server

Run the backend server

# Make sure you're in the server directory
uvicorn main:app --reload --port 8000

The backend will be available at http://localhost:8000

Frontend Setup

Install Node.js dependencies

# Navigate to the client directory
cd client

# Install dependencies. Ensure that the necessary configuration files (next-env.d.ts, next.config.js, postcss.config.js, tailwind.config.ts, tsconfig.json) are present in the 'client' directory from the repository clone. The 'npm install' command reads the package.json file, installs the required packages, and generates the package-lock.json file.
npm install

Run the frontend development server
```
# Make sure you're in the client directory
npm run dev
```
The frontend will be available at http://localhost:3000

File Structure and Responsibilities

Backend Files (`/server`)

main.py: FastAPI application setup, route definitions, and API endpoints
transcript.py: Handles YouTube video transcript extraction
gemini_utils.py: Gemini API integration for chat, embeddings, and RAG functionality
sectioning.py: Video content analysis and timestamped section generation
visual_search.py: Frame extraction and visual content search implementation
frames/: Temporary storage directory for extracted video frames

Frontend Files (`/client`)

app/: Next.js 13+ app directory containing page components and layouts
- layout.tsx: Root layout with metadata and global styles
- page.tsx: Home page with YouTube URL and API key input
- chat/page.tsx: Video player and chat interface
components/: Reusable React components
- ChatBox.tsx: Multi-panel chat interface with pistachio theme, visual search, Q&A chat, and smooth animations
- SectionList.tsx: Displays timestamped video sections with hover effects and inline timestamp styling
- VideoPlayer.tsx: YouTube player component
lib/: Utility functions and API clients for backend communication
public/: Static assets including logos and architecture diagrams
styles/: Global styles with pistachio theme variables, glassmorphism effects, and custom scrollbars
Configuration files:
- next-env.d.ts: TypeScript environment declaration file
- next.config.js: Next.js configuration
- package-lock.json: Dependency lock file
- package.json: Frontend dependencies and scripts
- postcss.config.js: PostCSS configuration
- tailwind.config.ts: Tailwind CSS configuration
- tsconfig.json: TypeScript configuration

🧠 How It Works

Step	Tech / API used
Transcript Extraction	youtube-transcript-api
Section Breakdown	Gemini (text generation on transcript)
Chat with Video (RAG)	Gemini (text generation with transcript context)
Visual Search	Gemini's frame embedding + cosine similarity

Backend Components

main.py: FastAPI routes for video processing, chat, and visual search
transcript.py: Handles YouTube video transcript extraction
gemini_utils.py: Manages Gemini API interactions for chat, embeddings, and RAG functionality
sectioning.py: Analyzes video content and generates timestamped sections
visual_search.py: Extracts frames and performs visual content search

Frontend Flow

URL Input (/):
- Elegant pistachio-themed YouTube URL input form with glassmorphism effects
- Gemini API key setup with floating animations (stored in localStorage)
- Smooth transitions and hover effects
- Redirects to chat page with video ID
Chat Interface (/chat):
- Embedded YouTube player with responsive design
- Three-panel layout with pistachio gradient backgrounds
- Section summaries with animated timestamps and hover effects
- Chat input for video questions with smooth interactions
- Visual search input for frame queries with elegant styling

🗺️ Roadmap

✅ Phase 1: Backend (Completed)

Extract transcript from YouTube link
Generate Gemini-based section summaries with timestamps
Implement chat with video using RAG (retrieve + generate)
Visual scene search: embed frames and match with user text

✅ Phase 2: Frontend (Completed)

Project Setup: Next.js + TypeScript + Tailwind CSS
User Inputs: YouTube URL + Gemini API Key
Video Display + Section Summaries
Chat Interface (Transcript Q&A)
Visual Search (Semantic Frame Search)

🧪 Future Enhancements

Crazy good UI Design (Pistachio theme with glassmorphism completed)
Deploy on vercel to misbahan.com
Loading states and error handling
Chat history persistence
Speaker detection
Frame thumbnails preview
Upload local MP4 videos
Export highlights / timestamps
Cross-video multimodal search

👨‍💻 Author

Misbah Ahmed Nauman

🌐 Portfolio
🛠️ Built during Headstarter SWE Residency Sprint 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatVid AI

📺 Watch Demo

🌟 Features

⚙️ Tech Stack

🛠 Project Structure

🚀 Getting Started

Prerequisites

Backend Setup

Frontend Setup

File Structure and Responsibilities

Backend Files (`/server`)

Frontend Files (`/client`)

🧠 How It Works

Backend Components

Frontend Flow

🗺️ Roadmap

✅ Phase 1: Backend (Completed)

✅ Phase 2: Frontend (Completed)

🧪 Future Enhancements

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
client		client
server		server
.gitignore		.gitignore
ChatVid-AI.pdf		ChatVid-AI.pdf
IMPROVEMENTS.md		IMPROVEMENTS.md
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ChatVid AI

📺 Watch Demo

🌟 Features

⚙️ Tech Stack

🛠 Project Structure

🚀 Getting Started

Prerequisites

Backend Setup

Frontend Setup

File Structure and Responsibilities

Backend Files (/server)

Frontend Files (/client)

🧠 How It Works

Backend Components

Frontend Flow

🗺️ Roadmap

✅ Phase 1: Backend (Completed)

✅ Phase 2: Frontend (Completed)

🧪 Future Enhancements

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Backend Files (`/server`)

Frontend Files (`/client`)

Packages