frontend/Next.js (minimal UI)backend/FastAPI + OpenAI tool-calling agent + FFmpeg + action timeline
cd backend- Create a venv and install deps:
python -m venv venvsource venv/bin/activatepip install -r requirements.txt
- Copy env:
cp .env.example .envand setOPENAI_API_KEY - Set
JWT_SECRETin.env(for auth) - Ensure
ffmpegis installed and on your PATH - Run:
./venv/bin/python -m uvicorn app:app --reload --port 8000 --env-file .env
Endpoints:
POST /api/transform(multipart form, field namevideo)POST /api/market-research(JSON body withdescription, optionalproduct,region,goal)GET /media/original/*andGET /media/processed/*GET /media/analysis/*(JSON action timeline)POST /api/auth/registerPOST /api/auth/loginGET /api/meGET /api/videosGET /api/videos/{id}
cd frontendnpm installnpm run dev
Optional: set NEXT_PUBLIC_API_BASE in frontend/.env (default http://localhost:8000).
- The OpenAI tool-calling agent lives in
backend/ai_agents/agent.pyand always callsspeed_up_video. - The video processing logic is isolated in
backend/ai_agents/video.py. - Action timeline extraction is in
backend/ai_agents/action_timeline.pyand uses a VLM + optional audio transcription. - If you want timestamped audio segments, set
OPENAI_ASR_MODEL=whisper-1(it supportsverbose_jsonsegments). - The backend now also generates a couple of random edit variants (combos) and returns them in
variants. - For faster video processing on macOS, set
VIDEO_HWACCEL=videotoolboxandVIDEO_ENCODER=h264_videotoolbox. - You can reduce analysis cost with
ACTION_FPSandACTION_FRAME_SCALE. - GPU-heavy generative workflows live in
backend/ai_agents/generative/(background replace, object erase, text replace) and are triggered via an agent that writes job specs. - GPU dependencies for those workflows are listed in
backend/ai_agents/generative/requirements-gpu.txt. - Text overlays require an ffmpeg build with the
drawtextfilter (libfreetype).