Inspiration
Video editing is powerful but intimidating. Professional tools like Premiere Pro have steep learning curves that lock out creators with great ideas but limited time. We asked: what if editing was as simple as describing what you want? What if AI could handle the technical complexity while you focus on creativity?
What it does
CutOS is an AI-first video editor that understands natural language. Type "split this clip at 10 seconds and add a vintage effect" and it happens instantly. Upload a video, describe your vision, and watch our Agent split clips, apply effects, remove green screens, dub into 29 languages, isolate voices, and create AI-powered morph transitions. It combines professional multi-track editing with conversational AI assistance, making video editing accessible to everyone.
How we built it
- Frontend: Next.js 16 with React 19, Tailwind CSS, and WebGL for GPU-accelerated effects
- AI Stack: OpenAI GPT-4o for the editing agent, TwelveLabs Marengo 3.0 for semantic video search, ElevenLabs for AI dubbing/voice isolation, and Kling API for morph transitions
- Backend: Supabase for auth, storage, and real-time sync
- Architecture: Built a tool-based agent system where GPT-4o analyzes timeline state and executes editing operations through structured tools, with streaming responses for real-time feedback
Challenges we ran into
- Performance: Real-time video preview with effects was initially laggy. Solved with WebGL shaders and RAF-based rendering optimization, separating visual feedback from state updates
- AI Context Management: The agent needed to understand complex timeline states. Built a dynamic system prompt that includes all clips, positions, and effects in every request
- Async AI Operations: Dubbing and voice isolation take 30-60 seconds. Implemented proper async handling with upload status tracking and automatic timeline replacement
- Tool Chaining: Teaching the AI to execute multiple operations in sequence ("split at 5s, apply noir, move to track 2") required careful prompt engineering and action parsing
Accomplishments that we're proud of
- Zero Learning Curve: Complete beginners can edit videos by just describing what they want
- True Multi-Modal AI: Combines vision (video search), audio (dubbing/isolation), and generation (morph transitions) in one seamless workflow
- Professional Features: Despite being AI-powered, it's a real NLE with multi-track timeline, effects, trim handles, and magnetic snapping
- Instant Gratification: From drag-and-drop to AI commands, every interaction feels immediate and responsive
- 29-Language Dubbing: Preserving emotion and timing while translating, democratizing global content creation
What we learned
- AI UX Design: The best AI tools don't feel like chatbots, they feel like magic. Streaming responses, optimistic updates, and automatic replacements make AI feel instantaneous
- Prompt Engineering at Scale: System prompts need to be dynamic and context-aware. We learned to balance completeness with token efficiency
- WebGL Optimization: GPU-accelerated effects are essential for real-time video editing in the browser
- User Intent Parsing: Natural language is ambiguous, designing tools that handle edge cases ("split this" when there are 5 clips) required thoughtful defaults and clarification flows
What's next for CutOS
- AI Video Generation: Generate B-roll, transitions, and effects from text descriptions
- Collaborative Editing: Real-time multi-user editing powered by Supabase
- Auto-Enhance: "Make this look cinematic" should intelligently analyze footage and apply color grading, cuts, and pacing
- Export to Social: One-click optimization for TikTok, Instagram, YouTube with platform-specific formatting
- Voice Cloning: Dub videos while preserving the original speaker's voice characteristics
- Mobile App: Bring AI-powered editing to iOS and Android with the same conversational interface



Log in or sign up for Devpost to join the conversation.