Instant Clips | Devpost

Completed 3-clip long videos with AI generated sound effects and narration.

Inspiration

The inspiration for our AI movie trailer workflow were the sights on the UC San Diego campus where fog enveloped the campus and you could no longer see more than 20 feet led us to believe we had special footage that would be useful for the spooky theme. Additionally the time of the year with Halloween coming up and the prompt of Cerebral Hack Hackathon.

What it does

Our solution leverages AI to streamline the production of a Halloween-themed horror trailer, addressing the inefficiencies and time-consuming nature of traditional trailer creation. By integrating Twelve Labs' video foundation models and Eleven Labs' sound generation capabilities, we automate several key tasks in the trailer-making process. This includes automated footage selection using the Marengo2.6 model, generative text capabilities of the Pegasus1 model, and AI-generated sound effects and narration. These features significantly reduce manual effort, allowing creators to focus on storytelling and artistic expression. The use of ffmpeg for audio mixing and video assembly ensures a polished final product, enhancing both the visual and auditory experience of the trailer. Overall, our solution empowers creators to produce high-quality, engaging trailers more efficiently and effectively.

How we built it

Setup and Initialization The project begins by importing necessary libraries and setting up API keys for Twelve Labs and Eleven Labs. The Twelve Labs client is initialized with the provided API keys, allowing access to various functionalities such as searching for stock footage and generating text prompts. Sound Effect Generation A function is used to generate sound effects by converting text descriptions into audio files using the Eleven Labs API. The generated audio is saved to a specified output path, ensuring that the sound effects match the duration and theme of the selected video clips. Footage Selection The process involves searching for and selecting the best clips from the stock footage based on duration and confidence scores. The selected clips are then downloaded and renamed for further processing. This step ensures that only the most relevant and high-quality footage is used in the trailer. Clip Generation A function orchestrates the search for clips, generates sound prompts, and prepares the clips for editing. It handles the renaming of downloaded clips and the generation of sound effects, ensuring that each clip is accompanied by appropriate audio. Audio and Video Concatenation The script uses ffmpeg to concatenate video clips and mix audio streams. This involves combining selected video clips into a single cohesive trailer and merging generated sound effects and narration with the video. The final video and audio are then combined into a single output file, ready for the final touches. Narration Generation The script generates a narration for the trailer using the Eleven Labs API. This narration is based on a script that describes the plot of the video in an engaging manner. The generated narration is then integrated into the final video, adding a professional touch to the trailer. The script utilizes Kindo.ai for the Llama 3.1 70B Versatile model for translation of the narration script to other languages. It is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. This implementation process highlights the use of AI models and APIs to automate the production of a high-quality, engaging horror trailer. By automating repetitive tasks such as footage selection, sound effect generation, and video editing, the solution allows creators to focus more on storytelling and artistic expression. The integration of original footage and AI-generated sound effects and narration ensures a personalized and immersive final product.

Challenges we ran into

During the hackathon, we encountered several challenges that required creative problem-solving and perseverance. Initially, we struggled with making successful POST and GET requests to the Twelve Labs API for accessing stock videos. After extensive research and troubleshooting, we finally found a solution online. Midway through the hackathon, we faced a significant mental block when our ffmpeg concatenated videos lacked audio, which we overcame by taking a strategic break to refresh our minds before returning to the project with renewed focus. Additionally, we encountered an issue with concatenating video and audio streams, where the resulting video lacked audio. To resolve this, we sought advice from professionals and conducted thorough research, ultimately finding a solution to integrate the audio correctly. These challenges tested our resilience and adaptability, but we successfully overcame them through collaboration and resourcefulness.

Accomplishments that we're proud of

Seamless integration of Kindo AI, Twelve Labs, and Eleven Labs
Our utilization of filmed footage combined with our generative script to create exceptional video clips.

What we learned

Throughout the project, we learned the importance of robust error handling and the need for efficient data management to handle larger datasets. We also realized the potential of AI in enhancing creative processes, allowing us to focus more on storytelling and artistic expression. The experience highlighted the value of integrating various AI models and tools to automate repetitive tasks, ultimately improving the quality and efficiency of the production process. Overall, the project was a valuable learning experience, demonstrating how AI can be a powerful ally in creative endeavors.

Our final submission: https://youtu.be/OunRTtz-RIw

Built With

elevenlabs
ffmpeg
kindoai
m3u8-to-mp4
python
twelvelabs

Submitted to

Cerebral Beach Hacks – LA Tech Week 2024 Kickoff Hackathon
- Winner 2nd Place– Spooktacular AI Horror Trailer (Sponsored by Twelve Labs)

Created by

I worked on the main architecture, combining multiple APIs to form a seamless workflow, filmed footage to index with Twelve Labs, and generated clips for the final video instead of sleeping.

Ivan Shen
Unity Engine Game Developer | Gen AI Enthusiast
I worked on technical designing, established narrative translation API calls to Kindo.ai, and helped develop the main architecture.

Tyler Houy
Computer Engineering student @ UC San Diego interested in Hardware/Embedded Systems.
Ryan Lee

Updates

Ivan Shen started this project — Oct 13, 2024 01:23 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.