YourMoments
Inspiration
We originally wanted to make Jarvis from Iron Man, with a camera and microphone recording you all day long with the ability to assist you with anything from your day. However, we realized it would be a lot more fun to make a movie out of your day, using AI to narrate and design it with the best clips from your day. This would the the true Journal of Memories. After wearing a camera, recording what you do all day from a first person view, MyMoments will create a 1 minute movie based on video from your whole day, while a cute storyline, music, and sound effects to go along with it. Check out these fun movies from today!
https://youtu.be/kTqEuwyYcac https://youtu.be/fB9GGYUba0g https://youtu.be/7Ub9HDZG1pQ
What it does
YourMoments makes many creative decisions in the editing process of large amounts of video. Cinematography, sound effects, and visual effects are all completely decided by AI based on the audio and video content of the video. The AI also attempts to create a consistent storyline between all of the clips that it selects, resulting in a final product that may not follow the chronology of the original if reordering helps the viewer enjoy the events of the video.
How we built it
First, data is intermittently collected from the video by an image-captioning AI. Then ChatGPT uses this data to decide which clips will be included and what order they should be in. On each selected clip, a YOLO object-detector is used in conjunction with the captions in order to create a more detailed description of what is happening in each clip. OpenAI's Whisper model is used to transcribe any dialogue from the clip, which ChatGPT then uses to determine whether or not ChatGPT should create a voiceover for the clip which is dictated by ElevenAI. ChatGPT also is able to select any sound effects or filters that should overlay any given clip, and a background music for the final video. Finally, all of the selected clips are put together and displayed to the user.
Challenges we ran into
Because AIs do not always respond predictably, significant prompt engineering was required to have a consistent output format.
Accomplishments that we're proud of
This was the most entertaining piece of software that any of us have worked on. We're proud to have created a system capable of both completing the complex creative task of video editing and giving us all a good laugh!
What we learned
We learned about how to pipeline data through multiple models to create something completely new and how to effectively engineer prompts to achieve the best output.
What's next for YourMoments
In the future, we would love to implement features such as facial recognition and emotion detection, which would help enhance the AI narration and allow users could input names of people that appear often in their day-to-day lives so that they appear in the narration.
We could also add opportunities for users to provide creative input; for instance, they could provide a prompt for a specific mood or genre they want to evoke (i.e. "formal documentary about my day").
Log in or sign up for Devpost to join the conversation.