Inspiration

As avid doomscrollers (Instagram-scrolling fanatics), one thing that came to mind when we imagined a product which combined video editing and social media was the Reddit Subway Surfer videos (Reference Video Here). Unlike other forms of social media content, these videos have three prominent features: a monotone AI-generated voice, Arial text overlay, and a gaming video running in the background. But in these videos, it's not the AI voice, storyline, or text type that hooks viewers- it's the gaming video. So we thought: how can we create a new auto-captioning system which will be more interesting, more engaging, and extremely helpful to users?

What it does

Introducing Captionify, an Adobe Express Add-On that will revolutionize your marketing and video-watching game. This new feature automatically interprets audio files and stylizes your captions to reflect the tone, pitch, and words used to convey your message. This not only saves social media influencers and marketing managers from the hassle of hand-captioning videos for hours on end, but this will also be of extreme help to our fellow humans! An intensity-based captioning system will help to make videos more accessible to those who may be unable to listen to audio. Engaging and accessible captions can also improve video retention, which will help individuals build their platform. Everyone needs to start from somewhere, and with the ease of Captionify, the time can be now.

How we built it

In order to interface Whisper AI with Adobe Express, we had to process the data through a complex series. Our add-on has a file upload button, which sends the code through HTML, then Perl, then into Python, where Whisper AI is accessed, then back into Perl, HTML, and finally to Adobe.

Challenges we ran into

As an all-beginner team, we faced many difficulties but were committed to learning as much as we could and having as much fun as possible! Early on in the tournament, we realized that we needed GitHub to create our repository and utilize the terminal to update our project. However, due to our unfamiliarity with this platform, we found ourselves struggling with simple tasks such as setting up the workflow and adding files. But with the help of practice and Google, we were able to get the hang of it pretty quickly! Originally, we also planned to use Whisper AI to analyze the audio files, but we found out that Adobe is unable to support Python programs. Therefore, we had to switch gears a fourth of the way through the Hackathon. Since we didn't want to lose our progress, we devised a new architecture hierarchy: HTML --> Perl --> Python and vice versa, instead of HTML --> Python. This caused some issues early on, mainly with testing and setup. However, despite this being the most difficult part, it was the most rewarding for we learned a lot about everything. One of the last big challenges that we worked to overcome was the incompatibility issues between CORS and the AI model that we used. This required us to learn how to create a web server, which came with its own difficulties. Unfortunately, there were oftentimes connection issues between the Perl server and ngrok server. Luckily with some help from Adobe, we were able to troubleshoot the problem and successfully transcribe the audio file.

Accomplishments that we're proud of

We are proud that we could integrate Whisper AI and process that output into usable data. This step initially seemed the most difficult, but was surprisingly manageable using AI assistance and guides. Another accomplishment that we are proud of is the fact that we successfully broke down the audio files with the use of the Librosa library. This data was crucial in our dynamic font size functions.

What we learned

We learned how to integrate AI, particularly OpenAI's Whisper, into our project's functionality and how to deal with multiple unfamiliar programming languages and how they interact with each other. We learned how to use Librosa to process audio files into usable data. During the course of our project, we familiarized ourselves with GitHub and learned to avoid merging problems. While struggling with certain permission issues initially, we were able to overcome them and set up a proper collaborative environment.

What's next for Captionify

In the future, we hope to elevate our program by upgrading our program's efficiency, organization, and UI design. We would also like to integrate AI in other ways to utilize algorithms and datasets to perform a more thorough analysis on the audio file to generate even higher-quality captions. We would also include controls for our setting to make our captions more customizable so that they can best meet our audience's needs, because every word deserves to be seen and understood.

Built With

Share this project:

Updates

posted an update

Hello Judges! I just wanted to let you know that the video attached for demo does not fully include all of the features which we were hoping to implement! It is able to perform the backend programs but it was not able to fully integrate with the frontend. Attached below is also an image of the time graphs which were generated from our audio analysis!

Log in or sign up for Devpost to join the conversation.