What it does
Our application tracks the text in a video and displays its translation.
How we built it
We input each frame into the Google Cloud Vision API, read the output, translate it and display it over the original text in the video.
Challenges we ran into
We had to figure out how to overlay text with each frame and then put these frames back together to form a cohesive video. We also had to smooth the output received from the Google API to avoid a 'flickering' effect. For the translation we had to make sure we translated one line at a time rather than one word at a time to obtain a coherent sentence.
Accomplishments that we're proud of
We are proud to have stuck with our vision throughout, overcome the challenges and accomplished the project the way we had planned it.
What's next for Dynamic Text Translator
In the future we could use the implementation of the Google API over videos to analyse text differently. Not only could it translate but also solve displayed equations, read the text out loud to caption videos for visually impaired individuals.
Log in or sign up for Devpost to join the conversation.