Inspiration
As a music composer who recently started working with Vocal Synths like Hatsune Miku and Kasane Teto, I’ve struggled with making them sound natural and human-like. Proso-T is the first step to understanding what makes a singing voice sound human.
What it does
Proso-T is a data pipeline demo that extracts prosody and musical data from a vocal audio.
How we built it
Python Research Lots of Gemini
Challenges we ran into
Finding SOTA models and techniques for each task in the pipeline Building a robust pipeline with modular components and minimizing technical debt
Accomplishments that we're proud of
I’m proud of being able to get a functioning pipeline at all in such a short period, considering the number of components. I’m also proud of having the discipline to design the pipeline for long-term scalability.
What we learned
Machine learning truly is 90% data
What's next for Proso-T
As a data pipeline, this project would be used (hint hint by me) to create a singing dataset to solve more advanced prosody problems, such as modelling prosody based on lyrics and midi notes alone.
Log in or sign up for Devpost to join the conversation.