Inspiration

As a music composer who recently started working with Vocal Synths like Hatsune Miku and Kasane Teto, I’ve struggled with making them sound natural and human-like. Proso-T is the first step to understanding what makes a singing voice sound human.

What it does

Proso-T is a data pipeline demo that extracts prosody and musical data from a vocal audio.

How we built it

Python Research Lots of Gemini

Challenges we ran into

Finding SOTA models and techniques for each task in the pipeline Building a robust pipeline with modular components and minimizing technical debt

Accomplishments that we're proud of

I’m proud of being able to get a functioning pipeline at all in such a short period, considering the number of components. I’m also proud of having the discipline to design the pipeline for long-term scalability.

What we learned

Machine learning truly is 90% data

What's next for Proso-T

As a data pipeline, this project would be used (hint hint by me) to create a singing dataset to solve more advanced prosody problems, such as modelling prosody based on lyrics and midi notes alone.

Built With

Share this project:

Updates