Proso-T | Devpost

Inspiration

As a music composer who recently started working with Vocal Synths like Hatsune Miku and Kasane Teto, I’ve struggled with making them sound natural and human-like. Proso-T is the first step to understanding what makes a singing voice sound human.

What it does

Proso-T is a data pipeline demo that extracts prosody and musical data from a vocal audio.

How we built it

Python Research Lots of Gemini

Challenges we ran into

Finding SOTA models and techniques for each task in the pipeline Building a robust pipeline with modular components and minimizing technical debt

Accomplishments that we're proud of

I’m proud of being able to get a functioning pipeline at all in such a short period, considering the number of components. I’m also proud of having the discipline to design the pipeline for long-term scalability.

What we learned

Machine learning truly is 90% data

What's next for Proso-T

As a data pipeline, this project would be used (hint hint by me) to create a singing dataset to solve more advanced prosody problems, such as modelling prosody based on lyrics and midi notes alone.

Built With

python

Updates

Archimedes Li started this project — Nov 09, 2025 08:37 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.