Inspiration

The project was inspired by the challenge of finding sufficient voice and grammar data for less-represented languages like Sinhala in AI cloud databases. While many platforms focus on popular languages, there’s a significant gap in collecting accurate data for languages like Sinhala. This sparked the idea to develop a tool specifically for capturing this data, enabling more inclusive AI models.

What it does

bigD provides a streamlined interface for users to contribute voice and grammar data in underrepresented languages. It enables users to collect and submit this data, which can be used for improving AI predictions and enhancing language support in machine learning models. It’s a simple but powerful tool designed to fill the data gap, particularly for languages not well-represented in existing cloud platforms.

How we built it

We built bigD with a focus on usability and efficiency. The platform uses modern web technologies and a clean UI to make data collection intuitive. For voice input, we implemented real-time recording features that integrate with AI processing tools. The grammar input section provides an easy-to-use interface for text submissions, designed to help upload structured grammar data. Firebase is used for storing collected data efficiently.

Challenges we ran into

One of the biggest challenges was ensuring the platform could handle diverse linguistic structures, particularly for languages like Sinhala that don't always align with popular models trained on English. Another hurdle was implementing smooth voice collection while maintaining data accuracy and integrity during uploads.

Accomplishments that we're proud of

We’re proud of creating a platform that contributes to language diversity in AI training. By facilitating data collection for Sinhala and similar languages, bigD opens up opportunities for better, more inclusive AI models. We’re also proud of the platform’s simple and clean interface, making it accessible to users without a technical background.

What we learned

Building bigD taught us a lot about the importance of language inclusivity in AI development. We learned how challenging it can be to gather and structure voice and grammar data for languages that don’t have much support in existing platforms. This experience has deepened our understanding of data collection and AI training needs.

What's next for bigD

Our next steps include expanding the platform to support more languages and improving the AI models by feeding them more diverse data. We also plan to integrate advanced tools for analyzing the collected data in real-time and work towards partnerships with educational and research institutions to gather even more comprehensive datasets.

Built With

Share this project:

Updates