our why
Dialects, Lingoes, Creoles, Acrolects are more than just words, more than just languages - there are a means for cultural immersion, intangible pieces of tradition and history passed down through generations.
Remarkably two of the industry giants lag far behind - Google Translate doesn't support translations for the majority of dialects and ChatGPT's responses can be likened to a dog meowing or a cat barking.
Aiden grew up in Trinidad and Tobago, a native creole (patois) speaker; Nuween in Afghanistan making memories with his extended family in hazaragi, and Halle and Savvy though Canadian show their love and appreciation at home, in Cantonese and Mandarin, with their parents who are both 1st gen immigrants.
How can we bring dialect speakers and even non-dialect speakers alike together? How can we traverse cultures, when the infrastructure to do so isn’t up to par?
pitta-patta, our solution
Metet Pitta-Patta—an LLM-powered, voice-to-text web app designed to bridge cultural barriers and bring people together through language, no matter where they are. With our innovative dialect translation system for underrepresented minorities, we enable users to seamlessly convert between standard English and dialects. Currently, we support Trinidadian Creole as our proof of concept, with plans to expand further, championing a cause dear to all of us.
our building journey
Model: Our project is built on a Sequence-to-Sequence (Seq2Seq) model, tailored to translate Trinidadian Creole slang to English and back. The encoder compresses the input into a context vector, while the decoder generates the output sequence. We chose Long Short-Term Memory (LSTM) networks to handle the complexity of sequential data.
To prepare our data, we clean it by removing unnecessary prefixes and adding start and end tokens to guide the model. We then tokenize the text, converting words to integers and defining an out-of-vocabulary token for unknown words. Finally, we pad the sequences to ensure they’re uniform in length.
The architecture includes an embedding layer that turns words into dense vectors, capturing their meanings. As the encoder processes each word, it produces hidden states that initialize the decoder, which predicts the next word in the sequence.
Our decode_sequence() function takes care of translating Trinidadian Creole into English, generating one word at a time until it reaches the end. This allows us to create meaningful connections through language, one sentence at a time.
Frontend: The Front end was done using stream-lit.
Challenges we ran into
This was our first time using Databricks and their services - while we did get Tensorflow up, it was pretty painful to utilize spark and also attempting to run llm models within the databricks environment - we eventually abandoned that plan.
We had a bit of difficulty connecting the llm to the backend - a small chink along the way, where calling the model would always result in retraining - slight tweaks in the logic fixed this.
We had a few issues in training the llm in terms of the data format of the input - this was fixed with the explicit encoder and decoder logic
Accomplishments that we're proud of
This was our first time using streamlit to build the front-end and in the end it was done quite smoothly.
We trained an llm to recognise and complete dialect!
looking far, far, ahead
We envision an exciting timeline for Pitta-Patta. Our goal is to develop a Software Development Kit (SDK) that small translation companies can utilize, empowering them to integrate our dialect translation capabilities into their platforms. This will not only broaden access to underrepresented dialects but also elevate the importance of cultural nuances in communication.
Additionally, we plan to create a consumer-focused web app that makes our translation tools accessible to everyday users. This app will not only facilitate seamless communication but also serve as a cultural exchange platform, allowing users to explore the richness of various dialects and connect with speakers around the world. With these initiatives, we aim to inspire a new wave of cultural understanding and appreciation.
Made with coffee, red bull, and pizza.
Built With
- chatgpt
- databricks
- javascript
- keras
- python
- streamlit
- tensorflow
- youtubeapi


Log in or sign up for Devpost to join the conversation.