WLIT - What Language Is This?

WLIT is a beginner hackathon project created to detect what language a given input is written in, using artificial intelligence. The languages that it can currently detect are English, French, Spanish, Portuguese, Italian, German, Swedish, Danish, Dutch, Russian, and Greek. The languages it can currently detect are limited by the dataset we used, as well as the modules we implemented.

Dataset

The dataset (https://www.kaggle.com/datasets/basilb2s/language-detection) that we chose for this project is the language detection dataset by Basil Saji from Kaggle. According to Basil he "Collected the data from Wikipedia by scraping using BeautifulSoup python library".

Methodology

To read and manipulate our data, we used the pandas module. To filter out unwanted characters, we employed the re module. We used the spaCy module, that provides us with information about each language, such as the stopwords, lemmatization, and punctuation. Using the spaCy modules, we removed any punctuation and stopwords ("the", "a"), as well as lemmatized ("is" -> "be", "going" -> "go") each word. After finishing pre-processing, we used sklearn to convert individual words into a vector counterpart. We further used sklearn to identify which language our input is written in. Overall, the project was coded using Python on Google Colabs.

What we learned

This was our first time working with machine learning and AI, and we learned many AI concepts as well as implementation of technologies such as spaCy, pandas, and sklearn.

Challenges we ran into

We originally had a difficult time understanding some concepts of artificial intelligence, which we overcame by studying online resources together and asking our mentors. On the technical side, we ran into multiple errors, where our dataset was returning NaN values, which we fixed by tracing down the error using debugging and the proper application of removing unwanted characters.

Accomplishments that we're proud of

We're proud of learning so much about AI in such a short amount of time, and also finishing our project. We now have a keen interest in AI and will continue learning more about it!

What's next for WLIT

In the future, we hope to expand to more languages, with better accuracy with implementations such as detecting spelling errors.

Built With

Share this project:

Updates