Inspiration

Twitter has had a big problem with identifying and acting on abuse and harassment at the scale that they operate. It's completely impractical to use human power alone to catalogue and moderate every Twitter message. We wanted to see if we could use machine learning to contribute to society in a positive way by automating the process of flagging abuse to ease the burden on the human moderators.

What it does

It uses natural language processing to detect and respond to malicious/abusive messages sent on Discord. The bot will respond in the chat if detects abuse. It will also keep track of the number of abusive messages per user, which you can then inquire about.

How we built it

We created a Discord bot and set up a server to communicate with it using a Python wrapper for the Discord API. The server uses scikit-learn, TextBlob, and nltk for natural language processing and was trained on a data set of hundreds of MySpace posts/messages that we sanitized and labeled as abusive or not. It then uses it's learned model to return true if a new chat message is determined to be abusive, and false if not.

Challenges we ran into

Finding good data was extremely difficult. The data we found was catalogued per thread, and not per message. It was also full of duplicates, and written in XML. We created scripts that parsed the XML files for messages and dumped them into CSV files. We then had to catalogue every one of the 800 or so messages ourselves, which for natural language processing is still a very small dataset.

Accomplishments that we're proud of

We're very proud that the natural language processing successfully identified and responded to abusive language in some cases given our limited data set. We're also proud of making the Discord bot easily repurpose-able for other applications.

What we learned

We learned how to make and communicate with chat bots and work with asynchronous code in Python. Additionally, we learned a lot about natural language preprocessing, and how to parse and process data in file formats like XML and CSV.

What's next for SJWBot

We want to collect better and more relevant data, improve upon our machine learning algorithm, and give more functionality to the bot like notifying chat admins of abuse.

Built With

Share this project:

Updates