SJWBot | Devpost

Full example

Inspiration

Twitter has had a big problem with identifying and acting on abuse and harassment at the scale that they operate. It's completely impractical to use human power alone to catalogue and moderate every Twitter message. We wanted to see if we could use machine learning to contribute to society in a positive way by automating the process of flagging abuse to ease the burden on the human moderators.

What it does

It uses natural language processing to detect and respond to malicious/abusive messages sent on Discord. The bot will respond in the chat if detects abuse. It will also keep track of the number of abusive messages per user, which you can then inquire about.

How we built it

We created a Discord bot and set up a server to communicate with it using a Python wrapper for the Discord API. The server uses scikit-learn, TextBlob, and nltk for natural language processing and was trained on a data set of hundreds of MySpace posts/messages that we sanitized and labeled as abusive or not. It then uses it's learned model to return true if a new chat message is determined to be abusive, and false if not.

Challenges we ran into

Finding good data was extremely difficult. The data we found was catalogued per thread, and not per message. It was also full of duplicates, and written in XML. We created scripts that parsed the XML files for messages and dumped them into CSV files. We then had to catalogue every one of the 800 or so messages ourselves, which for natural language processing is still a very small dataset.

Accomplishments that we're proud of

We're very proud that the natural language processing successfully identified and responded to abusive language in some cases given our limited data set. We're also proud of making the Discord bot easily repurpose-able for other applications.

What we learned

We learned how to make and communicate with chat bots and work with asynchronous code in Python. Additionally, we learned a lot about natural language preprocessing, and how to parse and process data in file formats like XML and CSV.

What's next for SJWBot

We want to collect better and more relevant data, improve upon our machine learning algorithm, and give more functionality to the bot like notifying chat admins of abuse.

Built With

discord-api
nltk
python
scikit-learn
textblob
websockets

Submitted to

BrickHack 4

Created by

I managed/oversaw both sides of the project. Providing assistance to both the Discord bot programming and the natural language side of things. I designed and wrote an api class stub to bridge the gap between the two. And like everybody, I labelled tons of test data.

Brian Ferch
I worked on the natural language processing backend of the bot. I sanitized and processed the data sets using machine learning and natural language processing techniques. I also labelled lots of test data.

Sam Koved
I was the primary developer of the Discord chat bot functionality. I used the Discord API to monitor messages as they were sent to a channel and to notify the users of abusive language. I also used the Discord API to scan through previously sent messages in the channel and recognize each abusive message as such. Labeling test data was also a role which each of us took on individually.

Chris Glasheen

Updates

Brian Ferch started this project — Jan 28, 2018 11:25 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.