Datasets
We used two publicly available datasets:
- Formspring Labeled for Cyberbullying
- MySpace Group Data Labeled for Cyberbullying
link: http://chatcoder.com/DataDownload
What it does
A user signs up, and then sends an SMS using Twilio API. When the server receives the text, its classified and forwarded to the intended recipient.
A D3 graph accompanies the hack that visualises the user messages and updates the colours (red/green) to show if a person has committed harassment.
How it Works
We are using an SVM and an LLDA (Labelled Latent Dirichlet Allocation).
For the SVM we are using a Bag-of-Words model.
For the LLDA, we using Google's list of banned words as labels. When we get a new message we get the topic distribution for the message, and classify the message as harassment based on the sum of the topic distributions.
Challenges we ran into
Improving the accuracy for the model. We discovered that ensemble learning had the best results after continuously testing with 10-KStratified Fold.
Accomplishments that we're proud of
F1 Score: 0.663871351995
Accuracy: 0.729411764706
Precision: 0.655128205128
Recall: 0.677898550725

Log in or sign up for Devpost to join the conversation.