MAXX | Devpost

GIF
demo

Introduction

MAXX is an automated web scraper that analyzes digital expression (i.e. Twitter posts) for linguistic patterns related to extreme states of mind which may present a risk to self (mental health) or others (violence, terrorism).

Inspiration

With 126 million daily users, Twitter has become not only one of the world's largest social media platforms, but also a breeding ground for extremism and violence.

Last month, we were horrified to learn that Connor Betts, the gunman responsible for the Dayton shooting, "liked" several tweets about the antifa movement and the El Paso shooting only hours before. His previous tweets also advocated for the beheading of oil oligarchs to counter climate change encouraged people to "buy a gun and learn to use it responsibly" (Washington Post).

By using natural language processing to analyze Twitter data, we seek to respond preemptively to emerging crises, provide user insight, and, ultimately, save lives.

What MAXX does

compiles tweets from the user account
provides a risk assessment of the user (weighted sentiment) using Google Cloud’s Natural Language API
presents data analytics with Bokeh on user dashboard

How we built it

We built MAXX primarily with Python. We used Twint to scrape tweets from each account and analyzed the tweets with Google Cloud’s Natural Language API. Google’s Sentiment Analysis feature identifies the prevailing emotion in a text to label the text as positive, negative, or neutral ('score') on a -1.0 to 1.0 scale. This API also measures the relative strength of the emotional content present ('magnitude'). We multiplied the score by the magnitude to assign a weighted value for a tweet's positivity/negativity.

To showcase our data, we created charts and graphs with Bokeh, a Python data visualization library. For each Twitter profile, we created graphs for: weighted sentiment over time; total sentiment (sum of all sentiment scores); negative vs. positive outliers; average sentiment and average magnitude.

Challenges we ran into

interpreting sentiment analysis scores: identifying outliers in our data, 0s in sentiment scores, variants in text length, etc.
UX-UI - we initially tried Flask, but we had trouble integrating our functions into web form
Accomplishments that we're proud of

As a team, we are most proud of experimenting with new tools and applying data science and machine learning to tangible, real-world issues. We are all relative beginners to hackathons: Zain had never used Bokeh before, but he learned Bokeh in the course of the hackathon and ultimately created all of our graphs.

What we learned

integrating various open-source solutions: Twint, Bokeh, Google Cloud API, and BeautifulSoup (though we didn't end up using this in our final version)
using code repositories: GitHub, Fork
collaborative brainstorming - combining different ideas to reach consensus

What's next for MAXX

In the future, we would love to improve usability (perhaps develop this idea into app form), improve our natural language processing algorithm to monitor and flag specific keywords, analyze followers and user interests, and create a sentiment web to show how ideas spread. Moreover, we would like to experiment with different language processing APIs, including Microsoft Azure and Amazon Comprehend.

Authors

Michaela Lozada [mlozada626@gmail.com]
Zain Ali [zain08816@gmail.com]
Kirtan Patel [kirt9911@gmail.com]
Linda Tong [kaihua.linda.tong@gmail.com]

Acknowledgements

We would like to thank the organizers of PennApps for an incredible learning experience and to the mentors (especially Paul, Saniyah, and Ryan of Google Cloud Platforms) for their guidance - we could not have done it without you!

Built With

Submitted to

PennApps XX

Created by

I worked on the back end to find and create a scraper tool with BeautifulSoup for compiling and cleaning the Twitter data. I also created a prototype for the front end with Flask

Linda Tong
For MAXX I mostly worked on the implementation of the ideas into code. I worked with the GCP Natural Language API and Twint to generate the sentiments and weights for the tweets. I also created the data visualization using bokeh to create graphs. I used the data obtained by GCP's Sentiment Analysis to extrapolate various data points to better help analyze the social media pages.

Zain Ali
I am came up with the idea for this project - "MAXX" (Monitored Analysis for eXtreme eXpressions). I also helped design the algorithm, find the API tools we could use, I researched those tools, which involved having in-person discussions with their company representatives, I helped to program and code the project, I helped debug, I helped create the data analysis and visualization, I came up with the project acronym and what it stands for, I designed and created the visual presentation, and I completed much of the descriptive writing for the project.

Michaela Lozada
I worked on integrating twint with our application and getting relevant data from it to send to the GCP Natural Language API. Initially, I suggested to use Flask and helped work on the prototype.

Kirtan Patel