Track: Social Impact
Inspiration
The increasing prevalence of misinformation on the internet in recent years has highlighted the necessity of building new tools to combat this spread. Inspired in part by GLTR, we decided to leverage natural language processing to attempt to develop a solution to this issue.
What it does
Our project is a (nearly) pure Python webapp that leverages machine learning libraries to parse text inputs for signs of algorithmic generation in real time. With a clean, minimalistic UI and a simple API, isitabot has one goal: get out of your way and let you fact check, fast.
How we built it
isitabot utilizes the GPT family of transformers to determine the likelihood that a given snippet of text was generated by a bot. The input is tokenized and checked against outputs from GPT-2 and other models, creating a probability matrix that reflects the likelihood of each token having been selected by the models. This probability matrix is then processed by our internal analytics (currently mostly statistical modeling, with a neural net set to take over in the near future) to determine the likelihood that the text as a whole was algorithmically generated.
Originally hosted on Heroku, the heavy resource demands of our natural language processing models necessitated a transition to a locally hosted solution for the duration of this hackathon, as we were unable to set up a reverse proxy while on school wifi or pay for the necessary cloud resources on this timespan.
Challenges we ran into
Due to our limited time and computing resources, we were unable to train our model as much as we would have liked--as a result, we ended up with a mixture of machine learning models and statistical analysis rather than a pure ML approach. Furthermore, our dependence on the GPT family means that our accuracy tends to suffer when confronted with text generated by models with differing architectures, such as BERT.
Accomplishments that we're proud of
Our team had very little to no experience with machine learning prior to this project, so we are very proud of the progress we were able to make in such a short period of time on Natural Language Processing. Additionally, we are very happy with our ability to offer programmatic support via our API for text analysis.
What we learned
We learned a lot about how data-hungry machine learning models are, as well as the relative scarcity of well-labeled datasets regarding natural language processing. Additionally, we all improved our project management and collaboration skills over the course of this project, setting us up for future success on similar endeavors in the future.
What's next for isitabot
We would like to incorporate external APIs from social media sites such as Reddit and Twitter, allowing users to input a username and target site and determine the likelihood that the owner of that username is a bot. Depending on feasibility, we would also like to extend our ML transformer base to include transformers outside of the GPT family
Log in or sign up for Devpost to join the conversation.