landing
analysis
about

Track: Social Impact

Inspiration

The increasing prevalence of misinformation on the internet in recent years has highlighted the necessity of building new tools to combat this spread. Inspired in part by GLTR, we decided to leverage natural language processing to attempt to develop a solution to this issue.

What it does

Our project is a (nearly) pure Python webapp that leverages machine learning libraries to parse text inputs for signs of algorithmic generation in real time. With a clean, minimalistic UI and a simple API, isitabot has one goal: get out of your way and let you fact check, fast.

How we built it

isitabot utilizes the GPT family of transformers to determine the likelihood that a given snippet of text was generated by a bot. The input is tokenized and checked against outputs from GPT-2 and other models, creating a probability matrix that reflects the likelihood of each token having been selected by the models. This probability matrix is then processed by our internal analytics (currently mostly statistical modeling, with a neural net set to take over in the near future) to determine the likelihood that the text as a whole was algorithmically generated.

Originally hosted on Heroku, the heavy resource demands of our natural language processing models necessitated a transition to a locally hosted solution for the duration of this hackathon, as we were unable to set up a reverse proxy while on school wifi or pay for the necessary cloud resources on this timespan.

Challenges we ran into

Due to our limited time and computing resources, we were unable to train our model as much as we would have liked--as a result, we ended up with a mixture of machine learning models and statistical analysis rather than a pure ML approach. Furthermore, our dependence on the GPT family means that our accuracy tends to suffer when confronted with text generated by models with differing architectures, such as BERT.

Accomplishments that we're proud of

Our team had very little to no experience with machine learning prior to this project, so we are very proud of the progress we were able to make in such a short period of time on Natural Language Processing. Additionally, we are very happy with our ability to offer programmatic support via our API for text analysis.

What we learned

We learned a lot about how data-hungry machine learning models are, as well as the relative scarcity of well-labeled datasets regarding natural language processing. Additionally, we all improved our project management and collaboration skills over the course of this project, setting us up for future success on similar endeavors in the future.

What's next for isitabot

We would like to incorporate external APIs from social media sites such as Reddit and Twitter, allowing users to input a username and target site and determine the likelihood that the owner of that username is a bot. Depending on feasibility, we would also like to extend our ML transformer base to include transformers outside of the GPT family

Built With

Submitted to

HackCU 8

Created by

I worked on the front-end, and with contributions to both data collection and DevOps tasks.

Badr Choubai
From a background of zero knowledge of AI or Machine Learning, I created the custom module that takes a user's sample text and compares it to various pretrained GPT models. This module calculates the probabilities for an AI/ML model to have generated each word. These probability distributions are used to decide isitabot.

georgej1144 Johnson
I worked on the analysis of the probability matrix resulting from our pretrained model comparison, as well as developing the overall architecture of the ML approach we took.

Maxwell Banks

Updates

Badr Choubai started this project — Mar 06, 2022 01:44 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.