ACE Detective | Devpost

ACE Detective Logo

Inspiration

Adverse childhood experiences (ACEs) are traumatic events that occur to an individual before the age of 18. These experiences include domestic violence, mental illness, physical neglect, among others. According to the Centers of Disease Control, "Adverse childhood experiences can have a tremendous impact on future violence victimization and perpetration, and lifelong health and opportunity". The COVID-19 pandemic has accelerated the negative impacts of ACEs necessitating novel methods of identifying and intervening against them. While ACEs may be broad in characterization, distinguishing these events from normal childhood experiences has not been thoroughly explored. This distinction is necessary if intervention protocols are to be developed and implemented.

What it does

In our solution, we provide a novel prediction algorithm that can take input text depicting a childhood experience and output whether or not this experience is speaking about an ACE (in a positive or negative light). The prediction model is trained on Reddit post titles from subreddits focused on childhood experiences. This prediction model has a 90% AUROC with both an internal Reddit test set and an external Twitter dataset. We combine the outputs of this prediction algorithm with a network constructed from Twitter users. Based on the attributes of the nodes (each Twitter user) in this network and their adjacent neighbors, we will be able to identify hotspots of ACEs. Each node has four attributes: User ID, Username, Tweet Sentiment, and ACE Probability. The graph network can be visualized on a local web server. Using all this information, we will be able to identify if a user who experienced an adverse event has a community providing proper support. Using our prediction model in conjunction with social network analysis, we will be able to relay important information to professionals about an individual's experiences and community. We are able to identify cliques which have supportive individuals, who provide a positive influence to those who have experienced ACEs. In a similar light, we are able to determine locations in need of intervention where individuals who are negatively speaking about their experiences are not getting proper support. Using measures such as clustering coefficients and closeness centrality, we are able to determine the most influential individuals within a subnetwork. By analyzing the tweet content of these influential individuals, we can determine if a particular subnetwork is positively overcoming ACEs or if they need help. Individuals who have tweets that have a high ACE probability and negative sentiment are flagged for reply. Each individual is then sent a response to their tweet with resources to provide help. Other filters can be aded or modulated to decrease the false positive rates.

How we built it

Using Pushshift.io Reddit API, reddit data and metadata can be scraped real-time. Using this API, we scraped through two subreddits: r/raisedbynarcissists and r/internetparents. For acquiring Twitter information, we used two tools for different purposes. We used the Twitter Intelligence Tool (TWINT) to identify tweets from the mass public that fit our search criteria which will be explained later. We also used it to get all the tweets from a user of interest. We used Tweepy, a python wrapper for Twitter’s API, to get information about the followers of a user of interest. The algorithm we have built is two-fold. The first part of the algorithm is a natural language based prediction model trained on Reddit post titles to predict the presence of an ACE based on language used by the user. The model is built using Tensorflow and NLTK (for language cleaning). The second part of this algorithm is the construction of a graph network to visualize Twitter users and the presence of ACEs among these users. Within this graph network, each node is a twitter user designated by a unique Twitter id. Each node will have four attributes: the prediction model's results for the user's Twitter tweets, a pre-trained sentiment analysis of the tweet, the username of the individual, and lastly the hashed id. With these attributes, the localization and spread of ACEs across different communities can be visualized and proper intervention steps can be taken.

Challenges we ran into

When we initially chose to use TWINT as the api to scrape tweets we didn’t realize how unstable the current build of this tool was. We had a lot of problems trying to scrape because of parts of this tool did not work anymore. Thus we had to switch to Tweepy and apply for a developer api key. In terms of filtering through twitter data, there is a significant amount of noise in this kind of data. We tried to the best of our ability to find relevant tweets. None of our team members are familiar with front-end development, so implementing DASH to display these ACE Cluster Networks in real-time was difficult. Additionally, although we had heard of tools such as Heroku to deploy apps, this was our first time using them. The learning process was insightful as our background was more in the realm of data mining and machine learning. Moreover, filtering noise in our Twitter and Reddit scraping posed an additional challenge. It was also tricky to build the graph networks and determine which metrics would be best to evaluate the performance of our model. We recognized the power of network science but we didn’t have a background in the complex methodologies so we had to self-teach a lot of these concepts.

Accomplishments that we're proud of

We are excited by the prospect of creating a tool that can help reduce Adverse Childhood Experiences and provide victims with appropriate connections and resources. ACE support groups will better be able to reach people in their community by deploying this tool, thus improving outcomes for victims. In terms of the implementation of this tool, it was very fun to learn how to scrape tweets from twitter, respond with a bot and learning how to create an interactive visual tool for the user was very rewarding. Finally, getting to utilize heroku to publish a website was very rewarding. These skills will be useful in future projects as well.

What we learned

We learned how to use both Reddit and Twitter API to access and scrape relevant posts and tweets. Moreover, while we had some domain knowledge on NLP tasks, performing transfer learning from the Reddit trained model to Tweets was something new and innovative. Additionally, we learned how to use Dash and Heroku to deploy an app with a rapid timeline.

What's next for ACE Detector

Moving forward, there are two main venues of improvement. First, the prediction model's accuracy can be improved. Larger training data along with more clean input data can better our model's performance. On the network side, adding more parameters to each node can provide professionals with more markers to measure ACEs in social media. Further expanding the size of the network will greatly deepen our understanding of the ways in which ACEs are communicated and addressed across dispersed groups in Twitter. Moreover, finding better ways of streamlining immediate responses (with appropriate resources) to individuals who are flagged by our algorithm is essential to adapting this tool in real-time use. Additionally, we plan to apply this trained model in other forms of social media similar to Reddit.

Built With

dash
heroku
keras
networkx
nltk
pushshift
reddit
tensorflow
tweepy

Submitted to

Hacklytics 2022: Connecting Communities
- Winner 2nd Place Best Hack
- Winner Top 6 Hacks
- Winner Runner Up Healthcare Hack Powered by Anthem

Created by

Suraj Rajendran
Biomedical Engineering major currently attending Georgia Institute of Technology
Prathic Sundararajan
Georgia Institute of Technology BME Student
Ashi Awasthi
Biomedical Engineering @ Georgia Tech
Vishal Shah

Updates

Suraj Rajendran started this project — Feb 19, 2022 10:36 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.