Inspiration
Browsing the internet and getting the content you are looking for has actually become a challenge in today's world. Every news outlet breaks an article down into multiple pages for extra ad revenue.
What it does
We trained an ML model to power a Chrome Extension to make scrolling Facebook a better experience. The goal was to eliminate titles such as "12 reasons why you should..." and
How I built it
We utilized Python BeauitfulSoup, CheerioJs, and the Reddit API to scrape articles to train an ML model. We used a node backend to set up an API to talk to Amazon ML to determine wether or not an article is clickbait. From there it communicates with a chrome extension to pull article links and headlines while the user is scrolling facebook. After this in real time, we used Node and Cheerio to scrape the source of the article. This accomplished several things. We can combined multi page articles down to a single reading experience with the hopes of injecting the new content directly into a users Facebook feed after passing through a summarization API called summary.
Challenges I ran into
Facebook was very blocking with the scripts we could inject. No two news websites are the same so scraping the articles themselves was a large challenge. On top of that we had to compile together a dataset to train the ML model.
Accomplishments that I'm proud of
Some crazy Regex Real Time API Accurately identify clickbait articles Scraping data from news sites that were not alike.
What we learned
Amazon ML NodeJS Express jQuery Chrome API Reddit API
What's next for hackcu-clickbait
We are very close to having a product worth being very proud of that will hopefully catch some traction. We are excited to move forward with the project.
Built With
- amazon-machine-learning
- amazon-web-services
- javascript
- node.js
- python
Log in or sign up for Devpost to join the conversation.