E-mbrella | Devpost

When a normal message is inputted, the ML model analyzes it and a green (normal) result is displayed
When a potential predatory message is inputted, the ML model analyzes it and an orange (might be predator) result is displayed
Users may flag results they believe have been misclassified and provide a reason to provide the model with more training to improve itself
When a predatory message is inputted, the ML model analyzes it and a red (predator) result is displayed

Inspiration

Online sexual predator cases prevail as social medias grew in it's popularity and demographics in this past few years. Without intensive protection and monitoring, children might fall into the carefully designed trap of the online predators. E-mbrella serves as a shield, a safety harbour for the children. It protects them from the danger of the internet and rid parents of the worries of their children being in those tricky situations.

What it does

E-mbrella is an online chat plugin that is heavily machine learning based. Equipped with a database, the plugin analyzes users' conversations and flags certain words, phrases, images, or patterns. By doing so, it sends notifications that alert users about talking with a predator and offers methods to abort the conversation.

How we built it

The machine learning was built with Python using a fine tuned version of the BERT sequence classification (natural language processing model), trained with binary data of grooming texts. The front-end web application was built with php and html. Our teammates scoured the Internet for sources of data, from online screenshots to YouTube videos.

Challenges we ran into

Machine Learning was a new concept for the entire team, so we had to navigate many unfamiliar frameworks to eventually arrive at an algorithm that worked. It was also extremely difficult to find data of this matter, likely due to the subjects being extremely sensitive. The team wound up having to manually enter and transcribe data.

Accomplishments that we're proud of

Since this project was quite a far stretch for the team, getting a predator detection algorithm working was already a huge accomplishment for us. The model can also predict a fairly accurate result with confidence levels of around 80%.

What we learned

Throughout the process of selecting a ML model, we have learned and compared the use cases between many models from linear regression to XLNet, and eventually decided that the BERT model was the best choice.