Inspiration

We were inspired by the cybersecurity theme and prompt regarding creating a basic model to analyze potentially malicious URLs.

What it does

MalShield is a machine-learning-powered tool designed to help users identify potentially malicious URLs or email content. By leveraging trained models on various online datasets, MalShield provides an assessment of whether the provided input is likely to be malicious. Additionally, the app integrates the Google Gemini API to generate a brief commentary based on the results, providing more context for users.

How we built it

We started by investigating the creation of machine learning models using Python and libraries like scikit learn. We quickly realized we needed large amounts of data to train the models we were looking to make. So our next goal was to search for credible and valid datasets available on the internet (discussed further in the challenges section). Once we found a usable dataset, we were able to train two different machine learning models using two different ML algorithms. For URL detection, we employed a Random Forest classifier to capture structural patterns and for email detection, we used a Support Vector Machine (SVM) model to distinguish subtle textual differences. At this point we split into two teams so we could simultaneously work on our frontend and backend services. For frontend, we used React to create a simple yet responsive form, from which we collected user input to be sent to the backend. For backend, we used Flask to handle requests from the frontend and loaded the trained models to classify submitted URLs and emails in real-time. Finally, we integrated Google's Gemini API in the frontend, allowing it to provide brief explanations or possibly judgements against the predictions our models made, helping accentuate the reasons why our model made its prediction. Finally, we created a web browser extension (for Google Chrome) using vanilla JavaScript which offered another point of access to our backend API and a different method of URL detection, with that being live detection during regular browsing.

Challenges we ran into

The biggest challenge we ran into was the difficulty in curating sufficiently large enough datasets of malicious or benign URLs and emails. The larger datasets available were often cluttered with broken or missing entries which made the cleaning the data a hurdle, especially since we were all new to machine learning.

Accomplishments that we're proud of

Having never touched machine learning before, we are proud of being able to create models that was accurate given the test cases we provided it (93% and 96% accuracy on testing sets for URLs and emails respectively). We are also proud of being able to create a responsive frontend that effectively communicated with the backend, when none of us were familiar with any backend frameworks.

What we learned

As amateurs in almost all of the technologies we used for this project, we learned a lot about ML, React, Flask, frontend-backend integration and creating web browser extensions.

What's next for malshield.

Our next goal for MalShield is to enhance its detection capabilities by expanding and diversifying the dataset, ensuring it stays up-to-date with evolving cyber threats. We also plan to refine our machine learning models to further improve accuracy and reduce false positives, making the tool even more reliable for users. Additionally, we aim to integrate real-time threat intelligence sources, allowing MalShield to identify new types of threats as they emerge. Finally, we’re looking to make the platform scalable, so it can serve more users and potentially be deployed as a browser extension or mobile app, making it even more accessible for everyday cybersecurity.

Share this project:

Updates