Inspiration

In today’s digital age, the rise of big data and artificial intelligence has transformed the way organizations handle information. However, the more we rely on data, the greater the risk to privacy and security. Inspired by the growing need for data privacy in industries like healthcare, finance, and AI research, we set out to create a data masking tool powered by Generative AI and built using Nvidia's Workbench platform which can run entirely locally.

What it does

-Localized data masking tool using Generative AI. -Users can upload files and choose between redaction or obfuscation.
-Customizable PII masking to protect sensitive information while maintaining data integrity. -Works in 2 types of files as of now: csv / excel and pdf / text documents . -Utilizes Llama 3.2 3B on top of Nvidia Workbench to create custom obfuscation.

How we built it

We incorporated a Small Language Model (Llama 3.2 3B) to allow users to define additional PII fields specific to their data. Using NextJS, we built an intuitive frontend where users can upload files, select between redaction or obfuscation, and customize PII masking operations. On the backend, we focused on real-time processing and maintaining the original data format during obfuscation to ensure usability handled by Python and Pandas.

Challenges we ran into

One of the major challenges we faced was balancing privacy and data usability. We wanted to ensure that after masking PII, the remaining data would still be useful for analysis. Creating a tool that could handle diverse data formats—while maintaining accuracy—was another hurdle. Ensuring real-time processing without compromising performance and integrating user-defined PII detection with an SLM added additional layers of complexity. Utilizing the capabilities of NVIDIA AI Workbench was a challenge - figuring out different aspects of it and getting it to work for our use case.

What's next for Fidelius

Multi modal support and support for more types of files such as PDF, audio and images. Also utilizing a hybrid approach of NLP and SLMs to improve masking performance and reduce data modification times.

Built With

Share this project:

Updates