PrivacyFilter | Devpost

About the team

We are a team of five third-year IT students from UET, led by Nguyen Tuan Duc, whose insights have greatly influenced our project. Nguyen Huy Hieu and Nguyen Thi Oanh focused on coding the application, ensuring seamless user experience and integration of the privacy protection system. Meanwhile, Nguyen Tuan Duc, Do Duc Huy, and Nguyen Tien Hung worked on AI models for detecting privacy leaks and user-controlled blurring.
Nguyen Huy Hieu and Nguyen Thi Oanh developed the core application with a robust and intuitive interface, integrating privacy features into the video upload process without compromising performance. Nguyen Tuan Duc, Do Duc Huy, and Nguyen Tien Hung worked on advanced computer vision algorithms, integrating their research into the YOLO-World model and optimizing blurring techniques to maintain video quality while obscuring sensitive information.
Together, we created a comprehensive privacy protection solution that enhances user trust and platform safety, allowing content creators to maintain their creative freedom while safeguarding their privacy.

Problem Statement

In the context of TikTok, video uploads often contain unintended privacy-sensitive information such as personal identification documents, credit card numbers, and addresses. This poses significant privacy risks for users, leading to potential misuse of personal data and breaches of privacy.
Despite existing privacy controls, TikTok lacks an efficient and dynamic system that can identify and protect against these privacy risks in real-time. Users need a solution that not only detects potential privacy leaks but also allows them to control the visibility of sensitive information without disrupting the creative flow of their content. To address these concerns, we propose a comprehensive privacy protection system for TikTok, consisting of two main components:
Privacy Leakage Detection: Real-time analysis of video frames using advanced computer vision to identify and flag sensitive information.
User-Controlled Blurring: An intuitive interface allowing users to review flagged areas and choose which elements to blur, maintaining privacy without compromising video integrity. Our goal is to enhance user trust and platform safety by integrating this solution into the video upload process, ensuring that users can confidently share their creative content without compromising their privacy.

Inspiration

Our inspiration for developing this privacy protection system on TikTok stems from a deep commitment to enhancing user trust and platform safety in the realm of social media. Recognizing the prevalent issues of unintentional privacy breaches in user-generated content, we were motivated to create a solution that empowers users to maintain control over their personal information without compromising their creative expression.
Drawing insights from real-world scenarios where sensitive information can inadvertently appear in videos, we aimed to integrate advanced computer vision technologies to detect and flag potential privacy risks in real-time. This approach not only addresses current privacy concerns but also aligns with TikTok's commitment to fostering a safe and respectful community environment.
Furthermore, our team was inspired by the opportunity to innovate within the intersection of AI, user experience design, and privacy protection. By leveraging cutting-edge algorithms and intuitive user interfaces, we strive to set a new standard in privacy management for video-sharing platforms, ensuring that users can confidently engage with TikTok while safeguarding their personal data.

What it does

To address the privacy concerns in video uploads on the platform, our solution consists of two main components: potential privacy leakage detection and user-controlled blurring post-processing. The privacy leakage detection component utilizes advanced computer vision algorithms to analyze each frame of the uploaded video. It identifies and flags areas that may contain sensitive information such as personal identification documents, credit card numbers, addresses, or other private data. This detection process is performed in real-time as the video is being uploaded.
The blurring post-processing component gives users control over their privacy. Once potential privacy risks are detected, the user is presented with an intuitive interface showing the flagged areas. They can then choose which detected elements to blur, if any. This process empowers users to make informed decisions about their content while maintaining the integrity of their videos.
Together, these components create a seamless privacy protection system that respects user autonomy while safeguarding against accidental information leakage. By integrating this solution into the video upload process, we aim to enhance user trust and platform safety without compromising the creative freedom that makes our platform unique.

How we built it

Our privacy leakage detection system leverages the latest advancements in vision grounding models, particularly the capabilities of YOLO-World for visual grounding and high inference speed. We began by conducting extensive research on common privacy leakage items in user-generated videos, such as ID cards, names and addresses on documents, and other sensitive information.
To implement the detection process, we curated a comprehensive list of candidate classes representing potential privacy risks. These classes were then integrated into the YOLO-World model, allowing for precise identification of sensitive elements within video frames. As a user uploads a video, our system processes each frame in real-time, detecting and storing the coordinates of potential privacy leakage areas.
For the user-controlled blurring component, we developed an intuitive interface that presents users with the detected sensitive regions. To ensure optimal privacy protection without compromising video quality, we experimented with various blurring techniques. After thorough testing and evaluation, we selected Gaussian blur for its superior performance in obscuring sensitive information while maintaining overall video aesthetics.
The entire system was designed with efficiency and user experience in mind. We optimized our algorithms to minimize processing time, ensuring that the privacy protection features do not significantly impact upload speeds or platform performance. By integrating these components seamlessly into the existing upload process, we've created a robust privacy protection solution that enhances user trust and platform safety without disrupting the creative flow of content creators.

Challenges we ran into

One of the primary challenges we faced was the lack of comprehensive training data for conventional object detectors when it came to identifying privacy-sensitive information in videos. Our initial approach was to leverage Multimodal Large Language Models (LLMs) for their superior understanding of video content. However, we quickly encountered limitations due to their substantial computational requirements. Even with access to an NVIDIA A5000 GPU, we found ourselves constrained by insufficient VRAM, making it impractical to run these models on our local machines.
In response, we pivoted to exploring lightweight detectors from the YOLO series, aiming to train them from scratch on common information leakage objects found in user-generated videos. This approach, however, presented its own set of obstacles. While we could source datasets for some classes like ID cards, many other relevant classes lacked adequate training data. We considered innovative solutions such as synthesizing datasets by cut-and-pasting personal information objects into normal images or applying diffusion inpainting techniques. Unfortunately, time and resource constraints prevented us from fully implementing these methods.
Another significant challenge arose in balancing detection accuracy with processing speed. Given the real-time nature of our privacy protection system, we needed a solution that could analyze video frames quickly without compromising on detection quality. This led us to explore visual grounding models like Grounding DINO, which, while smaller than Multimodal LLMs, still proved too computationally intensive for efficient video processing. After extensive research and experimentation, we discovered YOLO-World, which offered a promising balance between inference speed and detection accuracy. Implementing this model required careful optimization to ensure it could handle the diverse range of privacy-sensitive elements we needed to detect while maintaining real-time performance.
Throughout the development process, we continually grappled with the ethical implications of privacy protection in user-generated content. Striking the right balance between safeguarding user privacy and maintaining creative freedom required thoughtful consideration and iterative refinement of our approach.

What's next for PrivacyFilter

Our project is focused on detecting objects that contain personal information, and we are currently exploring different model options to achieve this. We are considering both YOLO-World and various few-shot segmentation models like VAT (Volumetric Aggregation Transformer). The challenges we face include improving inference speed and accuracy. Given our limited data, we find few-shot segmentation models particularly appealing. Among the promising models we have identified are VAT and BAM, and we need to determine the best choice for our requirements.
One of the primary considerations in our project is the choice of the model. YOLO-World, is known for its speed and efficiency in open-vocabulary object detection tasks. It processes images quickly and is efficient in detecting various objects without huge training data for detecting unseen objects. This makes YOLO-World a potentially attractive option for our project, where fast and accurate detection of personal information in objects is crucial.
Given our limited dataset, few-shot segmentation models present a particularly attractive option. These models can perform well with minimal training data, aligning perfectly with our constraints. Few-shot learning allows the model to learn and generalize from just a few examples, making it ideal for tasks where data is scarce. Among the few-shot segmentation models, VAT and BAM have shown significant potential in terms of both speed and accuracy. Evaluating these models will be essential to determine which one best meets our needs while providing the desired balance between performance and efficiency.
To address the issue of inference speed, we plan to implement several optimization techniques. Model pruning and quantization are two key methods we will use. Pruning involves reducing the size of the model by eliminating redundant parameters and layers, which can enhance inference speed without significantly impacting performance. Quantization, on the other hand, involves converting the model to use lower precision, such as 16-bit or 8-bit precision, which can increase speed and reduce memory usage. These techniques will help us optimize the model for faster inference.
We will also explore more efficient model architectures providing a good balance between speed and accuracy. Custom modifications to these architectures may further enhance their performance for our specific task.
Improving the model's accuracy is another crucial aspect of our project. We plan to implement advanced data augmentation techniques to enhance the model’s robustness and generalization capabilities.