Six Eyes

A content based image retrieval tool that makes it easy for students to visually search Textbooks, Images, PDFs, and YouTube videos to instantly jump to what they need.

The Problem

Students have all sorts of resources at their disposal:

YouTube videos (Khan Academy, Organic Chemistry Tutor, 3B1B, etc.)
Textbooks
Professor Lecture Notes
Past Exams
etc.

We literally have everything we need to succeed, and yet it's so darn hard! Why? Because:

good explanation videos are over 30 minutes long
textbooks are hundreds of pages
lecture notes are weirdly drawn and disogranized

The Problem: It takes so long to jump around and find exactly what we need!

The Solution: Introducing Six Eyes

SixEyes is your friend to navigate the mountain of study materials. With six eyes you can:

Upload YouTube Videos, mp4s, pdfs, images, powerpoints, and more and add them to an index
Search your corpus to find exact points that have diagrams or concepts based on a picture.

We let you submit a picture (it could be a simple sketch or a homework problem) and use AI embeddings to automagically jump to the relevants parts of your work! This includes timestamps of videos, pages in textbooks, and accurate images.

Screenshot

Look at how my drawing of a tree from my Algorithms class automatically pulled content from my notes? Awesome huh?

The Science: AI, Vectors, and a LOT OF GPUs

The secret sauce behind this is Image Embeddings. We trained our own custom model on 10 different TAMU Textbooks to get really good at clustering and classifying the diagarms found in engineering and biology textbooks. Unlike other image systems (like Google Images), our model understands what's being drawn and shows similar results.

These vectors are then stored in a Redis Vector Database. Unlike the 2D vectors we discuss in math, our vectors are over 700 dimensions large. We use the cosine similarity to compute which images are related to each other, and return the best matches. Training this was impossible!!! and took the combined might of AWS, GCP, TAMU HRPC, RunPod, and all of our laptops. After over 10 hours of training we are proud of the results of our classifiers.

The Architecture

Really for the USAA folks :P

Architecture

We have a service that breaks videos and documents into frames and runs some preprocessing. Then, all of it goes into our neural network service that uses multiple pods to compute these vectors. The network takes about 1 second per frame, so we have multiple instances running per frame to be able to tackle larger videos. Finally, they go into our vector database.

When you submit a image to find similar results, we compute a new vector embedding and then query the database for the most similar results. The crazy part is that the AI knows what your diagrams look like and actually find the closest results!!!

What's Next?

Bruh I need this so I will definitely be using this.

Built With

Submitted to

TAMUhack X
- Winner Third Overall Software Hack

Created by

I designed the architecture and then spent most of the hackathon iterating on the Neural Net that fuels the image similarity engine. I also helped with integrating the various micro services together.

Akil Manivannan
Engineer that likes to code. If I can write code that can actually impact somebody in 48 hours, then that's 48 hours well spent.
Joseph Chau
Nicholas Kasman
Software Engineer @ TAMU

Updates

Nicholas Kasman started this project — Jan 28, 2024 12:36 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.