Six Eyes
A content based image retrieval tool that makes it easy for students to visually search Textbooks, Images, PDFs, and YouTube videos to instantly jump to what they need.
The Problem
Students have all sorts of resources at their disposal:
- YouTube videos (Khan Academy, Organic Chemistry Tutor, 3B1B, etc.)
- Textbooks
- Professor Lecture Notes
- Past Exams
- etc.
We literally have everything we need to succeed, and yet it's so darn hard! Why? Because:
- good explanation videos are over 30 minutes long
- textbooks are hundreds of pages
- lecture notes are weirdly drawn and disogranized
The Problem: It takes so long to jump around and find exactly what we need!
The Solution: Introducing Six Eyes
SixEyes is your friend to navigate the mountain of study materials. With six eyes you can:
- Upload YouTube Videos,
mp4s,pdfs, images, powerpoints, and more and add them to an index - Search your corpus to find exact points that have diagrams or concepts based on a picture.
We let you submit a picture (it could be a simple sketch or a homework problem) and use AI embeddings to automagically jump to the relevants parts of your work! This includes timestamps of videos, pages in textbooks, and accurate images.

Look at how my drawing of a tree from my Algorithms class automatically pulled content from my notes? Awesome huh?
The Science: AI, Vectors, and a LOT OF GPUs
The secret sauce behind this is Image Embeddings. We trained our own custom model on 10 different TAMU Textbooks to get really good at clustering and classifying the diagarms found in engineering and biology textbooks. Unlike other image systems (like Google Images), our model understands what's being drawn and shows similar results.
These vectors are then stored in a Redis Vector Database. Unlike the 2D vectors we discuss in math, our vectors are over 700 dimensions large. We use the cosine similarity to compute which images are related to each other, and return the best matches. Training this was impossible!!! and took the combined might of AWS, GCP, TAMU HRPC, RunPod, and all of our laptops. After over 10 hours of training we are proud of the results of our classifiers.
The Architecture
Really for the USAA folks :P

We have a service that breaks videos and documents into frames and runs some preprocessing. Then, all of it goes into our neural network service that uses multiple pods to compute these vectors. The network takes about 1 second per frame, so we have multiple instances running per frame to be able to tackle larger videos. Finally, they go into our vector database.
When you submit a image to find similar results, we compute a new vector embedding and then query the database for the most similar results. The crazy part is that the AI knows what your diagrams look like and actually find the closest results!!!
What's Next?
Bruh I need this so I will definitely be using this.
Log in or sign up for Devpost to join the conversation.