✨ Inspiration
72% of U.S. high school teachers say cellphone distraction is a major problem in the classroom
— Pew Research Center, 2024
In today’s classrooms, maintaining focus is more challenging than ever. With smartphones, social media, and shorter attention spans, distractions dominate learning environments. Teachers struggle to gauge engagement, often relying on guesswork to identify students who may be disengaged.
Our project, Foclass, was designed to address this challenge via implementation of machine learning models. We wanted to empower educators with a tool that provides real-time insights into classroom focus levels, creating a data-driven approach to improving learning outcomes.
🎯 What it does
Foclass uses computer vision to analyze live classroom video feeds, assessing student focus levels in real-time. Key features include:
- Focus Tracking Dashboard: Displays metrics such as individual focus scores, average distraction time, camera overlay for at a glance info.
- Live Overlay Indicator: Alerts educators when attention dips, enabling them to address distractions as they occur.
- Behavior Analysis: Differentiates between levels of distraction, such as phone usage or general inattentiveness. With Foclass, educators gain actionable insights to enhance engagement and create personalized learning strategies.
🛠️ How we built it
Figure 1: Overall architecture for Foclass
After setting up the application, the frontend—built using tck/tk—streams live classroom video feeds to the teacher’s dashboard while communicating with the backend. The backend, also written in Python, processes these streams in real-time by interacting with multiple services to assess student focus levels.
The first service utilizes OpenCV2 and Haar Cascade detection to identify the bounding boxes of student faces in the video feed. To track individual students, we generate embeddings for each detected face using a facial recognition pipeline. These embeddings are compared against a database preloaded by educators with their students’ face embeddings, ensuring accurate identification.
Once the students are identified, the system evaluates their focus levels by passing each student’s video feed through an ensemble of two machine learning models. The first is a custom-trained convolutional neural network (CNN) based on ResNet-18. This model was fine-tuned in two stages: pretraining on general emotion recognition tasks and subsequent training for distraction classification using publicly available datasets.
Figure 2: Loss Curves for the trained ResNet18 models we trained.
The second model in the ensemble is a multimodal foundation model (GPT-4o-mini), chosen for its lower latency and ability to process context alongside visual inputs.
The ensemble strategy ensures high reliability: a student is classified as distracted or using their phone only if both models unanimously agree. By combining the strengths of both models, we maintain a conservative approach to distraction detection, reducing false positives.
Finally, all processed data is relayed to the teacher’s application in real-time. The dashboard displays metrics such as cumulative focus hours per student, attention trends, and aggregate distraction time. This allows educators to quickly identify disengaged students and take immediate action to re-engage the classroom.
🚧 Challenges we ran into
- Finding and curating a dataset to train our focus classification model required significant effort since most datasets are not publically available
- Reducing latency to ensure real-time detection was a significant challenge. On aspect of working around latency constraints, was training a smaller ResNet model that was practical to run on everyday hardware.
🎉 Accomplishments we’re proud of
- Successfully fine-tuned a ResNet-18 able to perform focus classification on a dataset of 35k+, 48x48 grey scale facial images.
- Setting up accurate facial recognition from single images of students
- Designing a dashboard that is both intuitive and informative.
🤔 What we learned
- How to use OpenCV2 in tandem with tcl/tk Python framework
- Our first time training a vision model!
⏭️ What’s next for Foclass
- Incorporating detection for a wider array of distracted states (e.g. asleep, chatting with friends).
- Adding more features like posture analysis and group engagement tracking.
- Implement an overall engagement score for each student depending on how much time was spent on each distraction state.
With Foclass, our goal is to redefine attentiveness in the classroom, creating opportunities for students to learn and grow in ways not previously possible.


Log in or sign up for Devpost to join the conversation.