Inspiration

We wanted to build a system that could give visually impaired individuals or anyone with face blindness the ability to identify people around them in real time, knowing not just who someone is, but when and where they were last seen.

What it does

FaceScan is a real-time face-recognition and audio-memory system. It detects faces through a camera, identifies known individuals, and speaks their name along with the date and location they were last seen. For new faces, it records their name through a microphone and stores it permanently. Every person is remembered across sessions.

How we built it

OpenCV LBPH for fast local face recognition AWS Rekognition as a high-accuracy backup for uncertain matches, running all comparisons in parallel for speed AWS Comprehend to extract just the name from natural speech (strips filler words like "hey" or "my name is") Google STT for speech-to-text transcription ElevenLabs TTS for natural, human-sounding voice output SQLite for persistent memory of all persons, locations, and timestamps pygame for cross-platform audio playback and beep tones

Challenges we ran into

LBPH confidence thresholds needed careful tuning to avoid misidentifying returning faces as new people AWS Rekognition sequential comparisons caused 10–15 second delays, solved by running all comparisons in parallel with ThreadPoolExecutor macOS quarantine flags on downloaded files made the SQLite database read-only ElevenLabs TTS using afplay (Mac-only) needed to be replaced with pygame. mixer for cross-platform support Cooldown logic needed to persist across restarts so the same person isn't announced repeatedly

Accomplishments that we're proud of

Sub-2-second face identification after initial training Fully cross-platform (Mac and Windows) with the same codebase Natural human-sounding voice responses via ElevenLabs Persistent memory that survives restarts, the system truly "remembers" people

What we learned

Combining a fast local model (LBPH) with a cloud fallback (AWS) gives the best balance of speed and accuracy Parallel API calls dramatically reduce latency in multi-person scenarios An audio pipeline on different OSes requires abstraction to work reliably

What's next for FaceScan

Mobile app version using the phone camera Emotion detection to add context ("Jasnoor looks happy today") Multi-camera support for wider environments A web dashboard to manage known persons and view history

Built With

Share this project:

Updates