Inspiration

Our inspiration came from a bunch of everyday frustrations that kept piling up. We recently got back into playing Valorant, and whenever we talked about cool plays at school, our friends always asked for clips we couldn’t access because everything was saved on our PCs at home. That constant “I wish I could just grab that file right now” moment stuck with us. We also wanted to start journaling, but we kept forgetting the small interactions that actually make up our day. It made us realize how hard it is to remember our lives accurately without some kind of support. With smart glasses becoming more common, the idea of capturing those moments effortlessly felt exciting. And on top of that, we noticed how our smartwatches track everything about our bodies (steps, heart rate, sleep), but nothing about how sustainable our daily choices are. Since our New Year’s resolution was to be more environmentally conscious, we kept wishing there was a simple way to understand our carbon footprint without manually tracking it. All of these little pain points added up and pushed us to build something that could make our digital lives, memories, and habits easier to manage.

What it does

alfr3d is a persistent AI companion that lives across your devices and turns your computer into something you can actually talk to and rely on. You can interact with it from anywhere using voice or text, ask it to find files based on what’s inside them instead of just their names, and manage your digital life without having to remember where everything is saved. It can recognize people through your webcam, automatically create and sync contacts, and hold natural voice conversations that feel more like a FaceTime call than a command-based assistant. alfr3d guides you for a more sustainable lifestyle, from informing you where to recycle pizza boxes to helping you save water in your home. Over time, alfr3d builds contextual memory across devices, learning your preferences, remembering conversations, and making your computer feel less like a tool and more like a personal assistant that actually understands you.

How we built it

Alfr3d consists of a Python desktop agent (Python 3.11+, asyncio, python-socketio, watchdog, Pillow) that runs in the background on your computer, a cloud server built with Node.js and Express hosted through vultr, and a web dashboard built with React 18, TypeScript, Vite, and Tailwind CSS. Real-time communication is handled with Socket.IO, and authentication is managed through Auth0. We used .tech to deploy as well. For AI and intelligence, we integrated the Gemini API for large-context conversations and file understanding, ElevenLabs for real-time speech transcription and natural text-to-speech, and face-api.js for browser-based face detection and recognition. Data is stored and synced using MongoDB Atlas, and everything is deployed on a Vultr VPS, with pm2 for process management and ngrok for secure tunneling during development. All of these pieces work together to support real-time voice calls, face recognition, file search, and seamless syncing across devices.

Challenges we ran into

A lot of the hard parts came from making things work in real time. Getting clean audio from the browser, processing it live, and turning it into natural conversation was much harder than expected. Face recognition took a lot of tuning to avoid false matches. Keeping data synced across multiple devices without breaking when the internet drops forced us to think about offline support and fallback systems. We also had to deal with connection stability between the desktop agent and the server, and learn how to manage large AI contexts without making the system slow or expensive.

Accomplishments that we're proud of

End-to-end voice interaction - User can speak naturally, AI understands and executes commands on their computer, responds with natural speech Automatic contact creation - When you meet someone and they introduce themselves, alfr3d automatically recognizes their face and creates a contact with extracted metadata (relationship, organization, tags) Gemini function calling integration - Seamless routing of user requests to the correct desktop agent capability (file search, screenshots, system info, clipboard) Beautiful real-time UI - Live face detection overlays, waveform audio visualization, responsive design with Tailwind CSS v4 Production - Full Auth0 integration with protected routes and JWT token handling, MongoDB Atlas database storing user information, model conversations, preferences, statistics, and more, Express.js server deployed through vultr’s cloud services, web dashboard hosted through Vercel available through alfredai.tech (temporarily on hold)

What we learned

Gemini's function calling is powerful for building agentic AI systems that can interact with external tools ElevenLabs voice tuning - Lower stability (0.35) + higher style (0.65) creates more expressive, natural-sounding AI voices face-api.js can run entirely client-side with good performance using SSD MobileNet and TinyFaceDetector models WebSocket architecture enables real-time bi-directional communication essential for voice interfaces Hybrid storage patterns are crucial for building resilient apps that work both online and offline

What's next for alfr3d

Smart Glasses Integration - Replace phone-based POV with actual AR glasses for hands-free visual memory Environmental Tracking - Location-based carbon footprint tracking with eco-friendly suggestions Daily Journal Generation - Automatic journaling from conversations and photos throughout the day Proactive AI Suggestions - Context-aware recommendations based on location, time, and history Calendar Integration - Google Calendar sync for proactive reminders and scheduling Multi-user Support - Allow multiple users with separate memories and contacts Advanced File Operations - Not just search, but organize, send, and automate file management

Built With

Share this project:

Updates