Zeo.ai — Emotion-Aware AI Companion
Zeo.ai — Emotion-aware AI for compassionate, always-on companionship.
Shaping new frontiers in care with human-like 3D avatar conversations powered by facial + vocal emotion recognition and intelligent AI.
Inspiration
Over 1 in 8 people worldwide experience mental health challenges, and access to timely, stigma-free support is limited for millions. We built Zeo to bring presence, empathy, and immediate relief into people’s pockets — an always-on companion that senses emotion, listens, and helps reduce stress. Zeo answers the gap left by text-only bots by restoring nonverbal signals and creating a private, trustworthy space to share.
What it does
Zeo.ai is an emotion-aware AI companion that:
- Reads facial expressions and vocal cues in real time to infer emotional state.
- Responds using a lifelike 3D avatar that mirrors tone and expression.
- Guides short, evidence-informed micro-interventions (breathing, grounding, journaling prompts) to help fight stress.
- Offers contextual, ongoing conversations via LLM-driven replies with TTS and realistic lip-sync.
- Prioritizes privacy: performs on-device inference where possible and uploads only anonymized embeddings with consent.
- Includes safety flows: risk detection → human escalation / hotline connect with clinician review options.
Zeo brings presence back into AI — not just answers, but a steady companion to share everything with.
How we built it
Frontend
- Framework: React 18 + TypeScript (Vite)
- Styling: Tailwind CSS
- Animation: Framer Motion
- Avatar / Video: WebRTC + WebGL / three.js, Tavus API (prototype)
- Speech: Web Speech API (browser-native STT where applicable) and TTS integrations
Backend & Services
- Runtime / Framework: Node.js + Express.js
- Docs: OpenAPI / Swagger
- Datastore: MongoDB / PostgreSQL (deployment dependent)
- Microservices: Python-based ML inference services (optional)
- Security: CORS, env-based configs, express-rate-limit
AI & Integrations
- Emotion / Face: Microsoft Azure Face API (prototype)
- Voice Analysis: Microsoft Speech Services; Deepgram / Google STT fallback
- TTS: Eleven Labs (expressive synthesis)
- Avatar: Tavus (prototype video/character)
- LLM: Groq / OpenAI (contextual response generation)
- Memory / Context: Supermemory.ai (opt-in session memory)
- On-device inference: TensorFlow Lite / ONNX for privacy-preserving local models
Design & Safety
- Edge-first design to reduce sensitive data transfer.
- Explicit consent flows, pseudonymized analytics, and human-in-the-loop escalation.
- Accessibility: captions, voice-first flows, large-text UI.
Challenges we ran into
- Realism vs. uncanny valley: balancing expressive avatars with natural, humane pacing.
- Privacy vs. accuracy: designing an edge-first pipeline to protect raw audio/video while keeping emotion detection reliable.
- Safety & scope: ensuring Zeo supports users without making clinical claims — requires strict disclaimers and escalation.
- Multimodal fusion: reliably combining face, voice, and text signals needed careful calibration and diverse data.
- Latency & UX: optimizing models and network strategies to keep real-time interactions smooth.
Accomplishments that we're proud of
- Winners @ Raise Your Hack — international validation of our approach.
- Working prototype integrating 3D avatar rendering, STT, LLM-driven empathy flows, and basic emotion detection.
- Privacy-first pipeline design with on-device inference and opt-in analytics.
- Cross-functional core team ready for rapid prototyping and user testing.
What we learned
- Trust is earned by design: timing, micro-expressions, and subtle animations greatly affect perceived empathy.
- Data minimization builds confidence: users prefer local processing when the tradeoff is explained clearly.
- Safety-first product thinking is essential: conservative detection thresholds and clear escalation are non-negotiable.
- Multimodal signals improve sensitivity: facial + vocal cues outperform text-only systems for stress detection.
- Pilot partnerships accelerate validation: clinician and counseling center collaborations are vital for ethical validation.
What's next for ZEO
Short-term (0–3 months)
- Run a controlled pilot (n≈50) with university counseling partners to measure pre/post stress and engagement.
- Replace prototype cloud emotion APIs with optimized, privacy-first on-device models.
- Expand avatar personalization (voice, appearance, cultural tuning) and refine lip-sync & micro-expression fidelity.
Mid-term (3–9 months)
- Build clinician dashboard for flagged sessions and a safe, consented data-sharing path.
- Conduct fairness audits across skin tones, ages, and languages; gather diverse training data.
- Improve personalization via safe opt-in memory and reinforcement learning for engagement.
Long-term (9–18 months)
- Pilot with NGO/university partners to validate impact and iterate safety protocols.
- Explore quantum-ready, energy-efficient architectures for large-scale training and deployment.
- Pursue clinical guidance for advanced care integrations while maintaining Zeo as a supportive companion (not a replacement for therapy).
Built With
- axios
- cors
- deepgram
- eleven-labs-tts
- express-rate-limit
- express.js
- framer-motion
- google-speech-to-text
- groq
- lucide-react
- microsoft-azure-face-api
- microsoft-speech-services
- mongodb
- node.js/npm
- openai
- openapi/swagger
- postgresql
- python-(ml-services)
- react-18
- react-context-api
- supermemory.ai
- tailwind-css
- tavus-api
- tensorflow-lite
- three.js
- typescript
- vite-4
- web-speech-api
- webgl
- webrtc



Log in or sign up for Devpost to join the conversation.