Zeo.ai — Emotion-Aware AI Companion

Zeo.ai — Emotion-aware AI for compassionate, always-on companionship.
Shaping new frontiers in care with human-like 3D avatar conversations powered by facial + vocal emotion recognition and intelligent AI.

Inspiration

Over 1 in 8 people worldwide experience mental health challenges, and access to timely, stigma-free support is limited for millions. We built Zeo to bring presence, empathy, and immediate relief into people’s pockets — an always-on companion that senses emotion, listens, and helps reduce stress. Zeo answers the gap left by text-only bots by restoring nonverbal signals and creating a private, trustworthy space to share.

What it does

Zeo.ai is an emotion-aware AI companion that:

Reads facial expressions and vocal cues in real time to infer emotional state.
Responds using a lifelike 3D avatar that mirrors tone and expression.
Guides short, evidence-informed micro-interventions (breathing, grounding, journaling prompts) to help fight stress.
Offers contextual, ongoing conversations via LLM-driven replies with TTS and realistic lip-sync.
Prioritizes privacy: performs on-device inference where possible and uploads only anonymized embeddings with consent.
Includes safety flows: risk detection → human escalation / hotline connect with clinician review options.

Zeo brings presence back into AI — not just answers, but a steady companion to share everything with.

How we built it

Frontend

Framework: React 18 + TypeScript (Vite)
Styling: Tailwind CSS
Animation: Framer Motion
Avatar / Video: WebRTC + WebGL / three.js, Tavus API (prototype)
Speech: Web Speech API (browser-native STT where applicable) and TTS integrations

Backend & Services

Runtime / Framework: Node.js + Express.js
Docs: OpenAPI / Swagger
Datastore: MongoDB / PostgreSQL (deployment dependent)
Microservices: Python-based ML inference services (optional)
Security: CORS, env-based configs, express-rate-limit

AI & Integrations

Emotion / Face: Microsoft Azure Face API (prototype)
Voice Analysis: Microsoft Speech Services; Deepgram / Google STT fallback
TTS: Eleven Labs (expressive synthesis)
Avatar: Tavus (prototype video/character)
LLM: Groq / OpenAI (contextual response generation)
Memory / Context: Supermemory.ai (opt-in session memory)
On-device inference: TensorFlow Lite / ONNX for privacy-preserving local models

Design & Safety

Edge-first design to reduce sensitive data transfer.
Explicit consent flows, pseudonymized analytics, and human-in-the-loop escalation.
Accessibility: captions, voice-first flows, large-text UI.

Challenges we ran into

Realism vs. uncanny valley: balancing expressive avatars with natural, humane pacing.
Privacy vs. accuracy: designing an edge-first pipeline to protect raw audio/video while keeping emotion detection reliable.
Safety & scope: ensuring Zeo supports users without making clinical claims — requires strict disclaimers and escalation.
Multimodal fusion: reliably combining face, voice, and text signals needed careful calibration and diverse data.
Latency & UX: optimizing models and network strategies to keep real-time interactions smooth.

Accomplishments that we're proud of

Winners @ Raise Your Hack — international validation of our approach.
Working prototype integrating 3D avatar rendering, STT, LLM-driven empathy flows, and basic emotion detection.
Privacy-first pipeline design with on-device inference and opt-in analytics.
Cross-functional core team ready for rapid prototyping and user testing.

What we learned

Trust is earned by design: timing, micro-expressions, and subtle animations greatly affect perceived empathy.
Data minimization builds confidence: users prefer local processing when the tradeoff is explained clearly.
Safety-first product thinking is essential: conservative detection thresholds and clear escalation are non-negotiable.
Multimodal signals improve sensitivity: facial + vocal cues outperform text-only systems for stress detection.
Pilot partnerships accelerate validation: clinician and counseling center collaborations are vital for ethical validation.

What's next for ZEO

Short-term (0–3 months)

Run a controlled pilot (n≈50) with university counseling partners to measure pre/post stress and engagement.
Replace prototype cloud emotion APIs with optimized, privacy-first on-device models.
Expand avatar personalization (voice, appearance, cultural tuning) and refine lip-sync & micro-expression fidelity.

Mid-term (3–9 months)

Build clinician dashboard for flagged sessions and a safe, consented data-sharing path.
Conduct fairness audits across skin tones, ages, and languages; gather diverse training data.
Improve personalization via safe opt-in memory and reinforcement learning for engagement.

Long-term (9–18 months)

Pilot with NGO/university partners to validate impact and iterate safety protocols.
Explore quantum-ready, energy-efficient architectures for large-scale training and deployment.
Pursue clinical guidance for advanced care integrations while maintaining Zeo as a supportive companion (not a replacement for therapy).