Zeo.ai — Emotion-Aware AI Companion

Zeo.ai — Emotion-aware AI for compassionate, always-on companionship.
Shaping new frontiers in care with human-like 3D avatar conversations powered by facial + vocal emotion recognition and intelligent AI.


Inspiration

Over 1 in 8 people worldwide experience mental health challenges, and access to timely, stigma-free support is limited for millions. We built Zeo to bring presence, empathy, and immediate relief into people’s pockets — an always-on companion that senses emotion, listens, and helps reduce stress. Zeo answers the gap left by text-only bots by restoring nonverbal signals and creating a private, trustworthy space to share.


What it does

Zeo.ai is an emotion-aware AI companion that:

  • Reads facial expressions and vocal cues in real time to infer emotional state.
  • Responds using a lifelike 3D avatar that mirrors tone and expression.
  • Guides short, evidence-informed micro-interventions (breathing, grounding, journaling prompts) to help fight stress.
  • Offers contextual, ongoing conversations via LLM-driven replies with TTS and realistic lip-sync.
  • Prioritizes privacy: performs on-device inference where possible and uploads only anonymized embeddings with consent.
  • Includes safety flows: risk detection → human escalation / hotline connect with clinician review options.

Zeo brings presence back into AI — not just answers, but a steady companion to share everything with.


How we built it

Frontend

  • Framework: React 18 + TypeScript (Vite)
  • Styling: Tailwind CSS
  • Animation: Framer Motion
  • Avatar / Video: WebRTC + WebGL / three.js, Tavus API (prototype)
  • Speech: Web Speech API (browser-native STT where applicable) and TTS integrations

Backend & Services

  • Runtime / Framework: Node.js + Express.js
  • Docs: OpenAPI / Swagger
  • Datastore: MongoDB / PostgreSQL (deployment dependent)
  • Microservices: Python-based ML inference services (optional)
  • Security: CORS, env-based configs, express-rate-limit

AI & Integrations

  • Emotion / Face: Microsoft Azure Face API (prototype)
  • Voice Analysis: Microsoft Speech Services; Deepgram / Google STT fallback
  • TTS: Eleven Labs (expressive synthesis)
  • Avatar: Tavus (prototype video/character)
  • LLM: Groq / OpenAI (contextual response generation)
  • Memory / Context: Supermemory.ai (opt-in session memory)
  • On-device inference: TensorFlow Lite / ONNX for privacy-preserving local models

Design & Safety

  • Edge-first design to reduce sensitive data transfer.
  • Explicit consent flows, pseudonymized analytics, and human-in-the-loop escalation.
  • Accessibility: captions, voice-first flows, large-text UI.

Challenges we ran into

  • Realism vs. uncanny valley: balancing expressive avatars with natural, humane pacing.
  • Privacy vs. accuracy: designing an edge-first pipeline to protect raw audio/video while keeping emotion detection reliable.
  • Safety & scope: ensuring Zeo supports users without making clinical claims — requires strict disclaimers and escalation.
  • Multimodal fusion: reliably combining face, voice, and text signals needed careful calibration and diverse data.
  • Latency & UX: optimizing models and network strategies to keep real-time interactions smooth.

Accomplishments that we're proud of

  • Winners @ Raise Your Hack — international validation of our approach.
  • Working prototype integrating 3D avatar rendering, STT, LLM-driven empathy flows, and basic emotion detection.
  • Privacy-first pipeline design with on-device inference and opt-in analytics.
  • Cross-functional core team ready for rapid prototyping and user testing.

What we learned

  • Trust is earned by design: timing, micro-expressions, and subtle animations greatly affect perceived empathy.
  • Data minimization builds confidence: users prefer local processing when the tradeoff is explained clearly.
  • Safety-first product thinking is essential: conservative detection thresholds and clear escalation are non-negotiable.
  • Multimodal signals improve sensitivity: facial + vocal cues outperform text-only systems for stress detection.
  • Pilot partnerships accelerate validation: clinician and counseling center collaborations are vital for ethical validation.

What's next for ZEO

Short-term (0–3 months)

  • Run a controlled pilot (n≈50) with university counseling partners to measure pre/post stress and engagement.
  • Replace prototype cloud emotion APIs with optimized, privacy-first on-device models.
  • Expand avatar personalization (voice, appearance, cultural tuning) and refine lip-sync & micro-expression fidelity.

Mid-term (3–9 months)

  • Build clinician dashboard for flagged sessions and a safe, consented data-sharing path.
  • Conduct fairness audits across skin tones, ages, and languages; gather diverse training data.
  • Improve personalization via safe opt-in memory and reinforcement learning for engagement.

Long-term (9–18 months)

  • Pilot with NGO/university partners to validate impact and iterate safety protocols.
  • Explore quantum-ready, energy-efficient architectures for large-scale training and deployment.
  • Pursue clinical guidance for advanced care integrations while maintaining Zeo as a supportive companion (not a replacement for therapy).

Built With

  • axios
  • cors
  • deepgram
  • eleven-labs-tts
  • express-rate-limit
  • express.js
  • framer-motion
  • google-speech-to-text
  • groq
  • lucide-react
  • microsoft-azure-face-api
  • microsoft-speech-services
  • mongodb
  • node.js/npm
  • openai
  • openapi/swagger
  • postgresql
  • python-(ml-services)
  • react-18
  • react-context-api
  • supermemory.ai
  • tailwind-css
  • tavus-api
  • tensorflow-lite
  • three.js
  • typescript
  • vite-4
  • web-speech-api
  • webgl
  • webrtc
Share this project:

Updates