Style Shepherd : Agentic Shopping Experience

https://cloud.cerebras.ai/
netlify

Inspiration

Style Shepherd started as a single question: how do we make shopping feel like talking to a trusted stylist who remembers you — not a faceless recommender? For the Agentic Shopping Experience prize we doubled down on agent-first design: small, auditable specialist agents coordinated by a supervisor/orchestrator, persistent memory for multi-turn context, and real-time collaboration between agents and UI. Modern fashion e-commerce faces a critical challenge: online apparel return rates average 24–40%, costing the industry over $200 billion annually. Most returns stem from sizing uncertainty, style mismatches, and an impersonal one-size-fits-all experience. We set out to reimagine this paradigm through a real-time, agentic voice assistant that empowers users to shop by voice—matching their fit, style preferences, and minimizing waste with deep learning and predictive analytics. We’re inspired by the thought that merging multi-modal AI, fast infrastructure, and a genuinely personal experience could reshape how the world shops for fashion online.

Key influences:

Agentic AI research (Supervisor → Orchestrator → Specialist pattern) for breaking down complex shopping tasks into reliable subroutines.
Headless commerce: separate presentation, cart & payment orchestration for composability across web, voice, and agent channels.
Production/ops best practices: monitoring, CI/CD, and scheduled retraining
Raindrop Smart Components (LiquidMetal): SmartMemory, SmartBuckets, SmartSQL, SmartInference — used to shortcut common primitives for agent memory, visual search, structured queries, and managed inference.

What it does

Style Shepherd is an agent-driven, headless fashion assistant that supports the full shopping funnel from discovery → recommendation → cart → checkout with auditability and explainability. Style Shepherd is an end-to-end AI commerce platform with a voice-first experience. Users can: Shop for clothing by talking or texting via a conversational AI that understands preferences, intent, and context.

Get hyper-personalized recommendations, including optimal size (reducing fit-related returns), style guidance, and alternative outfit suggestions.
Instantly see real-time return risk analysis for every product and size.
Receive explanations for every AI decision and suggestion, increasing trust and transparency. -Complete purchases securely, with AI-driven dynamic promotions and Stripe payment. -Track orders, returns, and even get sustainability metrics—all in one place.

Enter your measurements once, speak naturally about your style or the occasion, and Style Shepherd guides you to fashion that fits and suits you, while reducing the carbon/waste impact of e-commerce returns.

Core capabilities (technical summary):

Supervisor / Orchestrator
- Parses intent (LLM prompt), routes tasks to agents, aggregates responses.
- Interface: analyze(input: UserQuery) => OrchestratorResponse
- Implementation: src/agents/orchestrator.ts, server/orchestrator-worker.ts
Specialist Agents (each implements analyze(input) => AgentResponse)
- PersonalShopper — curates outfit bundles; uses product graph and cosine-similarity ranking. (src/agents/personal-shopper.ts)
- MakeupArtist — CV-based skin-tone analysis + product routine generation (src/agents/makeup-artist.ts)
- SizePredictor — per-brand size models returning size + confidence. (src/agents/size-predictor.ts)
- ReturnsPredictor — ML classifier on 55+ features producing risk score + explainability. (src/agents/returns-predictor.ts)
- OutfitBuilder — composes 3–5 item outfits, validates against constraints, scores with style harmony metrics. (pages/api/functions/v1/outfit-builder.ts)

Raindrop Smart Components integrated (see src/services/raindrop/*):
- SmartMemory — persistent user/session memory for multi-turn personalization. (src/services/raindrop/userMemoryService.ts)
- SmartBuckets — product image storage + visual search primitives for image-similarity in the PersonalShopper and OutfitBuilder. (src/services/raindrop/productBucketsService.ts)
- SmartSQL — structured query access for orders/catalog/returns across agents. (src/services/raindrop/orderSQLService.ts)
- SmartInference — optional inference primitives used as a managed inference fallback for style recommendations (src/services/raindrop/styleInferenceService.ts)
Search & Ranking
- Intent extraction (LLM) → embeddings → vector ANN search → multi-factor ranking: semantic(35%) + intent(25%) + price(15%) + availability(10%) + sustainability(8%) + brandAffinity(7%). (lib/embeddingService.ts, lib/productRanker.ts)
TTS & Voice Concierge
- ElevenLabs proxied via pages/api/functions/v1/elevenlabs-voices.ts + components/ElevenLabsVoiceSelector.tsx.
Seamless purchasing
- Headless cart + payment orchestration (Stripe patterns) + agent-initiated cart building with explicit user confirm. (src/lib/stripe.ts, src/services/payment-manager.ts)
Monitoring & Ops
- Prometheus metrics, OpenTelemetry traces, structured JSON logs (winston), /api/metrics for scrapes and dashboards. (src/monitoring.ts)

Example demo endpoints:

POST /functions/v1/style-recommendations   # 55+ feature engine
POST /functions/v1/outfit-builder
GET  /functions/v1/elevenlabs-voices      # proxied voice list
POST /api/agents/orchestrate              # orchestrator entrypoint
GET  /api/metrics                         # Prometheus scrape

How we built it

Frontend: Next.js, React, and Tailwind create a lightning-fast, mobile-friendly UI with accessibility-first design, animated micro-interactions, and a fully integrated voice interface (ElevenLabs).
Backend: Node.js (TypeScript) with modular Express APIs; all user state, catalog, and session data managed across Raindrop SmartMemory, SmartSQL, and SmartBuckets for rapid scalability. -AI Orchestration: Python ML models (Random Forest, Gradient Boosting, deep neural nets) deployed on Vultr GPU/CPU instances—serving real-time size/returns prediction and personalized recommendations via REST endpoints.
Voice AI: ElevenLabs API for near-human Text-to-Speech and Speech-to-Text, fine-tuned on fashion domain vocabulary for maximal accuracy.
DevOps: Raindrop YAML deployment with CI/CD pipeline, scalable from 2–10+ nodes, monitoring with Prometheus/Grafana, secure environment variable management, error tracking for reliability.
Integrations: WorkOS (SSO), Stripe (secure checkout), SmartInference (AI agent orchestration), and Cloudflare (media/CDN).

Agentic architecture — technical details

We followed a Supervisor → Orchestrator → Specialist pattern.

Supervisor / Orchestrator (src/agents/orchestrator.ts)
- parseIntent(text) -> Intent (LLM prompt template)
- routeTasks(intent) -> [agentNames] (rules + intent-confidence thresholds)
- dispatch(agentName, payload) via internal agent runner. Agents run in-process (fast path) or via worker (heavy path).
- Inter-agent comms: Supabase realtime channels: agents:session:{sessionId} for async collaboration and broadcast.

Agent interface

interface Agent {
  name: string;
  analyze(input: UserQuery): Promise<AgentResponse>;
}

Agents publish AgentResponse objects to Supabase and to agent_responses table for audit, replay, and debugging.

Raindrop integration (LiquidMetal)

SmartMemory: store user preferences, previous purchases, saved outfits; used to seed prompts and to build long-term personalization. (src/services/raindrop/userMemoryService.ts)
SmartBuckets: store product images + visual features for fast similarity queries; OutfitBuilder uses this to surface visually coherent pieces. (src/services/raindrop/productBucketsService.ts)
SmartSQL: used by orchestrator/agents for quick joins across orders/catalog without duplicating logic. (src/services/raindrop/orderSQLService.ts)
SmartInference: used as optional managed inference path when local models unavailable. (src/services/raindrop/styleInferenceService.ts)

Data & models

Return risk: ensemble classifier trained from historical returns + engineered 55 features (user behavior, product attributes, transaction context, interaction patterns). Service: src/services/returnRisk/return_risk_prediction.ts.
Size predictor: per-brand models with calibration & confidence outputs.
Embeddings: adapter pattern for embeddings — HF/online or local ggml/xenova fallback for offline demos.
Explainability: lightweight SHAP-like factor lists (top 5 reasons).

Dev & ops

CI/CD: GitHub Actions (.github/workflows/ci.yml, deploy.yml, ai-retrain.yml); scheduled retrain jobs for model freshness.
Process manager: ecosystem.config.js for PM2 (web + agent-worker).
Monitoring: src/monitoring.ts exports histograms/counters; metrics endpoints wired to /api/metrics.

File highlights (where to inspect)

Agents: src/agents/*
Raindrop services: src/services/raindrop/*
ElevenLabs proxy: pages/api/functions/v1/elevenlabs-voices.ts
Outfit builder & APIs: pages/api/functions/v1/outfit-builder.ts
Monitoring: src/monitoring.ts, pages/api/metrics.ts
CI/CD workflows: .github/workflows/*

Challenges we ran into

High-quality, diverse sizing and returns data is fragmented; normalizing across brands, body types, and formats required significant up-front engineering and feedback loops.
Real-time ML inference pipelines balancing speed (UX) vs. model complexity (accuracy) on limited computational budgets.
Building explainability directly into AI outputs (so users and admins could see and trust why an item gets recommended or flagged as risky).
Ensuring seamless voice UX, especially with varied accents, edge-case query formulations, and device compatibility.
Keeping everything privacy-first (GDPR/CCPA compliance) while managing sensitive profile and transaction data across multiple regions.

Securely integrating third-party services while keeping demos reproducible
- Fix: server-side proxies + .env.example. Sensitive keys never go to the client.
Multi-agent orchestration & race conditions
- Fix: use Supabase channels to synchronize asynchronous agent responses and persist intermediate results for replay.
Combinatorial outfit composition

Fix: prefilter by inventory/style cluster, greedy beam search with scoring function, return top-K with alternatives.

Explainability vs. latency
- Fix: produce compact top-5 factors inline; heavy SHAP-style reports are background jobs.
Bias & fairness
- Fix: subgroup parity unit tests, calibration checks, and logging to detect skew. Tests run in CI and included in returnRiskPrediction.test.ts.
Retraining governance
- Fix: schedule retrains via GitHub Actions (ai-retrain.yml), validate on dev tests and smoke-tests before auto-deploy.

Accomplishments that we're proud of

Reduced fit-related return risks by up to 35% in pilot tests, with size predictions matching user expectations in >85% of cases.
Built a genuinely conversational, real-time shopping experience—including SmartMemory for long-term learning from every interaction.
Achieved explainable AI for both users (why a product/size was chosen) and retailers (risk factors behind predicted returns).
End-to-end CI/CD: live updates, monitored deploys, and on-demand scaling.
Received positive user feedback for both accessibility and inclusivity: virtual try-on accommodates non-standard sizes, and visual results reflect global diversity.
Agentic Shopping Experience: Modular agent design enables multi-turn, multi-modal flows (text, voice, image) that are auditable and reproducible.
Raindrop SmartComponents: Leveraged SmartMemory, SmartBuckets, SmartSQL, and SmartInference to significantly reduce engineering time for memory & visual search primitives.
Production-minded demos: Single-file demo pages, server proxies, and a Judge Checklist that make the product demoable in 90s without secrets.
Explainability & audit trails: agent_responses table + traceable prompts + compact top-factor explanations for transparency.
Operational maturity: metrics, traces, and CI/CD including scheduled model retrains — demonstrates readiness for production and continuous improvement.

What we learned

Choosing the right feedback and measurable confidence scores builds trust—users are more likely to follow recommendations, and retailers better understand their audience and operations.
Multi-modal AI (voice, text, structured data, images, feedback) is key: no single signal delivers enough context for truly personal commerce.
Explainability is not just for compliance or admin dashboards—it dramatically increases conversion and satisfaction for end users.
Designing agentic workflows (and nudging the user at critical friction points, like high return risk) yields both better outcomes and more user engagement.
Agent modularity is powerful: narrow specialists (size, returns, style, TTS) compose into complex behaviors while remaining testable and auditable.
Memory + context are differentiators: SmartMemory (Raindrop) + Supabase session channels let agents build long-term personalization safely and incrementally.
Reproducible demos are non-negotiable: real APIs fail under demo load.
Operational instrumentation early: tracing and metrics clarified race conditions and performance bottlenecks in multi-agent orchestration.
Fairness must be engineered: bake subgroup tests & calibration in the pipeline, not as an afterthought.

What's next for Style Shepherd

In terms of Style Shepherd as a Product:

Add AI-powered live video try-on, bringing realistic draping to the mobile shopping experience.
Integrate direct-to-closet “fit journals” that sync user feedback, returns, and physical store visits anonymously for ever-improving model training.
Broader multi-lingual and global retailer support(supporting right-to-left, script, and regional fit nuances), plus open APIs for sustainable or upcycled fashion brands.
Expand the explainable AI dashboard to include carbon/waste reduction stats for eco-conscious shoppers and corporate clients.
Launch Style Shepherd SDK/white label for fashion brands and other verticals (e.g., sportswear, uniform suppliers, kidswear).

Targeted with the Agentic Shopping Experience prize in mind — focus areas to win:

Agent-Level SLAs & Contracts
- Formalize agent contracts (inputs, outputs, costs, token usage) and enforce via schema checks and CI tests. Implement auto-contract tests in GitHub Actions.
Model Ops & Drift Alerts
- Add automated drift detection (KL-divergence on features), nightly aggregation jobs, and alerting to Slack/PagerDuty when drift or calibration breaks.
Raindrop-Backed Personalization Pipelines
- Expand SmartMemory usage: episodic summaries, privacy-preserving recall, and user-control UIs for memory deletion / export.
Agent governance & explainability
- Attach per-response provenance (prompt snapshot, model version, tools used) to every agent_response. Build a judge-facing audit UI to replay agent decisions step-by-step.
One-click & Voice-first checkout
- Harden payment orchestration for secure one-click experiences coupled with returns-risk gating and fraud checks.
Edge & local inference options
- Integrate Xenova/ggml embeddings for on-device similarity and a hybrid inference path to reduce latency and cost.
Agent marketplace & plugin tooling
- Provide a plugin spec so others can drop in new domain agents (sizing, tailor-matching, sustainability rater) with automatic registration to the orchestrator.

Appendix — Quick technical artifacts

Example orchestrator call (curl)

curl -X POST http://localhost:3000/api/agents/orchestrate \
  -H "Content-Type: application/json" \
  -d '{"text":"What should I wear to a rooftop wedding?","sessionId":"s-123","userId":"demo"}'

Sample AgentResponse structure

{
  "agent": "size-predictor",
  "timestamp": "2025-12-10T17:02:05Z",
  "payload": {
    "size": "M",
    "confidence": 0.83,
    "topFactors": ["height", "waist_cm", "brand_fit_history"]
  },
  "provenance": {
    "modelVersion": "size-v1.2",
    "promptSnapshot": "<prompt text>",
    "toolsUsed": ["brandSizeModel-v2"]
  }
}

Env vars (core)

NEXT_PUBLIC_SUPABASE_URL=
SUPABASE_SERVICE_KEY=
ELEVENLABS_KEY=
VULTR_HOST=
VULTR_USER=
VULTR_KEY=
STRIPE_SECRET=
ENABLE_TREND_SERVICE=false

Built With

eleven
raindrop
vultr