Moodshaker | Devpost

Inspiration

MoodShaker started with a simple (and very common) feeling: we want a great cocktail at home—but not the bar tab, not the “what do I even order?” panic, and definitely not ten browser tabs of recipes that read like homework.

At the same time, cocktails have become more than “a drink.” They’re a moment—a little ritual that’s aesthetic, shareable, and personal. We found that many people want to make cocktails at home, and younger drinkers especially want recommendations that match their mood and taste, not just generic “top drinks.” But the current DIY experience is messy: jargon, confusing ratios, too many tools, and too much searching.

So we asked: what if learning cocktails felt like talking to a friend behind the bar—someone who listens, surprises you with something that fits, and teaches you step-by-step?

What it does

MoodShaker is a real-time voice bartender cast designed for non-professionals.

Pick a bartender persona (beginner → curious → advanced), based on tools/effort level.
Talk naturally (2–3 prompts) to capture vibe + constraints (mood, taste, safety/allergies).
Get three mood-driven recommendations as clear, scannable cards.
Choose one and follow a guided making animation you can pause anytime for exact ratios + tips.
Missing ingredients? MoodShaker suggests substitutions so you can still make your version.
Rate/save/share, and MoodShaker learns so next time it asks fewer questions and recommends better.

How we built it

System design

We built MoodShaker end-to-end on Gemini, combining voice, tool calling, and structured outputs.

Flow: Conversation → Signals → Tool Call → Cards → Selection → Steps → Rating → Memory

Gemini Live runs a bidirectional real-time voice session (mic audio in, bartender voice out).
Each bartender persona has a clear personality + tier, and its own distinct voice—so the cast genuinely feels different, not just “reskinned.”
During natural conversation, Gemini extracts lightweight but high-signal context: (\text{mood/state}), (\text{occasion + setting}) (e.g., celebrating vs. winding down, hosting vs. solo, at home vibe/weather), (\text{taste preferences}), and (\text{allergies/dietary constraints}). These signals—together with the selected persona’s personality/tier—drive both the recommendations and how we teach the drink.
When context is sufficient, Gemini triggers tool calls:
- get_recommendations → returns 3 recommendation cards
- select_cocktail → generates step-by-step mixing instructions mapped to animation/video beats

Structured output + tiers

To keep recommendations consistent per persona, we use structured output with Pydantic schemas + enum constraints. This ensures each bartender can only recommend cocktails within its tier (tools/complexity), preventing “advanced-only” recipes from leaking into the beginner experience.

Conceptually, we treat recommendations as a constrained scoring problem:

$$ \text{score}(c)=w_m\,\text{match}{mood}(c)+w_s\,\text{match}{setting}(c)+w_t\,\text{match}{taste}(c)+w_i\,\text{match}{inventory}(c)-w_a\,\text{penalty}_{allergy}(c) $$

Backend sketch (tool-calling loop)

# Pseudocode illustrating the tool-calling pipeline

def on_live_session(user_audio_stream):
    # 1) Gemini Live handles real-time conversation
    signals = extract_signals_from_conversation()  # mood/state, occasion+setting, taste, allergies

    if gemini_requests("get_recommendations"):
        cards = gemini_flash_structured(
            schema=RecommendationSchema,  # Pydantic + enums per persona tier
            context=signals
        )
        return cards

    if gemini_requests("select_cocktail"):
        steps = gemini_flash_structured(
            schema=StepByStepSchema,
            context=selected_card + signals
        )
        return steps

def on_rating(rating, session_context):
    update_user_profile(session_context, rating)  # fewer questions next time, better picks

UX + visuals

We designed the product to feel low-friction and playful:

Minimal prompts, friend-like tone (while matching each persona’s personality)
Cards that summarize the drink at a glance
Guided animation with pause-to-see-details
Figma motion planning + screen recording for the demo

Challenges we ran into

Making voice feel human (not like a quiz): We learned that fewer prompts and better pacing beat “smarter questions.” Keeping it to 2–3 prompts made the experience feel friendly.
Balancing creativity with control: Cocktails need personality, but also correctness (ratios/tools/safety). Structured output + tier constraints kept the UX coherent.
Explaining the system quickly: Hackathon time is brutal, so we focused on a clean narrative: talk → cards → guided making → share/save, then a minimal “under the hood” icon diagram.

Accomplishments that we're proud of

Real-time, bidirectional voice bartender: built a truly voice-first experience on Gemini Live — mic audio in, bartender voice out — so it feels like a conversation, not a form.
Low-friction UX with high-quality results: simplified the flow into a short chat (2–3 prompts) that reliably produces three clear recommendations.
Full demo under tight time: delivered a polished end-to-end prototype (UI + screen recording + motion planning + voiceover) that clearly communicates both impact and technical depth.

What we learned

Gemini Live (real-time voice conversation)
Gemini Flash (structured recommendations + step generation)
Server-side tool calling (get_recommendations, select_cocktail)
Python + Pydantic (schema + enum constraints)
Figma (UI + motion planning)

What's next for Moodshaker

Voice UX, fully conversational: push “UX inside the dialogue” — fewer UI steps, smarter prompts, and a bartender that guides naturally through voice.
Better mixing animations: keep polishing the modular animation system so steps feel smoother, more dynamic, and less repetitive across drinks.
Mobile app version for sharing: build a mobile experience optimized for capturing the moment — take a photo after making, then generate a branded, same-style share card that matches MoodShaker’s look and feel.