Yelp Scout | Devpost

Frontend Page
Chat Page- Take picture, or Press voice chat to start!
Recommended Restaurants
Map of the recommended restaurants
Fully responsive on phone!

Inspiration

We've all been there, scrolling Instagram, seeing an amazing dish, and thinking "I want that, but what even is it?" Or walking down the street, hands full, wishing we could just ask for restaurant recommendations instead of typing.

Existing food apps make you type, scroll, and filter. Worse, they forget your allergies and dietary needs every single time. We wanted something more natural., like asking a friend who knows every restaurant in town and remembers you can't eat gluten.

What it does

Yelp Scout is an AI food assistant with three input modes:

📷 Photo Search — Snap any dish, AI identifies it, finds restaurants serving it nearby
🎤 Voice Chat — Speak naturally, get spoken recommendations back
⌨️ Text — Traditional search with smart time-based suggestions

The app auto-learns and remembers:

🍕 Favorite cuisines ("I love sushi" → saved)
🥗 Dietary restrictions ("I'm vegan" → saved)
⚠️ Allergies ("I have a nut allergy" → saved and respected in future recommendations)

No settings page needed. Just chat naturally — we pick up on it.

How I built it

Layer	Tech
Frontend	React 19, TypeScript, Tailwind CSS, Leaflet
Vision AI	OpenAI GPT-4o via InsForge AI SDK
Food Data	Yelp Chat API v2
Voice	Web Speech API + OpenAI TTS
Backend	InsForge (Postgres, Auth, Edge Functions)

The architecture flows like this: User Input (📷/🎤/⌨️) → GPT-4o Vision → Yelp API → Results + Map

Preference detection happens server-side our edge function scans each message for cuisine keywords and dietary patterns, then auto-updates the user's profile in Postgres.

Challenges I faced

Speech re-triggering — Auto-speak kept replaying when switching tabs. Fixed by tracking all spoken message IDs in a Set instead of just the last one.

2.. Photo → Text pipeline — Getting GPT-4o to return structured JSON consistently required careful prompt engineering.

What I learned

Building voice-first UX is harder than it looks, timing, interruptions, and feedback matter
InsForge made backend setup incredibly fast (auth, DB, edge functions in minutes)
Multi-modal AI (vision + text + voice) creates a much more natural user experience
Auto-detecting preferences from natural language is powerful but requires robust keyword matching

What's next

Group dining mode (resolve conflicts: "Find food for a vegan + someone gluten-free")
Swipe-to-decide for indecisive moments
Reservation booking integration