Inspiration
We've all been there, scrolling Instagram, seeing an amazing dish, and thinking "I want that, but what even is it?" Or walking down the street, hands full, wishing we could just ask for restaurant recommendations instead of typing.
Existing food apps make you type, scroll, and filter. Worse, they forget your allergies and dietary needs every single time. We wanted something more natural., like asking a friend who knows every restaurant in town and remembers you can't eat gluten.
What it does
Yelp Scout is an AI food assistant with three input modes:
- 📷 Photo Search — Snap any dish, AI identifies it, finds restaurants serving it nearby
- 🎤 Voice Chat — Speak naturally, get spoken recommendations back
- ⌨️ Text — Traditional search with smart time-based suggestions
The app auto-learns and remembers:
- 🍕 Favorite cuisines ("I love sushi" → saved)
- 🥗 Dietary restrictions ("I'm vegan" → saved)
- ⚠️ Allergies ("I have a nut allergy" → saved and respected in future recommendations)
No settings page needed. Just chat naturally — we pick up on it.
How I built it
| Layer | Tech |
|---|---|
| Frontend | React 19, TypeScript, Tailwind CSS, Leaflet |
| Vision AI | OpenAI GPT-4o via InsForge AI SDK |
| Food Data | Yelp Chat API v2 |
| Voice | Web Speech API + OpenAI TTS |
| Backend | InsForge (Postgres, Auth, Edge Functions) |
The architecture flows like this: User Input (📷/🎤/⌨️) → GPT-4o Vision → Yelp API → Results + Map
Preference detection happens server-side our edge function scans each message for cuisine keywords and dietary patterns, then auto-updates the user's profile in Postgres.
Challenges I faced
- Speech re-triggering — Auto-speak kept replaying when switching tabs. Fixed by tracking all spoken message IDs in a Set instead of just the last one.
2.. Photo → Text pipeline — Getting GPT-4o to return structured JSON consistently required careful prompt engineering.
What I learned
- Building voice-first UX is harder than it looks, timing, interruptions, and feedback matter
- InsForge made backend setup incredibly fast (auth, DB, edge functions in minutes)
- Multi-modal AI (vision + text + voice) creates a much more natural user experience
- Auto-detecting preferences from natural language is powerful but requires robust keyword matching
What's next
- Group dining mode (resolve conflicts: "Find food for a vegan + someone gluten-free")
- Swipe-to-decide for indecisive moments
- Reservation booking integration
Built With
- insforge
- openai
- react
- yelpapi


Log in or sign up for Devpost to join the conversation.