Project Description

The Problem

We've all been there: sitting in a restaurant, staring at a wall of text.

"What does the 'Chef's Special' actually look like?"
"Is this a big portion?"
The Reality: We eat with our eyes, but physical menus are stuck in the 1900s. You shouldn't have to copy-paste dish names into Google Images one by one just to decide what to eat.

Menulator digitizes the physical world.

Snap a photo of any paper menu.
AI parses the text to understand dish names, descriptions, and prices.
The Visual Engine (powered by Pexels API) instantly finds high-quality reference photos for every single item.
The Result: The boring paper menu transforms into a rich, scrollable, "Deliveroo-style" feed on your phone.

We built a seamless pipeline using React Native and Expo.

Image Capture: We use expo-image-picker to capture high-resolution menu photos.
The Brain (GPT-4o): The image is sent to OpenAI's GPT-4o Vision model. We engineered a system prompt to strictly extract structured JSON data (IDs, names, prices) from the raw image, handling complex layouts and fonts.
The Visuals (Pexels API): Once we have the dish names, we asynchronously query the Pexels API to fetch mouth-watering reference images for each item in parallel.
Local Recommendations: We also added a "Local Food" feature that uses Google Places API combined with GPT reasoning to find authentic spots nearby based on your current craving.

Hallucination vs. Reality.

The Problem: Initially, the AI would try to guess what the food looked like or fail to read fancy cursive fonts.
The Fix: We had to refine our system prompts to be strict about JSON formatting. We also had to implement a robust error-handling fallback so the app doesn't crash if the AI misses a price.
Performance: Fetching images for 20 menu items simultaneously caused lag. We optimized this by using efficient React rendering and image caching.

The "Visual Pop": It is genuinely satisfying to snap a picture of a boring list and see it turn into a colorful, visual menu in seconds.
GPT-4o Integration: We successfully implemented a multimodal AI pipeline (Image -> Text -> Structured Data) entirely within a mobile app.
Cross-Platform: The app runs smoothly on both iOS and Android thanks to our Expo architecture.

Prompt Engineering is UI: We learned that the quality of our app's UI depends entirely on how well we prompt the model to structure the data.
API Orchestration: Managing three different APIs (OpenAI, Pexels, Google Maps) requires careful state management to ensure the user isn't stuck watching a loading spinner forever.

Dietary Filters: We plan to update the AI prompt to auto-tag items as "Vegan", "Spicy", or "Gluten-Free" so users can filter the list instantly.
AR Overlay: Moving from a "Scan & List" view to a live Augmented Reality view where food photos pop up directly on top of the physical paper.

I worked on the frontend, focusing on the user interface of the React app. I explored both openAI and claude for our AI parts, and decided on openAI. I also explored eleven labs but it may not be in use in the MVP.

17JYewlett Yewlett
Sharvil K
Jakub Kisielewski
Demir Khan
Armaan Nagra
1st year CS student @ The University of Warwick. Love building stuff.

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.