Inspiration
In the world of interior design, home renovation, and even just moving, getting a simple floor plan is a tedious, manual process. We were inspired by the idea of simplifying this. What if you could just take a quick video of a room on your phone and instantly get a basic "map" of where everything is? We wanted to create an accessible web tool that bridges the gap between complex 3D scanning apps and the simple need to visualize a space and its contents.
What it does
ArcPlan is a full-stack web application that analyzes a user-uploaded room tour video and generates a simplified 2D "blueprint" of the room's key items.
Upload: A user drags and drops (or selects) a video file of their room.
Process: The application uploads the video to a Python backend, which analyzes it frame by frame using a computer vision model.
Generate: The AI detects and finds the average 2D position of objects like chairs, beds, laptops, and bottles.
Display: The app then renders a clean, grid-based blueprint, placing icons for each detected item in its relative 2D position, giving the user a "data snapshot" of their space.
How we built it
We built ArcPlan with a full-stack, AI-driven architecture:
Frontend: A responsive single-page application (SPA) built with React, TypeScript, and Vite. The UI is styled with Tailwind CSS and uses Lucide for icons.
Backend: A lightweight API server built with Python and Flask, with flask-cors to handle cross-origin requests from the React frontend.
AI Model: We used the YOLOv8n (Nano) model from Ultralytics, a powerful and fast object detection model.
The Pipeline:
React sends the video file to a /upload endpoint on the Flask server.
Flask calls a detect.py script.
The script uses OpenCV to read the video frame by frame.
It runs the YOLO model on each frame to find all objects from a predefined list.
It aggregates all detections, calculates the average 2D (x, y) coordinate for each type of object (e.g., all "chairs" are averaged to one point).
It returns a clean JSON list of these objects and their normalized coordinates.
React receives this JSON data and renders the final ResultsScreen blueprint.
Challenges we ran into
We ran into several real-world debugging challenges that were critical to the project's success:
CORS Errors: Our React app (localhost:5173) was blocked from talking to our Python server (localhost:5000) by browser security. We solved this by installing and configuring the flask-cors library on the backend.
The float32 Crash: This was our biggest bug. The AI model's data (from the numpy library) was in a special np.float32 format. This data crashed the Flask server, which only understands standard Python float() numbers. The fix was to explicitly convert the numbers (float(norm_x)) before sending the final JSON.
The "Outlet" Problem: The YOLO model cannot detect power outlets. We built a clever prototype "hack" by telling the AI to find 'clocks' (which it can see) and then renaming them to 'outlet' in the final data.
Type Mismatches: We had to carefully sync our App.tsx props with our component props (like onUpload) to ensure the React state flow was working correctly.
Accomplishments that we're proud of
We are most proud of building a complete, end-to-end AI pipeline. This isn't just a simple UI; it's a fully functional web application where a user action on the frontend triggers a complex machine learning workflow on a separate backend. Debugging the network, type, and data format errors to finally see the real data from the video appear on the blueprint was a massive accomplishment.
What we learned
Full-Stack is Hard: Making a React frontend, Python backend, and AI model all speak the same language is complex. The bugs are often not in the logic, but in the "plumbing" between systems (like CORS and JSON data types).
AI is Full of "Hacks": The "outlet" problem taught us that you can't just ask an AI to find something. You have to understand what it can see and build creative workarounds to get the result you want.
Prototypes vs. Products: We learned that our app is a fantastic prototype. It proves the idea works. But it's not a product it's a 2D projection, not a 3D map, and the "outlet" detection is a proxy. This is a crucial distinction in AI development.
What's next for ArcPlan
This prototype is a great foundation. The next logical steps to turn this into a real product would be:
True 3D Mapping: Implement a SLAM (Simultaneous Localization and Mapping) algorithm. This would use the video to track the camera's 3D position in the room, allowing us to build a true top-down floor plan instead of a 2D projection.
Custom AI Model: Instead of using the 'clock' hack, we would fine-tune a custom model to actually recognize power outlets, doors, and windows.
Deployment: Deploy the Flask backend to a cloud server with GPU support (like AWS or Heroku) to handle the heavy processing, and deploy the React frontend to a static host (like Netlify or Vercel) so anyone can use it.


Log in or sign up for Devpost to join the conversation.