Buildr | Devpost

Struggling to pick out parts
Segmenting training data
PC Parts
#lovingit

Inspiration

Since we were kids, our team has been building PCs. There’s something really satisfying about working with your hands and building something powerful and useful.

Uno

But as fun as PC building can be, the process can get complicated fast. With thousands of different parts and product options, as well as hard-to-validate component compatibility, actually assembling PCs can feel overwhelming — even for people who’ve done it before. Even in preparing for this hackathon, we spent ages researching part compatibility (see below) Our shared experience and challenges inspired us to rethink the entire PC-building experience. The capabilities of real-world, multimodal AI are only just beginning to emerge. We believe that AI will transform the hardware world — from microcontrollers to aerospace to car engines — and we believe *that revolution should extend to how we interact with personal hardware, too. *

What it does

Our project reimagines the PC building experience with an AI-integrated workflow that helps users at every stage of their PC building journey.

PC Domain Knowledge

Our AI system is equipped with PC-specific knowledge thanks to the data pipelines we built to scrape part information, PC building tutorials, and more. Using Firecrawl, we were able to scrape PC Part Picker for up-to-date information on part costs, compatibility, and specs. With the TwelveLabs API, we scraped Youtube for tutorials and segmented specific clips of common tasks (e.g. installing a CPU, or applying thermal paste). Based on this gathered information, we were able to build a RAG database with LanceDB (Vector DB) and the OpenAI API (Embeddings).

Data Scraping Architecture Diagram

Conversational Capability

Using Deepgram (STT), the OpenAI API (LLM Processing), and Cartesia (TTS), we also built a conversational AI which is able to leverage the gathered information in our database. Orchestrated via LiveKit, this allowed us to build an agent which has voice activity detection, uses Deepgram’s Flux model’s eager end of turn, and is able to use function calling to retrieve data from our database.

LiveKit Architecture Diagram

This system is able to answer questions with accurate information, suggest useful tips, and be a helpful companion no matter what level of experience someone building a PC has.

Realtime Vision and Understanding

In addition, we wanted to make sure our platform wasn’t just great at finding parts, we also wanted it to be a platform for putting those parts together. To do so, we built an image training pipeline that was capable of segmenting relevant parts at very high speed. A strong contender for this is Meta SAM, but SAM is bulky, somewhat slow, and consumes too many resources on edge devices to be practical on deployments. In order to optimize performance, we distilled Meta SAM’s output (along with human verification) to a small Yolo11 model, which requires significantly less resources for inferencing and runs an order of magnitude faster. This enables us to apply the same enterprise-level pinpoint segmentation accuracy as Meta’s new cutting-edge models without needing a high-performance cluster. Through this process, we were able to train a custom vision model that understands PC building, communicates conversationally (voice AI), and uses realtime computer vision to guide the building process without being constrained by computational resources.

Mobile Interface

Lastly, we integrated all of the systems that we had built into a user-friendly, mobile-first application that could be easily accessed and used while somebody was building their PC. Our frontend was built with Next.js, Shadcn, and Tailwind for a modern, responsive feel.

Our project transforms the PC building experience from an intimidating technical challenge into an accessible conversation. It empowers users to navigate component selection and part assembly with clarity and confidence

How we built it

Frontend in Next.js, React, Tailwind, Shadcn
Voice AI agent using Livekit, deepgram (STT), chatgpt (LLM), cartesia (TTS)
PC Part data using data scraping (Firecrawl (PC Part Picker), TwelveLabs (YT))
RAG over part data using LanceDB
Backend API servers for part data and RAG via Python and FastAPI
Video Segmentation and Processing via Meta SAM 3, YOLOv11

Challenges we ran into

Building our AI-powered PC assistant was as challenging as it was rewarding.

On the software side, we spent significant time refining our computer vision-based component identification system, fine-tuning it to produce consistent and accurate classifications under a variety of conditions.
We had to learn how to use Firecrawl to programmatically gather part information
Integrating the many systems we had built together

Accomplishments that we're proud of

Integrated AI agent with data gathering, contextual understanding, conversational AI, and realtime computer vision
Learned how to work with Firecrawl and RAG
Did not spill food on our PC parts
Sharing the potential for AI in hands-on, real world projects — making PC building more accessible

What we learned

How to scrape and segment videos
How to clean up scraped data and isolate useful properties
Using PyTorch and X-AnyLabeling to segment out components for training data
How to implement a voice AI agent

What's next

Integrate a section to include peripherals as well, offering real-time guidance for mouse and keyboard troubleshooting
Add further logic to verify broader component compatibility (socket types, power requirements, etc.)
Build complete PC configurations based on budget/use case

Deepgram Track

Our project leverages Deepgram's Flux streaming STT at the heart of a hands-free PC building assistant, enabling users to receive real-time guidance while their hands remain on delicate components. The composite voice agent uses Deepgram STTv2 with eager end-of-turn (threshold 0.4), allowing the system to begin reasoning before the user finishes speaking—critical for natural, conversational interactions during physical assembly. The agent orchestrates five autonomous tools: querying a RAG-powered CPU knowledge base for specs and pricing, searching component databases, verifying hardware compatibility, and intelligently tracking assembly progress to guide users through the correct installation sequence.

The architecture prioritizes low-latency, interruption-friendly interactions by pairing Deepgram streaming STT with Silero VAD for precise turn detection and LiveKit noise cancellation for reliable recognition in noisy environments. The voice agent works alongside a WebRTC computer vision system that streams the user's camera through YOLO segmentation, overlaying detected components directly onto the video feed. This multimodal approach—voice input, spoken responses, and visual AR guidance—creates a truly hands-free experience where users can ask "what should I do next?" and receive both spoken instructions and real-time visual highlighting of the relevant component.