Inspiration

The average American household has over 30,000 items. Our neighbors have the same drill, same board games, same books as us: and they all collect dust. This is wasteful, expensive, and isolating.

Sharing used to be the norm, but today's efforts to revive it: Buy Nothing groups, tool co-ops, and our favorite, Little Free Libraries: keep fizzling out. We think this is because someone has to care: telling people to fold the picnic blankets, checking and tracking conditions of the board games, managing the whole world of logistics. While more people than ever crave community, fewer than ever have the bandwidth to organize it.

AI has made every person more insular. We asked: what if AI facilitated physical community instead?

That's why we built Stuart.

What it does

Stuart is a purpose-built AI agent sitting inside a tiny library, whose entire job is to care.

You walk up, tap your phone, drop off a book. Stuart sees it, tracks it, remembers it. When your neighbor comes by, Stuart tells them about that book: maybe shares the review you told him, maybe cracks a joke. Stuart keeps things organized, and builds connections between people who share a building or street but have never shared a conversation.

Stuart allows us to share far more than just books. With someone to focus on and care about objects' conditions and who has what, it becomes seamless to share so much more: board games, electronics, hiking gear, toys, all the stuff you wish everyone didn't have to buy separately to use once.

Stuart is modular and customizable to the wants and privacy preferences of your community. A co-working space, an apartment lobby, a neighborhood park: each gets a little Stuart that fits.

How a session works: You approach, and the depth sensor wakes Stuart from sleep. You tap your ID, NFC authenticates you, and the servo unlocks the door. You talk: Stuart greets you by name through a warm, friendly voice. Tell him what you're dropping off or looking for. Stuart sees: an onboard camera identifies items (books by title, board games by name, condition assessment) using Claude Vision. Stuart remembers: every item, every interaction, every review is logged. He knows what's inside, who left it, and what they said about it. You leave, and Stuart locks up, dims the lights, and goes back to sleep until the next neighbor arrives.

Stuart's personality is a jolly community grandfather: slightly gossipy, full of fun facts, calls people "sport" and "champ," and will absolutely tell you it's been "47 hours since anyone borrowed Catan!"

How we built it

Architecture: Cloud-Centric Design

Stuart's key architectural decision is that all intelligence lives in the cloud. The onboard microcontroller is a "dumb I/O device": it captures audio, snaps photos, controls the lock and LEDs, and streams everything over WebSocket. This means we can iterate on Stuart's personality, conversation flow, and item recognition without ever reflashing firmware.

The signal flow goes: the onboard microcontroller (handling sensors and actuators) connects over WebSocket with JSON and base64 audio/images to a cloud relay, which connects to the cloud backend: ElevenLabs Conversational AI for voice and personality, Anthropic Claude Vision for item identification, and a PostgreSQL database for users, items, transactions, and memories.

Hardware

We designed and built a custom IoT station from scratch.

At the heart of the station is a Raspberry Pi, paired with a USB camera for visual item identification. Audio is handled through a microphone and speaker. A servo motor with a 3D printed arm controls the door lock, and a depth sensor detects when someone approaches the station. Two WS2812B NeoPixel strips line the interior, lighting up to indicate where books are located on the shelf.

Software

Firmware (Python): runs on the Raspberry Pi and handles depth sensor monitoring, NeoPixel LED control, servo lock control, camera snapshots, and WebSocket communication. A laptop currently handles microphone input and speaker output.

Cloud Backend (Python/FastAPI): bridges the Raspberry Pi WebSocket to ElevenLabs Conversational AI. Forwards audio bidirectionally, handles tool calls (camera capture, inventory lookup, lock control, LED patterns), and routes item photos to Anthropic Claude Vision for identification.

Testing and Simulation

Testing was done hands-on throughout the build process, iterating through multiple rounds of end-to-end sessions as the hardware and software came together. Additionally, we built some test tooling to be able to test components in isolation and to speed up workflows.

Challenges we ran into

The older Raspberry Pi made it difficult to reliably maintain a Bluetooth speaker connection while running all other processes simultaneously — which led us to offload audio to a laptop instead.

Item identification wasn't always accurate on the first try. Stuart handles this gracefully by prompting for another photo when uncertain, which proved reliable in practice.

Live demo conditions introduced their own challenges: a noisy environment and a crowded WiFi network with many devices competing for bandwidth both made presenting challenging.

What's next for Stuart

We have a working prototype today, and we're planning to deploy at the MIT Media Lab.

Our vision: every apartment lobby, co-working space, and neighborhood park gets a little Stuart that fits their community. Stuart makes sharing as effortless as it used to be: because now, someone cares.

We're also working on a version 2 that integrates lower level electronics, now that we have more time to optimize the setup.

Built With

  • anthropicclaudevision
  • depth-sensor
  • elevenlabs-conversational-ai
  • neopixels
  • postgresql.
  • python
  • rasberrypi
  • servo-motor
  • usb-camera
  • ws2812b-neopixels.-ai/cloud:-anthropic-claude-vision
Share this project:

Updates