Ai Smart | Devpost

Our Autonomous Drivebot

Inspiration

The inspiration for AI Smart (Nexhacks Autonomous DriveBot) came from envisioning the future of human-robot interaction, where controlling sophisticated hardware feels as natural as having a conversation. We wanted to demonstrate how cutting-edge real-time communication technology (LiveKit) could transform a Raspberry Pi-powered robot car into an intelligent agent that sees, listens, and responds to commands instantly. The challenge of combining autonomous navigation, computer vision, and natural voice interaction into a single cohesive system drove us to push the boundaries of what's possible with accessible hardware and modern web technologies.

What it does

AI Smart is a LiveKit-powered robotic car that provides two-way, low-latency voice conversation and autonomous navigation through a sleek web-based operator console.

The system delivers: Voice & Visual Interaction: Real-time voice conversation with a LiveKit agent, talk to the robot naturally, and get instant responses Live camera streaming from the Raspberry Pi directly to the web UI with minimal latency Optional vision description using LLM (OpenAI GPT) to have the robot describe what it sees

Remote Control Capabilities:

Movement commands: Drive forward, backward, left, right, or stop Camera control: Pan, tilt, center, and directional viewing (look left/right/up/down) Stream management: Open and close the camera feed on demand

Autonomous Navigation For Demo:

ArUco marker detection and distance estimation using computer vision. Autonomous navigation to specific objects identified by markers.

An object tracking system that recognizes:

Monster drink (Marker 0) Water bottle (Marker 1) Cube (Marker 2) Laptop (Marker 3)

How we built it

Our system follows a distributed architecture with three main components:

Raspberry Pi Agent

LiveKit Agents runtime: Core orchestration layer handling room connections and real-time communication PiCar-X motion control: Python scripts managing motor controllers for precise movement Picamera2 integration: Camera stream pipeline directly feeding LiveKit video tracks Demo mode enhancements: ArUco marker detection with OpenCV and autonomous navigation algorithms

LiveKit Cloud Infrastructure

Room and session orchestration for managing connections Audio/video streaming with WebRTC for ultra-low latency Data channels for command transmission (topic: car-control) Scalable cloud infrastructure handling all real-time communications

Web Frontend

Next.js control console with React components LiveKit JS SDK integration for seamless room connections Token-based authentication via /api/connection-details endpoint Responsive UI with real-time video rendering and command controls

Technology Stack

Backend:

Python 3 with LiveKit Agents SDK Picamera2 for Raspberry Pi camera interface OpenCV for computer vision and ArUco detection NumPy for numerical computations and distance estimation OpenAI API (optional) for vision description capabilities

Frontend:

Next.js (React framework) LiveKit Client SDK for JavaScript Modern web APIs for real-time media streaming pnpm for package management

Challenges we ran into

There were lots, but mainly: Real-time Communication Complexity: Synchronizing audio, video, and data channels while maintaining low latency required careful orchestration.

Raspberry Pi Resource Constraints: Balancing CPU-intensive tasks (camera processing, OpenCV operations, motor control) on limited hardware.

Autonomous Navigation: Developing smooth path planning algorithms that avoid oscillations.

Accomplishments that we're proud of

Achieved true low-latency two-way voice conversation; talking to the robot feels natural and immediate. Created a working autonomous navigation system that can identify objects and navigate to them independently. Built a polished, intuitive web console that makes controlling a robot feel accessible.

What we learned

Deep understanding of WebRTC and how to leverage it for robotics applications. How to interface with Raspberry Pi GPIO and motor controllers from Python. Managing multiple hardware peripherals (camera, motors, servos) simultaneously.

What's next for AI Smart

SLAM Integration: Implement Simultaneous Localization and Mapping for indoor navigation without markers Advanced Path Planning: Add obstacle avoidance and dynamic re-routing capabilities. Natural Language Understanding: Process complex, multi-step commands. Emotion Detection: Analyze voice tone to adapt robot responses and behavior