CharacterAV | Devpost

Inspiration

I took my first Waymo ride earlier in the week and it was fun and exciting. But after a while the excitement died down... and it gets pretty robotic and boring.

We do believe in the future where robo taxi is the majority, working in harmony with other AI agents in the army of our AI assistants. It will be extremely boring if all of them have the same robotic characteristics. More importantly, if none of them know how to read the room!

What it does

Our CharacterAV detects a passenger's facial expression, pose and voice tone. It then uses its social cues skill to translate those behaviour and use it as a reasoning to whether it should initiate the conversation with the passenger or not, and if yes, when and what to say.

And of course, we use character of the user's choice. In our demo, we picked Harry Potter.

How we built it

We use an open source Voice Activity Detection model running via Onnx in combination with a local Whisper modal run via WebGPU. This combination allows our agent to understand what people are saying, when to speak, when not to speak, and more.

We also use LlaVA (Llama with a vision encoder attached acting as a ViT model) to detect the rider's emotions, actions and more to inform those decisions.

We also use an RVC model to enable having multiple voices

Challenges we ran into

Transcription is hard, especially in the browser. Getting Whisper to run via WebGPU took some novel bug fixes for Next.js

Accomplishments that we're proud of

How much of the pipeline works on device. In the future the entire pipeline could be on device for privacy reasons.

What we learned

LLMs are excellent social agents when given the correct inputs. Our demo is capable of extremely nuanced interactions

What's next for CharacterAV

I work at Zoox, Amazon's self driving car company, and I specifically work on Human Machine Interfaces. I'm going to surface our work with the team and see if it can't be developed further

Built With

llava
llm
next.js
onnx
rvc
webgpu
whisper

Updates

Fai Sukontanit started this project — May 12, 2024 03:01 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.