Lens | Devpost

the what
the how

bffr you have wanted secret glasses that give you mysterious AI powers too

What it does

Lens allows you to extend your cognition. It can passively monitor for opportunities to speak or actively respond to questions. Additionally, it can maintain a long context for memory retrieval.

How we built it

see this doc and this presentation

Lens was built through a series of hacky methods to circumvent Meta’s limitations on their glasses. Since they don’t provide an SDK for their glasses, we needed to implement our methods to capture live audio and camera from the glasses. To do this, we created a Messenger Bot which the glasses call when operating. The Messenger Bot uses a Chrome extension to stream the media and audio to our Websocket server which receives the audio chunks and forwards it to OpenAI’s transcription service. From here, the response is streamed to OpenAI’s completions API to generate a response to the user’s query, along with any context. From the response, we use Cartesia to generate speech from the text and play it through an Virtual-Cable input device on the machine to transmit the audio back to the glasses. We use a pub-sub architecture to manage asynchronous, live data flow throughout our application. This means our glasses don’t have any set “request” or “response” periods – it’s continuously monitoring what it sees, and what you say, to give a response.

For passive detection, a “thinking” chat request is created at a set interval in the backend. The agent can choose to speak up, or not say anything at all. If you instruct the agent to remind you of something in the future, it can store this information in the context.

Challenges we ran into

How do we access the live camera and audio capture from the Meta Raybans without an SDK? We set up a video call system to circumvent Meta’s restrictions on accessing camera and audio streams from the glasses. The Meta Raybans calls our Messenger Bot account which streams the audio and video to our API through a series of hacky methods. How do we create and manage context/memory? Store user queries and responses in memory and attach them to each request to the API. How do we communicate between the Meta Raybans and our API? Lots and lots of communication layers. Pub Sub :-) How do we use the captured audio to generate a response? Capture the MediaStream from the Chrome tab from which the Messenger call is running and stream it to the API through a WebSocket connection. The How do we handle terrible documentation from OpenAI? Thug it out and play around until it works. How do we interrupt responses if a user starts speaking? Responses are streamed back to the user via layers of queues: the first layer streams the output tokens from OpenAI to Cartesia, and the next layer streams the synthesized speech from Cartesia to the user. If the user starts speaking, we flush these queues. How do we reduce latency from user -> transcription -> output -> audio -> user? Experiment with other models and audio chunking methods. Wait for Cartesia or ElevanLabs to release a realtime text-to-audio model.

Accomplishments that we're proud of

Building a hacky solution that works Successful usage of our own models that have far superior capabilities than Meta’s trashy AI in their glasses

What we learned

In this project, we learned about the effectiveness of the publisher subscriber model for audio streaming and processing in real-time. Also that Meta does not like their data getting scraped.

What's next for Lens

Acquisition by Meta!!! The First Open-Source Smart Glasses™ (or a cease and desist) Easier way to get context -> move to using videos Find way around streaming through messenger calls ultimately

Built With

cartesia
fastapi
openai
pubsub
react

Submitted to

HooHacks 2025

Created by

malding and driving to cookout.... and crashing out...

and many hacky workarounds

Tony Tran
- malding with tony
- commiserating with albert
- being in the video
- owning the glasses
- pubsub !! and long context. for multimodal realtime ai

Michael Fatemi
pub-sub nightmare!
emotional support animal x.x

Albert Huang

Updates

Michael Fatemi started this project — Mar 30, 2025 08:54 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.