VeriSight | Devpost

2 humans detected
main menu
deep sleep
massive breakthrough - big moment

Inspiration

The only way for people with schizophrenia to know if they are hallucinating is to either be skeptical of what they are seeing, or to have an external force tell them.

A common treatment option that patients with schizophrenia utilize is service dogs. A service dog can be tasked to "greet" somebody that they suspect is a hallucination. Another trick that has been discovered is to open up your phone’s camera and point it at the thing the patient thinks could be a hallucination.

What it does

Our app is a bridge between these two external forces, but built inside of mixed reality.

We have two input methods to validate your environment. With the click of a button, sense your surroundings and detect all of the people present in the space around you. A blue outline of a person is projected on top of all detected people. No outline could mean that what the user is seeing is not actually there! The other is an AI agent that can describe the environment around you. Simply hold down the trigger, ask it any question regarding what you're looking at, and get a response in seconds.

How we built it

We used a combination of Unity's Passthrough API and Gemini's Speech to Speech Model.

To implement the overlays, we modified the Sentis agent to only detect humans out of the 80 objects that the model was initially trained on. We then used a combination of the positions of all the people the model detected, and the y-axis of the boundary floor to instantiate a visual on top of every person detected in the environment. This is all activated by pressing the A button on your right controller.

To implement Google Gemini, we had to hijack both the passthrough image data and voice recording data from the Quest headset. We then transferred that data to a server running the Gemini API key. We then ran the model using both the prompt and the photo as the inputs. The response gets converted into the Gemini Speech-to-Speech Model, where it gets transferred back into the Quest and gets played through the headset. This provides an experience very similar to Meta Ray-Bans — you can look anywhere in your environment and ask any question about it.

Challenges we ran into

Unity just released their Passthrough APIs a couple weeks ago. This means that there is a laughably small amount of documentation for this newly released tool. Trying to navigate it was like walking into a minefield blindfolded. We had to learn the tool and its limitations without them being properly documented. When we did figure out how it worked and started implementing, we quickly realized that the model running locally on the headset caused a flurry of performance issues. These issues led to times where the overlays didn’t properly line up with the positions people were detected. To fix this, we faintly made the outline data visible when the model detects a person. It's faint enough that it is not overly disruptive and accounts for the times when the outline is slightly misaligned.

Implementing the Google Speech-to-Speech model was also a great challenge. Even though Meta released their Passthrough APIs and we are now granted permission as developers to access camera data, it’s still hard to pull that data from the headset and transfer it to a server to be processed. When we eventually figured out how to pull that camera data and communicate with the API from a C# script, we still were not receiving it on the server side. Unfortunately, it took us 14 hours to realize that the headset we were using to test it was not updated to the latest Horizon firmware, making the implementation impossible. After the update, the model worked as intended. Ironically, we are still running into some problems with it hallucinating things that are not actually in the environment. We have yet to fix this issue.

Accomplishments that we're proud of

We were able to accomplish our vision without any major pivots. There were times where we almost lost hope and moved away from the idea space, thinking the tech was not there yet. However, our passion for this problem space and our belief that a tool like this could truly change somebody's life with schizophrenia led us to push through. The work has paid off, and we are incredibly proud of what we have accomplished in the hackathon’s timeframe.

What we learned

We learned its very difficult to make things with new API's without proper adoption or documentation.

What's next for VeriSight?

We designed this app with the foresight that the capabilities of the mixed reality mode in the Quest will one day be in the form factor of something as compact as glasses. This app could be used in day-to-day life as a non-intrusive, unmedicated treatment for schizophrenia symptoms. Our team is passionate about this problem space, and we are discussing continuing development as new Santis models get released from Meta.

Built With

c#
geminilive
python
unity
unitypassthroughcameraapi
unitysentis
websockets

Submitted to

ImmerseGT 2025
- Winner ImmerseGT: Best Overall

Created by

I was in charge of creating the project via oculus passthorugh API, designing front-end user interaction that displayed real-time object detection using Sentis, and testing the backend with Gemini Live through our own python server, making VeriSight support verbal and spatial inputs simultaneously.

Yiming (Danny) Huang
I conceptualized the idea and implemented the frontend and backend for the human detection feature using both Unity Sentis and the Passthrough API. I also designed the human detection interface and user experience.

Lorenzo Ametrano
game dev & xr developer @ usc
Ming Jin Yong
Justin Zhu