Meet Our Team!
- Joshua Delgadillo: Josh is a sophomore at Stanford and this is his first TreeHacks! He loves volunteer teaching CS in East Palo Alto and diving into the systems rabbit hole messing with bare metal code.
- Sheryl Hsu: Sheryl is a sophomore at Stanford. On campus, she’s busy building humanoid robots, mentoring younger students in WiCS, and doing LLM research in IRIS.
- Sonya Jin: Sonya is a sophomore at Stanford. She previously worked on graph machine learning at Amazon, and is passionate about ML-inspired products that help the end user.
- Trevor Chow: Trevor is a machine learning researcher at Stanford who enjoys understanding how we can do high-throughput inference with a limited compute budget. In his spare time, he likes jumping out of planes!
Inspiration
As over a thousand young adults descend on the corridors of Huang Engineering Center and frantically, they represent the two extremes of what Stanford is known for: a fiery desire to build something that helps others, and more nefariously, the "duck syndrome." Sometimes, life as a college student is just plain tough, and when that happens, we could all use a small dose of positivity and encouragement from time to time. That’s what inspired Inspira!
What it does
Stress and anxiety are deeply personal experiences. For some people, it may arise from some aspect of their daily routine: there are just some bits of life which make them feel less comfortable. For others, it might be far more intermittent, and what they need . We aim to help everyone, regardless of how this manifests itself. That’s why Inspira has three different modes of interaction.
Firstly, you can schedule events: for example, if you tell it, you're going to the gym every day at 5 pm, then it will proactively send out an encouraging text or email just before then, hyping you up so you have a great workout. Secondly, there is a real time chat feature where you can have a conversation with Inspira. Finally, for those who prefer the feeling of speaking rather than merely exchanging words on a screen, we created a bespoke speech-to-speech pipeline where the user can speak with Inspira.
Inspira also includes a personality profile for each user with information about their values, conversation history, etc., and we use this data across the range of interactions with the user. We also added the ability to learn from feedback using RLVF, a state of the art method for learning from feedback without overgeneralization. This way, the user can add more idiosyncratic feedback which is specific to a domain, such as "using capital letters when talking about weightlifting" or "calling my wife sweetie when talking about my family."
How we built it
When it comes to using artificial intelligence for improving our wellbeing, the transparency of how these black boxes influence our mental state is paramount towards gaining and keeping the trust of users. That’s why we chose to build our backend on top of an entirely open source pipeline, constructing a pipeline which begins with the open-sourced Whisper model that converts speech into text, passes through Mistral 7-B and returns to the user via the TorToiSe Text-To-Speech model. All of these are hosted by us on Modal Labs.
On the front end, Inspira uses React Native to provide a consistent experience across all platforms. Meanwhile, Convex’s real-time database connects our frontend with our backend, both by storing the message history and contextual information about the user which we use the personalize our wellness experience, and also by interacting with our API endpoints of the Modal instance.
We understand the importance of tailoring any sort of wellness and affirmation experience to the particular user. Thus, in addition to providing our models with the key personal information about the user that improves the quality of the conversation, we implemented a mechanism of having the model actually improve from past conversations and feedback.
This is an implementation of RLVF, Learning from Verbal Feedback without Overgeneralization, a state of the art technique for applying feedback to certain domains without overgeneralization.. To do this, we generated synthetic data using GPT4 and gathered sample completions from Together. From there, we trained a LoRA for Mistral based on a loss function that combined DPO (direct preference optimization) and Supervised Fine Tuning. All of this training was run on Modal.
Challenges we ran into
One challenge of having a speech-to-speech pipeline is running, chaining together and serving multiple powerful machine learning models all at the same time. We struggled to make our pipeline run efficiently enough to give the user a satisfying experience, and that meant really delving into the intricacies of how some of these models could run in parallel.
Another challenge was putting our research ideas into action: RLVF had many steps from generating synthetic data to sampling completions to doing two rounds of training with different losses to finally porting it into our deployed backend. There were many times we got stuck on a step and almost lost hope before finding a solution, as is often the case with reinforcement learning research.
Another challenge we ran into was processing audio files. We ran into strange errors where playback was intermittent and somewhat random, and debugging how our REST API endpoints interacted with the audio playback mechanism was an interesting insight into how differing media formats can stymie even the best design data pipeline.
Accomplishments that we're proud of
While chatbots are a dime a dozen, we believe that they can often feel quite impersonal, rendering them ineffective at injecting positivity into one’s life. That’s why we’re proud of being able to make a well-oiled speech-to-speech pipeline, since conversational speech feels far more comforting and natural for most people.
Furthermore, we’re excited about our approach at personalizing the entire user experience. This goes beyond the conventional methods of retrieval-augmented generation, although that itself is an indispensable part of our process. What stands out in Inspira is our integration of state-of-the art research into our app, bringing cutting edge work in machine learning into a commercially viable product in such a small period of time. Research codebases are often messy and unforgiving, and turning a paper from a mere idea on a page into a valuable experience for anyone who is having a rough day truly puts a smile on all of our faces.
What we learned
Going from a blank whiteboard to a fully functioning product in 36 hours has taught all four of us a great deal. Of course, there are the technical lessons. For example, working with the REST APIs across multiple languages taught us a lot about how we can most effectively deliver structured data across different platforms, while the complexities of managing multiple Modal instances helped us understand the intricacies of dependency management, and how having different Images can both simplify and complicate our development process.
On the machine learning side, we got a lot of valuable experience in understanding how to efficiently run inference for large language and audio models, as well as the experimental workflows which make generating high quality synthetic data easier.
Just as important, however, are the lessons we learned about how to build products. One example is understanding the importance of making the user experience as low friction as possible, such as by actively sending a message and/or email rather than waiting for them to check in into the app.
What's next for Inspira
In our minds, the defining opportunity in this era of Large Language Models is the ability for our apps to be more proactive in assisting our lives, rather than simply being passive and waiting for the user to come forward. While we have gotten a good start with our calendar integration and by continually refining our model towards the user’s preferences, we see the next step as taking this even further. For example, we could apply machine learning techniques to analyze our calendar events and the conversations to see if there are patterns about when the user is more likely to feel down, and whether there are specific triggers. In this way, we could reach out ahead of time in the future, as well as helping the user notice potential blind spots.
Built With
- convex
- gpt4
- huggingface
- modal
- pytorch
- react-native
- together
Log in or sign up for Devpost to join the conversation.