Inspiration
Our inspiration stems from the inherent human desire to maintain bonds that extend beyond the limitations of mortality. We have witnessed countless individuals yearning for one more conversation, one more chance to express their love, seek guidance, or simply share precious moments with those who have left this world. We are inspired by the belief that death should not mark the end of a relationship, but rather a new phase where the power of technology can offer solace, healing, and an ongoing connection. Our mission is to provide a meaningful avenue for preserving and nurturing these bonds, creating a lasting legacy of connection that traverses the realms of the physical and the ethereal.
What it does
Digital Funeral Service:
We create a heartfelt video, preserving the departed individual's final words through cherished photographs, videos, and voice recordings graciously shared by their family. This special tribute is showcased at the funeral, offering an opportunity for the family to personally connect with the Al-generated representation.
Interactions Beyond death with Departed Beloved Ones:
Users can experience real-time voice calls with their departed loved ones, while facial recognition technology reconstructs the appearance of the departed, providing a meaningful and interactive connection
How we built it
The technology stack employed for our web application includes JavaScript, CSS, and HTML on the front end, along with Flask and Python on the back end. Additional technologies we've utilized encompass HumeAI, OpenAI, Langchain, gTTS, SpeechRecognition, and PyDub.
Our application encompasses an interactive chat box, designed to accept both text and audio input. Audio inputs are transcribed into text format through the SpeechRecognition API, and all data is then dispatched to the server through JavaScript.
Upon receiving the data, our Flask server creates a local text file. This file is then transmitted to the HumeAI API via an asynchronous batch request, which is facilitated through a repeated call - a necessity due to Flask's inherent synchronous nature. This operation generates a primary key, which we subsequently use to retrieve emotion scores from the HumeAI API.
These emotion scores serve two key functions in our system. Firstly, they help train the language model's prompt to produce more emotionally expressive responses in the long-term context. Secondly, they are directly utilized as part of the input in the OpenAI API, shaping the short-term responses to our users.
The GPT-4 model's output is subsequently converted into an audio file using Google's Text-to-Speech library, resulting in an AI-produced audio response. This audio file, along with the corresponding text, is displayed in the chatbox simultaneously, enriching user engagement and experience.
Challenges we ran into
- The AI response speed is slow, impacting the interactive experience negatively.
- Poor stability and slow speed of generating dynamic portrait images
- Output from prompts is very unsteady ## Accomplishments that we're proud of By using prompts and leveraging Hume.ai, we achieved stable character design through iterative testing. Hume.ai, a powerful emotion recognition interface for text, pictures, and videos, seamlessly integrates into our digital human interaction, enabling optimized user feedback based on emotional intensity. We improved the realism of dynamic face image generation by exploring and implementing facial expression-driven APIs. By adapting the source code to incorporate voice-driven expression changes, we selected the most suitable interface input for our specific scenario. To address the real-time voice transmission speed, we extensively evaluated and selected the most suitable voice API from a range of options. Despite the availability of multiple speech synthesis APIs, we focused on achieving optimal real-time performance by testing and selecting the most suitable solution. ## What we learned Technology selection and evaluation: Learn to assess and choose the most suitable technology stack based on factors like maturity, scalability, security, and applicability. Iterative development and continuous integration: Embrace iterative methods and continuous integration to enable quick iteration, updates, and efficient response to changes, ensuring high-quality project delivery. Teamwork and communication: Foster effective communication and collaboration among team members, promoting a clear understanding of roles, tasks, and shared information flow. ## What's next for SoulVerse We recognize the importance of Hume.ai's support in obtaining emotional feedback from users and capturing resonant information. To enhance the Hume.ai interface, we aim to optimize it further, leveraging its compatibility with video and voice input. To ensure a stable and efficient dialogue, we aim to prioritize the establishment of a reliable connection between the persona Prompt and the embedding document. Preparations are underway for the server-side infrastructure and data flow management to expand our market, ensuring smooth user handover with utmost comfort and accuracy.
Log in or sign up for Devpost to join the conversation.