SharkProof | Devpost

Inspiration

As students, we have had to face countless interviews, whether it be for school clubs, research positions, or those coveted internships. A majority of these interviews are held virtually, through zoom or automated platforms like HireVue. These interviews and the days leading up to them are often filled with nervousness and anxiety. Often it is also hard to schedule mock interviews with engineers in positions that are similar to those that would conduct interviews. Mostly, students end up winging this process and achieving poor results. Having worked in early stage startups before, we saw a similar trend in early stage founders who were applying to incubators like YCombinator and TechStars or even raising their seed and Series A rounds. The resemblance of the problem allowed us to think even more about the problems that founders face with the biggest being a fear of the actual investors themselves. So we decided to solve this problem through repetition and muscle memory.

What it does

SharkProof is an all in one tool to perfect your interview skills and prepare for your best interview performance. It has two main target audiences, students and founders. Currently, we have focused the tool primarily for founders, especially those founders who have upcoming pitches or interviews with VCs and incubators. Once a founders logs on to the platform, they are allowed to choose a persona that will interview them. This persona can be another founder, such as Elon Musk, or a potential VC like Peter Thiel. Essentially it will let you choose or create the persona of the person that is actually going to be interviewing you in the real world. Once the persona has been selected, you will enter the interview room. In the interview room, the persona you have selected will interview you based on their personality (Mark Cuban might ask for more financial details while Oprah might ask for more impact related information). On top of this, the interviewer’s voice will be matched to their actual voice, allowing you to truly imagine that you are speaking to the actual person. And once your interview is completed, you will receive feedback detailing exactly what you need to work on and what you did well on. And when we say detailed, we truly mean detailed. On top of an overall interview score out of 100, you will be told what emotions you are depicting the most throughout your interview as well as emotions for specific questions. And the same will apply for your hand gestures and facial expressions. Suppose you are touching your hair too often or talking while not making eye contact, you will know exactly when that occurred and how to fix it.

How we built it

We built this using technologies from Hume AI, Cartesia, Google’s Gemini Model, Whisper from OpenAI and Groq. For the backend, we built it using a Flask backend with Python. We used Flask because it allows for us to setup the backend in Python which was ideal since we were making many LLM and API calls to AI models. For the frontend, we decided to go with React because of its component reusability features, its integration abilities with Flask, and its efficiency due to the Virtual DOM features. The two foundational models we use are Hume AI’s EVI (voice-to-voice) model and the Facial Expressions Model. The EVI model takes in the user’s audio input and converts it into an audio and text embedding. It then uses Hume's emotion mapping technology and identifies scores for prevalence of 48 different emotions in the interview. We do this in a sentence by sentence method so your emotions are accurately tracked across the interview. This is then used by Gemini within the Hume model configuration to understand and create a response as well as new follow-up questions, which are outputted as the voice of the interviewer. Our second model, the Facial Expressions Model is used to identify the emotions of the face of the interviewee and then mapped onto the correct portions of the audio so that we can make sure that the voice and the facial emotions are in line. Next we sent all of this data to the backend as soon as the interview is completed and use a Llama 3.1 model that we feed with a custom interview scoring algorithms based off of weights we decide as well as a holistic response quality analysis by the LLM model and output feedback and an interview score. This is also then displayed graphically for better understanding.

Challenges we ran into

One of the biggest challenges we ran into was coordinating and measuring the data from the Hume models. This was because the voice-to-voice model would give emotion outputs in the sentence to sentence interval while the facial expressions model would only accept videos in a 5 second batch. So recursively splitting and processing the video in 5 seconds, which we also did using Whisper and mapping that onto the voice data was quite difficult because there was no simple way to achieve that. This led to some completed dictionaries nested within dictionaries and many calls in the backend to process the data accurately. Another challenge was to come up with a custom algorithm to determine which emotions and which facial expressions should have what sort of weights in our model and how much of an effect they should have for our overall score’s equation. And lastly, we had a little bit of trouble with making sure the web sockets were routing our traffic correctly and opening and closing when we wanted them to.

Accomplishments that we're proud of

Some major accomplishments that we were proud of were actually integrating AI personas within our application with voices that are super realistic. We were also really proud of taking on the challenge of essentially merging the two Hume models which was a major challenge that even the Hume team is currently working on. And lastly, we thoroughly enjoyed making a project that we ourselves would use to improve our interview preparation skills in the future.

What we learned

We learned the importance of having a deep understanding of the model architecture and the input and output sequences of models for projects that are closely related to maximizing the model’s potential. We also learned about handling binary file data from frontend to backend and vice versa. We got the chance to delve into web socket programming and understand the importance of clear communications between the data types that are handled in the backend

What's next for SharkProof

Something we really wanted to implement but didn’t get the chance to do was create AI deepfakes of the interviewers. This would offer a complete persona of the interviewer, essentially building muscle memory for the founder which could be tapped into when they go into the actual interview. One other feature we wanted to incorporate was the ability for students to add their resume and job description, allowing for the interviewer to be pre-aware of their skills and experience level and ask questions based on those inputs. That would lead to the ideal interview simulation environment.