eMotical | Devpost

Inspiration

In the retail world, consumers are sometimes overwhelmed with the number of choices they face in deciding what purchases to consider and what products they might like that they have not been introduced to before. From the retail side, companies can provide consumers with a better online experience through video recommendations of new products based on emotional reactions. This can connect with the customer and help them bring the emotional aspect of finding good products in a live retail environment and translating that into an online experience.

What it does

At first, our web application displays five videos from a certain category that a company might be interested in presenting to a consumer. Once the user clicks on a video of a product they're interested in, a session begins recording the user's emotions and how positively or negatively they react to the product over a specified interval of time. Images are taken half a millisecond, and when the user chooses to terminate the session, we compile them frame by frame into a video format. We then analyze the compiled video and see if the user is Happy, Sad, Angry, etc. every second, and using our algorithm will generate an overall positive or negative reaction with these data points. A feedback report is displayed to the user and using this internal algorithm, we choose which products to show them next.

How we built it

Our backend REST APIs were completed in Python and FastAPI. For converting each frame to a video and displaying a bounding box over the face as well as the most recognized emotion for a frame, we used OpenCV. We utilized AWS S3 for storing each session as a series of images in a specified bucket, AWS Rekognition to utilize pre-trained and customizable computer vision models to extract emotional insights quickly from each frame, and AWS Lambda functions to provision resources quickly in a fast-paced environment. On the front-end, we utilized React.js and took advantage of many browser APIs to record a user session.

Challenges we ran into

This project had some hefty challenges to overcome and resulted in many new topics for our team to learn. We first struggled for a few hours to figure out the best implementation for translating a browser capture into a viable format to be received by the backend. At first, we chose to directly record video until the user paused the video they were watching, but browser limitations and latency issues prevented us from exploring this solution. For our purposes, we found it more useful and easier on the browser to capture as many as frames as possible and upload each image to an S3 bucket, and subsequently fetch these images on the backend when the session was concluded. For most of us, this was also our first time working with computer vision and some challenges came with detecting bounding boxes and how to best capture emotions before eventually settling on Amazon Rekognition. Due to the large number of frames required to download from the bucket folder for the given session, we realized it would be necessary to utilize parallel processing to download a large number of files at once and reduce latency response time.

Accomplishments that we're proud of

We were proud to build an algorithm to recommend products accurately based on the user's emotions throughout the session and display a new video list matching those results. Our neat UI was also able to hide the session capture behind the scenes and wait until later to show a video with the session playback and an easy-to-view chart with results of the user sentiment throughout the process. We're also proud of how many AWS services we were able to utilize; they all connected very nicely to help us implement new functionality quickly as well as take advantage of machine learning/data storage tools that Amazon has to offer. Regarding computer vision, as we spoke about in the last section, we were all new to this concept and being able to display bounding boxes and text throughout the live video playback was a great accomplishment.

What we learned

We learned a lot about the internals of machine learning, image recognition, as well as utilizing a broad use of AWS products to fulfill the data points necessary to feed our recommendation algorithm.