Amelia | Devpost

Amelia
Amelia
Amelia

Inspiration

Meet Amelia and see how we came up with the idea!

A set of members in the group were heavily invested in previous projects that incorporated IoT and hardware development. They really enjoyed the thrill of attempting new innovative projects that they have less experience in.

What it does

The companion Amelia is capable of speech-to-text, text-to-speech, photo-to-speech, and speech-to-photo processes.
We are giving "gemini" the ability to see, hear, talk, and walk.
Amelia is able to speak to you about landmarks all over the world. It is your own personal tour guide!
Amelia is able to follow you around using her camera and motors to see and move towards you.
She can also analyze photos that you show her! If you are confused about where a certain landmark or location is, you can show her the picture and she will tell you.

How we built it

We divided our tasks into two main sets of tasks divided by hardware and software.

Hardware The hardware component of our project was focused on building a motorized companion, which we named Amelia. Amelia was built using a Raspberry Pi as the central processing unit.

1. Motors: We used motors to enable Amelia to move around. The four motors were controlled by two separate L293D dual H-bridge motor drivers connected to an Arduino Nano. We used UART communication to communicate between the Arduino and the Raspberry Pi. We wrote a Python script to control the motors based on the inputs received from the software component.

2. Eyes: Amelia's eyes were implemented using a camera module connected to the Raspberry Pi. This allowed Amelia to capture images and videos, which were then processed by the software component for various tasks such as object recognition and tracking.

3. Mouth: For Amelia's mouth, we used a speaker connected to the Raspberry Pi. The speaker was used to output the text-to-speech conversions done by the software component, allowing Amelia to "speak".

4. Ears: Amelia's ears were implemented using a microphone module connected to the Raspberry Pi. The microphone captured audio input, which was then converted to text by the software component for processing.

The hardware components were carefully integrated to ensure smooth communication and coordination. The Raspberry Pi acted as the bridge between the hardware and software components, receiving inputs from the hardware, processing it with the help of the software, and then sending the appropriate commands back to the hardware.

Software Amelia's software is mainly written in Python, where her main features are text-to-speech, speech-to-text, translation, alongside conversation as we utilize the. Google Cloud Vertex AI: Gemini.

1. __Google Cloud API__ : We used the Google Cloud API to allow us to do live translation, alongside a text-to-speech feature. This was important for us as we were struggling to use pre-trained models as those took a long time loading and didn't work as we intended. Where we then decided to pivot to the Google Cloud API, allowing us to seamlessly implement a text-to-speech and translation feature for our project 

2. __Imaging__: we use __OpenCV__ to allow Amelia to take a picture and send that picture to be processed by Gemini. This image would then be encoded in base64 to allow Gemini to read the photo we just took and respond accordingly depending on what we ask of the AI.

3. __Speech Recognition__: This allowed us to have a speech-to-text feature, which helps us communicate with Amelia. Through this, we can take in a string of what we have been saying and throw it into Gemini to give us a proper response. Additionally, it filters outside noise and easily creates a transcription that we can feed into Gemini, who can then give us a response based on the prompts we have given it and whatever we have said to the AI in real-time.

Challenges we ran into

Our hardware team faced several software challenges. One major one was upgrading the software of the Raspberry Pi and wiring of the L293D. Our original schematic only utilized one H-bridge, but after further testing, we had to add a second H-bridge as one H-bridge was not enough to power the rover. Some issues that we ran into were stability and weight issues as the motors did not have enough torque to have the rover run smoothly. Many compatibility issues between all three operating systems being used on the same development team. With enough effort and debugging, we were able to overcome these hurdles.
Software team had difficulty utilizing the Raspberry Pi's CPU and GPU for tasks such as transcription and translation. To overcome this, we decided to leverage APIs and cloud technology. We offloaded heavy processing tasks and improved the performance of our product.

Accomplishments that we're proud of

We are very proud of the fact that we all stepped out of our comfort zones and attempted to do something new for each one of us. Faced many constraints and hurdles head-on and dealt with them one by one.

What We Learned

We are very proud of the fact that we all stepped out of our comfort zones and attempted to do something new for each one of us. Face many constraints and hurdles head-on and deal with them one by one.

What's next for Amelia

To improve her performance beyond basic functionality.
Improve her hardware enough to be capable of faster responses from user requests.

Built With

Submitted to

IrvineHacks 2024
- Winner Hacker's Choice

Created by

I did prompt engineering, wrote the firmware, and conceptualized and implemented the high level design of the LLM Autonomy idea. I also did pitching and lended support and ideas to the NLP team.

Dylan Vu
SDE I @ One Medical, CS @ UCI ‘24
Justin Nguyen
Casey Tran
Jose Gonzalez

Updates

Dylan Vu started this project — Jan 27, 2024 02:40 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.