HarmonicHomes (aka Roomy)

Inspiration

Smart homes are not a seamless experience. All smart home enthusiast love the idea of automating things around the house and being able to easily control their home with their voice but, we often get disappointed. Google Assistant, Siri, and Alexa all look for specific phrasing, and if we don't say the right thing it may not work at all. On top of that, we need to download many apps to manage all our smart devices. Rather than use dashboards we should be able to use our voice. And with the power of AI our smart home should understand us and just work. Lastly, wouldn't it be great to have a smart home handle more complex interactions for us by using the devices in our homes? What if there's an intruder? I may imagine the smartest home locking the door, playing sirens, and turning the lights to police colors. No smart home can do that yet but, what if it could?

What it does

Harmonic Homes leverages AI agents in a tool former (https://arxiv.org/abs/2302.04761) configuration to interpret natural language and control the simulated smart home devices. Harmonic Homes identifies if a user's request can be handled by an existing device command and executes the appropriate ones. If the user requests a more complex interaction the smart home will identify a relevant function it has created or it will create an entirely new function. Essentially, Harmonic Homes aims to make smarthome systems even smarter.

How we built it

To build the user-facing application, we created a NextJS application. We've designed the UI using Tailwind and took components from Aceternity to make user interfaces that are both intuitive and easy to use. Then, the application workflow works as follows: users would input a query/command that would be sent to the back

Ultimately, the heart of HarmonicHomes lies in the backend. First of all, our backend consists of three agents created using the uAgents framework: the orchestrator, the tool-former, and a tool-verifier.

The Orchestrator

The Orchestrator agent handles simple function-calling for functionalities that already exist in one's smart home. We've defined various general functionalities, such as "change_room_light", that the orchestrator both understands and recognizes when given a prompt. We utilized the Gemini API, specifically the Gemini Pro model, due to its function-calling functionality. We can respond to user responses promptly, and pass in specific desired parameters to the functions that already exist. This way, simple commands can be carried out instantaneously

The Tool-Former

That being said, there are a lot of potential prompts and commands that users may want to pass into their smart home; for example, a user may tell their home devices "An intruder arrives!". In that case, our homes should identify a complex behavior, such as turning sirens onto the speakers across various rooms, change the lights to emergency-based colors, etc... In another example, a user might want their house to be Christmas-themed: this means the lights in the rooms would be red, green, and white. The speakers, then, would play Christmas music. These are complex behaviors that the orchestrator will NOT know beforehand; thus, it'll send the user's complex command to the tool-former. The tool-former will know all of the simple, pre-defined smart-home actions, and build new functions/behaviors that combine the pre-defined functions. For example, a new function "handle_intruder" would combine simple actions (turning the sirens on, setting emergency colors). This newly created function, then, would get added to the list of smart-home functions. This way, users can REUSE their previously created complex behaviors, such that the tool-former doesn't need to "remake" these functions.

The Tool-Verifier

Code doesn't always work the way we want it to. This especially applies to code generated through the Gemini API; we need a way to confirm that the function works before we add the complex behavior(s) to the list of "known functions". Otherwise, the orchestrator would attempt to run a complex function that doesn't work. To solve this issue, we created a third agent: the tool-verifier. In short terms, we've automated quality assurance. Think of it as one AI agent grading the output of another AI agent. When the tool-former creates a new function, we begin a feedback loop between the tool-former and the tool-verifier. The tool-verifier is equipped with documentation on how functions in the smart home work. Thus, the feedback loop will persist in communication, forcing the tool-former to remake the new functions until the tool-verifier is satisfied with the output.

Challenges we ran into

The main challenge we ran into was rate-limiting. We really loved Gemini's function-calling capabilities and wanted to use it to carry out all of our smart-home functions (ex. turn on all of the lights, change all the lights to red). However, we ran into issues because each smart-home interaction was a call to the Gemini API; thus, complex behaviors, consisting of multiple smart-home interactions, would be rate-limited (429 errors). The error, however, logged to our agents was "index out of bounds"; this didn't make sense to us.

Additionally, many concepts relating to agents were previously foreign to us; we had to overcome an initial mental block of understanding how to integrate agents into our workflow and how exactly we could implement the ideas described in the research paper.

Accomplishments that we're proud of

We're really proud of the tool-former functionality; we've made agents that created new functions that the smart-home recognizes AND can reuse. We're also super proud of creating a working feedback agent - one that could essentially verify the correctness of the code generated by the fool former agent, and have it regenerate code until it gets it right.

What we learned

We learned a lot about prompt engineering like few-shot learning, role, and effective use of context windows.

What's next for HarmonicHomes

Due to the fast pace of the hackathon setting, our team was forced to push off some features in order to complete the main functionality as fast as possible. One feature we wish we could have implemented is automatic documentation generation for newly created smart home commands, allowing our gemini model to instantly expand its context and capabilities according to the user's commands.

Built With

  • aceternity
  • fastapi
  • fetchai
  • gemini
  • nextjs
  • python
  • tailwind
  • uagents
  • uvicorn
Share this project:

Updates