Inspiration

We have been recently inspired by the advent of what some are calling "Large Action Models" which can use natural language to perform a given task, like play your favorite song on Spotify. Rabbit R1 has been a recent proponent of this new paradigm.

What it does

It is an AI agent that can look at the user's web browser and perform task specified by the user through natural language such as "create a new Tweet."

How we built it

We built it using Meta's Segment Anything model to get the model to understand the web browser. We also leverage OpenCLIP which is an open source multimodal model that can bridge images and text. We use the different components of the web to allow the model to decide the best actionable steps based on the user's query.

Challenges we ran into

We ran into numerous challenges. We first wanted to use decision transformers but did not have the data, compute power, or large dataset to accomplish it. We then began to figure out how to use the current open source models to piece together a working proof of concept. As an added issue, it is difficult to control inputs and output devices due to security and privacy concerns from web browsers.

Accomplishments that we're proud of

We built a model that can understand the different components of a web site and with relatively substantial accuracy determine what action to take based on a user's prompt. We had no idea how we were going to do this when we first talked about it, but we are proud of the progress we made and how feasible the solution became.

What we learned

We learned many valuable skills such as running open source models locally and gained deeper understanding for how these models work cohesively to accomplish a task.

What's next for Project Jarvis

We will be continuing this project by using a more modern architecture with decision transformers so that we can chain actions together to give the model power to perform more complex tasks. We can also integrate speech-to-text for a seamless user interface.

Built With

Share this project:

Updates