Aly.so
Inspiration
We were excited about agentic LLMs and fortunately hackathons are a great place to have a solution and go looking for a problem. We have been surprised by the lack of agentic LLM products in the consumer space and decided to find out why they haven't been successful by struggling to make one ourselves!
Infrastructure
We aimed to be entirely self-hosted. We tried a large variety of local LLMs and settled one Mixtral 7b Instruct v0.2. To train it, we wrote a script to automatically create workflows with a human providing the actions that Aly would take. In the future we could also generate synthetic data with GPT4.
We used NodeJS, Bun, Puppeteer, Jupyter Notebooks, Langchain, and Chroma (though the rag stuff was too flaky for the final demo).
Challenges
As usual, there has been no shortage of challenges!
We found a bug in how Puppeteer selects aria labels so we had to include major kludge to even be able to click links on the Google results page.
We extensively prompt engineered our agent but it seems that no amount of prompting could get Mistral to use a search box. For example, given the task of buying bread from Target it would go to the website, blatantly ignore the search bar, and start clicking between categories in search of bread. It seemed that Mistral has more faith in recommender systems than I do. We planned to fine tune to nudge the model towards our preferred strategies--surprisingly it was great at maintaining IO structure, even compared to GPT--but our data was limited and scuffed, plus the fine-tuning API seemed to be down at the time. Perhaps we could have asked for advice from the relevant sponsors if we weren't tuning at 4 in the morning...
Acks
Thanks to the Codegen team for letting us bounce a lot of ideas off them and Dave from Bun for chilling at our table and showing us a lot of cool things!
What's Next
We want to take this idea further and integrate the agent with a specific niche. One of our teammate wants it to do his homework!



Log in or sign up for Devpost to join the conversation.