Inspiration

  • Original WebArena is pretty diverse and powerful but hard to integrate with
  • Good standard and end to end
  • Wanted to expand outside of Web Agents
  • Solid baseline framework

What it does

  • Integrate Agentops to better visualize Agent Evals
  • Allow for integration with other Agents
  • Minor various improvements (test with gpt-4), etc.

How we built it

  • Deep dive of current WebArena architecture
  • Refactor parts and integrate Agentops

Challenges we ran into

  • Very complex architecture
  • Hard to integrate new agents
  • Hard to create new test environments
  • Hard to visualize all benchmark evals

Accomplishments that we're proud of

  • Added Agentops for better observability
  • Broke down framework to be able to add new environments and tests
  • Improved testing to make it more robust

What's next for AutoArena

  • Easy connection with any Agent framework
  • Add new web environments
  • Automatically add new test sets dynamically based on what fails
  • Auto run regression tests on every PR to an Agent framework

Built With

Share this project:

Updates