TNS
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
NEW! Try Stackie AI
AI / AI Models

Ai2 launches MolmoWeb, an open-source web agent

The open-source web agent ships with the full training stack and was built without distilling from proprietary models.
Mar 24th, 2026 9:07am by
Featued image for: Ai2 launches MolmoWeb, an open-source web agent

AI agents that can browse the web and complete tasks on behalf of their users have gotten significantly better over the last few months, but the models that power them have, for the most part, remained proprietary.

On Tuesday, the Allen Institute of AI (Ai2) launched MolmoWeb, a new open-source visual web agent that is part of Ai2’s Molmo 2 model family. 

The new model comes in two sizes: 4 billion parameters and 8 billion parameters, making them small enough to run locally. As with virtually all of Ai2’s models, the team is also making the weights, training data, code (coming soon), and evaluation tools available. 

As with similar agents, the idea here is to build a system that can perform tasks in the web browser, using the same interface a human would use. 

The agent should be able to navigate websites, fill out forms, search for products on a shopping site, and retrieve information.

While there are lots of active efforts like WebMCP underway that aim to make it easier for agents to interact with individual sites, agent systems like MolmoWeb take a task and try to execute it by looking at screenshots of the web page, predicting next steps, and operating the browser by clicking on buttons, typing text, and scrolling. This means the agent should be able to navigate websites, fill out forms, search for products on a shopping site, and retrieve information.

Image
Credit: Ai2.

What makes MolmoWeb stand out from some other web agents isn’t just its small size, but also the fact that Ai2 did not train the model by distilling it from proprietary vision-based agents. Instead, the team writes that the “data comes from synthetic trajectories generated by text-only accessibility-tree agents and human demonstrations.” 

Benchmarks

This approach has resulted in impressive performance on some standard browser-use benchmarks. MolmoWeb, for example, outperforms OpenAI’s (admittedly older) GPT-4o, which relies on annotated screenshots and structured page data. Among open-weight models, MolomoWeb — in both its 4B and 8B variants — also outperforms competitors such as Fara-7B and GLM-4.1V-9B. 

Image
Credit: Ai2.

For the most part, though, proprietary models from Anthropic, Google, OpenAI, and others still easily outperform these open models, but it’s worth remembering that part of Ai2’s mission isn’t necessarily to compete with these models but to offer an alternative for researchers who want to understand how these systems work. 

Ai2’s mission isn’t necessarily to compete with Anthropic or OpenAI, but to offer an alternative for researchers.

As the team notes, “the open-source community lacks not just the models but the training data, infrastructure, and evaluation tools needed to build competitive alternatives. That gap limits reproducibility, slows research progress, and makes it difficult to understand how these systems actually work. In many ways, web agents today are where LLMs were before Olmo — the community needs an open foundation to build on.”

MolmoWeb’s training data

The training set for MolmoWeb includes 30,000 human task trajectories, which Ai2 describes as “the largest publicly released dataset of human web task execution to date.” This includes almost 600,000 individual subtasks across over 1,100 websites. 

That’s a lot of data, but not enough to train a model, so the team also generated synthetic trajectories using agents that operated websites using accessibility trees, which is significantly easier for those agents since they don’t have to interpret screenshots.

The training set also includes annotated screenshots with information about different elements of a website, as well as over 2.2 million question-answer pairs from reasoning tasks in which a model answered questions about screenshots from about 400 sites.

Availability

MolmoWeb is now available on Hugging Face and GitHub, along with all training data and evaluation tools.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: OpenAI, Anthropic.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.