Blog Article

Building the Evaluation Layer for AI Agents

Mozilla Ventures

•

Mar 25, 2026

Article

As AI systems become more autonomous, the real constraint is proving they work reliably in the real world. Galtea is building the evaluation layer that gives developers the confidence to deploy AI agents and scale them.

During our visit to the Barcelona Supercomputing Center last year we learnt about the innovative work that the Galtea team was doing in building quality assurance for AI systems. We returned to Barcelona for Mozfest 2025 and had the opportunity to spend time with the founders, Jorge Palomar and Dr Baybars Kulebi. We were struck by their conviction that without sufficient, affordable testing data, developers have no reliable way to know how their agents will perform in the real world. They had developed this insight as researchers at the Barcelona Supercomputing Center (BSC) where they saw firsthand the challenges of validating complex AI systems.

Jorge and Baybars make a formidable team. Jorge is an AI Data Engineer turned Data Strategy Lead at the Barcelona Computing Center. He previously was a Business Intelligence Engineer at Amazon. Baybars is a physicist turned developer. He was previously the Engineering Manager at the Barcelona Computing Center. Together they spun Galtea out of the BSC and are now using high quality synthetic data to evaluate AI systems across three critical dimensions: quality, security, and real world user scenarios. Their platform simulates edge cases, stress tests agentic workflows, and generates use case specific evaluation datasets, giving enterprises measurable confidence before deployment. Early customers include Telefonica and leading Spanish and retail commercial bank, ABANCA, both of which use Galtea to dynamically test scenarios and tailored metrics for their AI agents, enabling them to make more informed decisions.

Rather than treating evaluation as a secondary feature within observability, Galtea positions it as foundational. Without strong evaluation data, enterprises are guessing whether their AI is behaving correctly. As AI systems become more autonomous and agentic, that guesswork becomes riskier.At Mozilla Ventures, we invest in the infrastructure required for trustworthy AI. Galtea’s data first approach helps enterprises move from experimentation to production with greater safety and confidence, strengthening the foundations for AI that works, which is why we joined 42CAP in investing in the Galtea Seed round.

Building the Evaluation Layer for AI Agents

Related articles