Galileo

Galileo · 2026-04-13T21:00:06.215Z

ICYMI: Another week, another set of workflow improvements across our platform! Here's what just went live in this week's release notes 👇 Playgrounds: Playgrounds can now dynamically detect datasets’ variables, making it easier to add variables to playground prompts. Logs filtering: Logs can now automatically display available columns to filter. Custom model integration: Galileo’s custom model integrations now support model properties for users who wish to further customize LLM integration parameters. OpenAI models: GPT 5.4 Mini and Nano now available across Playground, Prompt store, Synthetic Data Generation, and Metrics Hub Annotation Queues (Enterprise Beta): Keyboard shortcuts and auto-advance to speed up annotator workflows Read more here: https://lnkd.in/gv5zwJKV

Software Development

Burlingame, California 23,925 followers

Galileo is the leading AI Reliability + Evaluation platform that helps teams of all sizes build AI apps they can trust.

View all 156 employees

About us

Galileo is the leading platform for enterprise GenAI evaluation and observability. Our comprehensive suite of products support builders across the new AI development workflow—from fine-tuning LLMs to developing, testing, monitoring, and securing their AI applications. Each product is powered by our research-backed evaluation metrics. Today, Galileo is used by 100s of AI teams from startups to Fortune 50 enterprises, including Twilio, Comcast, and HP.

Website: https://www.galileo.ai
External link for Galileo
Industry: Software Development
Company size: 51-200 employees
Headquarters: Burlingame, California
Type: Privately Held
Founded: 2021

Locations

Primary

287 Lorton Ave

Burlingame, California 94010, US

Get directions
Brooklyn, New York, US

Get directions
Bengaluru, IN

Get directions

Employees at Galileo

See all employees

Updates

Galileo

23,925 followers
18h
Report this post
Most teams running eval pipelines on multimodal agents are silently missing two failure modes. The first: bad inputs that look fine to the eval layer but break the agent. A blurry product photo. A customer call with three seconds of dropouts every minute. A PDF that was scanned poorly. The agent produces a confident, completely wrong response. The eval pipeline sees a clean transcript and a clean output. Everything passes. Meanwhile, the user leaves unhappy. The second: using text-based evals on a non-text input. Did the agent identify the safety vest in the photo? Did it correctly infer customer frustration from tone, not words? Did it count the items in the shelf image accurately? Text-only evals can’t answer these questions. We just shipped Multimodal evals to fix both. If you're building agents for PDF extraction, image description, visual compliance, or support-call analysis, give it a run on your own traffic. Read the docs: https://lnkd.in/gM9qjSm3

Like Comment Share
Galileo

23,925 followers
18h
Report this post
Most teams running eval pipelines on multimodal agents are silently missing two failure modes. The first: bad inputs that look fine to the eval layer but break the agent. A blurry product photo. A customer call with three seconds of dropouts every minute. A PDF that was scanned poorly. The agent produces a confident, completely wrong response. The eval pipeline sees a clean transcript and a clean output. Everything passes. Meanwhile, the user leaves unhappy. The second: using text-based evals on a non-text input. Did the agent identify the safety vest in the photo? Did it correctly infer customer frustration from tone, not words? Did it count the items in the shelf image accurately? Text-only evals can’t answer these questions. We just shipped Multimodal evals to fix both. If you're building agents for PDF extraction, image description, visual compliance, or support-call analysis, give it a run on your own traffic. Read the docs: https://lnkd.in/gM9qjSm3

Like Comment Share
Galileo reposted this
Pratik Bhavsar
2d Edited
Report this post
𝗧𝗵𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻 𝗶𝘀 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝘄𝗵𝗲𝘁𝗵𝗲𝗿 𝘆𝗼𝘂 𝗵𝗮𝘃𝗲 𝘁𝗵𝗲𝗺. 𝗜𝘁 𝗶𝘀 𝘄𝗵𝗲𝗿𝗲 𝘁𝗵𝗲𝘆 𝗹𝗶𝘃𝗲. One of the strongest examples from our new blog: an agent team thought its prompt injection guardrail was working. The dashboard looked clean. The model said risk was low. But the system was only catching 2 of the 10 OWASP scenarios. The rest, indirect injection, zero-shot attacks, multi-turn manipulation, cross-agent propagation, were effectively invisible. That is the trap with agent security: coverage gaps can look exactly like safety. That story is one of several in our new blog, written with Bianca, Michael, and Obine A. 𝗟𝗮𝗿𝗴𝗲 𝗲𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲𝘀, 𝗲𝘀𝗽𝗲𝗰𝗶𝗮𝗹𝗹𝘆 𝗶𝗻 𝗳𝗶𝗻𝗮𝗻𝗰𝗶𝗮𝗹 𝘀𝗲𝗿𝘃𝗶𝗰𝗲𝘀, 𝗮𝗿𝗲 𝗺𝗼𝘃𝗶𝗻𝗴 𝗳𝗿𝗼𝗺 “𝘄𝗲 𝗸𝗻𝗼𝘄 𝗢𝗪𝗔𝗦𝗣 𝗺𝗮𝘁𝘁𝗲𝗿𝘀” 𝘁𝗼 “𝘄𝗲 𝗰𝗮𝗻 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝗲𝗻𝗳𝗼𝗿𝗰𝗲 𝗶𝘁.” And the pattern keeps showing up across teams: security controls cannot live inside every individual agent. They need to be centrally owned, centrally updated, and enforced consistently across every production use case. 𝗪𝗵𝗮𝘁 𝘁𝗲𝗮𝗺𝘀 𝗰𝗮𝗿𝗲 𝘁𝗵𝗲 𝗺𝗼𝘀𝘁 𝗮𝗯𝗼𝘂𝘁: → Prompt injection is much broader than most teams assume. Direct attacks are only one slice of the problem. Indirect retrieval-based injection, multi-turn steering, and cross-agent contamination all need coverage. → PII leakage keeps coming up as a hard gating requirement, especially in banking. One quote from the piece stayed with me: “We don’t need to prove that PII doesn’t leak 99% of the time. We need to prove it doesn’t leak, period.” → Heuristic controls hit a wall fast. Regex, keyword filters, and custom rules help early, but they create maintenance burden, leave coverage gaps, and do not scale as agent use cases multiply. → Policy updates need to propagate immediately. When a new threat vector appears or requirements change, security teams need one policy definition that every agent picks up within seconds, across ADK, LangGraph, CrewAI, or custom stacks. 𝗧𝗵𝗲 𝗲𝗻𝗱𝗴𝗮𝗺𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻: 𝗖𝗮𝗻 𝘆𝗼𝘂 𝗼𝗽𝗲𝗿𝗮𝘁𝗶𝗼𝗻𝗮𝗹𝗶𝘇𝗲 𝗢𝗪𝗔𝗦𝗣 𝗮𝗰𝗿𝗼𝘀𝘀 𝘁𝗵𝗲 𝗲𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲 𝗳𝗼𝗿 𝗮𝗹𝗹 𝘁𝗵𝗲 𝗮𝗴𝗲𝗻𝘁𝘀? Learn more...
1 Comment

Like Comment Share
Galileo reposted this
Vikram Chatterji
1w
Report this post
As Jeetu Patel called out, this article from Michael Leone is highly congruent with how we collectively think about the massive opportunity ahead of us. At Galileo, two years ago, we pioneered leveraging SLMs to create ultra low latency, low cost, high accuracy evals -- the building block for evals, 100% AI software o11y and runtime control. We called these 'Evaluation Models' Luna. Love the shout out about Luna from Michael: "The piece I find most interesting technically is Luna, Galileo's line of fine-tuned small language models built specifically for evaluation work. Most "LLM as judge" approaches call a frontier model on every production request, which makes running evaluation at meaningful traffic volume prohibitively expensive. Luna inverts that equation by training specialized SLMs with metric-specific heads. When evaluation cost drops far enough that you can run it continuously instead of on a sample, you stop treating it as a periodic audit and start treating it as always-on quality control. That shift changes how engineering teams actually operate agents in production, and it looks a lot more like the kind of continuous monitoring ops teams already do." We are only going to build faster from here on. LOTs of exciting stuff brewing. Stay tuned!

Jeetu Patel Jeetu Patel is an Influencer
1w Edited

This is one of the most thoughtful posts I have seen in the space of Agent Observability. But what’s more impressive is that the overall strategy we have been thinking about is so well articulated. Furthermore the concerns that Michael Leone highlights that we at Cisco should be held accountable to are spot on. The only thing I would say Michael is that you should in the AI era be more demanding on timeline expectations from us. If it takes us the amount of time you are giving us, I would consider that a failure in execution on Cisco’s part. We will exceed your expectations here I hope. Cisco employees, please make sure we accept the challenge from us to beat Michael’s timeline. Our team is motivated to work super hard and stay obsessed about speed for long durations of time, which over time will become on of a Cisco’s key moats. Thanks again for such a thoughtful inaugural newsletter. Every sales person, SE and CX person of Cisco and our partners should read this newsletter from Michael. One thing that we have not yet cracked the code on is to enable our field and partner network at the speed that we are innovating products. We have yet to crack the code on enablement. But I hope this document gets the broadest distribution from our field and partners. For anyone at Cisco who reads this, my request would be to reshare it so that we all are able to use a completely new way to enable our field. So excited to see someone articulate our strategy in exactly the way that we have been thinking about it and also highlight the risks of our strategy that are very fair. Michael, we will up the bar on the speed of execution record and Cisco accepts the challenge. Thanks again for such a thoughtful post.

Galileo Brings Cisco a Purpose-Built Agent Evaluation Layer Michael Leone on LinkedIn

3 Comments

Like Comment Share
Galileo reposted this
Michael Leone
1w
Report this post
Great way to launch my new LinkedIn newsletter! Cisco's planned acquisition of Galileo is one of the more interesting #AgenticAI deals so far this year. Getting agents into production is only half the problem. Knowing whether they're actually doing their job is the part almost nobody has figured out yet. I break it all down below, and I encourage you to subscribe. Much more to come covering the intersection of #data, #analytics, #AI, and #governance. Kamal Hathi Vikram Chatterji Joely Urton DJ Sampath Tom Gillis Grace Chung Amy Chang Christine Wan

Galileo Brings Cisco a Purpose-Built Agent Evaluation Layer Michael Leone on LinkedIn

8 Comments

Like Comment Share
Galileo

23,925 followers
3d
Report this post
Building a crew of agents is the easy part. Knowing what they're doing and stopping them when they're off course is where most teams get stuck. We're co-hosting a live session with CrewAI to cover exactly that. Join Galileo co-founder and CTO Yash Sheth and CrewAI founder and CEO João (Joe) Moura as they walk through how to govern multi-agent systems at scale, covering behavior, cost, and compliance. In this session, you'll learn: → How to enforce safety and security policies in CrewAI agents. → How to steer agents to the best models and fallback tools at runtime to improve accuracy and control token costs → How to govern all your agents, whether CrewAI, internal or third-party, with one centralized set of policies → How to include non-technical stakeholders (such as risk and compliance) in writing or maintaining policies – no coding required Last chance to register here: https://lnkd.in/gRjQNnwM
Like Comment Share
Galileo reposted this
Joseph Lombardi
1w Edited
Report this post
This week in NYC, Galileo took the stage at the RE•WORK AI in Finance Summit! We shared how leading financial institutions are moving beyond AI pilots and what it actually takes to ship reliable, controllable agentic AI systems in production. If you weren't with us yesterday, no worries, here are some key highlights: AI in production is still broken: - Most teams struggle with unreliable evals, blind spots at scale, and lack of runtime control no surprise that ~95% of AI pilots fail to reach production AI teams aren’t lacking ambition, they’re lacking tooling: - Without systems to validate, trace, and monitor agents end-to-end, production readiness becomes guesswork. Agents change everything: - You’re no longer evaluating a single model, you’re validating entire systems: prompts, tools, workflows, and behaviors across dozens of variables. From evals to guardrails: - We showed how Galileo helps teams turn offline evaluation into real-time protection: • Test with robust, customizable evals • Monitor every workflow in production (not just samples) • Enforce guardrails in milliseconds to prevent hallucinations, data leaks, and more Big thanks to everyone who joined the session and to Festus Asare-Yeboah for representing Galileo on stage. Happy to share our presentation, just DM if interested. #AI #GenAI #FinTech #AIAgents #reworkai #NYC #AIinFinance
4 Comments

Like Comment Share
Galileo

23,925 followers
1w Edited
Report this post
EU AI Act audits begin in August. The theoretical conversation about AI governance just became a procurement requirement with deadlines attached. Large banks now require security sign-off before any agentic use case reaches production. Risk teams are blocking deployments until observability and governance are in place. Many enterprises guard against only 2-3 of the 10 OWASP threat categories for agentic AI. Prompt injection guardrails cover approximately 2 of 10 defined injection variants. Entire attack categories, tool misuse, identity abuse, privilege escalation, and inter-agent communication risks, remain invisible to existing controls. Traditional application security rests on one foundational property: the system under protection is a constrained actor with fixed logic. Agentic AI is an adaptive actor with open-ended behavior, and is fundamentally different to secure. We just published Operationalizing the OWASP Top 10 for Agentic AI; a security whitepaper that shows how to turn the OWASP framework into enforceable, auditable controls using a central control plane architecture. Read our whitepaper to: – Understand why agents break traditional application security models – Map every OWASP ASI01–ASI10 threat to concrete detection controls – Architect a central control plane that enforces policy across every agent – Separate platform-level and per-agent controls without duplicating effort – Close the gap between prompt injection guardrails and full OWASP coverage – Build an immutable audit trail regulators and CISOs will accept – Apply the same infrastructure to GDPR, EU AI Act, and internal requirements – Validate OWASP threat coverage with aligned test suites, not generic benchmarks The enterprises that treat OWASP as a checkbox will fall behind. The ones that treat it as the architectural blueprint for agentic AI governance will lead. Download the whitepaper here: https://lnkd.in/ge-nDAvJ Written by: Pratik Bhavsar, AI Engineer, Galileo Michael Branconier, FDE, Galileo Bianca DePriest, Enterprise Sales, Galileo Obine Adoh, Security, Galileo
1 Comment

Like Comment Share
Galileo

23,925 followers
1w
Report this post
ICYMI: Another week, another set of workflow improvements across our platform! Here's what just went live in this week's release notes 👇 Playgrounds: Playgrounds can now dynamically detect datasets’ variables, making it easier to add variables to playground prompts. Logs filtering: Logs can now automatically display available columns to filter. Custom model integration: Galileo’s custom model integrations now support model properties for users who wish to further customize LLM integration parameters. OpenAI models: GPT 5.4 Mini and Nano now available across Playground, Prompt store, Synthetic Data Generation, and Metrics Hub Annotation Queues (Enterprise Beta): Keyboard shortcuts and auto-advance to speed up annotator workflows Read more here: https://lnkd.in/gv5zwJKV
Like Comment Share
Galileo reposted this
CrewAI

96,155 followers
1w
Report this post
Governance is the part nobody talks about. Until something breaks. Join us for a live webinar on April 21st to learn how to keep your agentic systems safe, reliable, and in control — at scale. Galileo co-founder & CTO Yash Sheth and CrewAI founder & CEO João (Joe) Moura will cover: → Enforcing safety and security policies in CrewAI agents → Steering agents to the best models and fallback tools at runtime → Governing all your agents with one centralized set of policies → Bringing non-technical stakeholders into the loop — no coding required 🗓 April 21 | 10am PT 🎟 Register: https://lnkd.in/gRjQNnwM #AgenticAI #CrewAI #AI
3 Comments

Like Comment Share

Galileo

Software Development

Burlingame, California 23,925 followers

Galileo is the leading AI Reliability + Evaluation platform that helps teams of all sizes build AI apps they can trust.

About us

Locations

Employees at Galileo

Denis Abrantes

Xian Ke

Ahmed Quadri

Dharmesh Thakker

Updates

Join now to see what you are missing

Similar pages

Galileo AI

Galileo Financial Technologies

Cisco

Galileo

Arize AI

Splunk

Anthropic

Neuron7.ai

Voxela, Inc.

SoFi

Browse jobs

Scientist jobs

Engineer jobs

Machine Learning Engineer jobs

Analyst jobs

Developer jobs

Intern jobs

Manager jobs

Data Scientist jobs

Account Executive jobs

Site Reliability Engineer jobs

Director jobs

Senior Scientist jobs

Project Manager jobs

Technical Product Manager jobs

Account Manager jobs

Intelligence Specialist jobs

Senior Data Engineer jobs

Software Engineer jobs

Enterprise Account Executive jobs

Recruiter jobs