AssetOpsBench

AI Agents for Industrial Asset Operations & Maintenance

A unified, open framework for building, orchestrating, and evaluating domain-specific AI agents in Industry 4.0.

📄 Paper · 🤗 Dataset · 🎮 Playground · 📢 IBM Blog · 🎥 Video · 📊 Kaggle · 🚀 Colab

Important

🎉 AssetOpsBench is officially accepted at KDD 2026 (Datasets & Benchmarks Track), Jeju, South Korea, alongside our hands-on tutorial Building Reliable Industrial Agents with MCP. See Publications for the full list of 2025–2026 work.

At a Glance

9
_{Asset classes}

141+
_Scenarios

5
_{Domain agents}

2
_{Orchestration frameworks}

20+
_{University extensions}

500+
_{Competition submissions}

Built for: maintenance engineers, reliability specialists, facility planners, and Industry 4.0 researchers. Powered by: LLMs + Time Series Foundation Models, orchestrated over live sensor data and Industry 4.0 records (FMEA, work orders, alerts). Now with: simplified interface and native MCP (Model Context Protocol) support.

Quick Start

# Clone and install
git clone https://github.com/IBM/AssetOpsBench.git
cd AssetOpsBench
pip install -e .

# Try a scenario (to be enabled)
python -m assetopsbench.run --scenario "List all sensors of Chiller 6 in MAIN site"

Or jump in instantly:

🚀 Run on Colab — no install required (illustration of LLM Agent)
🎮 Try the HF Playground — interactive demo
📖 Read INSTRUCTIONS.md — full setup, MCP servers, plan-execute runner

Note

Active development is on main. The codebase used for various publication venues continues to be maintained on separate branches, for example, ACL 2026 IndustryAssetEQA and prior experimental work is maintained on main-0.x.

What is AssetOpsBench?

AssetOpsBench is a unified framework for developing, orchestrating, and evaluating domain-specific AI agents in industrial asset operations and maintenance. It provides reproducible scenarios, agent tooling, and evaluation pipelines for multi-step workflows in simulated industrial environments.

Domain-Specific MCP Servers

MCP Servers	Important tools
IoT	`get_sites`, `get_history`, `get_assets`, `get_sensors`
FMSR	`get_sensors`, `get_failure_modes`, `get_failure_sensor_mapping`
TSFM	`forecasting`, `timeseries_anomaly_detection`
WO	`get_work_order_distribution`, `predict_next_work_order`, ...
Vibration	`compute_fft_spectrum`, `compute_envelope_spectrum`, ...
...	...

Agent Frameworks

Plan Execute — plan-and-execute sequential workflow to work with any LLM
Deep Agent — planning, sub-agents, and virtual filesystem for long-horizon tasks
Claude Agent — ReAct-based orchestrator using Claude with agent-as-tool delegation
OpenAI Agent — ReAct-based orchestrator using OpenAI models with agent-as-tool delegation

MCP Environment

The src/ directory contains MCP servers and a plan-execute runner built on the Model Context Protocol. See INSTRUCTIONS.md for setup.

Example Scenarios

Domain	Example Task
IoT	"List all sensors of Chiller 6 in MAIN site"
FMSR	"Identify failure modes detected by Chiller 6 Supply Temperature"
TSFM	"Forecast Chiller 9 Condenser Water Flow for the week of 2020-04-27"
WO	"Generate a work order for Chiller 6 anomaly detection"

Some tasks focus on a single domain, others are multi-step end-to-end workflows. Explore all scenarios on Hugging Face.

Leaderboards

To be revised (WIP with latest models)
Evaluated with 7 Large Language Models
Trajectories scored using LLM Judge (Llama-4-Maverick-17B)
6-dimensional criteria measuring reasoning, execution, and data handling

Example: MetaAgent leaderboard

Publications

12+ contributions across 7 top venues in 2025–2026 from the team behind AssetOpsBench.

⭐ KDD 2026 — Jeju, South Korea (click to expand)

[D&B] AssetOpsBench: A Benchmark for Industrial Asset Operations Agents · D. Patel, S. Lin, et al. · 📄 Paper
[Tutorial] Building Reliable Industrial Agents with MCP: A Hands-on AssetOpsBench Tutorial for AI-Driven Operations · D. Patel, C. Shyalika, et al.

ACL 2026 - San Diego, USA

[Industry] IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance · C. Shyalika, D. Patel, A. Sheth

ICLR 2026 - Brazil

[Main] Adaptive Conformal Anomaly Detection with Time Series Foundation Models for Signal Monitoring · N. Martinez, F. O'Donncha, W. M. Gifford, N. Zhou, D. C. Patel, R. Vaculin

AAAI 2026 — Singapore

[Demo] AssetOpsBench-Live: Privacy-Aware Online Evaluation of Multi-Agent Performance in Industrial Operations · D. Patel, N. Zhou, S. Lin, J. T. Rayfield, C. Shyalika, S. R. Yarrabothula · 🎥 Demo
[Main] SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search · Y. Zhang, G. Ganapavarapu, S. Jayaraman, B. Agrawal, D. Patel, A. Fokoue · 💻 Code
[Bridge] Knowledge-Guided AI for Industrial Asset Health Monitoring · S. Lin, D. Patel
[Tutorial] From Inception to Productization: Hands-on Lab for the Lifecycle of Multimodal Agentic AI in Industry 4.0 · C. Shyalika, S. Ahuja, S. Lin, R. Wickramarachchi, D. Patel, A. Sheth · 🌐 Website · 📊 Slides
[Workshop(AABA4ET)] Agentic Code Generation for Heuristic Rules in Equipment Monitoring · F. Lorenzi, A. Langbridge, F. O'Donncha, J. Rayfield, B. Eck, S. Rosato

IAAI 2026 - Singapore

[Deployed] Deployed AI Agents for Industrial Asset Management: CodeReAct Framework for Event Analysis and Work Order Automation · N. Zhou, D. Patel, A. Bhattacharyya
[Emmerging] Diversity Meets Relevancy: Multi-Agent Knowledge Probing for Industry 4.0 Applications · C. Constantinides, D. Patel, S. Kimbleton, N. Garg, M. Paracha

NeurIPS 2025 — San Diego, USA

[D&B Track] FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes · C. Constantinides, D. Patel, S. Lin, C. Guerrero, S. D. Patil, J. Kalagnanam · 📄 arXiv · 💻 Code
[Social] Building Reliable Agentic Benchmarks: Insights from AssetOpsBench (invited talk, 2000+ registered) · D. Patel · 📅 Luma

EMNLP 2025 — Suzhou, China

[Main] ReAct Meets Industrial IoT: Language Agents for Data Access · J. T. Rayfield, S. Lin, N. Zhou, D. C. Patel
[Main] Generalized Embedding Models for Industry 4.0 Applications · C. Constantinides, S. Lin, D. C. Patel · 📄 arXiv
[Findings] Fine-Tuned Thoughts: Leveraging Chain-of-Thought Reasoning for Industrial Asset Health Monitoring · S. Lin, D. Patel, C. Constantinides · 📄 ACL Anthology · 💻 Code

Tutorials & Technical Material

📘 Hands-on guides from our team:

AI Competitions

AssetOpsBench powers public AI agent competitions that bring together researchers, students, and practitioners worldwide.

🔴 Live — IJCAI 2026

Industrial Automation Challenge: Benchmarking Physics-Grounded LLMs for Task Reasoning

A new challenge co-located with IJCAI 2026 that pushes LLM agents on physics-grounded industrial reasoning.

🌐 Challenge site: ai-industrial-challenge-ijcai
📋 IJCAI 2026 competitions: 2026.ijcai.org/competitions

✅ Completed — CODS 2025

AssetOpsBench-Live: AI Agentic Challenge

Launched in September 2025 at CODS 2025, the competition evaluated multi-agent systems on live industrial scenarios.

🏆 Competition page: codabench.org/competitions/10206
👥 365 participants · 500+ agent submissions

Talks & Events

Date	Event
2026-08	KDD 2026 — AssetOpsBench paper + MCP tutorial · Jeju, South Korea
2026-05-10	NUS Seminar: AssetOpsBench Applications
2025-12	NeurIPS 2025 Social: Building Reliable Agentic Benchmarks (2000+ registered)
2025-10-03	2-Hour Workshop: AI Agents and Their Role in Industry 4.0 Applications · NJIT ACM
2025-09-01	CODS 2025 Competition Launch — AssetOpsBench-Live
2025-06-01	AssetOpsBench v1.0 released — 141 industrial scenarios

University Projects & Extensions

AssetOpsBench is being extended by university research groups exploring new asset classes, evaluation paradigms, and agentic architectures. To list your project, open a PR.

Internalizing MCP Tool Knowledge in Small LLMs via QLoRA Fine-Tuning — HPML project using AssetOpsBench to fine-tune ~4B models to internalize MCP tool knowledge and reduce prompt schema overhead. Ayal Yakobe, Columbia University · repo
SPIN — Structural LLM Planning via Iterative Navigation for Industrial Tasks. Yusuke Ozaki, University at Albany · paper · repo
Synthetic Scenario Generation for Evaluation of Industry 4.0 Agents — Automated scenario generation, transformer asset integration, and scenario quality evaluation. Rohith Kanathur, Sagar Chethan Kumar, Columbia University · repo
AgentOpsBench — High-throughput battery analytics MCP server with DNN prognostics (RUL prediction) and 3.3× latency optimization. Siddharth Gowda, Rushin Bhatt, Aryaman Agrawal, Winston Li, Columbia University · repo
Skill-Knowledge-Augmented Agents on AssetOpsBench — Confidence-gated skill execution with scoped knowledge plugins for industrial fault diagnosis. Vera Mazeeva, Sanskruti Shejwal, Shrey Arora, Mana Abbaszadeh, Columbia University · repo
Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines. Krish Veera, Alimurtaza Mustafa Merchant, Sajal Kumar Goyla, Shambhawi Bhure, Columbia University · repo
Towards Multi-Turn Dialog Systems for Industrial Asset Operations and Maintenance - Improved response quality and reduced redundant tool calls and multi-turn latency. Chengrui Li, Rujing Li, Yitong Bai, Rui Li, Columbia University · repo

Call for Scenario Contribution

We are expanding AssetOpsBench to cover a broader range of industrial challenges. We invite researchers and practitioners to contribute new scenarios, particularly in:

Asset Classes: Turbines, HVAC systems, Pumps, Transformers, CNC Machines, Robotics, Engines
Task Domains: Prognostics and Health Management, Remaining Useful Life (RUL) estimation, Root Cause Analysis (RCA), Diagnostic Analysis, Predictive Maintenance

How to contribute:

Define your scenario following our Utterance Guideline and Ground Truth Guideline
Explore the Hugging Face dataset for examples
Submit a Pull Request or open an Issue with the tag new-scenario
Contact us with questions:
- Dhaval Patel — pateldha@us.ibm.com
- Nianjun Zhou — jzhou@us.ibm.com

Contributors

Thanks to these wonderful people ✨

_ShuxinLin 💻	_DhavalRepo18 💻	_{ChathurangiShyalika} 💻	_Dev-Scodes5 💻	_{DeveloperMindset123} 💻	_LGDiMaggio 💻	_{PUSHPAK-JAISWAL} 💻
_bradleyjeck 💻	_florenzi002 💻	_jack-pfeifer 💻	_jdsheehan 💻	_jtrayfield 💻	_kushwaha001 💻	_nianjunz 💻
_{sandeepkunkunuru} 💻	_srutanik 💻	_thedgarg31 💻

Star History

If AssetOpsBench is useful to your work, please ⭐ star the repo, 🍴 fork it, and tell us what you're building.

Name		Name	Last commit message	Last commit date
Latest commit History 693 Commits
.github		.github
docs		docs
src		src
.all-contributorsrc		.all-contributorsrc
.env.public		.env.public
.gitignore		.gitignore
.python-version		.python-version
.whitesource		.whitesource
CONTRIBUTING.md		CONTRIBUTING.md
INSTRUCTIONS.md		INSTRUCTIONS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
renovate.json		renovate.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AssetOpsBench

AI Agents for Industrial Asset Operations & Maintenance

At a Glance

Quick Start