Stories by BroadNotes by 0xjacobzhao on Medium

Turning Probability into Assets: A Look Ahead at Prediction Market Agents

BroadNotes by 0xjacobzhao — Wed, 04 Mar 2026 16:08:15 GMT

Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao

In our previous Crypto AI research, we established that while stablecoins and DeFi offer immediate utility, Agents represent the critical user interface for the AI industry. Consequently, we define two primary value paths for Crypto-AI integration: a short-term focus on AgentFi, which automates yield strategies on mature DeFi protocols, and a medium-to-long-term evolution toward Agent Payment, enabling autonomous stablecoin settlement via emerging standards like ACP, x402, and ERC-8004.

Prediction markets have become an undeniable new industry trend in 2025, with total annual trading volume surging from approximately $9 billion in 2024 to over $40 billion in 2025, achieving a year-on-year growth of over 400%. This significant growth is driven by multiple factors: demand for uncertainty hedging brought by macro-political events, the maturation of infrastructure and trading models, and the breaking of ice in the regulatory environment (Kalshi’s lawsuit victory and Polymarket’s return to the US). Prediction Market Agents are showing early prototypes in early 2026 and are poised to become a new product form in the agent field over the coming year.

I. Prediction Markets: From Betting Tools to a “Global Truth Layer”

A prediction market is a financial mechanism for trading around the outcomes of future events. Contract prices essentially reflect the market’s collective judgment on the probability of an event occurring. Its effectiveness stems from the combination of crowd wisdom and economic incentives: in an environment of anonymous, real-money betting, dispersed information is rapidly integrated into price signals weighted by financial willingness, thereby significantly reducing noise and false judgments.

(Note: “Prediction Market Nominal Trading Volume Trend Chart” from Dune Analytics here.)

By the end of 2025, prediction markets have largely formed a duopoly dominated by Polymarket and Kalshi. According to Forbes, total trading volume in 2025 reached approximately $44 billion, with Polymarket contributing about $21.5 billion and Kalshi about $17.1 billion. February 2026 weekly data shows Kalshi’s trading volume ($25.9B) has surpassed Polymarket ($18.3B), approaching 50% market share. Kalshi, leveraging its legal victory in the previous election contract case, its first-mover compliance advantage in the US sports prediction market, and relatively clear regulatory expectations, achieved rapid expansion. Currently, their development paths have clearly diverged:

Polymarket adopts a hybrid CLOB (Central Limit Order Book) architecture with “off-chain matching, on-chain settlement” and a decentralized settlement mechanism. It has built a globalized, non-custodial high-liquidity market, forming an “onshore + offshore” dual-track operational structure after its compliant return to the US.
Kalshi integrates into the traditional financial system, accessing mainstream retail brokers via API to attract Wall Street market makers for deep participation in macro and data-based contract trading. Its products are constrained by traditional regulatory processes, leading to a lag in addressing long-tail demands and sudden events.

Beyond Polymarket and Kalshi, other competitive participants in the prediction market field are developing along two main paths:

Compliant Distribution Path: Embedding event contracts into the existing account and clearing systems of brokers or large platforms, relying on channel coverage, compliance qualifications, and institutional trust to build advantages (e.g., Interactive Brokers × ForecastEx’s ForecastTrader, FanDuel × CME Group’s FanDuel Predicts). While compliance and resource advantages are significant, product and user scale are still in the early stages.
Crypto-Native On-Chain Path: Represented by Opinion.trade, Limitless, and Myriad, these leverage points mining, short-cycle contracts, and media distribution to achieve rapid volume growth. They emphasize performance and capital efficiency, but their long-term sustainability and risk control robustness remain to be verified.

These two paths — traditional financial compliance entry and crypto-native performance advantages — together constitute the diversified competitive landscape of the prediction market ecosystem.

While prediction markets superficially resemble gambling and are essentially zero-sum games, the core difference lies in whether they possess positive externalities: aggregating dispersed information through real-money trading to publicly price real-world events, forming a valuable signal layer. The trend is shifting from gaming to a “Global Truth Layer” — as institutions like CME and Bloomberg connect, event probabilities have become decision-making metadata directly callable by financial and corporate systems, providing a more timely, quantifiable, market-based truth.

From a global regulatory perspective, compliance paths for prediction markets are highly divergent. The US is the only major economy explicitly including prediction markets in its financial derivatives regulatory framework. Markets in Europe, the UK, Australia, and Singapore generally view them as gambling and tend to tighten regulations, while China and India completely ban them. Future global expansion of prediction markets still depends on national regulatory frameworks.

II. Architecture Design of Prediction Market Agents

Prediction Market Agents are currently entering an early practice stage. Their value lies not in “AI predicting more accurately,” but in amplifying information processing and execution efficiency within prediction markets. Prediction markets are essentially information aggregation mechanisms where price reflects the collective judgment of event probability; real-world market inefficiencies stem from information asymmetry, liquidity, and attention constraints. The reasonable positioning for a Prediction Market Agent is Executable Probabilistic Portfolio Management: converting news, rule texts, and on-chain data into verifiable pricing deviations, executing strategies in a faster, more disciplined, and lower-cost manner, and capturing structural opportunities through cross-platform arbitrage and portfolio risk control.

An ideal Prediction Market Agent can be abstracted into a four-layer architecture:

Information Layer: Aggregates news, social media, on-chain, and official data.
Analysis Layer: Uses LLMs and ML to identify mispricing and calculate Edge.
Strategy Layer: Converts Edge into positions using the Kelly Criterion, staggered entry, and risk control.
Execution Layer: Completes multi-market order placement, slippage and Gas optimization, and arbitrage execution, forming an efficient automated closed loop.

III. Strategy Framework for Prediction Market Agents

Unlike traditional trading environments, prediction markets have significant differences in settlement mechanisms, liquidity, and information distribution. Not all markets and strategies are suitable for automated execution. The core of a Prediction Market Agent lies in whether it is deployed in scenarios with clear rules, codifiability, and structural advantages. The following analysis covers target selection, position management, and strategy structure.

1. Prediction Market Target Selection

Not all prediction markets have tradable value. Participation value depends on: Settlement Clarity (are rules clear, is the data source unique), Liquidity Quality (market depth, spread, and volume), Insider Risk (degree of information asymmetry), Time Structure (expiration time and event pacing), and the trader’s own Information Advantage and Professional Background. A prediction market only has a basis for participation when most dimensions meet basic requirements. Participants should match based on their own strengths and market characteristics:

Human Core Advantage: Markets relying on domain expertise, judgment, and integration of ambiguous information, with relatively loose time windows (days/weeks). Typical examples: Political elections, macro trends, and corporate milestones.
AI Agent Core Advantage: Markets relying on data processing, pattern recognition, and rapid execution, with extremely short decision windows (seconds/minutes). Typical examples: High-frequency crypto prices, cross-market arbitrage, and automated market making.
Unsuitable Areas: Markets dominated by insider information or purely random/highly manipulated markets, which offer no advantage to any participant.

2. Position Management in Prediction Markets

The Kelly Criterion is the most representative capital management theory in repeated games. Its goal is not to maximize the return of a single trade, but to maximize the long-term compound growth rate of capital. It calculates the theoretical optimal position ratio based on estimates of win rate and odds, improving capital growth efficiency under the premise of positive expectancy. It is widely used in quantitative investment, professional gambling, poker, and asset management.

Classic Formula: f^* = (bp — q) / b
Where f∗ is optimal betting fraction, b is net odds, p is win rate, and q=1−p.
Simplified for PM: f^* = (p — market\_price) / (1 — market\_price)
Where p is the subjective true probability, market\_price is the market implied probability.

The theoretical effectiveness of the Kelly formula is highly dependent on accurate estimates of true probability and odds. In reality, traders find it difficult to consistently and accurately grasp the true probability. In practice, professional gamblers and prediction market participants tend to adopt rule-based strategies that are more executable and less dependent on probability estimation:

Unit System: Splits capital into fixed units (e.g., 1%) and invests different numbers of units based on confidence levels. This automatically constrains single-bet risk through a unit cap and is the most common practical method.
Flat Betting: Uses a fixed percentage of capital for each bet. Emphasizes discipline and stability, suitable for risk-averse or low-conviction environments.
Confidence Tiers: Presets discrete position tiers and sets absolute caps to reduce decision complexity and avoid the false precision problem of the Kelly model.
Inverted Risk Approach: Calculates position size backwards starting from the maximum tolerable loss. It defines boundaries from risk constraints rather than profit expectations.

For Prediction Market Agents, strategy design should prioritize executability and stability over theoretical optimality. The key lies in clear rules, simple parameters, and tolerance for judgment errors. Under these constraints, the Confidence Tiers method combined with fixed position caps is the most suitable general position management scheme for PM Agents. This method does not rely on precise probability estimates but divides opportunities into limited tiers based on signal strength, setting clear caps to control risk even in high-conviction scenarios.

3. Strategy Selection for Prediction Markets

Structurally, strategies fall into two main categories: Deterministic Arbitrage strategies (characterized by clear rules and codifiability) and Speculative Directional strategies (relying on information interpretation and direction judgment). Additionally, there are Market Making and Hedging strategies, mainly for professional institutions with high capital and infrastructure requirements.

Deterministic Arbitrage Strategies (Arbitrage)

Resolution Arbitrage: Occurs when an event outcome is basically determined but the market hasn’t fully priced it in yet. Returns come from information synchronization and execution speed. Rules are clear, risk is low, and it is fully codifiable — the core strategy most suitable for Agent execution.
Dutch Book Arbitrage (Probability Conservation): Exploits structural imbalances where the sum of prices for a mutually exclusive and exhaustive set of events deviates from the probability conservation constraint ($\sum P \neq 1$). By building a portfolio, it locks in risk-free returns. It relies only on rules and price relationships, has low risk, and can be highly regularized. It is a typical deterministic arbitrage form suitable for automated Agent execution.
Cross-Platform Arbitrage: Profits by capturing pricing deviations for the same event across different markets. Low risk but high requirements for latency and parallel monitoring. Suitable for Agents with infrastructure advantages, but competition is intensifying, leading to declining marginal returns.
Bundle Arbitrage: Exploits pricing inconsistencies between related contracts. Logic is clear but opportunities are limited. Can be executed by Agents but requires some engineering for rule parsing and portfolio constraints. Agent suitability is medium.

Speculative Directional Strategies (Speculative)

Structured Information Driven (Information Trading): Centers around clear events or structured information, such as official data releases, announcements, or ruling windows. As long as the information source is clear and trigger conditions are definable, Agents can leverage speed and discipline in monitoring and execution. However, when information turns into semantic judgment or scenario interpretation, human intervention is still needed.
Signal Following: Profits by following accounts or capital behaviors with historically superior performance. Rules are relatively simple and automatable. The core risk lies in signal decay and being front-run/counter-traded, requiring filtering mechanisms and strict position management. Suitable as an auxiliary strategy for Agents.
Unstructured / Noise-driven: Highly dependent on sentiment, randomness, or participation behavior. Lacks a stable, reproducible edge, and long-term expected value is unstable. Difficult to model and extremely high risk; not suitable for systematic Agent execution and not recommended as a long-term strategy.

High-Frequency Price & Liquidity Strategies (Market Microstructure): Relies on extremely short decision windows, continuous quoting, or high-frequency trading. Requirements for latency, models, and capital are extremely high. While theoretically suitable for Agents, they are often limited by liquidity and competition intensity in prediction markets, suitable only for a few participants with significant infrastructure advantages.

Risk Control & Hedging: Does not directly seek profit but is used to reduce overall risk exposure. Clear rules and objectives; runs long-term as an underlying risk control module.

Summary: Strategies suitable for Agent execution in prediction markets are concentrated in scenarios with clear rules, codifiability, and weak subjective judgment. Deterministic arbitrage should be the core revenue source, with structured information and signal following strategies as supplements. High-noise and emotional trading should be systematically excluded. An Agent’s long-term advantage lies in disciplined, high-speed execution and risk control capabilities.

IV. Business Models and Product Forms of Prediction Market Agents

Ideal business model designs for Prediction Market Agents have exploration space at different levels:

Infrastructure Layer: Provides multi-source real-time data aggregation, Smart Money address libraries, unified prediction market execution engines, and backtesting tools. Charges B2B fees to obtain stable revenue unrelated to prediction accuracy.
Strategy Layer: Introduces community and third-party strategies to build a reusable, evaluable strategy ecosystem. Captures value through calls, weights, or execution profit-sharing, reducing dependence on a single Alpha.
Agent / Vault Layer: Agents directly participate in live trading via entrusted management, relying on on-chain transparent records and strict risk control systems to earn management fees and performance fees based on capability.

Corresponding product forms can be divided into:

Entertainment / Gamification Mode: Lowers participation barriers through Tinder-like intuitive interaction. Has the strongest user growth and market education capability, making it an ideal entry point for breaking out of the niche, but needs to funnel users to subscription or execution products for monetization.
Strategy Subscription / Signal Mode: Does not involve capital custody, is regulatory-friendly with clear rights and responsibilities, and has a relatively stable SaaS revenue structure. It is currently the most feasible commercialization path. Its limitation is that strategies are easily copied and execution suffers from slippage. Long-term revenue ceilings are limited, but experience and retention can be significantly improved through a “Signal + One-Click Execution” semi-automated form.
Vault Custody Mode: Possesses scale effects and execution efficiency advantages, resembling asset management products. However, it faces multiple structural constraints such as asset management licenses, trust thresholds, and centralized technical risks. The business model is highly dependent on the market environment and sustained profitability. Unless possessing a long-term track record and institutional-grade endorsement, it should not be the main path.

Overall, a diversified revenue structure of “Infrastructure Monetization + Strategy Ecosystem Expansion + Performance Participation” helps reduce reliance on the single assumption that “AI consistently beats the market.” Even if Alpha converges as the market matures, underlying capabilities like execution, risk control, and settlement retain long-term value, thus building a more sustainable business closed loop.

V. Project Cases of Prediction Market Agents

Currently, Prediction Market Agents are still in the early exploration stage. Although the market has seen diverse attempts from underlying frameworks to upper-layer tools, a standardized product that is mature in strategy generation, execution efficiency, risk control systems, and business closed loops has not yet formed.

We classify the current ecosystem landscape into three levels: Infrastructure, Autonomous Agents, and Prediction Market Tools.

Infrastructure Layer

Polymarket Agents Framework

This official developer framework standardizes “connection and interaction,” handling data retrieval, order construction, and basic LLM interfaces. However, it functions primarily as an access standard rather than a turnkey solution; it solves “how to code an order” but leaves core trading capabilities — such as strategy generation, probability calibration, and risk management — entirely to the developer.

Gnosis Prediction Market Tools

Offering complete read/write support for the Gnosis ecosystem (Omen/Manifold), this toolset provides only read access for Polymarket, creating clear ecosystem barriers. It serves as a strong foundation for Gnosis-native agents but has limited utility for cross-platform development.

Polymarket and Gnosis are currently the only prediction market ecosystems that have clearly productized “Agent Development” into official frameworks. Other prediction markets like Kalshi still mainly remain at the API and Python SDK level, requiring developers to self-complete key system capabilities like strategy, risk control, operation, and monitoring.

Autonomous Agents

Current “Prediction Market AI Agents” on the market are mostly still in early stages. Although labeled “Agent,” their actual capabilities are significantly far from delegatable automated closed-loop trading. They generally lack independent, systematic risk control layers and have not incorporated position management, stop-loss, hedging, and expected value constraints into the decision process. Overall productization is low, and mature systems for long-term operation have not yet formed.

Olas Predict

Olas Predict is currently the most productized prediction market agent ecosystem. Its core product “Omenstrat” is built on Omen within the Gnosis system, utilizing FPMM and decentralized arbitration mechanisms. It supports small-scale high-frequency interactions but is constrained by Omen’s limited single-market liquidity. Its “AI prediction” primarily relies on generic LLMs, lacking real-time data and systematic risk control, with historical win rates varying significantly across categories.

In February 2026, Olas launched “Polystrat”, extending Agent capabilities to Polymarket — users can define strategies in natural language, and the Agent automatically identifies probability deviations in markets settling within 4 days and executes trades. The system controls risk through Pearl local execution, self-custodied Safe accounts, and hardcoded limits, making it the first consumer-grade autonomous trading Agent for Polymarket.

UnifAI Network Polymarket Strategy

Provides automated trading Agent for Polymarket, with a core tail risk strategy: scanning contracts near settlement with >95% implied probability and buying in, targeting 3–5% spread capture. On-chain data shows a win rate close to 95%, but returns diverge significantly across categories. The strategy is highly dependent on execution frequency and category selection.

NOYA.ai

Attempts a comprehensive “Research-Judgment-Execution” closed loop. Its architecture features an Intelligence Layer for signal aggregation and an Abstraction Layer using Intents to manage cross-chain complexity. Currently, its Omnichain Vaults have been delivered; the Prediction Market Agent remains under development, and a complete mainnet closed loop has not yet formed. Overall, it is in the vision validation stage.

Prediction Market Tools

Current prediction market analysis tools are insufficient to constitute complete “Prediction Market Agents.” Their value is mainly concentrated in the Information and Analysis layers of the agent architecture; trade execution, position management, and risk control must still be borne by the trader. Product forms align more with “Strategy Subscription / Signal Assistance / Research Enhancement” and can be viewed as early prototypes of Prediction Market Agents.

Based on a systematic review of Awesome-Prediction-Market-Tools, we selected representative projects with preliminary product forms:

Market Analysis Tools

Polyseer : Research-oriented tool using a multi-Agent architecture (Planner/Researcher/Critic/Analyst/Reporter) for evidence collection and Bayesian aggregation to output structured reports. Transparent methodology, open-source.
Oddpool: “Bloomberg Terminal for Prediction Markets,” aggregating Polymarket, Kalshi, CME, etc., with arbitrage scanning.
Polymarket Analytics: Global data analysis platform for Polymarket, showing trader, market, position, and volume data.
Hashdive: Trader-oriented data tool using Smart Score to identify “Smart Money.”
Polyfactual : Focuses on AI market intelligence and sentiment/risk analysis via Chrome extension.
Predly: AI mispricing detection platform comparing market prices with AI-calculated probabilities on Polymarket and Kalshi. Claims 89% alert accuracy.
Polysights: Covers 30+ markets and on-chain metrics with Insider Finder tracking new wallets and large unidirectional bets.
PolyRadar: Multi-model parallel analysis with real-time interpretation, timeline evolution, and confidence scoring.
Alphascope: AI-driven intelligence engine for real-time signals and research summaries (early stage).

Alerts / Whale Tracking

Stand: Focuses on whale copy-trading and high-conviction alerts.
Whale Tracker Livid : Productizes whale position changes.

Arbitrage Discovery Tools

ArbBets: AI-driven tool identifying cross-platform arbitrage (Polymarket, Kalshi, Sportsbooks).
PolyScalping: Real-time arbitrage and scalping analysis for Polymarket (1-minute scans).
Eventarb : Lightweight cross-platform arbitrage calculator (Polymarket, Kalshi, Robinhood).
Prediction Hunt: Cross-exchange aggregator comparing prices for arbitrage (Polymarket, Kalshi, PredictIt).

Trading Terminals / Aggregated Execution

Verso: Institutional-grade terminal (YC Fall 2024) with Bloomberg-style interface, covering 15,000+ contracts across Polymarket and Kalshi with AI news intelligence.
Matchr: Cross-platform aggregator covering 1,500+ markets with smart routing for optimal price matching and planned automated yield strategies.
TradeFox: Professional aggregation and Prime Brokerage platform backed by Alliance DAO and CMT Digital. Offers advanced order execution (limit, stop-loss, TWAP), self-custody, and multi-platform smart routing. Expanding to Kalshi, Limitless, and SxBet.

VI. Summary and Outlook

Currently, Prediction Market Agents are in the early exploration stage of development.

Market Essence: Backed by the Polymarket and Kalshi duopoly, prediction markets differ from gambling by acting as a “Global Truth Layer” that aggregates information via real-money trading.
Core Positioning: Agents function as Executable Probabilistic Portfolio Management tools. They convert data into verifiable pricing deviations, prioritizing discipline and execution speed.
Strategy & Risk: Deterministic Arbitrage is the optimal strategy for automation, with speculation serving only as a supplement. Risk management should prioritize executability using Confidence Tiers with Fixed Caps.
Business Model: The most sustainable path combines Infrastructure (B2B data/execution fees), Strategy Ecosystems (third-party licensing), and Vaults (performance-based asset management).

Despite the emergence of diverse tools and frameworks in the ecosystem, a mature, standardized product capable of closing the loop on strategy generation, execution efficiency, and risk control has yet to appear. We look forward to the continued iteration and evolution of Prediction Market Agents.

Disclaimer: This article was created with the assistance of AI tools including ChatGPT-5.2, Gemini 3, and Claude Opus 4.5. While the author has strived for accuracy, errors may exist. Please note that crypto asset fundamentals often diverge from secondary market prices. This content is for information and research purposes only and does not constitute investment advice or a recommendation to buy or sell any tokens.

概率成为资产：预测市场智能体前瞻

BroadNotes by 0xjacobzhao — Wed, 04 Mar 2026 08:04:14 GMT

作者：0xjacobzhao | https://linktr.ee/0xjacobzhao

在过往Crypto AI系列研报中我们持续强调的观点：当前加密领域最具实际应用价值的场景，主要集中在稳定币支付与DeFi，而Agent是AI产业面向用户的关键界面。因此，在Crypto与AI融合的趋势中，最具价值的两条路径分别是：短期内基于现有成熟DeFi协议（借贷、流动性挖矿等基础策略，以及Swap、Pendle PT、资金费率套利等高级策略）的AgentFi，以及中长期围绕稳定币结算、并依托ACP/AP2/x402/ERC-8004等协议的Agent Payment。

预测市场在2025年已成为不容忽视的行业新趋势，其年度总交易量从2024年的约90亿美元激增至2025年的超过400亿美元，实现超过400%的年同比增长。这一显著增长由多重因素共同推动：宏观政治事件带来不确定性需求，基础设施与交易模式的成熟，以及监管环境出现破冰（Kalshi胜诉与Polymarket回归美国）。预测市场智能体(Prediction Market Agent)在2026年初呈现早期雏形，有望在未来一年成为智能体领域的新兴产品形态。

一、预测市场：从下注工具到“全球真相层”

预测市场是一种围绕未来事件结果进行交易的金融机制，合约价格本质上反映了市场对事件发生概率的集体判断。其有效性源于群体智慧与经济激励的结合：在匿名、真金白银下注的环境中，分散信息被快速整合为按资金意愿加权的价格信号，从而显著降低噪音与虚假判断。

预测市场名义交易量趋势图 数据来源：Dune Analytics (Query ID: 5753743)

截至2025年底，预测市场已基本形成 Polymarket与Kalshi 双寡头主导的格局。据《福布斯》统计，2025年总交易量约达440亿美元，其中Polymarket贡献约215亿美元，Kalshi约为171亿美元。2026年2月周数据显示Kalshi交易量（$25.9B）已超过Polymarket（$18.3B），接近50%市场份额，Kalshi凭借此前选举合约案的法律胜诉、在美国体育预测市场的合规先发优势，以及相对明确的监管预期，实现了快速扩张。目前，二者的发展路径已呈现清晰分化：

Polymarket 采用“链下撮合、链上结算”的混合CLOB架构与去中心化结算机制，构建起全球化、非托管的高流动性市场，合规重返美国后形成“在岸+离岸”双轨运营结构；
Kalshi 融入传统金融体系，通过API接入主流零售券商，吸引华尔街做市商深度参与宏观与数据型合约交易，产品受制于传统监管流程，长尾需求与突发事件相对滞后。

除Polymarket与Kalshi之外，预测市场领域具备竞争力的其他参与者主要沿着两条路径发展：

一是合规分发路径，将事件合约嵌入券商或大型平台的既有账户与清算体系，依托渠道覆盖、合规资质与机构信任建立优势（如 Interactive Brokers × ForecastEx 的 ForecastTrader，FanDuel × CME Group 的 FanDuel Predicts），合规与资源优势显著，但产品与用户规模仍早期。
二是Crypto原生链上路径，以 Opinion.trade、Limitless、Myriad 为代表，借助积分挖矿、短周期合约与媒体分发实现快速放量，强调性能与资金效率，但其长期可持续性与风控稳健性仍有待验证。

传统金融合规入口与加密原生性能优势这两类路径共同构成预测市场生态的多元竞争格局。

预测市场表面上与赌博相似，本质是零和博弈，但二者的核心区别在于是否具有正外部性：通过真金白银的交易聚合分散信息，对现实事件进行公共定价，形成有价值的信号层。其趋势正从博弈转向“全球真相层” — — 随着CME、彭博等机构的接入，事件概率已成为可被金融与企业系统直接调用的决策元数据，提供更及时、可量化的市场化真相。

从全球监管现状看，预测市场的合规路径高度分化。美国是唯一明确将预测市场纳入金融衍生品监管框架的主要经济体，欧洲、英国、澳大利亚、新加坡等市场普遍将其视为博彩并趋于收紧监管，中国、印度等则完全禁止，预测市场未来全球化扩张仍依赖于各国的监管框架。

二、预测市场智能体的架构设计

当下预测市场智能体(Prediction Market Agent)正在进入早期实践阶段，其价值不在于“AI 预测更准”，而在于放大预测市场中的信息处理与执行效率。预测市场本质是信息聚合机制，价格反映对事件概率的集体判断；现实中的市场低效源于信息不对称、流动性与注意力约束。预测市场智能体的合理定位是可执行的概率资产管理（Executable Probabilistic Portfolio Management）：将新闻、规则文本与链上数据转化为可验证的定价偏差，以更快、更纪律化、低成本的方式执行策略，并通过跨平台套利与组合风控捕获结构性机会。

理想的预测市场智能体 可抽象为四层架构：

信息层汇集新闻、社交、链上与官方数据；
分析层以 LLM 与 ML 识别错价并计算 Edge；
策略层通过凯利公式、分批建仓与风控将 Edge 转化为仓位；
执行层完成多市场下单、滑点与 Gas 优化与套利执行，形成高效自动化闭环。

三、预测市场智能体的策略框架

不同于传统交易环境，预测市场在结算机制、流动性与信息分布上具有显著差异，并非所有市场与策略都适合自动化执行。预测市场智能体的核心在于是否被部署于规则清晰、可编码且符合其结构性优势的场景中。下文将从标的选择、仓位管理与策略结构三个层面展开分析。

预测市场的标的选择

并非所有预测市场都具备可交易价值，其参与价值取决于：结算清晰度（规则是否明确、数据源是否唯一）、流动性质量（市场深度、点差与成交量）、内幕风险（信息不对称程度）、时间结构（到期时间与事件节奏）、以及交易者自身的信息优势与专业背景。仅多数维度满足基本要求时，预测市场才具备参与的基础，参与者应依据自身优势与市场特性进行匹配：

人类核心优势：依赖专业知识、判断力与模糊信息整合，且时间窗口相对宽松（以天/周计）的市场。典型如政治选举、宏观趋势及企业里程碑。
AI Agent核心优势：依赖数据处理、模式识别与快速执行，且决策窗口极短（以秒/分计）的市场。典型如高频加密价格、跨市场套利及自动化做市。
不适配领域：由内幕信息主导或纯随机/高操纵性的市场，对任何参与者不构成优势。

预测市场的仓位管理

凯利公式（Kelly Criterion）是重复博弈场景中最具代表性的资金管理理论，其目标并非最大化单次收益，而是最大化资金的长期复利增长率。该方法基于对胜率与赔率的估计，计算理论最优仓位比例，在具备正期望的前提下提升资本增长效率，广泛应用于量化投资、职业博彩、扑克及资产管理领域。

经典形式为： f^* = (bp — q) / b

其中，f∗为最优投注比例，b为净赔率，p为胜率，q=1−p

预测市场可简化为：f^* = (p — market\_price) / (1 — market\_price)

其中，p为主观真实概率，market_price 为市场隐含概率

凯利公式的理论有效性高度依赖对真实概率与赔率的准确估计，现实中交易者难以持续准确地掌握真实概率，在实际操作中，职业博彩者与预测市场参与者更倾向采用可执行性更强、对概率估计依赖更低的规则化策略：

Unit System（单位下注法）：将资金拆分为固定单位（如 1%），根据信心等级投入不同单位数，通过单位上限自动约束单笔风险，是最常见的实务方法。
固定比例法（Flat Betting）：每次下注使用固定资金比例，强调纪律性与稳定性，适合风险厌恶型或低确信度环境。
阶梯信心法（Confidence Tiers）：预设离散仓位档位并设置绝对上限，以降低决策复杂度，避免凯利模型的伪精确问题。
反向风险法（Inverted Risk Approach）：以可承受最大亏损为起点反推仓位规模，从风险约束而非收益预期出发，形成稳定的风险边界。

对于预测市场智能体而言，策略设计应优先强调可执行性与稳定性，而非追求理论最优。关键在于规则清晰、参数简洁、对判断误差具备容错性。在此约束下，阶梯信心法结合固定仓位上限是最适合 PM Agent 的通用仓位管理方案。该方法不依赖精确概率估计，而是根据信号强弱将机会划分为有限档位并对应固定仓位；即便在高确信场景下亦设定明确上限控制风险。

预测市场的策略选择

从策略结构看，预测市场主要可分为两大类：以规则清晰、可编码为特征的确定性套利策略（Arbitrage），以及依赖信息解读与方向判断的投机类方向策略（Speculative）；此外，还存在以专业机构为主、对资本与基础设施要求较高的做市与对冲策略。

确定性套利策略（Arbitrage）

结算套利（Resolution Arbitrage）： 结算套利发生在事件结果已基本确定、但市场尚未完全定价的阶段，收益主要来自信息同步与执行速度。该策略规则清晰、风险较低且可完全编码，是预测市场中最适合 Agent 执行的核心策略。
概率守恒套利（Dutch Book Arbitrage）：Dutch Book 套利利用互斥且完备事件集合的价格之和偏离概率守恒约束（∑P≠1）所形成的结构性失衡，通过组合建仓锁定无方向风险收益。该策略仅依赖规则与价格关系，风险较低且可高度规则化，是适合 Agent 自动化执行的典型确定性套利形式。
跨平台套利： 跨平台套利通过捕捉同一事件在不同市场间的定价偏差获利，风险较低但对延迟与并行监控要求较高。该策略适合具备基础设施优势的 Agent 执行，但竞争加剧使边际收益持续下降。
组合套利（Bundle）： 组合套利利用相关合约之间的定价不一致进行交易，逻辑清晰但机会有限。该策略可由 Agent 执行，但对规则解析与组合约束有一定工程要求，Agent 适配度中等。

投机类方向策略（Speculative）

结构化信息驱动策略（Information Trading）：该类策略围绕明确事件或结构化信息展开，如官方数据发布、公告或裁决窗口。只要信息来源清晰、触发条件可定义，Agent 可在监测与执行层面发挥速度与纪律优势；但当信息转为语义判断或情景解读时，仍需人类介入。
信号跟随策略（Signal Following）：该策略通过跟随历史表现较优的账户或资金行为获取收益，规则相对简单、可自动化执行。其核心风险在于信号退化与被反向利用，因此需要过滤机制与严格的仓位管理。适合作为 Agent 的辅助型策略。
非结构化与高噪声策略（Unstructured / Noise-driven）：该类策略高度依赖情绪、随机性或参与行为，缺乏稳定可复制的 edge，长期期望值不稳定。由于难以建模、风险极高，不适合 Agent 系统性执行，也不建议作为长期策略。

高频价格与流动性策略（Market Microstructure）：该类策略依赖极短决策窗口、持续报价或高频交易，对延迟、模型与资本要求极高。虽然理论上适合 Agent，但在预测市场中往往受限于流动性与竞争强度，仅适合少数具备显著基础设施优势的参与者。

风险管理与对冲策略（Risk Control & Hedging）：该类策略并不直接追求收益，而是用于降低整体风险暴露。规则明确、目标清晰，作为底层风险控制模块长期运行。

总体而言，预测市场中适合 Agent 执行的策略集中于规则清晰、可编码且弱主观判断的场景，其中确定性套利应作为核心收益来源，结构化信息与信号跟随策略作为补充，高噪声与情绪型交易应被系统性排除。Agent 的长期优势在于高纪律、高速度的执行与风险控制能力。

四、预测市场智能体商业模式与产品形态

预测市场智能体的理想的商业模式设计在不同层级有不同方向的探索空间：

基建层(Infrastructure )，提供多源实时数据聚合、Smart Money 地址库、统一的预测市场执行引擎与回测工具，向 B2B收费，获取与预测准确率无关的稳定收入；
策略层(Strategy) ，引入社区与第三方策略，构建可复用、可评估的策略生态，并通过调用、权重或执行分成实现价值捕获，从而降低对单一 Alpha 的依赖。
Agent / Vault 层，智能体以受托管理方式直接参与实盘执行，依托链上透明记录与严格风控体系，收取管理费与绩效费兑现能力。

而不同商业模式对应的产品形态，亦可以划分为：

娱乐化 / 游戏化模式：通过类 Tinder 的直觉交互降低参与门槛，具备最强的用户增长与市场教育能力，是实现破圈的理想入口，但需承接至订阅或执行型产品变现。
策略订阅 / 信号模式：不涉及资金托管，监管友好、权责清晰，SaaS 收入结构相对稳定，是当前阶段最可行的商业化路径。其局限在于策略易被复制、执行存在损耗，长期收入天花板有限，可通过“信号 + 一键执行”的半自动化形态显著改善体验与留存。
Vault 托管模式：具备规模效应与执行效率优势，形态接近资管产品，但面临资产管理牌照、信任门槛与集中化技术风险等多重结构性约束，商业模式高度依赖市场环境与持续盈利能力。除非具备长期业绩与机构级背书，否则不宜作为主路径。

总体而言，“基础设施变现 + 策略生态扩展 + 业绩参与”的多元收入结构，有助于降低对“AI 持续战胜市场”的单一假设依赖。即便 Alpha 随市场成熟而收敛，执行、风控与结算等底层能力仍具长期价值，从而构建更具可持续性的商业闭环。

五、预测市场智能体的项目案例

目前，预测市场智能体（Prediction Market Agents）仍处于早期探索阶段。市场虽然涌现出从底层框架到上层工具的多样化尝试，但尚未形成一套在策略生成、执行效率、风控体系及商业闭环上均成熟的标准化产品。

我们将目前的生态版图划分为三个层级：基础设施层（Infrastructure）、自主交易智能体（Autonomous Agents） 以及 预测市场工具（Prediction Market Tools）。

基础设施层（Infrastructure）

Polymarket Agents框架：

Polymarket Agents Polymarket 官方推出的开发者框架，旨在解决“连接与交互”的工程标准化问题。该框架封装了市场数据获取、订单构建及基础的 LLM 调用接口。它解决了“如何用代码下单”的问题，但在核心的交易能力 — — 如策略生成、概率校准、动态仓位管理及回测系统上基本留白。它更像是官方认可的“接入规范”，而非具备 Alpha 收益的成品。商业级的 Agent 仍需在此基础上自建完整的投研与风控内核。

Gnosis 预测市场工具：

Gnosis Prediction Market Agent Tooling（PMAT）对 Omen/AIOmen 及 Manifold 提供了完整的读写支持，但对 Polymarket 仅开放只读权限，生态壁垒明显。它适合作为 Gnosis 体系内Agent 的开发基石，但对于以 Polymarket 为主战场的开发者而言，实用性有限。

Polymarket 与 Gnosis 是目前将“Agent 开发”明确产品化为官方框架的预测市场生态。 Kalshi 等其他预测市场仍主要停留在 API 及 Python SDK层，开发者需自行补齐策略、风控、运行与监控等关键系统能力。

自主交易智能体（Autonomous Agent）

当前市场上的“预测市场 AI Agents”多仍处于早期阶段，虽冠以“Agent”之名，但实际能力距离可放权的自动化闭环交易仍有显著差距，普遍缺乏独立、系统化的风控层，未将仓位管理、止损、对冲与期望值约束纳入决策流程，整体产品化程度偏低尚未形成可长期运行的成熟系统。

Olas Predict：是当前产品化程度最高的预测市场智能体生态。其核心产品 Omenstrat 基于 Gnosis 体系内的 Omen 构建，底层采用 FPMM 与去中心化仲裁机制，支持小额高频交互，但受限于 Omen 单市场流动性不足。其”AI 预测”主要依赖通用 LLM，缺乏实时数据与系统化风控，历史胜率在品类间分化明显。2026年2月，Olas 推出 Polystrat，将 Agent 能力扩展至 Polymarket — — 用户可用自然语言设定策略，Agent 自动识别 4 天内结算市场的概率偏差并执行交易。系统通过 Pearl 本地运行、自托管 Safe 账户与硬编码限制控制风险，是目前首个面向 Polymarket 的消费级自主交易 Agent。

UnifAI Network Polymarket Strategy：提供 Polymarket 自动化交易 Agent，核心为尾部风险承担策略：扫描隐含概率 >95% 的临近结算合约并买入，目标获取 3–5% 价差。链上数据显示胜率接近 95%，但收益在品类间分化明显，策略高度依赖执行频率与品类选择。

NOYA.ai 试图将”研究 — 判断 — 执行 — 监控”整合为 Agent 闭环，架构涵盖情报层、抽象层与执行层。当前已交付 Omnichain Vaults；Prediction Market Agent 仍处开发阶段，尚未形成完整主网闭环，整体处于愿景验证期。

预测市场工具 (Prediction Market Tools)

当前预测市场分析工具尚不足以构成完整的“预测市场智能体”，其价值主要集中在智能体架构中的信息层与分析层，交易执行、仓位管理与风险控制仍需由交易者自行承担。从产品形态看，更符合“策略订阅 / 信号辅助 / 研究增强”的定位，可被视为预测市场智能体的早期雏形。

通过对 Awesome-Prediction-Market-Tools 收录项目的系统梳理与实证筛选，本文选取其中已具备初步产品形态与使用场景的代表性项目作为研报案例。主要集中于四个方向：分析与信号层、警报与鲸鱼追踪系统、套利发现工具和交易终端与聚合执行。

市场分析工具

Polyseer ：研究型预测市场工具，采用多 Agent 分工架构（Planner / Researcher / Critic / Analyst / Reporter）进行双边证据搜集与贝叶斯概率聚合，输出结构化研报。其优势在于方法论透明、流程工程化、完全开源可审计。
Oddpool ：定位为“预测市场的 Bloomberg 终端”，提供 Polymarket、Kalshi、CME 等跨平台聚合、套利扫描与实时数据仪表盘终端。
Polymarket Analytics：全球化的 Polymarket 数据分析平台，系统性展示交易者、市场、仓位与成交数据，定位清晰、数据直观，适合作为基础数据查询与研究参考。
Hashdive：面向交易者的数据工具，通过 Smart Score 与多维 Screener 量化筛选交易者与市场，在“聪明钱识别”和跟单决策上具备实用性。
Polyfactual ：聚焦 AI 市场情报与情绪/风险分析，通过 Chrome 扩展将分析结果嵌入交易界面，偏向 B2B 与机构用户场景。
Predly ：AI 错价检测平台，通过对比市场价格与 AI 计算概率识别 Polymarket 与 Kalshi 的定价偏差，官方声称警报准确率达 89%，定位于信号发现与机会筛选。
Polysights : 覆盖 30+ 市场与链上指标，并以 Insider Finder 追踪新钱包、大额单向下注等异常行为，适合日常监控与信号发现。
PolyRadar ：多模型并行分析平台，对单一事件提供实时解读、时间线演化、置信度评分与来源透明度，强调多 AI 交叉验证，定位分析工具。
Alphascope ：AI 驱动的预测市场情报引擎，提供实时信号、研究摘要与概率变化监控，整体仍处早期阶段，偏研究与信号支持。

警报/鲸鱼追踪

Stand: 明确定位鲸鱼跟单与高确信动作提醒。
Whale Tracker Livid ：将鲸鱼仓位变化产品化

套利发现工具：

ArbBets : AI 驱动的套利发现工具，聚焦于 Polymarket、Kalshi 及体育博彩市场，识别跨平台套利与正期望值（+EV）交易机会，定位于高频机会扫描层。
PolyScalping : 面向 Polymarket 的实时套利与剥头皮分析平台，支持每 60 秒全市场扫描、ROI 计算与 Telegram 推送，并可按流动性、价差与成交量等维度筛选机会，偏向主动交易者。
Eventarb : 轻量级跨平台套利计算与提醒工具，覆盖 Polymarket、Kalshi 与 Robinhood，功能聚焦、免费使用，适合作为基础套利辅助。
Prediction Hunt：跨交易所预测市场聚合与对比工具，提供 Polymarket、Kalshi 与 PredictIt 的实时价格比较与套利识别（约 5 分钟刷新），定位于信息对称与市场低效发现。

交易终端/聚合执行

Verso：获 YC Fall 2024 支持的机构级预测市场交易终端，提供 Bloomberg 风格界面，覆盖 Polymarket 与 Kalshi 的 15,000+ 合约实时追踪、深度数据分析与 AI 新闻情报，定位于专业与机构交易者。
Matchr：跨平台预测市场聚合与执行工具，覆盖 1,500+ 市场，通过智能路由实现最优价格撮合，并规划基于高概率事件、跨场套利与事件驱动的自动化收益策略，定位于执行与资金效率层。
TradeFox：由 Alliance DAO 与 CMT Digital 支持的专业预测市场聚合与 Prime Brokerage 平台，提供高级订单执行（限价单、止盈止损、TWAP）、自托管交易与多平台智能路由，定位机构级交易者，计划扩展至 Kalshi、Limitless、SxBet 等平台。

六、总结与展望

当前，预测市场智能体(Prediction Market Agent)正处于发展的早期探索阶段。

市场基础与本质演进：Polymarket与Kalshi已形成双寡头结构，围绕其构建智能体具备充分的流动性与场景基础。预测市场与赌博的核心区别在于正外部性，通过真实交易聚合分散信息，对现实事件进行公共定价，逐步演化为“全球真相层”。
核心定位：预测市场智能体应定位为可执行的概率资产管理工具，其核心任务是将新闻、规则文本与链上数据转化为可验证的定价偏差，并以更高纪律性、更低成本和跨市场能力执行策略。理想架构可抽象为信息、分析、策略与执行四层，但其实际可交易性高度依赖于结算的清晰度、流动性的质量以及信息的结构化程度。
策略选择与风控逻辑：从策略层面看，确定性套利（包括结算套利、概率守恒套利及跨平台价差交易）最适合由智能体自动化执行，而方向性投机仅可作为补充。在仓位管理上，应优先考虑可执行性与容错性，阶梯法结合固定仓位上限最适合。
商业模式与前景：商业化主要分为三层：基建层以数据执行基础设施获取稳定 B2B 收入，策略层通过第三方策略调用或分成变现，Agent/Vault 层在链上透明风控约束下参与实盘并收取管理费与绩效费。对应形态包括娱乐化入口、策略订阅/信号（当前最可行）及高门槛的 Vault 托管，“基建 + 策略生态 + 业绩参与”为更可持续路径。

尽管预测市场智能体（Prediction Market Agents）生态中已涌现出从底层框架到上层工具的多样化尝试，但在策略生成、执行效率、风险控制与商业闭环等关键维度上，目前尚未出现成熟、可复制的标准化产品，我们期待未来预测市场智能体的迭代与进化。

免责声明：本文在创作过程中借助了 ChatGPT-5.2, Gemini 3和Claude Opus 4.5等 AI 工具辅助完成，作者已尽力校对并确保信息真实与准确，但仍难免存在疏漏，敬请谅解。需特别提示的是，加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流，不构成任何投资建议，亦不应视为任何代币的买卖推荐。

Ethereum Repricing: From Rollup-Centric to Security Settlement Layer

BroadNotes by 0xjacobzhao — Mon, 16 Feb 2026 14:45:04 GMT

Author: Jacob Zhao, Jiawei

On February 3, 2026, Vitalik published a significant reflection on the Ethereum scaling roadmap on X. As the practical difficulties of Layer 2 evolving into a fully decentralized form are being re-evaluated, and with the mainnet’s own throughput expected to increase significantly in the coming years, the original assumption of relying solely on L2 for throughput scaling is being corrected. A new “Settlement-Service” collaborative paradigm is forming between L1 and L2: L1 focuses on providing the highest level of security, censorship resistance, and settlement sovereignty, while L2 evolves into “differentiated service providers” (such as privacy, AI, high-frequency trading). Ethereum’s strategic focus is returning to the mainnet itself, reinforcing its positioning as the world’s most trusted settlement layer. Scaling is no longer the sole objective; security, neutrality, and predictability are once again becoming Ethereum’s core assets.

Core Changes:

Ethereum is entering an “L1-First Paradigm”: With direct mainnet scaling and continuously decreasing fees, the original assumption relying on L2 to shoulder the core role of scaling no longer holds.
L2 is no longer “Branded Sharding,” but a Trust Spectrum: The progress of L2 decentralization is much slower than expected, making it difficult to uniformly inherit Ethereum’s security. Their role is being redefined as a spectrum of networks with different trust levels.
Ethereum’s core value is shifting from “Traffic” to “Settlement Sovereignty”: The value of ETH is no longer limited to Gas or Blob revenue, but lies in its institutional premium as the world’s most secure EVM settlement layer and native monetary asset.
Scaling strategy is adjusting towards protocol internalization: Based on continuous direct L1 scaling, the exploration of protocol-layer native verification and security mechanisms may reshape the security boundary and value capture structure between L1 and L2.
Valuation framework acts a structural migration: The weight of security and institutional credibility has risen significantly, while the weight of fees and platform effects has decreased. ETH’s pricing is shifting from a cash flow model to an asset premium model.

This article will analyze the paradigm shift in Ethereum’s pricing model and valuation reconstruction according to a layered approach: Facts (technological and institutional changes that have occurred), Mechanisms (impact on value capture and pricing logic), and Deductions (implications for allocation and risk-return).

I. Back to Origins: Ethereum Values

To understand the long-term value of Ethereum, the key lies not in short-term price fluctuations, but in its consistent design philosophy and value orientation.

Credible Neutrality: Ethereum’s core goal is not the maximization of efficiency or profit, but to become a set of credibly neutral infrastructure — with open rules, predictability, no favoritism towards any participant, no control by a single entity, and where anyone can participate without permission. The security of ETH and its on-chain assets ultimately depends on the protocol itself, not on any institutional credit.
Ecosystem First, Not Revenue First: Multiple key upgrades of Ethereum reflect a consistent decision-making logic — actively foregoing short-term protocol revenue in exchange for lower usage costs, larger ecosystem scale, and stronger system resilience. Its goal is not to “collect tolls,” but to become the irreplaceable neutral settlement and trust foundation in the digital economy.
Decentralization as a Means: The mainnet focuses on the highest level of security and finality, while Layer 2 networks are located on a connection spectrum with varying degrees to the mainnet: some inherit mainnet security and pursue efficiency, while others position themselves with differentiated functions. This enables the system to serve both global settlement and high-performance applications simultaneously, rather than L2s being “Branded Shards.”
Long-Termist Technical Route: Ethereum adheres to a slow but certain evolutionary path, prioritizing system security and credibility. From the PoS transition to subsequent scaling and confirmation mechanism optimizations, its roadmap pursues sustainable, verifiable, and irreversible correctness.

Security Settlement Layer: Refers to the Ethereum mainnet providing irreversible Finality services for Layer 2 and on-chain assets through decentralized validator nodes and consensus mechanisms.

This positioning as a Security Settlement Layer marks the establishment of “Settlement Sovereignty.” It is a transition for Ethereum from a “Confederation” to a “Federation,” representing the “Constitutional Moment” of the establishment of the Ethereum digital nation, and a significant upgrade to Ethereum’s architecture and core.

After the American Revolutionary War, under the Articles of Confederation, the 13 states were like a loose alliance. Each state printed its own currency and levied tariffs on others. Every state was free-riding: enjoying common defense but refusing to pay; enjoying the alliance’s brand but acting independently. This structural problem led to reduced national credit and an inability to unify foreign trade, severely hindering the economy.

1787 was America’s “Constitutional Moment.” The new Constitution granted the federal government three key powers: the power to tax directly, the power to regulate interstate commerce, and the power to unify currency. But what truly brought the federal government “to life” was Hamilton’s economic plan of 1790: the federal assumption of state debts, repayment at face value to rebuild national credit, and the establishment of a National Bank as a financial hub. A unified market released economies of scale, national credit attracted more capital, and infrastructure construction gained financing capability. The US moved from 13 mutually guarded small states to become the world’s largest economy.

Today’s structural dilemma in the Ethereum ecosystem is exactly the same.

Each L2 is like a “Sovereign State,” with its own user base, liquidity pool, and governance token. Liquidity is fragmented, cross-L2 interaction friction is high, and L2s enjoy Ethereum’s security layer and brand without being able to return value to L1. Locking liquidity on their own chain is short-term rational for each L2, but if all L2s do this, the core competitive advantage of the entire Ethereum ecosystem is lost.

The roadmap Ethereum is currently advancing is essentially its constitution-making and the establishment of a central economic system, that is, the establishment of “Settlement Sovereignty”:

Native Rollup Precompile = Federal Constitution. L2s can freely build differentiated functions outside the EVM, while the EVM part can obtain Ethereum-level security verification through native precompiles. Not connecting is an option, but the cost is losing trustless interoperability with the Ethereum ecosystem.
Synchronous Composability = Unified Market. Through mechanisms like Native Rollup Precompiles, trustless interoperability and synchronous composability between L2s and between L2 and L1 are becoming possible. This directly eliminates “interstate trade barriers,” and liquidity is no longer trapped in respective silos.
L1 Value Capture Reconstruction = Federal Taxing Power. When all critical cross-L2 interactions return to L1 for settlement, ETH re-becomes the settlement hub and trust anchor for the entire ecosystem. Whoever controls the settlement layer captures the value.

Ethereum is using a unified settlement and verification system to turn a fragmented L2 ecosystem into an irreplaceable “Digital Nation.” This is a historical inevitability. Of course, the transition process may be slow, but history tells us that once this transition is complete, the released network effects will far exceed the linear growth of the fragmentation era. The US used a unified economic system to turn 13 small states into the world’s largest economy. Ethereum will also transform a loose L2 ecosystem into the largest Security Settlement Layer, and even a global financial carrier.

Ethereum Core Upgrade Roadmap & Valuation Impact (2025–2026)

II. Valuation Misconceptions: Why Ethereum Should Not Be Viewed as a “Tech Company”

Applying traditional corporate valuation models (P/E, DCF, EV/EBITDA) to Ethereum is essentially a category error. Ethereum is not a company aiming for profit maximization, but an open digital economic infrastructure. Corporations pursue shareholder value maximization, while Ethereum pursues the maximization of ecosystem scale, security, and censorship resistance. To achieve this goal, Ethereum has repeatedly actively suppressed protocol revenue (e.g., via EIP-4844 introducing Blob DA to structurally lower L2 data publishing costs and suppress L1 revenue from rollup data) — which approximates “revenue self-destruction” from a corporate perspective, but from an infrastructure perspective, is sacrificing short-term fees for long-term neutrality premium and network effects.

A more reasonable framework is to view Ethereum as a globally neutral settlement and consensus layer: providing security, finality, and trusted coordination for the digital economy. ETH’s value is reflected across multiple structural demands — rigid demand for final settlement, the scale of on-chain finance and stablecoins, the impact of staking and burning mechanisms on supply, and long-term, sticky capital brought by institutional adoption such as ETFs, corporate treasuries, and RWAs.

III. Paradigm Restructuring: Finding the Pricing Anchor Beyond Cash Flow

The ethval.com launched by the Hashed team at the end of 2025 provided a detailed set of reproducible quantitative models for Ethereum, but traditional static models struggle to capture the dramatic pivot in Ethereum’s narrative in 2026. Therefore, we reused their systematic, transparent, and reproducible underlying models (covering yield, money, network effects, and supply structure), but reshaped the valuation architecture and weighting logic:

Structural Restructuring: Mapping models to four value quadrants: “Security, Money, Platform, Revenue,” aggregated for pricing.
Weight Rebalancing: Significantly increasing the weight of security and settlement premium, weakening the marginal contribution of protocol revenue and L2 expansion.
Risk Control Overlay: Introducing a circuit breaker mechanism sensing macro and on-chain risks, making the valuation framework adaptable across cycles.
Removing “Circular Reasoning”: Models containing current price inputs (like Staking Scarcity, Liquidity Premium) are no longer used as fair value anchors, but retained only as indicators for position and risk appetite adjustment.

Note: The following models are not for precise point prediction, but to depict the relative pricing direction of different value sources in different cycles.

1. Security Settlement Layer: Core Value Anchor (45%, Increased in Risk-Off)

We view the security settlement layer as Ethereum’s most core source of value and assign it a 45% benchmark weight; this weight is further increased during periods of rising macro uncertainty or declining risk appetite. This judgment stems from Vitalik’s latest definition of “truly scaling Ethereum”: the essence of scaling is not increasing TPS, but creating block space fully backed by Ethereum itself. Any high-performance execution environment relying on external trust assumptions does not constitute an extension of the Ethereum entity.

Under this framework, ETH’s value is mainly reflected as the credit premium of a global sovereign-less settlement layer, rather than protocol revenue. This premium is jointly supported by structural factors such as validator scale and degree of decentralization, long-term security record, institutional adoption, clarity of compliance paths, and protocol-endogenous Rollup verification mechanisms.

In specific pricing, we mainly use two complementary methods: Validator Economics (Yield Equilibrium Mapping) and Staking DCF (Perpetual Staking Discount), to jointly depict the institutional premium of ETH as the “Global Secure Settlement Layer.”

Validator Economics (Yield Equilibrium Pricing): Based on the ratio of annualized staking cash flow per ETH to the target real yield, deriving a theoretical fair price. This expression is used to depict the equilibrium relationship between yield and price, serving as a directional relative valuation tool rather than an independent pricing model.
Staking DCF (Perpetual Staking Discount): Viewing ETH as a long-term asset capable of generating sustainable real staking yields, discounting its cash flow in perpetuity. Essentially, this value layer does not benchmark against the revenue capability of platform companies, but is similar to the settlement credit of a global clearing network.

2. Monetary Attribute: Settlement and Collateral (35%, Dominant in Utility Expansion)

We view the monetary attribute as Ethereum’s second core source of value and assign it a 35% benchmark weight, becoming the main utility anchor in neutral markets or during on-chain economic expansion. This judgment is not based on the narrative that “ETH equals USD,” but on its structural role as the native settlement fuel and ultimate collateral asset of the on-chain financial system. The security of stablecoin circulation, DeFi liquidation, and RWA settlement all rely on the settlement layer supported by ETH.

For pricing, we use an extended form of the Quantity Theory of Money (MV = PQ), but model ETH’s usage scenarios in layers to address the order-of-magnitude differences in circulation velocity across different scenarios:

High-Frequency Settlement Layer (Gas Payment, Stablecoin Transfers)

M_transaction = Annual Transaction Settlement Volume / V_high
V_high ≈ 15–25 (Referencing historical on-chain data)

Medium-Frequency Financial Layer (DeFi Interaction, Lending Liquidation)

M_defi = Annual DeFi Settlement Volume / V_medium
V_medium ≈ 3–8 (Based on mainstream DeFi protocol capital turnover rate)

Low-Frequency Collateral Layer (Staking, Restaking, Long-term Locking)

M_collateral = Total ETH Collateral Value × (1 + Liquidity Premium)
Liquidity Premium = 10–30% (Reflecting compensation for liquidity sacrifice)

3. Platform / Network Effect: Growth Option (10%, Bull Market Amplifier)

Platform and network effects are viewed as growth options in Ethereum’s valuation, assigned only a 10% weight, used to explain the non-linear premium brought by ecosystem expansion during bull market phases. We use a trust-corrected Metcalfe model to avoid weighting L2 assets of different security levels equally in the valuation.

4. Revenue Asset: Cash Flow Floor (10%, Bear Market Bottom)

We view protocol revenue as the cash flow floor in the Ethereum valuation system, rather than a growth engine, also assigning a 10% weight. This layer mainly functions during bear markets or extreme risk phases to depict the valuation lower limit.

Gas and Blob fees provide the minimum operating cost for the network and affect the supply structure through EIP-1559. For valuation, we use Price-to-Sales (P/S) and Fee Yield models, taking the conservative value among them, serving only as a bottom reference. As the mainnet continues to scale, the relative importance of protocol revenue declines, with its core role reflected as a safety margin during downturns.

Price-to-Sales Model (P/S Floor): ETH Price (PS) = M_PS / Circulating Supply
Fee Yield Model: ETH Price(Yield) = M_Yield / Circulating Supply
Cash Flow Floor Pricing (Minimum Value Principle): P_Revenue_Floor = min(P_PS , P_Yield)

IV. Dynamic Calibration: Macro Constraints and Cycle Adaptation

If the previous text established Ethereum’s “intrinsic value pivot,” this chapter introduces an “external environment adaptation system” independent of fundamentals. Valuation cannot operate in a vacuum and must be constrained by three major external factors: Macro Environment (Cost of Capital), Market Structure (Relative Strength), and On-Chain Sentiment (Crowdedness). Based on this, we constructed a Regime Adaptation mechanism to dynamically adjust valuation weights across different cycles — releasing option premiums during loose periods and retreating to the revenue floor during risk-off periods, thereby achieving a leap from static models to dynamic strategies. (Note: Due to space limitations, this article only presents the core logical framework of this mechanism.)

V. The Conditional Path for the Institutional Second Curve

The analysis above is based on internal crypto technical, valuation, and cycle logic. This chapter discusses a problem at a different level: When ETH is no longer priced solely by crypto-native funds but is gradually integrated into the traditional financial system, how will its pricing power, asset attributes, and risk structure change? The “Institutional Second Curve” is not an extension of existing logic, but a redefinition of Ethereum by exogenous forces:

Change in Asset Attribute (Beta → Carry): Spot ETH ETFs solve compliance and custody issues, essentially still being price exposure; while the future advancement of Staking ETFs introduces on-chain yields into the institutional system via compliant carriers for the first time. ETH thus shifts from a “non-interest-bearing high-volatility asset” to an “allocation asset with predictable yield,” expanding potential buyers from trading funds to pension, insurance, and long-term accounts sensitive to yield and duration.
Change in Usage (Holding → Using): Institutions may no longer just view ETH as a tradable ticker, but start using it as settlement and collateral infrastructure. Whether it’s JPMorgan’s tokenized funds or the deployment of compliant stablecoins and RWAs on Ethereum, it indicates demand for ETH is shifting from “Holding Demand” to “Running Demand” — institutions not only hold ETH but use it for settlement, clearing, and risk management.
Change in Tail Risk (Uncertainty → Pricing): As stablecoin regulatory frameworks (like the GENIUS Act) are gradually established, and with increased transparency in Ethereum’s roadmap and governance, the regulatory and technical uncertainties most sensitive to institutions are being systematically compressed. This means uncertainty starts being priced in, rather than avoided.

The so-called “Institutional Second Curve” is a change in the nature of demand, providing a real demand source for the “Security Settlement Layer + Monetary Attribute” valuation logic, driving ETH to transition from a sentiment-driven speculative asset to a foundational asset carrying both allocation and functional needs.

VI. Conclusion: Value Anchoring in the Darkest Hour

In the past week, the industry has undergone a severe deleveraging wash, with market sentiment dropping to freezing point — undoubtedly a “darkest hour” for the crypto world. Pessimism is spreading among practitioners, and Ethereum, as the asset most representative of the crypto spirit, is also in the eye of the storm of controversy.

However, as rational observers, we need to pierce through the fog of panic: What Ethereum is currently experiencing is not a “collapse of value,” but a profound “migration of pricing anchor.” With L1 scaling advancing directly, L2 being redefined as a network spectrum of different trust levels, and protocol revenue actively giving way to system security and neutrality, ETH’s pricing logic has structurally shifted to “Security Settlement Layer + Native Monetary Attribute.”

Against the backdrop of high macro real interest rates, liquidity not yet being loose, and on-chain growth options not yet permitted to be priced by the market, ETH’s price naturally converges to a structural value range supported by settlement certainty, verifiable yield, and institutional consensus. This range is not a sentiment bottom, but a value pivot after stripping away platform growth premiums.

As long-term builders of the Ethereum ecosystem, we refuse to be “mindless bulls” for ETH. We hope to use a rigorous logical framework to carefully demonstrate our prediction: Only when macro liquidity, risk appetite, and network effects simultaneously meet market state trigger conditions will higher valuations be re-factored in by the market.

Therefore, for long-term investors, the critical question now is not anxiously asking “Can Ethereum still go up,” but to clearly recognize — in the current environment, which layer of core value are we buying at a “floor price”?

Disclaimer: This article was assisted by AI tools such as ChatGPT-5.2, Gemini 3, and Claude Opus 4.5 during the creation process. The author has made every effort to proofread and ensure the information is true and accurate, but omissions are inevitable, and we ask for your understanding. It should be specially noted that the crypto asset market universally experiences deviations between project fundamentals and secondary market price performance. The content of this article is for information consolidation and academic/research exchange only, does not constitute any investment advice, and should not be considered as a recommendation for any token.

以太坊再定价：从 Rollup-Centric 到“安全性结算层”

BroadNotes by 0xjacobzhao — Tue, 10 Feb 2026 04:25:02 GMT

作者：Jacob Zhao, Jiawei, Turbo @ IOSG Ventures

2026 年 2 月 3 日，Vitalik 在 X 上发表了关于以太坊扩容路线的重要反思。随着 Layer 2 向完全去中心化形态演进的现实难度被重新认识，同时主网自身吞吐能力在未来数年内预计大幅提升，单纯依赖 L2 进行吞吐量扩容的原始设想正在修正，L1 与 L2 正在形成新的‘结算-服务’协同范式： L1 专注于提供最高等级的安全性、抗审查性与结算主权，而 L2 则向‘差异化服务商’演进（如隐私、AI、高频交易），以太坊的战略重心正回归主网本身，强化其作为全球最可信结算层的定位。扩容不再是唯一目标，安全性、中立性与可预测性，重新成为以太坊的核心资产。

核心变化：

以太坊正在进入“L1 优先范式”： 随着主网直接扩展、费用持续下降，依赖 L2 承担规模化核心角色的原始假设已不再成立。
L2 不再是“品牌分片”，而是信任光谱： L2 去中心化推进远慢于预期，难以统一继承以太坊安全性，其角色正被重新定义为不同信任级别的网络光谱。
以太坊的核心价值从“流量”转向“结算主权”： ETH 的价值不再限于 Gas 或 Blob 收入，而在于其作为全球最安全 EVM 结算层与原生货币资产的制度性溢价。
扩展策略正在向协议内生化调整： 在 L1 持续直接扩展的基础上，协议层原生验证与安全机制的探索，可能重塑 L1–L2 的安全边界与价值捕获结构。
估值框架发生结构性迁移： 安全性与机构可信度权重显著上升，手续费与平台效应权重下降，ETH 的定价正从现金流模型转向资产溢价模型。

本文将依照事实（已发生的技术与制度变化）、机制（对价值捕获与定价逻辑的影响）、推演（对配置与风险回报的含义）的分层对以太坊定价模型的范式转变与估值重构展开分析。

一、原点回归：以太坊价值观

理解以太坊的长期价值，关键不在短期价格波动，而在于其始终如一的设计理念与价值取向。

可信中立性：以太坊的核心目标并非效率或利润最大化，而是成为一套可信中立的基础设施 — — 规则公开、可预测，不偏袒任何参与者，不受单一主体控制，任何人均可无需许可地参与。ETH 及其链上资产的安全性，最终依赖的是协议本身，而非任何机构信用。
生态优先非收入优先：以太坊多次关键升级体现出一致的决策逻辑 — — 主动放弃短期协议收入，以换取更低的使用成本、更大的生态规模与更强的系统韧性。其目标不是“收取过路费”，而是成为数字经济中不可替代的中立结算与信任底座。
去中心化作为手段：主网专注于最高等级的安全性与最终性，而 Layer 2 网络位于与主网不同程度的连接光谱上：有的继承主网安全性并追求效率，有的则以差异化功能为价值定位。使系统能够同时服务全球结算与高性能应用，而非 L2 “品牌分片”。
长期主义技术路线：以太坊坚持慢而确定的演进路径，优先保障系统安全与可信度。从 PoS 转型到后续扩容与确认机制优化，其路线图追求可持续、可验证、不可逆的正确性。

安全性结算层 (Security Settlement Layer)： 指以太坊主网通过去中心化验证节点和共识机制，为 Layer 2 及链上资产提供不可逆转的最终性（Finality）服务。

这种安全性结算层的定位，标志了“结算主权”的建立，是以太坊从“邦联制”转向“联邦制” 的转变，是以太坊数字国家建立的 “宪法时刻”，更是以太坊架构与核心的重要升级。

美国独立战争以后，在邦联制的条款下，13个州像是一个松散联盟，各州各印各的货币、互相征收关税，每个州都在搭便车：享受共同国防，却拒绝缴费；享受联盟的品牌，却各自为政。这个结构性的问题导致国家信用降低，并且无法统一对外贸易，严重阻碍经济。

1787年是美国的“宪法时刻”，新宪法赋予联邦政府三项关键权力：直接征税权、州际贸易管制权、统一货币权。但真正让联邦政府”活过来”的是汉密尔顿1790年的经济方案，联邦承担各州债务、按面值兑付重建国家信用、建立国家银行作为金融中枢。统一市场释放了规模效应，国家信用吸引了更多资本，基础设施建设获得了融资能力。美国从13个互相设防的小邦，走向了世界第一大经济体。

今天的以太坊生态的结构性困境完全一致。

每条L2就像一个”主权州”，各自有自己的用户群、流动性池和治理代币。流动性被切割成碎片，跨L2交互摩擦大，L2享受以太坊的安全层和品牌却无法回馈L1价值。每条L2把流动性锁在自己链上是短期理性的，但所有L2都这样做就导致整个以太坊生态的最核心的竞争优势丧失。

以太坊现在推进的路线图，本质上就是它的制宪和建立中央经济系统，也就是建立“结算主权”：

原生Rollup预编译（Native Rollup Precompile）= 联邦宪法。 L2可以在EVM之外自由构建差异化功能，而EVM部分可以通过原生预编译获得以太坊级别的安全验证。不接入当然也可以，但代价是失去与以太坊生态的免信任互操作性。
同步可组合性（Synchronous Composability）= 统一市场。 通过原生Rollup预编译等机制，L2之间、L2与L1之间的免信任互操作和同步可组合性正在成为可能，这直接消除了”州际贸易壁垒”，流动性不再被困在各自的孤岛上。
L1价值捕获重建 = 联邦征税权。 当所有关键的跨L2交互都回归L1结算时，ETH重新成为整个生态的结算中枢和信任锚点。谁控制结算层，谁就捕获价值。

以太坊正在用统一的结算和验证体系，把碎片化的L2生态变成一个不可替代的“数字国家”，这是一个历史必然。当然，转变的过程可能缓慢，而历史告诉我们，这个转变一旦完成，释放出的网络效应将远超碎片化时代的线性增长。美国用统一的经济系统把13个小邦变成了世界第一大经济体。以太坊也将把松散的L2生态转化成最大的安全性结算层，乃至全球金融载体。

以太坊核心升级路线图与估值影响 (2025–2026)

二、估值误区：为何不应将以太坊视为“科技公司”

将传统企业估值模型（P/E、DCF、EV/EBITDA）套用于以太坊，本质上是一种类别错误。以太坊并非以利润最大化为目标的公司，而是一套开放的数字经济基础设施。企业追求股东价值最大化，而以太坊追求的是生态规模、安全性与抗审查性的最大化。为实现这一目标，以太坊多次主动压低协议收入（如通过EIP-4844 通过引入 Blob DA，结构性下移 L2 数据发布成本，并压低 L1 来自 rollup 数据的费用收入） — — 在公司视角下近似“收入自毁”，但在基础设施视角下，则是以牺牲短期费用换取长期的中立性溢价与网络效应。

更合理的理解框架，是将以太坊视为全球中立的结算与共识层：为数字经济提供安全性、最终性与可信协调。ETH 的价值体现在多重结构性需求之上 — — 最终结算的刚性需求、链上金融与稳定币规模、质押与销毁机制对供给的影响，以及 ETF、企业财库与 RWA 等机构级采用所带来的长期、粘性资金。

三、范式重构：寻找现金流之外的定价锚

2025年底 Hashed团队推出的 ethval.com 为以太坊提供了详尽的可复现量化模型集合，但传统的静态模型难以捕捉 2026 年以太坊叙事的剧烈转折。因此，我们复用了其系统性、透明且可复现的底层模型（涵盖收益、货币、网络效应与供给结构），在估值架构与权重逻辑上进行了重塑：

结构重构： 将模型映射至“安全性、货币、平台、收入”四大价值象限，分类加总定价。
权重再平衡： 显著上调安全性与结算溢价权重，弱化协议收入与 L2 扩张的边际贡献。
风控叠加层： 引入宏观与链上风险感知的熔断机制，使估值框架具备跨周期适应性。
剔除“循环论证”：对含现价输入的模型（如 Staking Scarcity、Liquidity Premium）不再作为公允价值锚，仅保留其作为仓位与风险偏好调节指标。

注：下述模型并非用于精确点位预测，而用于刻画不同价值来源在不同周期中的相对定价方向

1. 安全性结算层：核心价值锚（45%，避险期上调）

我们将安全性结算层视为以太坊最核心的价值来源，并赋予其 45% 的基准权重；在宏观不确定性上升或风险偏好回落阶段，该权重进一步上调。这一判断源于 Vitalik 对“真正扩展以太坊”的最新界定：扩容的本质不是提升 TPS，而是创造由以太坊本身完全背书的区块空间。任何依赖外部信任假设的高性能执行环境，都不构成对以太坊本体的扩展。

在此框架下，ETH 的价值主要体现为全球无主权结算层的信用溢价，而非协议收入。该溢价由验证者规模与去中心化程度、长期安全记录、机构级采用、合规路径清晰度，以及协议内生 Rollup 验证机制等结构性因素共同支撑。

在具体定价上，我们主要采用两种互补的方法：Validator Economics（收益均衡映射）与 Staking DCF（永续质押折现），共同刻画 ETH 作为“全球安全结算层”的制度性溢价。

Validator Economics（收益均衡定价）：基于每枚ETH的年化质押现金流与目标真实收益率的比值，推导理论公允价格：

Fair Price = (Annual Staking Cash Flow per ETH) / Target Real Yield

该表达用于刻画收益与价格的均衡关系，作为方向性相对估值工具，而非独立定价模型。

Staking DCF（永续质押折现）：将 ETH 视为一项可持续产生真实质押收益的长期资产，对其现金流进行永续折现：

M_staking = Total Real Staking Cash Flow / (Discount Rate − Longterm Growth Rate)

ETH Price (staking) = M_staking / Circulating Supply

从本质上看，这一价值层并非对标平台型公司的收入能力，而是类似全球清算网络的结算信用。

2. 货币属性：结算与抵押（35%，效用扩张期主导）

我们将货币属性视为以太坊第二核心的价值来源，并赋予其 35% 的基准权重，在中性市场或链上经济扩张阶段成为主要效用锚。这一判断并非基于“ETH 等同于美元”的叙事，而在于其作为链上金融体系的原生结算燃料与最终抵押资产的结构性角色。稳定币流转、DeFi 清算与 RWA 结算的安全性，均依赖 ETH 所支撑的结算层。

定价上，我们采用货币数量论的扩展形式（MV = PQ），但将ETH的使用场景分层建模，以应对不同场景下流通速度的数量级差异分层货币需求模型：

高频结算层（Gas支付、稳定币转账）

M_transaction = Annual Transaction Settlement Volume / V_high
V_high ≈ 15–25（参考历史链上数据）

中频金融层（DeFi交互、借贷清算）

M_defi = Annual DeFi Settlement Volume / V_medium
V_medium ≈ 3–8（基于主流DeFi协议资金周转率）

低频抵押层（质押、再质押、长期锁仓）

M_collateral = Total ETH Collateral Value × (1 + Liquidity Premium)
Liquidity Premium = 10–30%（反映流动性牺牲的补偿）

3. 平台 / 网络效应：增长期权（10%，牛市放大器）

平台与网络效应被视为以太坊估值中的增长期权，仅赋予 10% 权重，用于解释牛市阶段生态扩张带来的非线性溢价。我们采用经信任修正的梅特卡夫模型，避免将不同安全级别的 L2 资产等权计入估值：

梅特卡夫模型： M_network = a × (Active Users)^b + m × Σ (L2 TVL_i × TrustScore_i)
平台/网络效应估值价格：ETH Price(network) = M_network / Circulating Supply

4. 收入资产：现金流地板（10%，熊市托底）

我们将协议收入视为以太坊估值体系中的现金流地板，而非增长引擎，同样赋予 10% 权重。该层主要在熊市或极端风险阶段发挥作用，用于刻画估值下限。

Gas 与 Blob 费用为网络提供最低运作成本，并通过 EIP-1559 影响供给结构。估值上，我们采用市销率与费用收益率模型，并取其中的保守值，仅作为底部参考。随着主网持续扩容，协议收入的重要性相对下降，其核心作用体现在下行阶段的安全边际。

市销率模型（P/S Floor）：M_PS = Annual Protocol Revenue × P/S_multiple
市销率估值价格：ETH Price (PS) = M_PS / Circulating Supply
费用收益率模型：M_Yield = Annual Protocol Revenue / Target Fee Yield
费用收益估值价格：ETH Price(Yield) = M_Yield / Circulating Supply
现金流地板定价（取两者极小值）：P_Revenue_Floor = min(P_PS , P_Yield)

四、动态校准：宏观约束与周期适配

如果说前文确立了以太坊的“内在价值中枢”，本章则引入一套独立于基本面的“外在环境适配系统”。估值无法真空运行，必须受制于宏观环境（资金成本）、市场结构（相对强弱）与链上情绪（拥挤度）三大外部约束。基于此，我们构建了状态适配（Regime Adaptation）机制，在不同周期动态调整估值权重 — — 宽松期释放期权溢价，避险期退守收入地板，从而实现从静态模型到动态策略的跨越。（注：限于篇幅，本文仅展示该机制的核心逻辑框架。）

五、机构化第二曲线的条件路径

前文分析均基于加密体系内部的技术、估值与周期逻辑，而本章讨论的是一个不同层级的问题：当 ETH 不再仅由加密原生资金定价，而被逐步纳入传统金融体系，其定价权、资产属性与风险结构将如何变化。机构化第二曲线并非对既有逻辑的延伸，而是外生力量对以太坊的再定义：

资产属性的变化（Beta → Carry）：现货 ETH ETF 解决的是合规与托管问题，本质仍是价格暴露；而未来Staking ETF 的推进，首次将链上收益通过合规载体引入机构体系。ETH 由此从“无息高波动资产”转向“具备可预期收益的配置型资产”，潜在买家从交易型资金扩展至对收益与久期敏感的养老金、保险及长期账户。
使用方式的变化（Holding → Using）：如果机构不再仅将 ETH 视为可交易标的，而是开始将其作为结算与抵押基础设施使用。无论是 JPMorgan 的代币化基金，还是合规稳定币与 RWA 在以太坊上的部署，都表明 ETH 的需求正从“持有需求”转向“运行需求” — — 机构不仅持有 ETH，更在其上完成结算、清算与风险管理。
尾部风险的变化（Uncertainty → Pricing）： 随着稳定币监管框架（如 GENIUS Act）未来逐步确立，以及以太坊路线图与治理透明度提升，机构最为敏感的监管与技术不确定性正在被系统性压缩，意味着不确定性开始被定价，而非被回避。

所谓“机构化第二曲线”是 需求性质的改变，为“安全性结算层 + 货币属性”的估值逻辑提供了真实需求来源，推动 ETH 从以情绪驱动的投机资产过渡为同时承载配置性与功能性需求的基础资产。

六、结语：至暗时刻的价值锚定

过去一周，行业经历了剧烈的去杠杆化洗礼，市场情绪降至冰点，这无疑是加密世界的“至暗时刻”。悲观情绪在从业者中蔓延，而作为最能代表加密精神的资产标的，以太坊亦处于争议的风暴眼中。

然而，作为理性的观察者，我们需要穿透恐慌的迷雾：以太坊当前所经历的，并非“价值的坍塌”，而是一次深刻的“定价锚迁移”。随着 L1 扩容直接推进、L2 被重新界定为不同信任等级的网络光谱，以及协议收入主动让位于系统安全与中立性，ETH 的定价逻辑已结构性转向“安全性结算层 + 原生货币属性”。

在宏观真实利率高位、流动性尚未宽松、链上增长期权暂未被市场允许定价的背景下，ETH 的价格自然收敛至由结算确定性、可验证收益与机构共识支撑的结构性价值区间。这一区间并非情绪底，而是在剥离平台型增长溢价后的价值中枢。

作为以太坊生态的长期建设者，我们拒绝做 ETH 的“无脑多头”。我们希望通过严谨的逻辑框架，审慎地论证我们的预判：只有当宏观流动性、风险偏好与网络效应同时满足市场状态的触发条件时，更高的估值才会被市场重新计入。

因此，对于长线投资者而言，当下的关键问题不再是焦虑地追问“以太坊还能不能涨”，而是要清醒地认识到 — — 在当前环境下，我们正在以“地板价”买入哪一层核心价值？

Noya.ai: Agents in Prediction Markets

BroadNotes by 0xjacobzhao — Mon, 05 Jan 2026 06:31:26 GMT

Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao

In our previous Crypto AI series research reports, we have consistently emphasized the view that the most practical application scenarios in the current crypto field are mainly concentrated in stablecoin payments and DeFi, while Agents are the key interface for the AI industry facing users. Therefore, in the trend of Crypto and AI integration, the two most valuable paths are: AgentFi, based on existing mature DeFi protocols (basic strategies like lending and liquidity mining, as well as advanced strategies like Swap, Pendle PT, and funding rate arbitrage) in the short term; and Agent Payment, centering on stablecoin settlement and relying on protocols such as ACP/AP2/x402/ERC-8004 in the medium to long term.

Prediction markets have become an undeniable new industry trend in 2025, with their total annual trading volume surging from approximately $9 billion in 2024 to over $40 billion in 2025, achieving a year-over-year growth of more than 400%. This significant growth is driven by multiple factors: uncertainty demand brought by macro-political events (such as the 2024 US election), the maturity of infrastructure and trading models, and the thawing of the regulatory environment (Kalshi’s lawsuit victory and Polymarket’s return to the US). Prediction Market Agents are showing early embryonic forms in early 2026 and are poised to become a continuously emerging product form in the agent field over the coming year.

I. Prediction Markets: From Betting to Truth Layer

A prediction market is a financial mechanism for trading on the outcomes of future events. Contract prices essentially reflect the market’s collective judgment on the probability of an event occurring. Its effectiveness stems from the combination of crowd wisdom and economic incentives: in an environment of anonymous, real-money betting, scattered information is quickly integrated into price signals weighted by financial willingness, thereby significantly reducing noise and false judgments.

By the end of 2025, prediction markets have basically formed a duopoly dominated by Polymarket and Kalshi. According to Forbes, the total trading volume in 2025 reached approximately $44 billion, with Polymarket contributing about $21.5 billion and Kalshi about $17.1 billion. Relying on its legal victory in the previous election contract case, its first-mover compliance advantage in the US sports prediction market, and relatively clear regulatory expectations, Kalshi has achieved rapid expansion. Currently, the development paths of the two have shown clear differentiation:

Polymarket adopts a mixed CLOB architecture with “off-chain matching, on-chain settlement” and a decentralized settlement mechanism, building a globalized, non-custodial high-liquidity market. After returning to the US with compliance, it formed an “onshore + offshore” dual-track operating structure.
Kalshi integrates into the traditional financial system, accessing mainstream retail brokerages via API, attracting Wall Street market makers to participate deeply in macro and data-type contract trading. Its products are constrained by traditional regulatory processes, and long-tail demands and sudden events lag relatively behind.

Apart from Polymarket and Kalshi, other competitive players in the prediction market field are developing mainly along two paths:

First is the compliance distribution path, embedding event contracts into the existing account systems of brokerages or large platforms, relying on channel coverage, clearing capabilities, and institutional trust to build advantages (e.g., ForecastTrader by Interactive Brokers and ForecastEx, and FanDuel Predicts by FanDuel and CME).
Second is the on-chain performance and capital efficiency path. Taking the Solana ecosystem’s perpetual contract DEX Drift as an example, it added a prediction market module B.E.T (prediction markets) on top of its original product line.

The two paths — traditional financial compliance entry and crypto-native performance advantages — together constitute the diversified competitive landscape of the prediction market ecosystem.

Prediction markets appear similar to gambling on the surface and are essentially zero-sum games. However, the core difference lies not in the form, but in whether they possess positive externalities: aggregating scattered information through real-money trading to publicly price real-world events, forming a valuable signal layer. Despite limitations such as entertainment-focused participation, the trend is shifting from gaming to a “Global Truth Layer” — with the access of institutions like CME and Bloomberg, event probabilities have become decision-making metadata that can be directly called by financial and enterprise systems, providing a more timely and quantifiable market-based truth.

II. Prediction Agents: Architecture, Business, Strategy

Currently, Prediction Market Agents are entering an early practice stage. Their value lies not in “AI predicting more accurately,” but in amplifying information processing and execution efficiency in prediction markets. The essence of a prediction market is an information aggregation mechanism, where price reflects the collective judgment of event probability; market inefficiencies in reality stem from information asymmetry, liquidity, and attention constraints. The reasonable positioning of a Prediction Market Agent is Executable Probabilistic Portfolio Management: converting news, rule texts, and on-chain data into verifiable pricing deviations, executing strategies in a faster, more disciplined, and lower-cost manner, and capturing structural opportunities through cross-platform arbitrage and portfolio risk control.

An ideal Prediction Market Agent can be abstracted into a four-layer architecture:

Information Layer: Aggregates news, social media, on-chain, and official data.
Analysis Layer: Uses LLMs and ML to identify mispricing and calculate Edge.
Strategy Layer: Converts Edge into positions through the Kelly criterion, staggered entry, and risk control.
Execution Layer: Completes multi-market order placement, slippage and Gas optimization, and arbitrage execution, forming an efficient automated closed loop.

The ideal business model design for Prediction Market Agents has different exploration spaces at different levels:

Bottom Infrastructure Layer: Provides multi-source real-time data aggregation, Smart Money address libraries, unified prediction market execution engines, and backtesting tools. Charges B2B/B2D fees to obtain stable revenue unrelated to prediction accuracy.
Middle Strategy Layer: Precipitates modular strategy components and community-contributed strategies in an open-source or Token-Gated manner, forming a composable strategy ecosystem and achieving value capture.
Top Agent Layer: Directly runs live trading through trusted managed Vaults, realizing capabilities with transparent on-chain records and a 20–30% performance fee (plus a small management fee).

The ideal Prediction Market Agent is closer to an “AI-driven probabilistic asset management product,” gaining returns through long-term disciplined execution and cross-market mispricing gaming, rather than relying on single-time prediction accuracy. The core logic of the diversified revenue structure of “Infrastructure Monetization + Ecosystem Expansion + Performance Participation” is that even if Alpha converges as the market matures, bottom-layer capabilities such as execution, risk control, and settlement still have long-term value, reducing dependence on the single assumption that “AI consistently beats the market.”

Prediction Market Agent Strategy Analysis:

Theoretically, Agents have advantages in high-speed, 24/7, and emotion-free execution. However, in prediction markets, this is often difficult to convert into sustainable Alpha. Its effective application is mainly limited to specific structures, such as automated market making, cross-platform mispricing capture, and information integration of long-tail events. These opportunities are scarce and constrained by liquidity and capital.

Market Selection: Not all prediction markets have tradable value. Participation value depends on five dimensions: settlement clarity, liquidity quality, information advantage, time structure, and manipulation risk. It is recommended to prioritize the early stages of new markets, long-tail events with few professional players, and fleeting pricing windows caused by time zone differences; avoid high-heat political events, subjective settlement markets, and varieties with extremely low liquidity.
Order Strategy: Adopt strict systematic position management. The prerequisite for entry is that one’s own probability judgment is significantly higher than the market implied probability. Positions are determined based on the fractional Kelly criterion (usually 1/10–1/4 Kelly), and single event risk exposure does not exceed 15%, to achieve robust growth with controllable risk, bearable drawdowns, and compoundable advantages in the long run.
Arbitrage Strategy: Arbitrage in prediction markets is mainly manifested in four types: cross-platform spread (be wary of settlement differences), Dutch Book arbitrage (high certainty but strict liquidity requirements), settlement arbitrage (relies on execution speed), and correlated asset hedging (limited by structural mismatch). The key to practice lies not in discovering spreads, but in strictly aligning contract definitions and settlement standards to avoid pseudo-arbitrage caused by subtle rule differences.
Smart Money Copy-Trading: On-chain “Smart Money” signals are not suitable as a main strategy due to lagging, inducement risks, and sample issues. A more reasonable usage is as a confidence adjustment factor, used to assist core judgments based on information and pricing deviations.

III. Noya.ai: Intelligence to Action

As an early exploration of Prediction Market Agents, NOYA’s core philosophy is “Intelligence That Acts.” In on-chain markets, pure analysis and insight are not enough to create value — although dashboards, data analysis, and research tools can help users understand “what might happen,” there is still a large amount of manual operation, cross-chain friction, and execution risk between insight and execution. NOYA is built based on this pain point: compressing the complete link of “Research → Form Judgment → Execution → Continuous Monitoring” in the professional investment process into a unified system, enabling intelligence to be directly translated into on-chain action.

NOYA achieves this goal by integrating three core levels:

Intelligence Layer: Aggregates market data, token analysis, and prediction market signals.
Abstraction Layer: Hides complex cross-chain routing; users only need to express Intent.
Execution Layer: AI Agents execute operations across chains and protocols based on user authorization.

In terms of product form, NOYA supports different participation methods for passive income users, active traders, and prediction market participants. Through designs like Omnichain Execution, AI Agents & Intents, and Vault Abstraction, it modularizes and automates multi-chain liquidity management, complex strategy execution, and risk control.

The overall system forms a continuous closed loop: Intelligence → Intent → Execution → Monitoring, achieving efficient, verifiable, and low-friction conversion from insight to execution while ensuring users always maintain control over their assets.

IV. Noya.ai’s Product System Evolution

Core Cornerstone: Noya Omnichain Vaults

Omnivaults is NOYA’s capital deployment layer, providing cross-chain, risk-controlled automated yield strategies. Users hand over assets to the system to run continuously across multiple chains and protocols through simple deposit and withdrawal operations, without the need for manual rebalancing or monitoring. The core goal is to achieve stable risk-adjusted returns rather than short-term speculation.

Omnivaults cover strategies like standard yield and Loop, clearly divided by asset and risk level, and support optional bonding incentive mechanisms. At the execution level, the system automatically completes cross-chain routing and optimization, and can introduce ZKML to provide verifiable proof for strategy decisions, enhancing the transparency and credibility of automated asset management. The overall design focuses on modularity and composability, supporting future access to more asset types and strategy forms.

NOYA Vault Technical Architecture: Each vault is uniformly registered and managed through the Registry; the AccountingManager is responsible for user shares (ERC-20) and NAV pricing; the bottom layer connects to protocols like Aave and Uniswap through modular Connectors and calculates cross-protocol TVL, relying on Value Oracle (Chainlink + Uniswap v3 TWAP) for price routing and valuation; trading and cross-chain operations are executed by Swap Handler (LiFi); finally, strategy execution is triggered by Keeper Multi-sig, forming a composable and auditable execution closed loop.

Future Alpha: Prediction Market Agent

NOYA’s most imaginative module: the Intelligence layer continuously tracks on-chain fund behavior and off-chain narrative changes, identifying news shocks, emotional fluctuations, and odds mismatches. When probability deviations are found in prediction markets like Polymarket, the Execution layer AI Agent can mobilize vault funds for arbitrage and rebalancing under user authorization. At the same time, Token Intelligence and Prediction Market Copilot provide users with structured token and prediction market analysis, directly converting external information into actionable trading decisions.

Prediction Market Intelligence Copilot

NOYA is committed to upgrading prediction markets from single-event betting to systematically manageable probabilistic assets. Its core module integrates diverse data such as market implied probability, liquidity structure, historical settlements, and on-chain smart money behavior. It uses Expected Value (EV) and scenario analysis to identify pricing deviations and focuses on tracking position signals of high-win-rate wallets to distinguish informed trading from market noise. Based on this, Copilot supports cross-market and cross-event correlation analysis and transmits real-time signals to AI Agents to drive automated execution such as opening and rebalancing positions, achieving portfolio management and dynamic optimization of prediction markets.

Core Strategy Mechanisms include:

Multi-source Edge Sourcing: Fuses Polymarket real-time odds, polling data, private and external information flows to cross-verify event implied probabilities, systematically mining information advantages that have not been fully priced in.
Prediction Market Arbitrage: Builds probabilistic and structural arbitrage strategies based on pricing differences across different markets, different contract structures, or similar events, capturing odds convergence returns while controlling directional risk.
Auto-adjust Positions (Odds-Driven): When odds shift significantly due to changes in information, capital, or sentiment, the AI Agent automatically adjusts position size and direction, achieving continuous optimization in the prediction market rather than a one-time bet.

NOYA Intelligence Token Reports

NOYA’s institutional-grade research and decision hub aims to automate the professional crypto investment research process and directly output decision-level signals usable for real asset allocation. This module presents clear investment stances, comprehensive scores, core logic, key catalysts, and risk warnings in a standardized report structure, continuously updated with real-time market and on-chain data. Unlike traditional research tools, NOYA’s intelligence does not stop at static analysis but can be queried, compared, and followed up by AI Agents in natural language. It is directly fed to the execution layer to drive subsequent cross-chain trading, fund allocation, and portfolio management, thereby forming a “Research — Decision — Execution” integrated closed loop, making Intelligence an active signal source in the automated capital operation system.

NOYA AI Agent (Voice & Natural Language Driven)

The NOYA AI Agent is the platform’s execution layer, whose core role is to directly translate user intent and market intelligence into authorized on-chain actions. Users can express goals via text or voice, and the Agent is responsible for planning and executing cross-chain, cross-protocol operations, compressing research and execution into a continuous process. It is a key product form for NOYA to lower the threshold for DeFi and prediction market operations.

Users do not need to understand the underlying links, protocols, or transaction paths. They only need to express their goals through natural language or voice to trigger the AI Agent to automatically plan and execute multi-step on-chain operations, achieving “Intent as Execution.” Under the premise of full-process user signing and non-custody, the Agent operates in a closed loop of “Intent Understanding → Action Planning → User Confirmation → On-chain Execution → Result Monitoring.” It does not replace decision-making but is only responsible for efficient implementation and execution, significantly reducing the friction and threshold of complex financial operations.

Trust Moat: ZKML Verifiable Execution

Verifiable Execution aims to build a verifiable closed loop for the entire process of strategy, decision-making, and execution. NOYA introduces ZKML as a key mechanism to reduce trust assumptions: strategies are calculated off-chain and verifiable proofs are generated; corresponding fund operations can only be triggered after on-chain verification passes. This mechanism can provide credibility for strategy output without revealing model details and supports derivative capabilities such as verifiable backtesting. Currently, relevant modules are still marked as “under development” in public documents, and engineering details remain to be disclosed and verified.

Future 6-Month Product Roadmap

Prediction Market Advanced Order Capabilities: Improve strategy expression and execution precision to support Agent-based trading.
Expansion to Multi-Prediction Markets: Access more platforms beyond Polymarket to expand event coverage and liquidity.
Multi-source Edge Information Collection: Cross-verify with handicap odds to systematically capture underpriced probability deviations.
Clearer Token Signals & Advanced Reports: Output trading signals and in-depth on-chain analysis that can directly drive execution.
Advanced On-chain DeFi Strategy Combinations: Launch complex strategy structures to improve capital efficiency, returns, and scalability.

V. Noya.ai’s Ecosystem Growth

Currently, Omnichain Vaults are in the early stage of ecosystem development, and their cross-chain execution and multi-strategy framework have been verified.

Strategy & Coverage: The platform has integrated mainstream DeFi protocols such as Aave and Morpho, supports cross-chain allocation of stablecoins, ETH, and their derivative assets, and has preliminarily built a layered risk strategy (e.g., Basic Yield vs. Loop Strategy).
Development Stage: The current TVL volume is limited. The core goal lies in functional verification (MVP) and risk control framework refinement. The architectural design has strong composability, reserving interfaces for the subsequent introduction of complex assets and advanced Agent scheduling.

Incentive System: Kaito Linkage & Space Race Dual Drive

NOYA has built a growth flywheel deeply binding content narrative and liquidity anchored on “Real Contribution.”

Ecosystem Partnership (Kaito Yaps): NOYA landed on Kaito Leaderboards with a composite narrative of “AI × DeFi × Agent,” configuring an unlocked incentive pool of 5% of the total supply, and reserving an additional 1% for the Kaito ecosystem. Its mechanism deeply binds content creation (Yaps) with Vault deposits and Bond locking. User weekly contributions are converted into Stars that determine rank and multipliers, thereby synchronously strengthening narrative consensus and long-term capital stickiness at the incentive level.
Growth Engine (Space Race): Space Race constitutes NOYA’s core growth flywheel, replacing the traditional “capital scale first” airdrop model by using Stars as long-term equity credentials. This mechanism integrates Bond locking bonuses, two-way 10% referral incentives, and content dissemination into a weekly Points system, filtering out long-term users with high participation and strong consensus, and continuously optimizing community structure and token distribution.
Community Building (Ambassador): NOYA adopts an invitation-only ambassador program, providing qualified participants with community round participation qualifications and performance rebates based on actual contributions (up to 10%).

Currently, Noya.ai has accumulated over 3,000 on-chain users, and its X platform followers have exceeded 41,000, ranking in the top five of the Kaito Mindshare list. This indicates that NOYA has occupied a favorable attention niche in the prediction market and Agent track.

In addition, Noya.ai’s core contracts have passed dual audits by Code4rena and Hacken, and have accessed Hacken Extractor.

VI. Tokenomics Design and Governance

NOYA adopts a Single-token ecosystem model, with $NOYA as the sole value carrier and governance vehicle.

NOYA employs a Buyback & Burn value capture mechanism. The value generated by the protocol layer in products such as AI Agents, Omnivaults, and prediction markets is captured through mechanisms like staking, governance, access permissions, and buyback & burn, forming a value closed loop of Use → Fee → Buyback, converting platform usage into long-term token value.

The project takes Fair Launch as its core principle. It did not introduce angel round or VC investment but completed distribution through a public community round (Launch-Raise) with a low valuation ($10M FDV), Space Race, and airdrops. It deliberately reserves asymmetric upside space for the community, making the chip structure more biased towards active users and long-term participants; team incentives mainly come from long-term locked token shares.

Token Distribution:

Total Supply: 1 Billion (1,000,000,000) NOYA
Initial Float (Low Float): ~12%
Valuation & Financing (The Raise): Financing Amount: $1 Million; Valuation (FDV): $10 Million

VII. Prediction Agent Competitive Analysis

Currently, the Prediction Market Agent track is still in its early stages with a limited number of projects. Representative ones include Olas (Pearl Prediction Agents), Warden (BetFlix), and Noya.ai.

From the perspective of product form and user participation, each represents three types of paths in the current prediction market agent track:

Olas (Pearl Prediction Agents): Agent Productization & Runnable Delivery. Participated by “running an automated prediction Agent,” encapsulating prediction market trading into a runnable Agent: users inject capital and run it, and the system automatically completes information acquisition, probability judgment, betting, and settlement. The participation method requiring additional installation has relatively limited friendliness for ordinary users.
Warden (BetFlix): Interactive Distribution & Consumer-grade Betting Platform. Attracts user participation through a low-threshold, highly entertaining interactive experience. Adopts an interaction and distribution-oriented path, lowering participation costs with gamified and content-based frontends, emphasizing the consumption and entertainment attributes of prediction markets. Its competitive advantage mainly comes from user growth and distribution efficiency, rather than strategy or execution layer depth.
NOYA.ai: Centered on “Fund Custody + Strategy Execution on Behalf,” abstracting prediction markets and DeFi execution into asset management products through Vaults, providing a participation method with low operation and low mental burden. If the Prediction Market Intelligence and Agent execution modules are superimposed later, it is expected to form a “Research — Execution — Monitoring” integrated workflow

Compared with AgentFi projects that have achieved clear product delivery such as Giza and Almanak, NOYA’s DeFi Agent is currently still in a relatively early stage. However, NOYA’s differentiation lies in its positioning and entry level: it enters the same execution and asset management narrative track with a fair launch valuation of about $10M FDV, possessing significant valuation discount and growth potential at the current stage.

NOYA: An AgentFi project encapsulating asset management centered on Omnichain Vault. Current delivery focus is on infrastructure layers like cross-chain execution and risk control. Upper-layer Agent execution, prediction market capabilities, and ZKML-related mechanisms are still in the development and verification stage.
Giza: Can directly run asset management strategies (ARMA, Pulse). Currently has the highest AgentFi product completion.
Almanak: Positioned as AI Quant for DeFi, outputting strategy and risk signals through models and quantitative frameworks. Mainly targets professional fund and strategy management needs, emphasizing methodological systematicness and result reproducibility.
Theoriq: Centered on multi-agent collaboration (Agent Swarms) strategy and execution framework, emphasizing scalable Agent collaboration systems and medium-to-long-term infrastructure narratives, leaning more towards bottom-layer capability construction.
Infinit: An Agentic DeFi terminal leaning towards the execution layer. Through process orchestration of “Intent → Multi-step on-chain operation,” it significantly lowers the execution threshold of complex DeFi operations, and users’ perception of product value is relatively direct.

VIII. Summary: Business, Engineering and Risks

Business Logic:

NOYA is a rare target in the current market that superimposes multiple narratives of AI Agent × Prediction Market × ZKML, and further combines the product direction of Intent-Driven Execution. At the asset pricing level, it launches with an FDV of approximately $10M, significantly lower than the common $75M–$100M valuation range of similar AI / DeFAI / Prediction related projects, forming a certain structural price difference.

Design-wise, NOYA attempts to unify Strategy Execution (Vault / Agent) and Information Advantage (Prediction Market Intelligence) into the same execution framework, and establishes a value capture closed loop through protocol revenue return (fees → buyback & burn). Although the project is still in its early stages, under the combined effect of multi-narrative superposition and low valuation starting point, its risk-return structure is closer to a type of high-odds, asymmetric betting target.

Engineering Implementation:

At the verifiable delivery level, NOYA’s core function currently online is Omnichain Vaults, providing cross-chain asset scheduling, yield strategy execution, and delayed settlement mechanisms. The engineering implementation is relatively foundational. The Prediction Market Intelligence (Copilot), NOYA AI Agent, and ZKML-driven verifiable execution emphasized in its vision are still in the development stage and have not yet formed a complete closed loop on the mainnet. It is not a mature DeFAI platform at this stage.

Potential Risks & Key Focus Points:

Delivery Uncertainty: The technological span from “Basic Vault” to “All-round Agent” is huge. Be alert to the risk of Roadmap delays or ZKML implementation falling short of expectations.
Potential System Risks: Including contract security, cross-chain bridge failures, and oracle disputes specific to prediction markets (such as fuzzy rules leading to inability to adjudicate). Any single point of failure could cause fund loss.

Disclaimer: This article was created with the assistance of AI tools such as ChatGPT-5.2, Gemini 3, and Claude Opus 4.5. The author has tried their best to proofread and ensure the information is true and accurate, but omissions are inevitable. Please understand. It should be specially noted that the crypto asset market generally has a divergence between project fundamentals and secondary market price performance. The content of this article is only for information integration and academic/research exchange, does not constitute any investment advice, and should not be considered as a recommendation to buy or sell any tokens.

Noya.ai 研报：预测市场智能体的前瞻

BroadNotes by 0xjacobzhao — Mon, 05 Jan 2026 04:42:56 GMT

作者：0xjacobzhao | https://linktr.ee/0xjacobzhao

预测市场在2025年已成为不容忽视的行业新趋势，其年度总交易量从2024年的约90亿美元激增至2025年的超过400亿美元，实现超过400%的年同比增长。这一显著增长由多重因素共同推动：宏观政治事件（如2024年美国大选）带来不确定性需求，基础设施与交易模式的成熟，以及监管环境出现破冰（Kalshi胜诉与Polymarket回归美国）。预测市场智能体(Prediction Market Agent)在2026年初呈现早期雏形，有望在未来一年成为智能体领域的新兴产品形态。

一、预测市场：从下注工具到“全球真相层”

截至2025年底，预测市场已基本形成 Polymarket与Kalshi 双寡头主导的格局。据《福布斯》统计，2025年总交易量约达440亿美元，其中Polymarket贡献约215亿美元，Kalshi约为171亿美元。Kalshi凭借此前选举合约案的法律胜诉、在美国体育预测市场的合规先发优势，以及相对明确的监管预期，实现了快速扩张。目前，二者的发展路径已呈现清晰分化：

Polymarket 采用“链下撮合、链上结算”的混合CLOB架构与去中心化结算机制，构建起全球化、非托管的高流动性市场，合规重返美国后形成“在岸+离岸”双轨运营结构；
Kalshi 融入传统金融体系，通过API接入主流零售券商，吸引华尔街做市商深度参与宏观与数据型合约交易，产品受制于传统监管流程，长尾需求与突发事件相对滞后。

除Polymarket与Kalshi之外，预测市场领域具备竞争力的其他参与者主要沿着两条路径发展：

一是合规分发路径，将事件合约嵌入券商或大型平台的现有账户体系，依靠渠道覆盖、清算能力与机构信任建立优势（例如Interactive Brokers与ForecastEx合作的ForecastTrader，以及FanDuel与CME合作的FanDuel Predicts）；
二是链上性能与资金效率路径，以Solana生态的永续合约DEX Drift为例，其在原有产品线基础上新增了预测市场模块B.E.T（prediction markets）。

传统金融合规入口与加密原生性能优势这两类路径共同构成预测市场生态的多元竞争格局。

预测市场表面上与赌博相似，本质上也是一种零和博弈，但二者的核心区别并不在于形式，而在于是否具有正外部性：通过真金白银的交易聚合分散信息，对现实事件进行公共定价，形成有价值的信号层。尽管存在娱乐化参与等局限，但其趋势正从博弈转向“全球真相层” — — 随着CME、彭博等机构的接入，事件概率已成为可被金融与企业系统直接调用的决策元数据，提供更及时、可量化的市场化真相。

二、预测智能体：架构设计、商业模式与策略分析

理想的预测市场智能体 可抽象为四层架构：

信息层汇集新闻、社交、链上与官方数据；
分析层以 LLM 与 ML 识别错价并计算 Edge；
策略层通过凯利公式、分批建仓与风控将 Edge 转化为仓位；
执行层完成多市场下单、滑点与 Gas 优化与套利执行，形成高效自动化闭环。

预测市场智能体的理想的商业模式设计在不同层级有不同方向的探索空间：

底层Infrastructure 层，提供多源实时数据聚合、Smart Money 地址库、统一的预测市场执行引擎与回测工具，向 B2B/B2D 收费，获取与预测准确率无关的稳定收入；
中间Strategy 层，以开源或 Token-Gated 方式沉淀模块化策略组件与社区贡献策略，形成可组合的策略生态并实现价值捕获；
顶层Agent 层，通过受托管理的 Vault 直接跑实盘，以透明链上记录和 20–30% 的绩效费（叠加少量管理费）兑现能力。

理想的预测市场智能体 Agent 更接近一个“AI 驱动的概率型资管产品”，通过长期纪律化执行与跨市场错价博弈，而非依赖单次预测准确率来获取收益。而“基础设施变现 + 生态扩展 + 业绩参与”的多元收入结构设计的核心逻辑在于：即便 Alpha 随市场成熟而收敛，执行、风控与结算等底层能力仍具长期价值，可降低对单一“AI 持续战胜市场”假设的依赖。

预测市场智能体策略分析：

理论上，Agent 具备高速、全天候与去情绪化执行优势，但在预测市场中往往难以转化为持续 Alpha，其有效应用主要局限于特定结构，如自动化做市、跨平台错价捕捉及长尾事件的信息整合，这些机会稀缺且受流动性与资本约束。

市场选择：并非所有预测市场都具备可交易价值，参与价值取决于结算清晰度、流动性质量、信息优势、时间结构与操纵风险五个维度。建议优先关注新市场的早期阶段、专业玩家少的长尾事件以及时区差异导致的短暂定价窗口；避免高热度政治事件、主观结算市场与极低流动性品种。
下单策略：采用严格的系统化仓位管理。入场前提是自身概率判断显著高于市场隐含概率，并依据分数化凯利公式（通常为1/10–1/4 Kelly）确定仓位，单事件风险敞口不超过15%，以在长期实现风险可控、回撤可承受、优势可复利的稳健增长。
套利策略：预测市场中的套利主要体现为四类：跨平台价差（需警惕结算差异）、Dutch Book套利（确定性高但流动性要求严）、结算套利（依赖执行速度）及关联资产对冲（受结构错配限制）。实践关键不在于发现价差，而在于严格对齐合约定义与结算标准，避免因规则细微差异导致的伪套利。
聪明钱跟单：链上“聪明钱”信号因滞后性、诱导风险与样本问题，不宜作为主策略。更合理的用法是作为置信度调节因子，用于辅助基于信息与定价偏差的核心判断。

三、Noya.ai：从情报到行动的智能体网络

作为预测市场智能体的早期探索，NOYA 的核心理念是 “Intelligence That Acts（让情报直接行动）”。在链上市场中，单纯的分析与洞察并不足以创造价值 — — 尽管仪表盘、数据分析和研究工具能够帮助用户理解“可能发生什么”，但从洞察到执行之间仍存在大量人工操作、跨链摩擦与执行风险。NOYA 正是基于这一痛点构建：将专业投资流程中“研究 → 形成判断 → 执行 → 持续监控”的完整链路，压缩进一个统一系统，使情报能够直接转化为链上行动。

NOYA 通过整合三大核心层级实现这一目标：

情报层 (Intelligence)： 聚合市场数据、代币分析和预测市场信号。
抽象层 (Abstraction)： 隐藏复杂的跨链路由，用户只需表达意图（Intent）。
执行层 (Execution)： AI Agent 根据用户授权，跨链、跨协议执行操作。

在产品形态上，NOYA 支持被动收益型用户、主动交易者以及预测市场参与者等不同参与方式，并通过 Omnichain Execution、AI Agents & Intents、Vault Abstraction 等设计，将多链流动性管理、复杂策略执行与风险控制模块化、自动化。

整体系统形成一个持续闭环：Intelligence → Intent → Execution → Monitoring，在确保用户始终掌握资产控制权的前提下，实现从洞察到执行的高效、可验证与低摩擦转化。

四、Noya.ai 的产品体系与演进路径

核心基石：Noya Omnichain Vaults

Omnivaults 是 NOYA 的资本部署层，提供跨链、风险可控的自动化收益策略。用户通过简单的存取操作，将资产交由系统在多链、多协议中持续运行，无需手动调仓或盯盘，核心目标是实现稳定的风险调整后收益而非短期投机。

Omnivaults 覆盖标准收益与循环（Loop）等策略，按资产与风险等级清晰划分，并支持可选的绑定激励机制。在执行层面，系统自动完成跨链路由与优化，并可引入 ZKML 对策略决策进行可验证证明，增强自动化资管的透明度与可信度。整体设计以模块化和可组合为核心，支持未来接入更多资产类型与策略形态。

NOYA Vault（金库）的技术架构：各金库通过 Registry 统一注册与管理，AccountingManager 负责用户份额（ERC-20）与净值定价；底层通过模块化 Connectors 对接 Aave、Uniswap 等协议并计算跨协议 TVL，依赖 Value Oracle（Chainlink + Uniswap v3 TWAP）完成价格路由与估值；交易与跨链由 Swap Handler（LiFi） 执行；最终，策略执行由 Keeper 多签 触发，形成可组合、可审计的执行闭环。

未来 Alpha：预测市场智能体 (Prediction Market Agent)

NOYA 最具想象空间的模块：情报层持续追踪链上资金行为与链下叙事变化，识别新闻冲击、情绪波动与赔率错配；当在 Polymarket 等预测市场发现概率偏差时，执行层 AI Agent 可在用户授权下调动金库资金进行套利与调仓。同时，Token Intelligence 与 Prediction Market Copilot 为用户提供结构化代币与预测市场分析，将外部信息直接转化为可执行的交易决策。

预测市场智能决策助理（Prediction Market Intelligence Copilot)

NOYA致力于将预测市场从单一事件下注升级为可系统管理的概率资产。其核心模块通过整合市场隐含概率、流动性结构、历史结算与链上聪明钱行为等多元数据，运用期望值（EV）与情景分析识别定价偏差，并重点追踪高胜率钱包的仓位信号以区分信息交易与市场噪音。基于此，Copilot 支持跨市场、跨事件的关联分析，并将实时信号传递至AI Agent，驱动开仓、调仓等自动化执行，实现预测市场的组合管理与动态优化。

核心策略机制包括：

多源 Edge 信息捕获（Multi-source Edge Sourcing）：融合 Polymarket 实时赔率、民调数据、私有与外部信息流，对事件隐含概率进行交叉验证，系统性挖掘尚未被充分定价的信息优势。
跨市场与跨事件套利（Prediction Market Arbitrage）：基于不同市场、不同合约结构或相近事件间的定价差异，构建概率与结构性套利策略，在控制方向性风险的前提下捕获赔率收敛收益。
赔率驱动的动态仓位管理（Auto-adjust Positions）：当赔率因信息、资金或情绪变化显著偏移时，由 AI Agent 自动调整仓位规模与方向，实现预测市场中的持续优化，而非一次性下注。

NOYA 智能代币情报报告：（NOYA Intelligence Token Reports）

NOYA 的机构级研究与决策中枢，目标在于将专业加密投研流程自动化，并直接输出可用于真实资产配置的决策级信号。该模块以标准化报告结构呈现明确的投资立场、综合评分、核心逻辑、关键催化剂与风险提示，并结合实时市场与链上数据持续更新。与传统研究工具不同，NOYA 的情报并不止步于静态分析，而是可通过 AI Agent 以自然语言调用、对比与追问，并被直接输送至执行层，驱动后续的跨链交易、资金配置与组合管理，从而形成“研究 — 决策 — 执行”一体化闭环，使 Intelligence 成为自动化资本运作体系中的主动信号源。

NOYA AI Agent (语音与自然语言驱动)

NOYA AI Agent 是平台的执行层，核心作用是将用户意图与市场情报直接转化为经授权的链上行动。用户可通过文本或语音表达目标，Agent 负责规划并执行跨链、跨协议的操作，将研究与执行压缩为一个连续流程。是 NOYA 降低 DeFi 与预测市场操作门槛的关键产品形态

用户无需理解底层链路、协议或交易路径，仅需通过自然语言或语音表达目标，即可触发 AI Agent 自动规划并执行多步链上操作，实现“意图即执行”。在全程用户签名与非托管前提下，Agent 按“意图理解 → 行动规划 → 用户确认 → 链上执行 → 结果监控”的闭环运行，不替代决策，仅负责高效落地执行，显著降低复杂金融操作的摩擦与门槛。

信任护城河：ZKML 可信执行（Verifiable Execution）

可信执行旨在构建策略、决策与执行的全流程可验证闭环。NOYA引入ZKML作为降低信任假设的关键机制：策略在链下计算，并生成可验证证明，链上验证通过后方可触发相应资金操作。该机制可在不泄露模型细节的前提下，为策略输出提供可信性，并支持可验证回测等衍生能力。目前相关模块在公开文档中仍标注为“开发中”，工程细节仍有待后续披露与验证。

未来 6 个月产品路线图

预测市场高级订单能力：提升策略表达与执行精度，支撑 Agent 化交易。
扩展至多预测市场：在 Polymarket 之外接入更多平台，扩大事件覆盖与流动性。
多源 Edge 信息采集：与盘口赔率交叉验证，系统性捕获未充分定价的概率偏差。
更清晰的代币信号与高阶报告：输出可直接驱动执行的交易信号与深度链上分析。
更高级的链上 DeFi 策略组合：上线复杂策略结构，提升资金效率、收益与可扩展性。

五、Noya.ai的生态增长与激励体系

目前 Omnichain Vaults 处于生态发展的早期阶段，其跨链执行与多策略框架已通过验证。

策略与覆盖： 平台已集成 Aave、Morpho 等主流 DeFi 协议，支持稳定币、ETH 及其衍生资产的跨链调配，并初步构建了分层风险策略（如基础收益 vs. Loop 策略）。
发展阶段： 当前 TVL 体量有限，核心目标在于功能验证（MVP）与风控框架打磨，架构设计有较强的可组合性，为后续引入复杂资产及高级 Agent 调度预留接口。

激励体系：Kaito 联动与 Space Race 双轮驱动

NOYA 构建了一套以“真实贡献”为锚点，深度绑定内容叙事与流动性的增长飞轮。

生态合作（Kaito Yaps）：NOYA 以“AI × DeFi × Agent”的复合叙事登陆 Kaito Leaderboards，配置 总供应量 5% 的无锁仓激励池，并额外预留 1% 用于 Kaito 生态。其机制将内容创作（Yaps）与 Vault 存入、Bond 锁定深度绑定，用户周度贡献转化为决定等级与倍率的 Stars，从而在激励层面同步强化叙事共识与资金长期黏性。
增长引擎（Space Race）：Space Race 构成 NOYA 的核心增长飞轮，通过以 Stars 作为长期权益凭证，替代传统“资金规模优先”的空投模式。该机制将 Bond 锁仓加成、双向 10% 推荐激励与内容传播统一纳入周度 Points 体系，筛选出高参与度、强共识的长期用户，持续优化社区结构与代币分布。
社区建设（Ambassador）：NOYA 采用邀请制大使计划，向合格参与者提供社区轮参与资格及基于实际贡献的绩效返佣（最高 10%）。

目前Noya.ai积累超 3,000 名链上用户，X 平台粉丝突破 4.1 万，位列 Kaito Mindshare 榜单前五。这表明 NOYA 在预测市场与 Agent 赛道中已占据了有利的注意力生态位。

此外Noya.ai核心合约通过 Code4rena 与 Hacken 双重审计，并接入 Hacken Extractor。

六、代币经济模型设计及治理

NOYA 采用单代币（Single-token）生态模型，以 $NOYA 作为唯一的价值承载与治理载体。

NOYA 采用回购销毁（Buyback & Burn） 价值捕获机制，协议层在 AI Agent、Omnivaults 与预测市场等产品中产生的价值，通过质押、治理、访问权限及回购销毁等机制实现价值承接，形成 使用 → 收费 → 回购价值闭环，将平台使用度转化为代币长期价值。

项目以 Fair Launch 为核心原则，未引入天使轮或 VC 投资，而是通过低估值（$10M FDV）的公开社区轮（Launch-Raise）、Space Race 与空投完成分发，刻意为社区保留非对称上行空间，使筹码结构更偏向活跃用户与长期参与者；团队激励主要来自长期锁定的代币份额。

代币分配 (Distribution)

总供应量： 10 亿 (1,000,000,000) NOYA
初始流通量 (Low Float)：约 12%
估值与融资 (The Raise)：融资额：100万美金；估值 (FDV)： 1000万美金

七、预测智能体市场竞争分析

目前，预测市场智能体（Prediction Market Agent）赛道仍处于早期，项目数量有限，较具代表性的包括 Olas（Pearl Prediction Agents）、Warden（BetFlix） 与 Noya.ai。

从产品形态与用户参与方式看，各代表了目前预测市场智能体赛道的三类路径：

1）Olas（Pearl Prediction Agents）：Agent 产品化与可运行交付, 以“运行一个自动化预测 Agent”为参与方式，将预测市场交易封装为可运行的 Agent：用户注资并运行，系统自动完成信息获取、概率判断、下注与结算。需要额外安装的参与方式对普通用户的友好度相对有限。

2）Warden（BetFlix）：交互分发与消费级投注平台 , 通过低门槛、强娱乐性的交互体验吸引用户参与，采用交互与分发导向路径，以游戏化、内容化前端降低参与成本，强调预测市场的消费与娱乐属性。其竞争优势主要来自用户增长与分发效率，而非策略或执行层深度。

3）NOYA.ai：以“资金托管 + 策略代执行”为核心，通过 Vault 将预测市场与 DeFi 执行抽象为资管产品，提供低操作、低心智负担的参与方式。若后续叠加 Prediction Market Intelligence 与 Agent 执行模块，有望形成“研究 — 执行 — 监控”的一体化工作流。

与 Giza、Almanak 等已实现明确产品交付的 AgentFi 项目相比，NOYA 的 DeFi Agent 目前仍处于相对早期阶段。但 NOYA 的差异化在于其定位与切入层级：其以约 $10M FDV 的公平启动估值进入同一执行与资管叙事赛道，在现阶段具备显著的估值折价与增长潜力。

NOYA：以 Omnichain Vault 为核心的资管封装型 AgentFi 项目，当前交付重点集中在跨链执行与风险控制等基础设施层，上层的 Agent 执行、预测市场能力及 ZKML 相关机制仍处于开发与验证阶段。
Giza：可直接运行资管策略（ARMA、Pulse），目前 AgentFi 产品完成度最高。
Almanak：定位于 AI Quant for DeFi，通过模型与量化框架输出策略与风险信号，主要面向专业资金与策略管理需求，强调方法论的系统性与结果的可复现性。
Theoriq：以多智能体协作（Agent Swarms）为核心的策略与执行框架，强调可扩展的 Agent 协作体系与中长期基础设施叙事，更偏向底层能力建设。
Infinit：偏执行层的 Agentic DeFi 终端，通过“意图 → 多步链上操作”的流程编排，显著降低复杂 DeFi 操作的执行门槛，用户对产品价值的感知相对直接。

八、总结：商业逻辑、工程实现及潜在风险

商业逻辑：
NOYA 是当前市场中较为少见的 AI Agent × Prediction Market × ZKML 多重叙事叠加标的，并进一步结合了 Intent 驱动执行 的产品方向。在资产定价层面，其以约 $10M FDV 启动，明显低于同类 AI / DeFAI / Prediction 相关项目常见的 $75M–$100M 区间估值，形成一定的结构性价差。

从设计上看，NOYA 试图将 策略执行（Vault / Agent） 与 信息优势（Prediction Market Intelligence） 统一到同一执行框架中，并通过协议收入回流（fees → buyback & burn）建立价值捕获闭环。尽管项目仍处于早期阶段，但在多叙事叠加与低估值起点的共同作用下，其风险 — 收益结构更接近一类高赔率、非对称博弈标的。

工程实现： 在可验证的交付层面，NOYA 当前已上线的核心功能为 Omnichain Vaults，提供跨链资产调度、收益策略执行与延迟结算机制，工程实现相对偏基础。其愿景中强调的 Prediction Market Intelligence（Copilot）、NOYA AI Agent 以及 ZKML 驱动的可验证执行仍处于开发阶段，尚未在主网形成完整闭环。现阶段并非成熟的 DeFAI 平台。

潜在风险与关注要点

交付不确定性： 从“基础 Vault”到“全能 Agent”的技术跨度极大，需警惕 Roadmap 延期或 ZKML 落地不及预期的风险。
潜在系统风险： 包含合约安全、跨链桥故障以及预测市场特有的预言机争议（如规则模糊导致无法裁决），任何单点故障都可能造成资金损耗。

Reinforcement Learning: The Paradigm Shift of Decentralized AI

BroadNotes by 0xjacobzhao — Tue, 23 Dec 2025 05:26:49 GMT

Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao

This independent research report is supported by IOSG Ventures. The research and writing process was inspired by Sam Lehman (Pantera Capital) ’s work on reinforcement learning. Thanks to Ben Fielding (Gensyn.ai), Gao Yuan(Gradient), Samuel Dare & Erfan Miahi (Covenant AI), Shashank Yadav (Fraction AI), Chao Wang for their valuable suggestions on this article. This article strives for objectivity and accuracy, but some viewpoints involve subjective judgment and may contain biases. We appreciate the readers’ understanding.

Artificial intelligence is shifting from pattern-based statistical learning toward structured reasoning systems, with post-training — especially reinforcement learning — becoming central to capability scaling. DeepSeek-R1 signals a paradigm shift: reinforcement learning now demonstrably improves reasoning depth and complex decision-making, evolving from a mere alignment tool into a continuous intelligence-enhancement pathway.

In parallel, Web3 is reshaping AI production via decentralized compute and crypto incentives, whose verifiability and coordination align naturally with reinforcement learning’s needs. This report examines AI training paradigms and reinforcement learning fundamentals, highlights the structural advantages of “Reinforcement Learning × Web3,” and analyzes Prime Intellect, Gensyn, Nous Research, Gradient, Grail and Fraction AI.

I. Three Stages of AI Training

Modern LLM training spans three stages — pre-training, supervised fine-tuning (SFT), and post-training/reinforcement learning — corresponding to building a world model, injecting task capabilities, and shaping reasoning and values. Their computational and verification characteristics determine how compatible they are with decentralization.

Pre-training: establishes the core statistical and multimodal foundations via massive self-supervised learning, consuming 80–95% of total cost and requiring tightly synchronized, homogeneous GPU clusters and high-bandwidth data access, making it inherently centralized.
Supervised Fine-tuning (SFT): adds task and instruction capabilities with smaller datasets and lower cost (5–15%), often using PEFT methods such as LoRA or Q-LoRA, but still depends on gradient synchronization, limiting decentralization.
Post-training: Post-training consists of multiple iterative stages that shape a model’s reasoning ability, values, and safety boundaries. It includes both RL-based approaches (e.g. RLHF, RLAIF, GRPO), non-RL preference optimization (e.g. DPO), and process reward models (PRM). With lower data and cost requirements (around 5–10%), computation focuses on rollouts and policy updates. Its native support for asynchronous, distributed execution — often without requiring full model weights — makes post-training the phase best suited for Web3-based decentralized training networks when combined with verifiable computation and on-chain incentives.

II. Reinforcement Learning Technology Landscape

2.1 System Architecture of Reinforcement Learning

Reinforcement learning enables models to improve decision-making through a feedback loop of environment interaction, reward signals, and policy updates. Structurally, an RL system consists of three core components: the policy network, rollout for experience sampling, and the learner for policy optimization. The policy generates trajectories through interaction with the environment, while the learner updates the policy based on rewards, forming a continuous iterative learning process.

Policy Network (Policy): Generates actions from environmental states and is the decision-making core of the system. It requires centralized backpropagation to maintain consistency during training; during inference, it can be distributed to different nodes for parallel operation.
Experience Sampling (Rollout): Nodes execute environment interactions based on the policy, generating state-action-reward trajectories. This process is highly parallel, has extremely low communication, is insensitive to hardware differences, and is the most suitable component for expansion in decentralization.
Learner: Aggregates all Rollout trajectories and executes policy gradient updates. It is the only module with the highest requirements for computing power and bandwidth, so it is usually kept centralized or lightly centralized to ensure convergence stability.

2.2 Reinforcement Learning Stage Framework

Reinforcement learning can usually be divided into five stages, and the overall process as follows:

Data Generation Stage (Policy Exploration): Given a prompt, the policy samples multiple reasoning chains or trajectories, supplying the candidates for preference evaluation and reward modeling and defining the scope of policy exploration.

Preference Feedback Stage (RLHF / RLAIF):

RLHF (Reinforcement Learning from Human Feedback): trains a reward model from human preferences and then uses RL (typically PPO) to optimize the policy based on that reward signal.
RLAIF (Reinforcement Learning from AI Feedback): replaces humans with AI judges or constitutional rules, cutting costs and scaling alignment — now the dominant approach for Anthropic, OpenAI, and DeepSeek.

Reward Modeling Stage (Reward Modeling): Learns to map outputs to rewards based on preference pairs. RM teaches the model “what is the correct answer,” while PRM teaches the model “how to reason correctly.”

RM (Reward Model): Used to evaluate the quality of the final answer, scoring only the output.
Process Reward Model (PRM): scores step-by-step reasoning, effectively training the model’s reasoning process (e.g., in o1 and DeepSeek-R1).

Reward Verification (RLVR / Reward Verifiability): A reward-verification layer constrains reward signals to be derived from reproducible rules, ground-truth facts, or consensus mechanisms. This reduces reward hacking and systemic bias, and improves auditability and robustness in open and distributed training environments.

Policy Optimization Stage (Policy Optimization): Updates policy parameters $\theta$ under the guidance of signals given by the reward model to obtain a policy $\pi_{\theta’}$ with stronger reasoning capabilities, higher safety, and more stable behavioral patterns. Mainstream optimization methods include:

PPO (Proximal Policy Optimization): the standard RLHF optimizer, valued for stability but limited by slow convergence in complex reasoning.
GRPO (Group Relative Policy Optimization): introduced by DeepSeek-R1, optimizes policies using group-level advantage estimates rather than simple ranking, preserving value magnitude and enabling more stable reasoning-chain optimization.
DPO (Direct Preference Optimization): bypasses RL by optimizing directly on preference pairs — cheap and stable for alignment, but ineffective at improving reasoning.

New Policy Deployment Stage (New Policy Deployment): the updated model shows stronger System-2 reasoning, better preference alignment, fewer hallucinations, and higher safety, and continues to improve through iterative feedback loops.

2.3 Industrial Applications of Reinforcement Learning

Reinforcement Learning (RL) has evolved from early game intelligence to a core framework for cross-industry autonomous decision-making. Its application scenarios, based on technological maturity and industrial implementation, can be summarized into five major categories:

Game & Strategy: The earliest direction where RL was verified. In environments with “perfect information + clear rewards” like AlphaGo, AlphaZero, AlphaStar, and OpenAI Five, RL demonstrated decision intelligence comparable to or surpassing human experts, laying the foundation for modern RL algorithms.
Robotics & Embodied AI: Through continuous control, dynamics modeling, and environmental interaction, RL enables robots to learn manipulation, motion control, and cross-modal tasks (e.g., RT-2, RT-X). It is rapidly moving towards industrialization and is a key technical route for real-world robot deployment.
Digital Reasoning / LLM System-2: RL + PRM drives large models from “language imitation” to “structured reasoning.” Representative achievements include DeepSeek-R1, OpenAI o1/o3, Anthropic Claude, and AlphaGeometry. Essentially, it performs reward optimization at the reasoning chain level rather than just evaluating the final answer.
Scientific Discovery & Math Optimization: RL finds optimal structures or strategies in label-free, complex reward, and huge search spaces. It has achieved foundational breakthroughs in AlphaTensor, AlphaDev, and Fusion RL, showing exploration capabilities beyond human intuition.
Economic Decision-making & Trading: RL is used for strategy optimization, high-dimensional risk control, and adaptive trading system generation. Compared to traditional quantitative models, it can learn continuously in uncertain environments and is an important component of intelligent finance.

III. Natural Match Between Reinforcement Learning and Web3

Reinforcement learning and Web3 are naturally aligned as incentive-driven systems: RL optimizes behavior through rewards, while blockchains coordinate participants through economic incentives. RL’s core needs — large-scale heterogeneous rollouts, reward distribution, and verifiable execution — map directly onto Web3’s structural strengths.

Decoupling of Reasoning and Training: Reinforcement learning separates into rollout and update phases: rollouts are compute-heavy but communication-light and can run in parallel on distributed consumer GPUs, while updates require centralized, high-bandwidth resources. This decoupling lets open networks handle rollouts with token incentives, while centralized updates maintain training stability.
Verifiability: ZK (Zero-Knowledge) and Proof-of-Learning provide means to verify whether nodes truly executed reasoning, solving the honesty problem in open networks. In deterministic tasks like code and mathematical reasoning, verifiers only need to check the answer to confirm the workload, significantly improving the credibility of decentralized RL systems.
Incentive Layer, Token Economy-Based Feedback Production Mechanism: Web3 token incentives can directly reward RLHF/RLAIF feedback contributors, enabling transparent, permissionless preference generation, with staking and slashing enforcing quality more efficiently than traditional crowdsourcing.
Potential for Multi-Agent Reinforcement Learning (MARL): Blockchains form open, incentive-driven multi-agent environments with public state, verifiable execution, and programmable incentives, making them a natural testbed for large-scale MARL despite the field still being early.

IV. Analysis of Web3 + Reinforcement Learning Projects

Based on the above theoretical framework, we will briefly analyze the most representative projects in the current ecosystem:

Prime Intellect: Asynchronous Reinforcement Learning prime-rl

Prime Intellect aims to build an open global compute market and open-source superintelligence stack, spanning Prime Compute, the INTELLECT model family, open RL environments, and large-scale synthetic data engines. Its core prime-rl framework is purpose-built for asynchronous distributed RL, complemented by OpenDiLoCo for bandwidth-efficient training and TopLoc for verification.

Prime Intellect Core Infrastructure Components Overview

Technical Cornerstone: prime-rl Asynchronous Reinforcement Learning Framework

prime-rl is Prime Intellect’s core training engine, designed for large-scale asynchronous decentralized environments. It achieves high-throughput inference and stable updates through complete Actor–Learner decoupling. Executors (Rollout Workers) and Learners (Trainers) do not block synchronously. Nodes can join or leave at any time, only needing to continuously pull the latest policy and upload generated data:

Actor (Rollout Workers): Responsible for model inference and data generation. Prime Intellect innovatively integrated the vLLM inference engine at the Actor end. vLLM’s PagedAttention technology and Continuous Batching capability allow Actors to generate inference trajectories with extremely high throughput.
Learner (Trainer): Responsible for policy optimization. The Learner asynchronously pulls data from the shared Experience Buffer for gradient updates without waiting for all Actors to complete the current batch.
Orchestrator: Responsible for scheduling model weights and data flow.

Key Innovations of prime-rl:

True Asynchrony: prime-rl abandons the traditional synchronous paradigm of PPO, does not wait for slow nodes, and does not require batch alignment, enabling any number and performance of GPUs to access at any time, establishing the feasibility of decentralized RL.
Deep Integration of FSDP2 and MoE: Through FSDP2 parameter sharding and MoE sparse activation, prime-rl allows tens of billions of parameters models to be efficiently trained in distributed environments. Actors only run active experts, significantly reducing VRAM and inference costs.
GRPO+ (Group Relative Policy Optimization): GRPO eliminates the Critic network, significantly reducing computation and VRAM overhead, naturally adapting to asynchronous environments. prime-rl’s GRPO+ ensures reliable convergence under high latency conditions through stabilization mechanisms.

INTELLECT Model Family: A Symbol of Decentralized RL Technology Maturity

INTELLECT-1 (10B, Oct 2024): Proved for the first time that OpenDiLoCo can train efficiently in a heterogeneous network across three continents (communication share < 2%, compute utilization 98%), breaking physical perceptions of cross-region training.
INTELLECT-2 (32B, Apr 2025): As the first Permissionless RL model, it validates the stable convergence capability of prime-rl and GRPO+ in multi-step latency and asynchronous environments, realizing decentralized RL with global open computing participation.
INTELLECT-3 (106B MoE, Nov 2025): Adopts a sparse architecture activating only 12B parameters, trained on 512×H200 and achieving flagship inference performance (AIME 90.8%, GPQA 74.4%, MMLU-Pro 81.9%, etc.). Overall performance approaches or surpasses centralized closed-source models far larger than itself.

Prime Intellect has built a full decentralized RL stack: OpenDiLoCo cuts cross-region training traffic by orders of magnitude while sustaining ~98% utilization across continents; TopLoc and Verifiers ensure trustworthy inference and reward data via activation fingerprints and sandboxed verification; and the SYNTHETIC data engine generates high-quality reasoning chains while enabling large models to run efficiently on consumer GPUs through pipeline parallelism. Together, these components underpin scalable data generation, verification, and inference in decentralized RL, with the INTELLECT series demonstrating that such systems can deliver world-class models in practice.

Gensyn: RL Core Stack RL Swarm and SAPO

Gensyn seeks to unify global idle compute into a trustless, scalable AI training network, combining standardized execution, P2P coordination, and on-chain task verification. Through mechanisms like RL Swarm, SAPO, and SkipPipe, it decouples generation, evaluation, and updates across heterogeneous GPUs, delivering not just compute, but verifiable intelligence.

RL Applications in the Gensyn Stack

RL Swarm: Decentralized Collaborative Reinforcement Learning Engine

RL Swarm demonstrates a brand new collaboration mode. It is no longer simple task distribution, but an infinite loop of a decentralized generate–evaluate–update loop inspired by collaborative learning simulating human social learning:

Solvers (Executors): Responsible for local model inference and Rollout generation, unimpeded by node heterogeneity. Gensyn integrates high-throughput inference engines (like CodeZero) locally to output complete trajectories rather than just answers.
Proposers: Dynamically generate tasks (math problems, code questions, etc.), enabling task diversity and curriculum-like adaptation to adapt training difficulty to model capabilities.
Evaluators: Use frozen “Judge Models” or rules to check output quality, forming local reward signals evaluated independently by each node. The evaluation process can be audited, reducing room for malice.

The three form a P2P RL organizational structure that can complete large-scale collaborative learning without centralized scheduling.

SAPO: Policy Optimization Algorithm Reconstructed for Decentralization

SAPO (Swarm Sampling Policy Optimization) centers on sharing rollouts while filtering those without gradient signal, rather than sharing gradients. By enabling large-scale decentralized rollout sampling and treating received rollouts as locally generated, SAPO maintains stable convergence in environments without central coordination and with significant node latency heterogeneity. Compared to PPO (which relies on a critic network that dominates computational cost) or GRPO (which relies on group-level advantage estimation rather than simple ranking), SAPO allows consumer-grade GPUs to participate effectively in large-scale RL optimization with extremely low bandwidth requirements.

Through RL Swarm and SAPO, Gensyn demonstrates that reinforcement learning — particularly post-training RLVR — naturally fits decentralized architectures, as it depends more on diverse exploration via rollouts than on high-frequency parameter synchronization. Combined with PoL and Verde verification systems, Gensyn offers an alternative path toward training trillion-parameter models: a self-evolving superintelligence network composed of millions of heterogeneous GPUs worldwide.

Nous Research: Reinforcement Learning Environment Atropos

Nous Research is building a decentralized, self-evolving cognitive stack, where components like Hermes, Atropos, DisTrO, Psyche, and World Sim form a closed-loop intelligence system. Using RL methods such as DPO, GRPO, and rejection sampling, it replaces linear training pipelines with continuous feedback across data generation, learning, and inference.

Nous Research Components Overview

Model Layer: Hermes and the Evolution of Reasoning Capabilities

The Hermes series is the main model interface of Nous Research facing users. Its evolution clearly demonstrates the industry path migrating from traditional SFT/DPO alignment to Reasoning RL:

Hermes 1–3: Instruction Alignment & Early Agent Capabilities: Hermes 1–3 relied on low-cost DPO for robust instruction alignment and leveraged synthetic data and the first introduction of Atropos verification mechanisms in Hermes 3.
Hermes 4 / DeepHermes: Writes System-2 style slow thinking into weights via Chain-of-Thought, improving math and code performance with Test-Time Scaling, and relying on “Rejection Sampling + Atropos Verification” to build high-purity reasoning data.
DeepHermes further adopts GRPO to replace PPO (which is hard to implement mainly), enabling Reasoning RL to run on the Psyche decentralized GPU network, laying the engineering foundation for the scalability of open-source Reasoning RL.

Atropos: Verifiable Reward-Driven Reinforcement Learning Environment

Atropos is the true hub of the Nous RL system. It encapsulates prompts, tool calls, code execution, and multi-turn interactions into a standardized RL environment, directly verifying whether outputs are correct, thus providing deterministic reward signals to replace expensive and unscalable human labeling. More importantly, in the decentralized training network Psyche, Atropos acts as a “judge” to verify if nodes truly improved the policy, supporting auditable Proof-of-Learning, fundamentally solving the reward credibility problem in distributed RL.

DisTrO and Psyche: Optimizer Layer for Decentralized Reinforcement Learning

Traditional RLF (RLHF/RLAIF) training relies on centralized high-bandwidth clusters, a core barrier that open source cannot replicate. DisTrO reduces RL communication costs by orders of magnitude through momentum decoupling and gradient compression, enabling training to run on internet bandwidth; Psyche deploys this training mechanism on an on-chain network, allowing nodes to complete inference, verification, reward evaluation, and weight updates locally, forming a complete RL closed loop.

In the Nous system, Atropos verifies chains of thought; DisTrO compresses training communication; Psyche runs the RL loop; World Sim provides complex environments; Forge collects real reasoning; Hermes writes all learning into weights. Reinforcement learning is not just a training stage, but the core protocol connecting data, environment, models, and infrastructure in the Nous architecture, making Hermes a living system capable of continuous self-improvement on an open computing network.

Gradient Network: Reinforcement Learning Architecture Echo

Gradient Network aims to rebuild AI compute via an Open Intelligence Stack: a modular set of interoperable protocols spanning P2P communication (Lattica), distributed inference (Parallax), decentralized RL training (Echo), verification (VeriLLM), simulation (Mirage), and higher-level memory and agent coordination — together forming an evolving decentralized intelligence infrastructure.

Echo — Reinforcement Learning Training Architecture

Echo is Gradient’s reinforcement learning framework. Its core design principle lies in decoupling training, inference, and data (reward) pathways in reinforcement learning, running them separately in heterogeneous Inference Swarm and Training Swarm, maintaining stable optimization behavior across wide-area heterogeneous environments with lightweight synchronization protocols. This effectively mitigates the SPMD failures and GPU utilization bottlenecks caused by mixing inference and training in traditional DeepSpeed RLHF / VERL.

Echo uses an “Inference-Training Dual Swarm Architecture” to maximize computing power utilization. The two swarms run independently without blocking each other:

Maximize Sampling Throughput: The Inference Swarm consists of consumer-grade GPUs and edge devices, building high-throughput samplers via pipeline-parallel with Parallax, focusing on trajectory generation.
Maximize Gradient Computing Power: The Training Swarm can run on centralized clusters or globally distributed consumer-grade GPU networks, responsible for gradient updates, parameter synchronization, and LoRA fine-tuning, focusing on the learning process.

To maintain policy and data consistency, Echo provides two types of lightweight synchronization protocols: Sequential and Asynchronous, managing bidirectional consistency of policy weights and trajectories:

Sequential Pull Mode (Accuracy First): The training side forces inference nodes to refresh the model version before pulling new trajectories to ensure trajectory freshness, suitable for tasks highly sensitive to policy staleness.
Asynchronous Push–Pull Mode (Efficiency First): The inference side continuously generates trajectories with version tags, and the training side consumes them at its own pace. The coordinator monitors version deviation and triggers weight refreshes, maximizing device utilization.

At the bottom layer, Echo is built upon Parallax (heterogeneous inference in low-bandwidth environments) and lightweight distributed training components (e.g., VERL), relying on LoRA to reduce cross-node synchronization costs, enabling reinforcement learning to run stably on global heterogeneous networks.

Grail: Reinforcement Learning in the Bittensor Ecosystem

Bittensor constructs a huge, sparse, non-stationary reward function network through its unique Yuma consensus mechanism.

Covenant AI in the Bittensor ecosystem builds a vertically integrated pipeline from pre-training to RL post-training through SN3 Templar, SN39 Basilica, and SN81 Grail. Among them, SN3 Templar is responsible for base model pre-training, SN39 Basilica provides a distributed computing power market, and SN81 Grail serves as the “verifiable inference layer” for RL post-training, carrying the core processes of RLHF / RLAIF and completing the closed-loop optimization from base model to aligned policy.

GRAIL cryptographically verifies RL rollouts and binds them to model identity, enabling trustless RLHF. It uses deterministic challenges to prevent pre-computation, low-cost sampling and commitments to verify rollouts, and model fingerprinting to detect substitution or replay — establishing end-to-end authenticity for RL inference trajectories.

Grail’s subnet implements a verifiable GRPO-style post-training loop: miners produce multiple reasoning paths, validators score correctness and reasoning quality, and normalized results are written on-chain. Public tests raised Qwen2.5–1.5B MATH accuracy from 12.7% to 47.6%, showing both cheat resistance and strong capability gains; in Covenant AI, Grail serves as the trust and execution core for decentralized RLVR/RLAIF.

Fraction AI: Competition-Based Reinforcement Learning RLFC

Fraction AI reframes alignment as Reinforcement Learning from Competition, using gamified labeling and agent-versus-agent contests. Relative rankings and AI judge scores replace static human labels, turning RLHF into a continuous, competitive multi-agent game.

Core Differences Between Traditional RLHF and Fraction AI’s RLFC:

RLFC’s core value is that rewards come from evolving opponents and evaluators, not a single model, reducing reward hacking and preserving policy diversity. Space design shapes the game dynamics, enabling complex competitive and cooperative behaviors.

In system architecture, Fraction AI disassembles the training process into four key components:

Agents: Lightweight policy units based on open-source LLMs, extended via QLoRA with differential weights for low-cost updates.
Spaces: Isolated task domain environments where agents pay to enter and earn rewards by winning.
AI Judges: Immediate reward layer built with RLAIF, providing scalable, decentralized evaluation.
Proof-of-Learning: Binds policy updates to specific competition results, ensuring the training process is verifiable and cheat-proof.

Fraction AI functions as a human–machine co-evolution engine: users act as meta-optimizers guiding exploration, while agents compete to generate high-quality preference data, enabling trustless, commercialized fine-tuning.

Comparison of Web3 Reinforcement Learning Project Architectures

V. The Path and Opportunity of Reinforcement Learning × Web3

Across these frontier projects, despite differing entry points, RL combined with Web3 consistently converges on a shared “decoupling–verification–incentive” architecture — an inevitable outcome of adapting reinforcement learning to decentralized networks.

General Architecture Features of Reinforcement Learning: Solving Core Physical Limits and Trust Issues

Decoupling of Rollouts & Learning (Physical Separation of Inference/Training) — Default Computing Topology: Communication-sparse, parallelizable Rollouts are outsourced to global consumer-grade GPUs, while high-bandwidth parameter updates are concentrated in a few training nodes. This is true from Prime Intellect’s asynchronous Actor–Learner to Gradient Echo’s dual-swarm architecture.
Verification-Driven Trust — Infrastructuralization: In permissionless networks, computational authenticity must be forcibly guaranteed through mathematics and mechanism design. Representative implementations include Gensyn’s PoL, Prime Intellect’s TopLoc, and Grail’s cryptographic verification.
Tokenized Incentive Loop — Market Self-Regulation: Computing supply, data generation, verification sorting, and reward distribution form a closed loop. Rewards drive participation, and Slashing suppresses cheating, keeping the network stable and continuously evolving in an open environment.

Differentiated Technical Paths: Different “Breakthrough Points” Under Consistent Architecture

Although architectures are converging, projects choose different technical moats based on their DNA:

Algorithm Breakthrough School (Nous Research): Tackles distributed training’s bandwidth bottleneck at the optimizer level — DisTrO compresses gradient communication by orders of magnitude, aiming to enable large-model training over home broadband.
Systems Engineering School (Prime Intellect, Gensyn, Gradient): Focuses on building the next generation “AI Runtime System.” Prime Intellect’s ShardCast and Gradient’s Parallax are designed to squeeze the highest efficiency out of heterogeneous clusters under existing network conditions through extreme engineering means.
Market Game School (Bittensor, Fraction AI): Focuses on the design of Reward Functions. By designing sophisticated scoring mechanisms, they guide miners to spontaneously find optimal strategies to accelerate the emergence of intelligence.

Advantages, Challenges, and Endgame Outlook

Under the paradigm of Reinforcement Learning combined with Web3, system-level advantages are first reflected in the rewriting of cost structures and governance structures.

Cost Reshaping: RL Post-training has unlimited demand for sampling (Rollout). Web3 can mobilize global long-tail computing power at extremely low costs, a cost advantage difficult for centralized cloud providers to match.
Sovereign Alignment: Breaking the monopoly of big tech on AI values (Alignment). The community can decide “what is a good answer” for the model through Token voting, realizing the democratization of AI governance.

At the same time, this system faces two structural constraints:

Bandwidth Wall: Despite innovations like DisTrO, physical latency still limits the full training of ultra-large parameter models (70B+). Currently, Web3 AI is more limited to fine-tuning and inference.
Reward Hacking (Goodhart’s Law): In highly incentivized networks, miners are extremely prone to “overfitting” reward rules (gaming the system) rather than improving real intelligence. Designing cheat-proof robust reward functions is an eternal game.
Malicious Byzantine workers: refer to the deliberate manipulation and poisoning of training signals to disrupt model convergence. The core challenge is not the continual design of cheat-resistant reward functions, but mechanisms with adversarial robustness.

RL and Web3 are reshaping intelligence via decentralized rollout networks, on-chain assetized feedback, and vertical RL agents with direct value capture. The true opportunity is not a decentralized OpenAI, but new intelligence production relations — open compute markets, governable rewards and preferences, and shared value across trainers, aligners, and users.

Disclaimer: This article was completed with the assistance of AI tools ChatGPT-5 and Gemini 3. The author has made every effort to proofread and ensure information authenticity and accuracy, but omissions may still exist. Please understand. It should be specially noted that the crypto asset market often experiences divergences between project fundamentals and secondary market price performance. The content of this article is for information integration and academic/research exchange only and does not constitute any investment advice, nor should it be considered a recommendation to buy or sell any tokens.

强化学习：去中心化 AI 网络的范式变迁

BroadNotes by 0xjacobzhao — Tue, 23 Dec 2025 01:32:20 GMT

作者：0xjacobzhao | https://linktr.ee/0xjacobzhao

本独立研报由IOSG Ventures支持，研究与写作过程受 Sam Lehman（Pantera Capital） 强化学习研报的启发，感谢 Ben Fielding (Gensyn.ai), Gao Yuan(Gradient), Samuel Dare & Erfan Miahi (Covenant AI), Shashank Yadav (Fraction AI), Chao Wang 对本文提出的宝贵建议。本文力求内容客观准确，部分观点涉及主观判断，难免存在偏差，敬请读者予以理解。

人工智能正从以“模式拟合”为主的统计学习，迈向以“结构化推理”为核心的能力体系，后训练（Post-training）的重要性快速上升。DeepSeek-R1 的出现标志着强化学习在大模型时代的范式级翻身，行业共识形成：预训练构建模型的通用能力基座，强化学习不再只是价值对齐工具，而被证明能够系统提升推理链质量与复杂决策能力，正逐步演化为持续提升智能水平的技术路径。

与此同时，Web3 正通过去中心化算力网络与加密激励体系重构 AI 的生产关系，而强化学习对 rollout 采样、奖励信号与可验证训练的结构性需求，恰与区块链的算力协作、激励分配与可验证执行天然契合。本研报将系统拆解 AI 训练范式与强化学习技术原理，论证强化学习 × Web3 的结构优势，并对 Prime Intellect、Gensyn、Nous Research、Gradient、Grail和Fraction AI等项目进行分析。

一. AI 训练的三阶段：预训练、指令微调与后训练对齐

现代大语言模型（LLM）训练全生命周期通常被划分为三个核心阶段：预训练（Pre-training）、监督微调（SFT）和后训练（Post-training/RL）。三者分别承担“构建世界模型 — 注入任务能力 — 塑造推理与价值观”的功能，其计算结构、数据要求与验证难度决定了去中心化的匹配程度。

预训练（Pre-training） 通过大规模自监督学习（Self-supervised Learning）构建模型的语言统计结构与跨模态世界模型，是 LLM 能力的根基。此阶段需在万亿级语料上以全局同步方式训练，依赖数千至数万张 H100 的同构集群，成本占比高达 80–95%，对带宽与数据版权极度敏感，因此必须在高度集中式环境中完成。
微调（Supervised Fine-tuning）用于注入任务能力与指令格式，数据量小、成本占比约 5–15%，微调既可以进行全参训练，也可以采用参数高效微调（PEFT）方法，其中 LoRA、Q-LoRA 与 Adapter 是工业界主流。但仍需同步梯度，使其去中心化潜力有限。
后训练（Post-training）由多个迭代子阶段构成，决定模型的推理能力、价值观与安全边界，其方法既包括强化学习体系（RLHF、RLAIF、GRPO）也包括无 RL 的偏好优化方法（DPO），以及过程奖励模型（PRM）等。该阶段数据量与成本较低（5–10%），主要集中在 Rollout 与策略更新；其天然支持异步与分布式执行，节点无需持有完整权重，结合可验证计算与链上激励可形成开放的去中心化训练网络，是最适配 Web3 的训练环节。

二. 强化学习技术全景：架构、框架与应用

2.1 强化学习的系统架构与核心环节

强化学习（Reinforcement Learning, RL）通过“环境交互 — 奖励反馈 — 策略更新”驱动模型自主改进决策能力，其核心结构可视为由状态、动作、奖励与策略构成的反馈闭环。一个完整的 RL 系统通常包含三类组件：Policy（策略网络）、Rollout（经验采样）与 Learner（策略更新器）。策略与环境交互生成轨迹，Learner 根据奖励信号更新策略，从而形成持续迭代、不断优化的学习过程：

策略网络（Policy）：从环境状态生成动作，是系统的决策核心。训练时需集中式反向传播维持一致性；推理时可分发至不同节点并行运行。
经验采样（Rollout）：节点根据策略执行环境交互，生成状态 — 动作 — 奖励等轨迹。该过程高度并行、通信极低，对硬件差异不敏感是最适合在去中心化中扩展的环节。
学习器（Learner）：聚合全部 Rollout 轨迹并执行策略梯度更新，是唯一对算力、带宽要求最高的模块，因此通常保持中心化或轻中心化部署以确保收敛稳定性。

2.2 强化学习阶段框架（RLHF → RLAIF → PRM → GRPO）

强化学习通常可分为五个阶段，整体流程如下所述：

数据生成阶段（Policy Exploration）：在给定输入提示的条件下，策略模型 πθ 生成多条候选推理链或完整轨迹，为后续偏好评估与奖励建模提供样本基础，决定了策略探索的广度。
偏好反馈阶段（RLHF / RLAIF）：
RLHF（Reinforcement Learning from Human Feedback）通过多候选回答、人工偏好标注、训练奖励模型（RM）并用 PPO 优化策略，使模型输出更符合人类价值观，是 GPT-3.5 → GPT-4 的关键一环
RLAIF（Reinforcement Learning from AI Feedback）以 AI Judge 或宪法式规则替代人工标注，实现偏好获取自动化，显著降低成本并具备规模化特性，已成为 Anthropic、OpenAI、DeepSeek 等的主流对齐范式。
奖励建模阶段（Reward Modeling）：偏好对输入奖励模型，学习将输出映射为奖励。RM 教模型“什么是正确答案”，PRM 教模型“如何进行正确推理”。
RM（Reward Model）用于评估最终答案的好坏，仅对输出打分：
过程奖励模型PRM（Process Reward Model）它不再只评估最终答案，而是为每一步推理、每个 token、每个逻辑段打分，也是 OpenAI o1 与 DeepSeek-R1 的关键技术，本质上是在“教模型如何思考”。
奖励验证阶段（RLVR / Reward Verifiability）：在奖励信号生成与使用过程中引入“可验证约束”，使奖励尽可能来自可复现的规则、事实或共识，从而降低 reward hacking 与偏差风险，并提升在开放环境中的可审计性与可扩展性。
策略优化阶段（Policy Optimization）：是在奖励模型给出的信号指导下更新策略参数 θ，以得到更强推理能力、更高安全性与更稳定行为模式的策略 πθ′。主流优化方式包括：
PPO（Proximal Policy Optimization）： RLHF 的传统优化器，以稳定性见长，但在复杂推理任务中往往面临收敛慢、稳定性不足等局限。
GRPO（Group Relative Policy Optimization）：是 DeepSeek-R1 的核心创新，通过对候选答案组内优势分布进行建模以估计期望价值，而非简单排序。该方法保留了奖励幅度信息，更适合推理链优化，训练过程更稳定，被视为继 PPO 之后面向深度推理场景的重要强化学习优化框架。
DPO（Direct Preference Optimization）：非强化学习的后训练方法：不生成轨迹、不建奖励模型，而是直接在偏好对上做优化，成本低、效果稳定，因而被广泛用于 Llama、Gemma 等开源模型的对齐，但不提升推理能力。
新策略部署阶段（New Policy Deployment）：经过优化后的模型表现为：更强的推理链生成能力（System-2 Reasoning）、更符合人类或 AI 偏好的行为、更低的幻觉率、更高的安全性。模型在持续迭代中不断学习偏好、优化过程、提升决策质量，形成闭环。

2.3 强化学习的产业应用五大分类

强化学习（Reinforcement Learning）已从早期的博弈智能演进为跨产业的自主决策核心框架，其应用场景按照技术成熟度与产业落地程度，可归纳为五大类别，并在各自方向推动了关键突破。

博弈与策略系统（Game & Strategy）：是 RL 最早被验证的方向，在 AlphaGo、AlphaZero、AlphaStar、OpenAI Five 等“完美信息 + 明确奖励”的环境中，RL 展示了可与人类专家比肩甚至超越的决策智能，为现代 RL 算法奠定基础。
机器人与具身智能（Embodied AI）：RL 通过连续控制、动力学建模与环境交互，使机器人学习操控、运动控制和跨模态任务（如 RT-2、RT-X），正快速迈向产业化，是现实世界机器人落地的关键技术路线。
数字推理（Digital Reasoning / LLM System-2）：RL + PRM 推动大模型从“语言模仿”走向“结构化推理”，代表成果包括 DeepSeek-R1、OpenAI o1/o3、Anthropic Claude 及 AlphaGeometry，其本质是在推理链层面进行奖励优化，而非仅评估最终答案。
自动化科学发现与数学优化（Scientific Discovery）：RL 在无标签、复杂奖励与巨大搜索空间中寻找最优结构或策略，已实现 AlphaTensor、AlphaDev、Fusion RL 等基础突破，展现出超越人类直觉的探索能力。
经济决策与交易系统（Economic Decision-making & Trading）：RL 被用于策略优化、高维风险控制与自适应交易系统生成，相较传统量化模型更能在不确定环境中持续学习，是智能金融的重要构成部分。

三. 强化学习与 Web3 的天然匹配

强化学习（RL）与 Web3 的高度契合，源于二者本质上都是“激励驱动系统”。RL 依赖奖励信号优化策略，区块链依靠经济激励协调参与者行为，使两者在机制层面天然一致。RL 的核心需求 — — 大规模异构 Rollout、奖励分配与真实性验证 — — 正是 Web3 的结构优势所在。

推理与训练解耦：强化学习的训练过程可明确拆分为两个阶段：

Rollout (探索采样)：模型基于当前策略生成大量数据，计算密集型但通信稀疏型的任务。它不需要节点间频繁通信，适合在全球分布的消费级 GPU 上并行生成。
Update (参数更新)：基于收集到的数据更新模型权重，需高带宽中心化节点完成。

“推理 — 训练解耦”天然契合去中心化的异构算力结构：Rollout 可外包给开放网络，通过代币机制按贡献结算，而模型更新保持集中化以确保稳定性。

可验证性 (Verifiability)：ZK 与 Proof-of-Learning 提供了验证节点是否真实执行推理的手段，解决了开放网络中的诚实性问题。在代码、数学推理等确定性任务中，验证者只需检查答案即可确认工作量，大幅提升去中心化 RL 系统的可信度。
激励层，基于代币经济的反馈生产机制：Web3 的代币机制可直接奖励 RLHF/RLAIF 的偏好反馈贡献者，使偏好数据生成具备透明、可结算、无需许可的激励结构；质押与削减（Staking/Slashing）进一步约束反馈质量，形成比传统众包更高效且对齐的反馈市场。
多智能体强化学习（MARL）潜力：区块链本质上是公开、透明、持续演化的多智能体环境，账户、合约与智能体不断在激励驱动下调整策略，使其天然具备构建大规模 MARL 实验场的潜力。尽管仍在早期，但其状态公开、执行可验证、激励可编程的特性，为未来 MARL 的发展提供了原则性优势。

四. 经典 Web3 + 强化学习项目解析

基于上述理论框架，我们将对当前生态中最具代表性的项目进行简要分析：

Prime Intellect: 异步强化学习范式 prime-rl

Prime Intellect 致力于构建全球开放算力市场，降低训练门槛、推动协作式去中心化训练，并发展完整的开源超级智能技术栈。其体系包括：Prime Compute（统一云/分布式算力环境）、INTELLECT 模型家族（10B–100B+）、开放强化学习环境中心（Environments Hub）、以及大规模合成数据引擎（SYNTHETIC-1/2）。

Prime Intellect 核心基础设施组件prime-rl 框架专为异步分布式环境设计与强化学习高度相关，其余包括突破带宽瓶颈的 OpenDiLoCo 通信协议、保障计算完整性的 TopLoc 验证机制等。

Prime Intellect 核心基础设施组件一览

技术基石：prime-rl 异步强化学习框架

prime-rl 是 Prime Intellect 的核心训练引擎，专为大规模异步去中心化环境设计，通过 Actor–Learner 完全解耦实现高吞吐推理与稳定更新。执行者(Rollout Worker) 与 学习者(Trainer) 不再同步阻塞，节点可随时加入或退出，只需持续拉取最新策略并上传生成数据即可：

执行者 Actor (Rollout Workers)：负责模型推理和数据生成。Prime Intellect 创新性地在 Actor 端集成了 vLLM 推理引擎。vLLM 的 PagedAttention 技术和连续批处理（Continuous Batching）能力，使得 Actor 能够以极高的吞吐量生成推理轨迹。
学习者 Learner (Trainer)：负责策略优化。Learner 从共享的经验回放缓冲区（Experience Buffer）中异步拉取数据进行梯度更新，无需等待所有 Actor 完成当前批次。
协调器 (Orchestrator)：负责调度模型权重与数据流。

prime-rl 的关键创新点：

完全异步（True Asynchrony）：prime-rl 摒弃传统 PPO 的同步范式，不等待慢节点、无需批次对齐，使任意数量与性能的 GPU 都能随时接入，奠定去中心化 RL 的可行性。
深度集成 FSDP2 与 MoE：通过 FSDP2 参数切片与 MoE 稀疏激活，prime-rl 让百亿级模型在分布式环境中高效训练，Actor 仅运行活跃专家，大幅降低显存与推理成本。
GRPO+（Group Relative Policy Optimization）：GRPO 免除 Critic 网络，显著减少计算与显存开销，天然适配异步环境，prime-rl 的 GRPO+ 更通过稳定化机制确保高延迟条件下的可靠收敛。

INTELLECT 模型家族：去中心化 RL 技术成熟度的标志

INTELLECT-1（10B，2024年10月）首次证明 OpenDiLoCo 能在跨三大洲的异构网络中高效训练（通信占比 <2%、算力利用率 98%），打破跨地域训练的物理认知；
INTELLECT-2（32B，2025年4月）作为首个 Permissionless RL 模型，验证 prime-rl 与 GRPO+ 在多步延迟、异步环境中的稳定收敛能力，实现全球开放算力参与的去中心化 RL；
INTELLECT-3（106B MoE，2025年11月）采用仅激活 12B 参数的稀疏架构，在 512×H200 上训练并实现旗舰级推理性能（AIME 90.8%、GPQA 74.4%、MMLU-Pro 81.9% 等），整体表现已逼近甚至超越规模远大于自身的中心化闭源模型。

Prime Intellect 此外还构建了数个支撑性基础设施：OpenDiLoCo 通过时间稀疏通信与量化权重差，将跨地域训练的通信量降低数百倍，使 INTELLECT-1 在跨三洲网络仍保持 98% 利用率；TopLoc + Verifiers 形成去中心化可信执行层，以激活指纹与沙箱验证确保推理与奖励数据的真实性；SYNTHETIC 数据引擎 则生产大规模高质量推理链，并通过流水线并行让 671B 模型在消费级 GPU 集群上高效运行。这些组件为去中心化 RL 的数据生成、验证与推理吞吐提供了关键的工程底座。INTELLECT 系列证明了这一技术栈可产生成熟的世界级模型，标志着去中心化训练体系从概念阶段进入实用阶段。

Gensyn：强化学习核心栈RL Swarm与SAPO

Gensyn 的目标是将全球闲置算力汇聚成一个开放、无需信任、可无限扩展的 AI 训练基础设施。其核心包括跨设备标准化执行层、点对点协调网络与无需信任的任务验证系统，并通过智能合约自动分配任务与奖励。围绕强化学习的特点，Gensyn 引入 RL Swarm、SAPO 与 SkipPipe 等核心机制等机制，将生成、评估、更新三个环节解耦，利用全球异构 GPU 组成的“蜂群”实现集体进化。其最终交付的不是单纯的算力，而是可验证的智能（Verifiable Intelligence）。

Gensyn堆栈的强化学习应用

RL Swarm：去中心化的协作式强化学习引擎

RL Swarm 展示了一种全新的协作模式。它不再是简单的任务分发，而是一个模拟人类社会学习的去中心化的“生成 — 评估 — 更新”循环，类比协作式学习过程，无限循环：

Solvers（执行者）： 负责本地模型推理与 Rollout 生成，节点异构无碍。Gensyn 在本地集成高吞吐推理引擎（如 CodeZero），可输出完整轨迹而非仅答案。
Proposers（出题者）： 动态生成任务（数学题、代码问题等），支持任务多样性与类 Curriculum Learning 的难度自适应。
Evaluators（评估者）： 使用冻结的“裁判模型”或规则对本地 Rollout 进行评估，生成本地奖励信号。评估过程可被审计，减少作恶空间。

三者共同组成一个 P2P 的 RL 组织结构，无需中心化调度即可完成大规模协作学习。

SAPO：为去中心化重构的策略优化算法： SAPO（Swarm Sampling Policy Optimization）以“共享 Rollout 并过滤无梯度信号样本，而非共享梯度”为核心，通过大规模去中心化的 Rollout 采样，并将接收的 Rollout 视为本地生成，从而在无中心协调、节点延迟差异显著的环境中保持稳定收敛。相较依赖 Critic 网络、计算成本较高的 PPO，或基于组内优势估计的 GRPO，SAPO 以极低带宽使消费级 GPU 也能有效参与大规模强化学习优化。

通过 RL Swarm 与 SAPO，Gensyn 证明了强化学习（尤其是后训练阶段的 RLVR）天然适配去中心化架构 — — 因为其更依赖于大规模、多样化的探索（Rollout），而非高频参数同步。结合 PoL 与 Verde 的验证体系，Gensyn 为万亿级参数模型的训练提供了一条不再依赖单一科技巨头的替代路径：一个由全球数百万异构 GPU 组成的、自我演化的超级智能网络。

Nous Research：可验证强化学习环境Atropos

Nous Research在构建一套 去中心化、可自我进化的认知基础设施。其核心组件 — — Hermes、Atropos、DisTrO、Psyche 与 World Sim被组织成一个持续闭环的智能演化系统。不同于传统“预训练 — 后训练 — 推理”线性流程，Nous 采用 DPO、GRPO、拒绝采样等强化学习技术，将数据生成、验证、学习与推理统一为连续反馈回路，打造持续自我改进的闭环 AI 生态。

Nous Research 组件总览

模型层：Hermes 与推理能力的演进

Hermes 系列是 Nous Research 面向用户的主要模型接口，其演进清晰展示了行业从传统 SFT/DPO 对齐向推理强化学习（Reasoning RL）迁移的路径：

Hermes 1–3：指令对齐与早期代理能力：Hermes 1–3 依靠低成本 DPO 完成稳健指令对齐，并在 Hermes 3 借助合成数据与首次引入的 Atropos 验证机制。
Hermes 4 / DeepHermes：通过思维链将 System-2 式慢思考写入权重，以 Test-Time Scaling 提升数学与代码性能，并依赖“拒绝采样 + Atropos 验证”构建高纯度推理数据。
DeepHermes 进一步采用 GRPO 替代难以分布式落地的 PPO，使推理 RL 能在 Psyche 去中心化 GPU 网络上运行，为开源推理 RL 的可扩展化奠定工程基础。

Atropos：可验证奖励驱动的强化学习环境

Atropos 是 Nous RL 体系的真正枢纽。它将提示、工具调用、代码执行和多轮交互封装成标准化 RL 环境，可直接验证输出是否正确，从而提供确定性奖励信号，替代昂贵且不可扩展的人类标注。更重要的是，在去中心化训练网络 Psyche 中，Atropos 充当“裁判”，用于验证节点是否真实提升策略，支持可审计的 Proof-of-Learning，从根本上解决分布式 RL 中的奖励可信性问题。

DisTrO 与 Psyche：去中心化强化学习的优化器层

传统 RLF（RLHF/RLAIF）训练依赖中心化高带宽集群，这是开源无法复制的核心壁垒。DisTrO 通过动量解耦与梯度压缩，将 RL 的通信成本降低几个数量级，使训练能够在互联网带宽上运行；Psyche 则将这一训练机制部署在链上网络，使节点可以在本地完成推理、验证、奖励评估与权重更新，形成完整的 RL 闭环。

在 Nous 的体系中， Atropos 验证思维链；DisTrO 压缩训练通信；Psyche 运行 RL 循环；World Sim 提供复杂环境；Forge 采集真实推理；Hermes 将所有学习写入权重。强化学习不仅是一个训练阶段，而是 Nous 架构中连接数据、环境、模型与基础设施的核心协议，让 Hermes成为一个能在开源算力网络上持续自我改进的活体系统。

Gradient Network：强化学习架构Echo

Gradient Network 核心愿景是通过“开放智能协议栈”（Open Intelligence Stack）重构 AI 的计算范式。Gradient 的技术栈由一组可独立演化、又异构协同的核心协议组成。其体系从底层通信到上层智能协作依次包括：Parallax（分布式推理）、Echo（去中心化 RL 训练）、Lattica（P2P 网络）、SEDM / Massgen / Symphony / CUAHarm（记忆、协作、安全）、VeriLLM（可信验证）、Mirage（高保真仿真），共同构成持续演化的去中心化智能基础设施。

Echo — 强化学习训练架构

Echo 是 Gradient 的强化学习框架，其核心设计理念在于解耦强化学习中的训练、推理与数据（奖励）路径，使 Rollout 生成、策略优化与奖励评估能够在异构环境中独立扩展与调度。在由推理侧与训练侧节点组成的异构网络中协同运行，以轻量同步机制在广域异构环境中维持训练稳定性，有效缓解传统 DeepSpeed RLHF / VERL 中推理与训练混跑导致的 SPMD 失效与 GPU 利用率瓶颈。

Echo 采用“推理–训练双群架构”实现算力利用最大化，双群各自独立运行，互不阻塞：

最大化采样吞吐：推理群 Inference Swarm 由消费级 GPU 与边缘设备组成，通过 Parallax 以 pipeline‐parallel 构建高吞吐采样器，专注于轨迹生成；
最大化梯度算力：训练群Training Swarm 由可运行于中心化集群或全球多地的消费级 GPU 网络，负责梯度更新、参数同步与 LoRA 微调，专注于学习过程。

为维持策略与数据的一致性，Echo 提供 顺序（Sequential） 与异步（Asynchronous） 两类轻量级同步协议，实现策略权重与轨迹的双向一致性管理：

顺序拉取（Pull）模式｜精度优先 ：训练侧在拉取新轨迹前强制推理节点刷新模型版本，从而确保轨迹新鲜度，适合对策略陈旧高度敏感的任务；
异步推拉（Push–Pull）模式｜效率优先：推理侧持续生成带版本标签的轨迹，训练侧依自身节奏消费，协调器监控版本偏差并触发权重刷新，最大化设备利用率。

在底层，Echo 构建于 Parallax（低带宽环境下的异构推理）与轻量化分布式训练组件（如 VERL)之上，依赖 LoRA 降低跨节点同步成本，使强化学习可在全球异构网络上稳定运行。

Grail：Bittensor 生态的强化学习

Bittensor 通过其独特的 Yuma 共识机制，构建了一个巨大的、稀疏的、非平稳的奖励函数网络。

Bittensor生态中的Covenant AI 则通过 SN3 Templar、SN39 Basilica 与 SN81 Grail 构建了从预训练到 RL 后训练的垂直一体化流水线。其中，SN3 Templar 负责基础模型的预训练，SN39 Basilica 提供分布式算力市场，SN81 Grail 则作为面向 RL 后训练的“可验证推理层”，承载 RLHF / RLAIF 的核心流程，完成从基础模型到对齐策略的闭环优化。

GRAIL目标是以密码学方式证明每条强化学习 rollout 的真实性与模型身份绑定，确保 RLHF 能够在无需信任的环境中被安全执行。协议通过三层机制建立可信链条：

确定性挑战生成：利用 drand 随机信标与区块哈希生成不可预测但可复现的挑战任务（如 SAT、GSM8K），杜绝预计算作弊；
通过 PRF 索引采样与 sketch commitments，使验证者以极低成本抽检 token-level logprob 与推理链，确认 rollout 确由声明模型生成；
模型身份绑定：将推理过程与模型权重指纹及 token 分布的结构性签名绑定，确保替换模型或结果重放都会被立即识别。由此，为 RL 中推理轨迹（rollout）提供了真实性根基。

在此机制上，Grail 子网实现了 GRPO 风格的可验证后训练流程：矿工为同一题目生成多条推理路径，验证者依据正确性、推理链质量与 SAT 满足度评分，并将归一化结果写入链上，作为 TAO 权重。公开实验显示，该框架已将 Qwen2.5–1.5B 的 MATH 准确率从 12.7% 提升至 47.6%，证明其既能防作弊，也能显著强化模型能力。在 Covenant AI 的训练栈中，Grail 是去中心化 RLVR/RLAIF 的信任与执行基石，目前尚未正式主网上线。

Fraction AI：基于竞争的强化学习RLFC

Fraction AI 的架构明确围绕 竞争强化学习（Reinforcement Learning from Competition, RLFC） 和游戏化数据标注构建，将传统 RLHF 的静态奖励与人工标注替换为开放、动态的竞争环境。代理在不同 Spaces 中对抗，其相对排名与 AI 法官评分共同构成实时奖励，使对齐过程演变为持续在线的多智能体博弈系统。

传统RLHF与Fraction AI的RLFC之间的核心差异：

RLFC 的核心价值在于奖励不再来自单一模型，而来自不断演化的对手与评估者，避免奖励模型被利用，并通过策略多样性防止生态陷入局部最优。Spaces 的结构决定博弈性质（零和或正和），在对抗与协作中推动复杂行为涌现。

在系统架构上，Fraction AI 将训练过程拆解为四个关键组件：

Agents：基于开源 LLM 的轻量策略单元，通过 QLoRA 以差分权重扩展，低成本更新；
Spaces：隔离的任务域环境，代理付费进入并以胜负获得奖励；
AI Judges：以 RLAIF 构建的即时奖励层，提供可扩展、去中心化的评估；
Proof-of-Learning：将策略更新绑定到具体竞争结果，确保训练过程可验证、防作弊。

Fraction AI 的本质是构建了一个人机协同的进化引擎”。用户作为策略层的“元优化者” (Meta-optimizer)，通过提示工程（Prompt Engineering）和超参配置引导探索方向；而代理在微观的竞争中自动生成海量的高质量偏好数据对 (Preference Pairs)。这种模式让数据标注通过 “去信任化微调” (Trustless Fine-tuning) 实现了商业闭环。

强化学习 Web3项目架构比较

五. 总结与展望：强化学习 × Web3 的路径与机会

基于对上述前沿项目的解构分析，我们观察到：尽管各团队的切入点（算法、工程或市场）各异，但当强化学习（RL）与 Web3 结合时，其底层架构逻辑皆收敛为一个高度一致的“解耦-验证-激励”范式。这不仅是技术上的巧合，更是去中心化网络适配强化学习独特属性的必然结果。

强化学习通用架构特征：解决核心的物理限制与信任问题

推训物理分离 (Decoupling of Rollouts & Learning) — — 默认计算拓扑

通信稀疏、可并行的 Rollout 外包给全球消费级 GPU，高带宽的参数更新集中于少量训练节点，从 Prime Intellect 的异步 Actor–Learner 到 Gradient Echo 的双群架构皆如此。

验证驱动的信任层 (Verification-Driven Trust) — — 基础设施化

在无需许可的网络中，计算真实性必须通过数学与机制设计强制保障，代表实现包括 Gensyn 的 PoL、Prime Intellect 的 TOPLOC 与 Grail 的密码学验证。

代币化的激励闭环 (Tokenized Incentive Loop) — — 市场自我调节

算力供给、数据生成、验证排序与奖励分配形成闭环，通过奖励驱动参与、通过 Slash 抑制作弊，使网络在开放环境中依然保持稳定与持续演进。

差异化技术路径：一致架构下的不同“突破点”

尽管架构趋同，但各项目根据自身基因选择了不同的技术护城河：

算法突破派 (Nous Research)：试图从数学底层解决分布式训练的根本矛盾（带宽瓶颈）。其 DisTrO 优化器旨在将梯度通信量压缩数千倍，目标是让家庭宽带也能跑得动大模型训练，这是对物理限制的“降维打击”。
系统工程派 (Prime Intellect, Gensyn, Gradient)：侧重于构建下一代的“AI 运行时系统”。Prime Intellect的 ShardCast 和 Gradient 的 Parallax 都是为了在现有的网络条件下，通过极致的工程手段压榨出最高的异构集群效率。
市场博弈派 (Bittensor, Fraction AI)：专注奖励函数（Reward Function）的设计。通过设计精妙的评分机制，引导矿工自发寻找最优策略，来加速智能涌现。

优势、挑战与终局展望

在强化学习与 Web3 结合的范式下，系统级优势首先体现在 成本结构与治理结构的重写。

成本重塑：RL 后训练（Post-training）对采样（Rollout）的需求是无限的，Web3 能以极低成本调动全球长尾算力，这是中心化云厂商难以比拟的成本优势。
主权对齐 (Sovereign Alignment)：打破大厂对 AI 价值观（Alignment）的垄断，社区可以通过 Token 投票决定模型“什么是好的回答”，实现 AI 治理的民主化。

与此同时，这一体系也面临两大结构性约束。

带宽墙 (Bandwidth Wall)：尽管有 DisTrO 等创新，物理延迟仍限制了超大参数模型（70B+）的全量训练，目前 Web3 AI 更多局限于微调和推理。
古德哈特定律 (Reward Hacking)：在高度激励的网络中，矿工极易“过拟合”奖励规则（刷分）而非提升真实智能。设计防作弊的鲁棒奖励函数是永恒的博弈。
恶意拜占庭式节点攻击(BYZANTINE worker)：通过对训练信号的主动操纵与投毒破坏模型收敛。核心不在于持续设计防作弊的奖励函数，而在于构建具备对抗性鲁棒性的机制。

强化学习与 Web3 的结合，本质是在重写“智能是如何被生产、对齐并分配价值”的机制。其演进路径可概括为三条互补方向：

去中心化推训网络：从算力矿机到策略网络，将并行且可验证的 Rollout 外包给全球长尾 GPU，短期聚焦可验证推理市场，中期演化为按任务聚类的强化学习子网；
偏好与奖励的资产化：从标注劳工到数据股权。实现偏好与奖励的资产化，将高质量反馈与 Reward Model 变为可治理、可分配的数据资产，从“标注劳工”升级为“数据股权”
垂直领域的“小而美”进化：在结果可验证、收益可量化的垂直场景中孕育小而强的专用 RL Agents，如 DeFi 策略执行、代码生成，使策略改进与价值捕获直接绑定并有望跑赢通用闭源模型。

总体来看，强化学习 × Web3 的真正机会不在于复制一个去中心化版 OpenAI，而在于重写“智能生产关系”：让训练执行成为开放算力市场，让奖励与偏好成为可治理的链上资产，让智能带来的价值不再集中于平台，而在训练者、对齐者与使用者之间重新分配。

免责声明：本文在创作过程中借助了 ChatGPT-5 与Gemini 3的 AI 工具辅助完成，作者已尽力校对并确保信息真实与准确，但仍难免存在疏漏，敬请谅解。需特别提示的是，加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流，不构成任何投资建议，亦不应视为任何代币的买卖推荐。

Machine Economic Order: A Full-Stack Pathway to Agentic Commerce

BroadNotes by 0xjacobzhao — Tue, 16 Dec 2025 06:22:21 GMT

Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao

This independent research report is supported by IOSG Ventures. The research and writing process was inspired by related work from Raghav Agarwal (LongHash) and Jay Yu (Pantera). Thanks to Lex Sokolin @ Generative Ventures , Jordan@AIsa, Ivy @PodOur2Cents for their valuable suggestions on this article. Feedback was also solicited from project teams such as Nevermined, Skyfire, Virtuals Protocol, AIsa, Heurist, AEON during the writing process. This article strives for objective and accurate content, but some viewpoints involve subjective judgment and may inevitably contain deviations. Readers’ understanding is appreciated.

Agentic Commerce refers to a full-process commercial system where AI agents autonomously complete service discovery, credibility judgment, order generation, payment authorization, and final settlement. It no longer relies on step-by-step human operation or information input, but rather involves agents automatically collaborating, placing orders, paying, and fulfilling in a cross-platform and cross-system environment, thereby forming a commercial closed loop of autonomous execution between machines (M2M Commerce).

In the crypto ecosystem, the most practically valuable applications today are concentrated in stablecoin payments and DeFi. Therefore, as AI and Crypto converge, two high-value development paths are emerging:

Short term: AgentFi, built on today’s mature DeFi protocols
Mid to long term: Agent Payment, built around stablecoin settlement and progressively standardized by protocols such as ACP, AP2, x402, and ERC-8004

Agentic Commerce is difficult to scale quickly in the short term due to factors such as protocol maturity, regulatory differences, and merchant/user acceptance. However, from a long-term perspective, payment is the underlying anchor of all commercial closed loops, making Agentic Commerce the most valuable in the long run.

I. Agentic Commerce Payment Scenarios

In the Agentic Commerce system, the real-world merchant network is the largest value scenario. Regardless of how AI Agents evolve, the traditional fiat payment system (Stripe, Visa, Mastercard, bank transfers) and the rapidly growing stablecoin system (USDC, x402) will coexist for a long time, jointly constituting the base of Agentic Commerce.

Traditional Fiat Payment vs. Stablecoin Payment

Real-world merchants — from e-commerce, subscriptions, and SaaS to travel, paid content, and enterprise procurement — carry trillion-dollar demand and are also the core value source for AI Agents to automatically compare prices, renew subscriptions, and procure. In the short term, mainstream consumption and enterprise procurement will still be dominated by the traditional fiat payment system for a long time.

The core obstacle to the scaling of stablecoins in real-world commerce is not just technology, but regulation (KYC/AML, tax, consumer protection), merchant accounting (stablecoins are non-legal tender), and the lack of dispute resolution mechanisms caused by irreversible payments. Due to these structural limitations, it is difficult for stablecoins to enter high-regulation industries such as healthcare, aviation, e-commerce, government, and utilities in the short term. Their implementation will mainly focus on digital content, cross-border payments, Web3 native services, and machine economy (M2M/IoT/Agent) scenarios where regulatory pressure is lower or are native on-chain — this is precisely the opportunity window for Web3-native Agentic Commerce to achieve scale breakthroughs first.

However, regulatory institutionalization is advancing rapidly in 2025: the US stablecoin bill has achieved bipartisan consensus, Hong Kong and Singapore have implemented stablecoin licensing frameworks, the EU MiCA has officially come into effect, Stripe supports USDC, and PayPal has launched PYUSD. The clarity of the regulatory structure means that stablecoins are being accepted by the mainstream financial system, opening up policy space for future cross-border settlement, B2B procurement, and the machine economy.

Best Scenario Matching for Agentic Commerce

The core of Agentic Commerce is not to let one payment rail replace another, but to hand over the execution subject of “order — authorization — payment” to AI Agents, allowing the traditional fiat payment system (AP2, authorization credentials, identity compliance) and the stablecoin system (x402, CCTP, smart contract settlement) to leverage their respective advantages. It is neither a zero-sum competition between fiat and stablecoins nor a substitution narrative of a single rail, but a structural opportunity to expand the capabilities of both: fiat payments continue to support human commerce, while stablecoin payments accelerate machine-native and on-chain native scenarios. The two complement and coexist, becoming the twin engines of the agent economy.

II. Agentic Commerce Protocol Standards

The protocol stack of Agentic Commerce consists of six layers, forming a complete machine commerce link from “capability discovery” to “payment delivery”. A2A Catalog and MCP Registry are responsible for capability discovery, ERC-8004 provides on-chain verifiable identity and reputation; ACP and AP2 undertake structured ordering and authorization instructions respectively; the payment layer is composed of traditional fiat rails (AP2) and stablecoin rails (x402) in parallel; the delivery layer currently has no unified standard.

Discovery Layer: Solves “How Agents discover and understand callable services”. The AI side builds standardized capability catalogs through A2A Catalog and MCP Registry; Web3 relies on ERC-8004 to provide addressable identity guidance. This layer is the entrance to the entire protocol stack.
Trust Layer: Answers “Is the other party credible”. There is no universal standard on the AI side yet. Web3 builds a unified framework for verifiable identity, reputation, and execution records through ERC-8004, which is a key advantage of Web3.
Ordering Layer: Responsible for “How orders are expressed and verified”. ACP (OpenAI × Stripe) provides a structured description of goods, prices, and settlement terms to ensure merchants can fulfill contracts. Since it is difficult to express real-world commercial contracts on-chain, this layer is basically dominated by Web2.
Authorization Layer: Handles “Whether the Agent has obtained legal user authorization”. AP2 binds intent, confirmation, and payment authorization to the real identity system through verifiable credentials. Web3 signatures do not yet have legal effect, so they cannot bear the contract and compliance responsibilities of this layer.
Payment Layer: Decides “Which rail completes the payment”. AP2 covers traditional payment networks such as cards and banks; x402 provides native API payment interfaces for stablecoins, enabling assets like USDC to be embedded in automated calls. The two types of rails form functional complementarity here.
Fulfillment Layer: Answers “How to safely deliver content after payment is completed”. Currently, there is no unified protocol: the real world relies on merchant systems to complete delivery, and Web3’s encrypted access control has not yet formed a cross-ecosystem standard. This layer is still the largest blank in the protocol stack and is most likely to incubate the next generation of infrastructure protocols.

III. Agentic Commerce Core Protocols

Focusing on the five key links of service discovery, trust judgment, structured ordering, payment authorization, and final settlement in Agentic Commerce, institutions such as Google, Anthropic, OpenAI, Stripe, Ethereum, and Coinbase have all proposed underlying protocols in corresponding links, jointly building the core protocol stack of the next generation Agentic Commerce.

Agent-to-Agent (A2A) — Agent Interoperability Protocol (Google)

A2A is an open-source protocol initiated by Google and donated to the Linux Foundation. It aims to provide unified communication and collaboration standards for AI Agents built by different vendors and frameworks. Based on HTTP + JSON-RPC, A2A implements secure, structured message and task exchange, enabling Agents to conduct multi-turn dialogue, collaborative decision-making, task decomposition, and state management in a native way. Its core goal is to build an “Internet of Agents”, allowing any A2A-compatible Agent to be automatically discovered, called, and combined, thereby forming a cross-platform, cross-organization distributed Agent network.

Model Context Protocol (MCP) — Unified Tool Data Access Protocol (Anthropic)

MCP launched by Anthropic, is an open protocol connecting LLM / Agents with external systems, focusing on unified tool and data access interfaces. It abstracts databases, file systems, remote APIs, and proprietary tools into standardized resources, enabling Agents to access external capabilities securely, controllably, and auditably. MCP’s design emphasizes low integration costs and high scalability: developers only need to connect once to let the Agent use the entire tool ecosystem. Currently, MCP has been adopted by many leading AI vendors and has become the de facto standard for agent-tool interaction.

MCP focuses on “How Agents use tools” — providing models with unified and secure external resource access capabilities (such as databases, APIs, file systems, etc.), thereby standardizing agent-tool / agent-data interaction methods.
A2A solves “How Agents collaborate with other Agents” — establishing native communication standards for cross-vendor, cross-framework agents, supporting multi-turn dialogue, task decomposition, state management, and long-lifecycle execution. It is the basic interoperability layer between agents.

Agentic Commerce Protocol (ACP) — Ordering and Checkout Protocol (OpenAI × Stripe)

ACP (Agentic Commerce Protocol) is an open ordering standard (Apache 2.0) proposed by OpenAI and Stripe. It establishes a structured ordering process that can be directly understood by machines for Buyer — AI Agent — Merchant. The protocol covers product information, price and term verification, settlement logic, and payment credential transmission, enabling AI to safely initiate purchases on behalf of users without becoming a merchant itself.

Its core design is: AI calls the merchant’s checkout interface in a standardized way, while the merchant retains full commercial and legal control. ACP enables merchants to enter the AI shopping ecosystem without transforming their systems by using structured orders (JSON Schema / OpenAPI), secure payment tokens (Stripe Shared Payment Token), compatibility with existing e-commerce backends, and supporting REST and MCP publishing capabilities. Currently, ACP has been used for ChatGPT Instant Checkout, becoming an early deployable payment infrastructure.

Agent Payments Protocol (AP2) — Digital Authorization and Payment Instruction Protocol (Google)

AP2 is an open standard jointly launched by Google and multiple payment networks and technology companies. It aims to establish a unified, compliant, and auditable process for AI Agent-led payments. It binds the user’s payment intent, authorization scope, and compliance identity through cryptographically signed digital authorization credentials, providing merchants, payment institutions, and regulators with verifiable evidence of “who is spending money for whom”.

AP2 takes “Payment-Agnostic” as its design principle, supporting credit cards, bank transfers, real-time payments, and accessing stablecoin and other crypto payment rails through extensions like x402. In the entire Agentic Commerce protocol stack, AP2 is not responsible for specific goods and ordering details, but provides a universal Agent payment authorization framework for various payment channels.

ERC-8004 — On-chain Agent Identity / Reputation / Verification Standard (Ethereum)

ERC-8004 is an Ethereum standard jointly proposed by MetaMask, Ethereum Foundation, Google, and Coinbase. It aims to build a cross-platform, verifiable, trustless identity and reputation system for AI Agents. The protocol consists of three on-chain parts:

Identity Registry: Mints a chain identity similar to NFT for each Agent, which can link cross-platform information such as MCP / A2A endpoints, ENS/DID, wallets, etc.
Reputation Registry: Standardizes recording of scores, feedback, and behavioral signals, making the Agent’s historical performance auditable, aggregatable, and composable.
Validation Registry: Supports verification mechanisms such as stake re-execution, zkML, TEE, providing verifiable execution records for high-value tasks.

Through ERC-8004, the Agent’s identity, reputation, and behavior are preserved on-chain, forming a cross-platform discoverable, tamper-proof, and verifiable trust base, which is an important infrastructure for Web3 to build an open and trusted AI economy. ERC-8004 is in the Review stage, meaning the standard is basically stable and feasible, but is still soliciting broad community opinion and has not been finalized.

x402 — Stablecoin Native API Payment Rail (Coinbase)

x402 is an open payment standard (Apache-2.0) proposed by Coinbase. It turns the long-idle HTTP 402 Payment Required into a programmable on-chain payment handshake mechanism, allowing APIs and AI Agents to achieve accountless, frictionless, pay-per-use on-chain settlement without accounts, credit cards, or API Keys.

HTTP 402 Payment Flow. Source: Jay Yu@Pantera Capital

Core Mechanism: The x402 protocol revives the HTTP 402 status code left over from the early internet. Its workflow is:

Request & Negotiation: Client (Agent) initiates request -> Server returns 402 status code and payment parameters (e.g., amount, receiving address).
Autonomous Payment: Agent locally signs the transaction and broadcasts it (usually using stablecoins like USDC), without human intervention.
Verification & Delivery: After the server or third-party “Facilitator” verifies the on-chain transaction, resources are released instantly.

x402 introduces the Facilitator role as middleware connecting Web2 APIs and the Web3 settlement layer. The Facilitator is responsible for handling complex on-chain verification and settlement logic, allowing traditional developers to monetize APIs with minimal code. The server side does not need to run nodes, manage signatures, or broadcast transactions; it only needs to rely on the interface provided by the Facilitator to complete on-chain payment processing. Currently, the most mature Facilitator implementation is provided by the Coinbase Developer Platform.

The technical advantages of x402 are: supporting on-chain micropayments as low as 1 cent, breaking the limitation of traditional payment gateways unable to handle high-frequency small-amount calls in AI scenarios; completely removing accounts, KYC, and API Keys, enabling AI to autonomously complete M2M payment closed loops; and achieving gasless USDC authorized payments through EIP-3009, natively compatible with Base and Solana, possessing multi-chain scalability.

Based on the introduction of the core protocol stack of Agentic Commerce, the following table summarizes the positioning, core capabilities, main limitations, and maturity assessment of the protocols at each level, providing a clear structural perspective for building a cross-platform, executable, and payable agent economy.

IV. Web3 Agentic Commerce Ecosystem Representative Projects

Currently, the Web3 ecosystem of Agentic Commerce can be divided into three layers:

Business Payment Systems Layer (L3): Includes projects like Skyfire, Payman, Catena Labs, Nevermined, providing payment encapsulation, SDK integration, quota and permission governance, human approval, and compliance access. They connect to traditional financial rails (banks, card organizations, PSP, KYC/KYB) to varying degrees, building a bridge between payment business and the machine economy.
Native Payment Protocol Layer (L2): Consists of protocols like x402, Virtual ACP and their ecosystem projects. Responsible for charge requests, payment verification, and on-chain settlement. This is the core that truly achieves automated, end-to-end clearing in the Agent economy. x402 relies completely on no banks, card organizations, or payment service providers, providing on-chain native M2M/A2A payment capabilities.
Infrastructure Layer (L1): Includes Ethereum, Base, Solana, and Kite AI, providing the trusted technical stack base for payment and identity systems, such as on-chain execution environments, key systems, MPC/AA, and permission Runtimes.

L3 — Skyfire: Identity and Payment Credentials for AI Agents

Skyfire takes KYA + Pay as its core, abstracting “Identity Verification + Payment Authorization” into JWT credentials usable by AI, providing verifiable automated access and deduction capabilities for websites, APIs, and MCP services. The system automatically generates Buyer/Seller Agents and custodial wallets for users, supporting top-ups via cards, banks, and USDC.

At the system level, Skyfire generates Buyer/Seller Agents and custodial wallets for each user. Its biggest advantage is full compatibility with Web2 (JWT/JWKS, WAF, API Gateway can be used directly), providing “identity-bearing automated paid access” for content sites, data APIs, and tool SaaS.

Skyfire is a realistically usable Agent Payment middle layer, but identity and asset custody are centralized solutions.

L3 — Payman: AI Native Fund Authority Risk Control

Payman provides four capabilities: Wallet, Payee, Policy, Approval, building a governable and auditable “Fund Authority Layer” for AI. AI can execute real payments, but all fund actions must meet quotas, policies, and approval rules set by users. Core interaction is done through the payman.ask() natural language interface, where the system is responsible for intent parsing, policy verification, and payment execution.

Payman’s key value lies in: “AI can move money, but never oversteps authority.” It migrates enterprise-level fund governance to the AI environment: automated payroll, reimbursement, vendor payments, bulk transfers, etc., can all be completed within clearly defined permission boundaries. Payman is suitable for internal financial automation of enterprises and teams (salary, reimbursement, vendor payment, etc.), positioned as a Controlled Fund Governance Layer, and does not attempt to build an open Agent-to-Agent payment protocol.

L3 — Catena Labs: Agent Identity/Payment Standard

Catena uses AI-Native financial institutions (custody, clearing, risk control, KYA) as the commercial layer and ACK (Agent Commerce Kit) as the standard layer to build the Agent’s unified identity protocol (ACK-ID) and Agent-native payment protocol (ACK-Pay). The goal is to fill the missing verifiable identity, authorization chain, and automated payment standards in the machine economy.

ACK-ID establishes the Agent’s ownership chain and authorization chain based on DID/VC; ACK-Pay defines payment request and verifiable receipt formats decoupled from underlying settlement networks (USDC, Bank, Arc). Catena emphasizes long-term cross-ecosystem interoperability, and its role is closer to the “TLS/EMV layer of the Agent economy”, with strong standardization and a clear vision.

L3 — Nevermined: Metering, Billing and Micropayment Settlement

Nevermined focuses on the AI usage-based economic model, providing Access Control, Metering, Credits System, and Usage Logs for automated metering, pay-per-use, revenue sharing, and auditing. Users can top up credits via Stripe or USDC, and the system automatically verifies usage, deducts fees, and generates auditable logs for each API call.

Its core value lies in supporting sub-cent real-time micropayments and Agent-to-Agent automated settlement, allowing data purchase, API calls, workflow scheduling, etc., to run in a “pay-per-call” manner. Nevermined does not build a new payment rail, but builds a metering/billing layer on top of payment: promoting AI SaaS commercialization in the short term, supporting A2A marketplace in the medium term, and potentially becoming the micropayment fabric of the machine economy in the long term.

Skyfire, Payman, Catena Labs, and Nevermined belong to the business payment layer and all need to connect to banks, card organizations, PSPs, and KYC/KYB to varying degrees. But their real value is not in “accessing fiat”, but in solving machine-native needs that traditional finance cannot cover — identity mapping, permission governance, programmatic risk control, and pay-per-use.

Skyfire (Payment Gateway): Provides “Identity + Auto-deduction” for Websites/APIs (On-chain identity mapping to Web2 identity).
Payman (Financial Governance): Policy, quota, permission, and approval for internal enterprise use (AI can spend money but not overstep).
Catena Labs (Financial Infrastructure): Combines with banking system, building (AI Compliance Bank) through KYA, custody, and clearing services.
Nevermined (Cashier): Does metering and billing on top of payment; payment relies on Stripe/USDC.

In contrast, x402 is at a lower level and is the only native on-chain payment protocol that does not rely on banks, card organizations, or PSPs. It can directly complete on-chain deduction and settlement via the 402 workflow. Upper-layer systems like Skyfire, Payman, and Nevermined can call x402 as a settlement rail, thereby providing Agents with a truly M2M / A2A automated native payment closed loop.

L2 — x402 Ecosystem: From Client to On-chain Settlement

The x402 native payment ecosystem can be divided into four levels: Client, Server, Payment Execution Layer (Facilitators), and Blockchain Settlement Layer. The Client is responsible for allowing Agents or Apps to initiate payment requests; the Server provides data, reasoning, or storage API services to Agents on a per-use basis; the Payment Execution Layer completes on-chain deduction, verification, and settlement, serving as the core execution engine of the entire process; the Blockchain Settlement Layer undertakes the final token deduction and on-chain confirmation, realizing tamper-proof payment finality.

x402 Payment Flow Source: x402 Whitepaper

Client-Side Integrations / The Payers: Enable Agents or Apps to initiate x402 payment requests, the “starting point” of the entire payment process. Representative projects:
thirdweb Client SDK: The most commonly used x402 client standard in the ecosystem, actively maintained, multi-chain support, default tool for developers to integrate x402.
Nuwa AI: Enables AI to directly pay for x402 services without coding, representative project of “Agent Payment Entrance”.
Others like Axios/Fetch, Mogami Java SDK, Tweazy are early clients.
Current status: Existing clients are still in the “SDK Era”, essentially developer tools. More advanced forms like Browser/OS clients, Robot/IoT clients, or Enterprise systems managing multi-wallet/multi-Facilitator have not yet appeared.
Services / Endpoints / The Sellers: Sell data, storage, or reasoning services to Agents on a per-use basis. Representative projects:
AIsa: provides payment and settlement infrastructure for real AI Agents to access data, content, compute, and third-party services on a per-call, per-token, or usage basis, and is currently the top project by x402 request volume.
Firecrawl: Web parsing and structured crawler entrance most frequently consumed by AI Agents.
Pinata: Mainstream Web3 storage infrastructure, x402 covers real underlying storage costs, not lightweight API.
Gloria AI: Provides high-frequency real-time news and structured market signals, intelligence source for Trading and Analytical Agents.
AEON: Extends x402 + USDC to online & offline merchant acquiring in Southeast Asia / LatAm / Africa. Reaching up to 50 million merchants.
Neynar: Farcaster social graph infrastructure, opening social data to Agents via x402.
Current status: Server side is concentrated in crawler/storage/news APIs. Critical layers like financial transaction execution APIs, ad delivery APIs, Web2 SaaS gateways, or APIs executing real-world tasks are almost undeveloped.
Facilitators / The Processors: Complete on-chain deduction, verification, and settlement. The core execution engine of x402. Representative projects:
Coinbase Facilitator (CDP): Enterprise-grade trusted executor, Base mainnet zero fees + built-in OFAC/KYT, strongest choice for production environment.
PayAI Facilitator: Execution layer project with widest multi-chain coverage and fastest growth (Solana, Polygon, Base, Avalanche, etc.), highest usage multi-chain Facilitator in the ecosystem.
Daydreams: Project combining payment execution with LLM reasoning routing, currently the fastest-growing “AI Reasoning Payment Executor”, becoming the third pole in the x402 ecosystem.
Others: According to x402scan data, there are long-tail Facilitators/Routers like Dexter, Virtuals Protocol, OpenX402, CodeNut, Heurist, Thirdweb, etc., but volume is significantly lower than the top three.
Blockchain Settlement Layer: The final destination of the x402 payment workflow. Responsible for actual token deduction and on-chain confirmation.
Base: Promoted by CDP official Facilitator, USDC native, stable fees, currently the settlement network with the largest transaction volume and number of sellers.
Solana: Key support from multi-chain Facilitators like PayAI, fastest growing in high-frequency reasoning and real-time API scenarios due to high throughput and low latency.
Trend: The chain itself doesn’t participate in payment logic. With more Facilitators expanding, x402’s settlement layer will show a stronger multi-chain trend.

In the x402 payment system, the Facilitator is the only role that truly executes on-chain payments and is closest to “protocol-level revenue”: responsible for verifying payment authorization, submitting and tracking on-chain transactions, generating auditable settlement proofs, and handling replay, timeout, multi-chain compatibility, and basic compliance checks. Unlike Client SDKs (Payers) and API Servers (Sellers) which only handle HTTP requests, it is the final clearing outlet for all M2M/A2A transactions, controlling traffic entrance and settlement charging rights, thus being at the core of value capture in the Agent economy.

However, reality is that most projects are still in testnet or small-scale Demo stages, essentially lightweight “Payment Executors”, lacking moats in key capabilities like identity, billing, risk control, and multi-chain steady-state handling, showing obvious low-threshold and high-homogeneity characteristics. As the ecosystem matures, facilitators backed by Coinbase, with strong advantages in stability and compliance, do enjoy a clear early lead. However, as CDP facilitators begin charging fees while others may remain free or experiment with alternative monetization models, the overall market structure and share distribution still have significant room to evolve. In the long run, x402 is still an interface layer and cannot carry core value. What truly possesses sustainable competitiveness are comprehensive platforms capable of building identity, billing, risk control, and compliance systems on top of settlement capabilities.

L2 — Virtual Agent Commerce Protocol

Virtual’s Agent Commerce Protocol (ACP) provides a common commercial interaction standard for autonomous AI. Through a four-stage process of Request → Negotiation → Transaction → Evaluation, it enables independent agents to request services, negotiate terms, complete transactions, and accept quality assessments in a secure and verifiable manner. ACP uses blockchain as a trusted execution layer to ensure the interaction process is auditable and tamper-proof, and establishes an incentive-driven reputation system by introducing Evaluator Agents, allowing heterogeneous and independent professional Agents to form an “autonomous commercial body” and conduct sustainable economic activities without central coordination. Currently, ACP has moved beyond the purely experimental stage. Adoption through the Virtuals ecosystem suggests early network effects, looking more than “multi-agent commercial interaction standards”.

L1 Infrastructure Layer — Emerging Agent Native Payment Chain

Mainstream general public chains like Ethereum, Base (EVM), and Solana provide the most core execution environment, account system, state machine, security, and settlement foundation for Agents, possessing mature account models, stablecoin ecosystems, and broad developer bases.

Kite AI is a representative “Agent Native L1” infrastructure, specifically designing the underlying execution environment for Agent payment, identity, and permission. Its core is based on the SPACE framework (Stablecoin native, Programmable constraints, Agent-first certification, Compliance audit, Economically viable micropayments), and implements fine-grained risk isolation through a three-layer key system of Root→Agent→Session. Combined with optimized state channels to build an “Agent Native Payment Railway”, it suppresses costs to $0.000001 and latency to the hundred-millisecond level, making API-level high-frequency micropayments feasible. As a general execution layer, Kite is upward compatible with x402, Google A2A, Anthropic MCP, and downward compatible with OAuth 2.1, aiming to become a unified Agent payment and identity base connecting Web2 and Web3.

AIsaNet integrates x402 and L402 (the Lightning Network–based 402 payment protocol standard developed by Lightning Labs) as a micro-payment and settlement layer for AI Agents, supporting high-frequency transactions, cross-protocol call coordination, settlement path selection, and transaction routing, enabling Agents to perform cross-service, cross-chain automated payments without understanding the underlying complexity.

V. Summary and Outlook: From Payment Protocols to Reconstruction of Machine Economic Order

Agentic Commerce is the establishment of a completely new economic order dominated by machines. It is not as simple as “AI placing orders automatically”, but a reconstruction of the entire cross-subject link: how services are discovered, how credibility is established, how orders are expressed, how permissions are authorized, how value is cleared, and who bears disputes. The emergence of A2A, MCP, ACP, AP2, ERC-8004, and x402 standardizes the “commercial closed loop between machines”.

Along this evolutionary path, future payment infrastructure will diverge into two parallel tracks: one is the Business Governance Track based on traditional fiat logic, and the other is the Native Settlement Track based on the x402 protocol. The value capture logic between the two is different.

1. Business Governance Track: Web3 Business Payment System Layer

Applicable Scenarios: Low-frequency, non-micropayment real-world transactions (e.g., procurement, SaaS subscription, physical e-commerce).
Core Logic: Traditional fiat will dominate for a long time. Agents are just smarter front-ends and process coordinators, not replacements for Stripe / Card Organizations / Bank Transfers. The hard obstacles for stablecoins to enter the real commercial world on a large scale are regulation and taxation.
The value of projects like Skyfire, Payman, Catena Labs lies not in underlying payment routing (usually done by Stripe/Circle), but in “Machine Governance-as-a-Service”. That is, solving machine-native needs that traditional finance cannot cover — identity mapping, permission governance, programmatic risk control, liability attribution, and M2M / A2A micropayment (settlement per token / second). The key is who can become the “AI Financial Steward” trusted by enterprises.

2. Native Settlement Track: x402 Protocol Ecosystem and the Endgame of Facilitators

Applicable Scenarios: High-frequency, micropayment, M2M/A2A digital native transactions (API billing, resource stream payments).
Core Logic: x402 as an open standard achieves atomic binding of payment and resources through the HTTP 402 status code. In programmable micropayment and M2M / A2A scenarios, x402 is currently the protocol with the most complete ecosystem and most advanced implementation (HTTP native + on-chain settlement). Its status in the Agent economy is expected to be analogous to ‘Stripe for agents’.
Simply accessing x402 on the Client or Service side does not bring sector premium; what truly has growth potential are upper-layer assets that can precipitate long-term repurchases and high-frequency calls, such as OS-level Agent clients, Robot/IoT wallets, and high-value API services (market data, GPU reasoning, real-world task execution, etc.).
Facilitator, as the protocol gateway assisting Client and Server to complete payment handshake, invoice generation, and fund clearing, controls both traffic and settlement fees, and is the link closest to “revenue” in the current x402 Stack. Most Facilitators are essentially just “Payment Executors” with obvious low-threshold and homogeneity characteristics. Giants with availability and compliance advantages (like Coinbase) will form a dominant pattern. The core value to avoid marginalization will move up to the “Facilitator + X” service layer: providing high-margin capabilities such as arbitration, risk control, and treasury management by building verifiable service catalogs and reputation systems.

We believe that a “Dual-Track Parallel of Fiat System and Stablecoin System” will form in the future: the former supports mainstream human commerce, while the latter carries machine-native and on-chain native high-frequency, cross-border, and micropayment scenarios. The role of Web3 is not to replace traditional payments, but to provide underlying capabilities of Verifiable Identity, Programmable Clearing, and Global Stablecoins for the Agent era. Ultimately, Agentic Commerce is not limited to payment optimization, but is a reconstruction of the machine economic order. When billions of micro-transactions are automatically completed by Agents in the background, those protocols and companies that first provide trust, coordination, and optimization capabilities will become the core forces of the next generation of global commercial infrastructure.

Disclaimer: This article was completed with the assistance of AI tools ChatGPT-5 and Gemini 3 during the creation process. The author has made every effort to proofread and ensure the information is true and accurate, but omissions may still exist, and understanding is appreciated. It is important to note that the crypto asset market generally has a divergence between project fundamentals and secondary market price performance. The content of this article is for information integration and academic/research exchange only, does not constitute any investment advice, and should not be considered as a recommendation for buying or selling any tokens.

机器的经济秩序：智能体商业的全栈路径

BroadNotes by 0xjacobzhao — Tue, 16 Dec 2025 04:07:51 GMT

作者：0xjacobzhao | https://linktr.ee/0xjacobzhao

本独立研报由IOSG Ventures支持，研究写作过程受Raghav Agarwal@LongHash与Jay Yu@Pantera相关研报启发，感谢Lex Sokolin @ Generative Ventures, Jordan@AIsa, Ivy@《支无不言》博客对本文提出的宝贵建议。撰写过程中亦征询了 Nevermined, Skyfire, Virtuals Protocol, AIsa, Heurist, AEON等项目团队的意见反馈。本文力求内容客观准确，部分观点涉及主观判断，难免存在偏差，敬请读者予以理解。

智能体商业（Agentic Commerce）指的是由AI智能体自主完成服务发现、可信度判断、订单生成、支付授权及最终结算的全流程商业体系。它不再依赖于人类逐步操作或信息输入，而是由智能体在跨平台、跨系统的环境中自动协作、下单、支付与履约，从而形成机器与机器之间自主执行的商业闭环（M2M Commerce）。

加密领域中，最具实际应用价值的场景目前主要集中在稳定币支付与DeFi。因此，在Crypto与AI融合的过程中，最具价值的两条路径分别为：短期内依托现有成熟DeFi协议的AgentFi，以及中长期围绕稳定币结算、依赖ACP/AP2/x402/ERC-8004等协议逐步完善的Agent Payment。

智能体商业（Agentic Commerce）短期受限于协议成熟度、监管差异、商户用户接受度等因素，难以快速规模化；但从长期看，支付是所有商业闭环的底层锚点，智能体商业最具有长期价值。

一、智能体商业支付体系与应用场景

在智能体商业（Agentic Commerce）体系中，真实世界的商户网络才是最大的价值场景。无论 AI Agent 如何演进，传统法币支付体系（Stripe、Visa、Mastercard、银行转账）与快速增长的稳定币体系（USDC、x402）都将长期并存，共同构成智能体商业的底座。

传统法币支付 vs 稳定币支付对比

真实世界商户 — — 从电商、订阅、SaaS 到出行、内容付费与企业采购 — — 承载万亿美元级需求，也是 AI Agent 自动比价、续费与采购的核心价值来源。短期内，主流消费与企业采购仍将由传统法币支付体系长期主导。

稳定币在现实商业无法规模化的核心障碍并非仅技术，而是监管（KYC/AML、税务、消费者保护）、商户会计（稳定币非法偿）以及不可逆支付带来的争议处理机制缺失。由于这些结构性限制，稳定币短期难以进入医疗、航空、电商、政府、公用事业等高监管行业，其落地将主要集中在数字内容、跨境支付、Web3 原生服务与机器经济（M2M/IoT/Agent）等监管压力较低或链上原生的场景 — — 这也正是 Web3 原生的智能体商业最先实现规模突破的机会窗口。

不过，2025 年监管制度化正快速推进：美国稳定币法案取得两党共识，香港与新加坡落地稳定币牌照框架，欧盟 MiCA 正式生效，Stripe 支持 USDC、PayPal 推出 PYUSD。监管结构的清晰化意味着稳定币正被主流金融体系接纳，为未来跨境结算、B2B 采购与机器经济打开政策空间。

智能体商业最佳应用场景匹配

智能体商业（Agentic Commerce）的核心不是让一种支付轨道取代另一种，而是将“下单 — 授权 — 支付”的执行主体交给 AI Agent，使传统法币支付体系（AP2、授权凭证、身份合规）与稳定币体系（x402、CCTP、智能合约结算）各自发挥优势。它既不是法币 vs 稳定币的零和竞争，也不是单一轨道的替代叙事，而是一个同时扩张双方能力的结构性机会：法币支付继续支撑人类商业，稳定币支付加速机器原生与链上原生场景，两者互补共生，成为智能体经济的双引擎。

二、智能体商业底层协议标准全景

智能体商业（Agentic Commerce）的协议栈由六个层级构成，形成“能力发现”至“支付交付”完整的机器商业链路。A2A Catalog 与 MCP Registry 负责能力发现，ERC-8004 提供链上可验证身份与声誉；ACP 与 AP2 分别承担结构化下单与授权指令；支付层由传统法币轨道（AP2）与稳定币轨道（x402）并行组成；交付层则尚无统一标准。

发现层（Discovery Layer）：解决“Agent 如何发现并理解可调用服务”。AI 侧通过 A2A Catalog 与 MCP Registry 构建标准化能力目录；Web3 则依托 ERC-8004 提供可寻址的身份指引。该层是整个协议栈的入口。
信任层（Trust Layer）：回答“对方是否可信”。AI 侧尚无通用标准，Web3 通过 ERC-8004 构建可验证身份、声誉与执行记录的统一框架，是Web3 的关键优势。
下单层（Ordering Layer）：负责“订单如何表达与校验”。ACP（OpenAI × Stripe）提供对商品、价格与结算条款的结构化描述，确保商户可履约。由于链上难以表达现实世界商业契约，该层基本由 Web2 主导。
授权层（Authorization Layer）：处理“Agent 是否获得用户合法授权”。AP2 通过可验证凭证将意图、确认与支付授权绑定至真实身份体系。Web3 签名尚不具法律效力，因此无法承担该层的契约与合规责任。
支付层（Payment Layer）：决定“付款通过何种轨道完成”。AP2 覆盖卡与银行等传统支付网络；x402 则提供稳定币的原生 API 支付接口，使 USDC 等资产可嵌入自动化调用。两类轨道在此形成功能互补。
交付层（Fulfillment Layer）：回答“支付完成后如何安全交付内容”。目前无统一协议：现实世界依赖商户系统完成交付，Web3 的加密访问控制尚未形成跨生态标准。该层仍是协议栈的最大空白，也最有可能孕育下一代基础协议。

三、智能体商业关键核心协议详解

围绕智能体商业（Agentic Commerce）服务发现、信任判断、结构化下单、支付授权与最终结算这五个关键环节，Google、Anthropic、OpenAI、Stripe、Ethereum、Coinbase 等机构均在相应环节提出底层协议，从而共同构建出下一代 Agentic Commerce 核心协议栈。

Agent‑to‑Agent (A2A) — 智能体互操作协议（Google）

A2A 是由 Google 发起并捐赠至 Linux Foundation 的开源协议，旨在为不同供应商、不同框架构建的 AI Agents 提供统一的通信与协作标准。A2A 基于 HTTP + JSON-RPC，实现安全、结构化的消息与任务交换，使 Agents 能以原生方式进行多轮对话、协作决策、任务分解与状态管理。它的核心目标是构建“智能体之间的互联网”，让任何 A2A 兼容的 Agent 都能被自动发现、调用与组合，从而形成跨平台、跨组织的分布式 Agent 网络。

Model Context Protocol (MCP) — 统一工具数据接入协议（Anthropic）

MCP 由 Anthropic 推出，是连接 LLM / Agents 与外部系统的开放协议，侧重统一工具与数据访问接口。它将数据库、文件系统、远程 API 以及专有工具抽象为标准化资源，使 Agent 可以安全、可控、可审计地访问外部能力。MCP 的设计强调低集成成本与高可扩展性：开发者只需一次对接，即可让 Agent 使用整个工具生态。目前 MCP 已被多家头部 AI 厂商采用，成为 agent-tool 交互的事实标准。

MCP 关注的是 “Agent 如何使用工具” — — 为模型提供统一且安全的外部资源访问能力（如数据库、API、文件系统等），从而标准化 agent-tool / agent-data 的交互方式。

A2A 则解决 “Agent 如何与其他 Agent 协同工作” — — 为跨厂商、跨框架的智能体建立原生通信标准，支持多轮对话、任务分解、状态管理与长生命周期执行，是智能体之间的基础互操作层。

Agentic Commerce Protocol (ACP) — 下单结账协议（OpenAI × Stripe）

ACP（Agentic Commerce Protocol）是 OpenAI 与 Stripe 提出的开放下单标准（Apache 2.0），为 买家 — AI Agent — 商户 建立可被机器直接理解的结构化下单流程。协议覆盖商品信息、价格与条款校验、结算逻辑及支付凭证传递，使 AI 能在不成为商户的前提下代表用户安全发起购买。

其核心设计是：AI 以标准化方式调用商户的结账接口，而商户保留全部商业与法律控制权。ACP 通过结构化订单（JSON Schema / OpenAPI）、安全支付令牌（Stripe Shared Payment Token）、兼容现有电商后台，并支持 REST 与 MCP 发布能力，使商户无需改造系统即可进入 AI 购物生态。目前 ACP 已用于 ChatGPT Instant Checkout，成为早期部署可用的支付基础设施。

Agent Payments Protocol (AP2) — 数字授权与支付指令协议（Google）

AP2 是由 Google 联合多家支付网络与科技公司共同推出的开放标准，旨在为 AI Agent 主导的支付 建立统一、合规、可审计的流程。它通过加密签名的数字授权凭证将用户的支付意图、授权范围与合规身份绑定起来，为商户、支付机构与监管方提供可验证的“谁在为谁花钱”的证据。

AP2 以“Payment-Agnostic”为设计原则，同时支持信用卡、银行转账、实时支付以及通过 x402 等扩展接入稳定币等加密支付轨道。在整个 Agentic Commerce 协议栈中，AP2 不负责具体商品与下单细节，而是为各种支付渠道提供通用的Agent 支付授权框架。

ERC‑8004 — 链上 Agent 身份 / 声誉 / 验证标准（Ethereum）

ERC-8004 是由 MetaMask、Ethereum基金会、Google、 Coinbase共同提出的以太坊标准，旨在为 AI Agents 构建 跨平台、可验证、无需预信任 的身份与信誉体系，协议由链上三部分组成：

Identity Registry：为每个 Agent 铸造类似 NFT 的链上身份，可挂接 MCP / A2A 端点、ENS/DID、钱包等跨平台信息。
Reputation Registry：标准化记录评分、反馈与行为信号，使 Agent 的历史表现可审计、可聚合、可组合。
Validation Registry：支持 stake re-execution、zkML、TEE 等验证机制，为高价值任务提供可验证的执行记录。

通过 ERC-8004，Agent 的身份、信誉与行为被链上存证，形成跨平台可发现、不可篡改、可验证的信任底座，是 Web3 构建开放、可信 AI 经济的重要基础设施。ERC-8004 处于 Review 阶段，意味着标准已基本稳定、具备可实现性，但仍在广泛征求社区意见，尚未最终定稿。

x402 — 稳定币原生 API 支付轨道（Coinbase）

x402 是 Coinbase 提出的开放支付标准（Apache-2.0），将长期闲置的 HTTP 402 Payment Required 变为可编程的链上支付握手机制，让 API 与 AI Agent 可以在无需账号、无需信用卡、无需 API Key 的情况下实现去账户化、无摩擦、按需付费的链上结算。

图例：HTTP 402 支付工作流. 来源: Jay Yu@Pantera Capital

核心机制：x402 协议复活了互联网早期遗留的 HTTP 402 状态码。其工作流为：

请求与协商： 客户端（Agent）发起请求 -> 服务端返回 402 状态码及支付参数（如金额、接收地址）。
自主支付： Agent 本地签署交易并广播（通常使用 USDC 等稳定币），无需人工干预。
验证与交付： 服务端或第三方“Facilitator”验证链上交易后，即时释放资源。

x402 引入了 Facilitator（促进者）角色，作为连接 Web2 API 与 Web3 结算层的中间件。Facilitator 负责处理复杂的链上验证与结算逻辑，使传统开发者仅需极少代码即可将 API 货币化，服务端无需运行节点、管理签名或广播交易，只需依赖 Facilitator 提供的接口即可完成链上支付处理。当前最成熟的 Facilitator 实现由 Coinbase Developer Platform 提供。

x402 的技术优势在于：支持低至 1 美分的链上微支付，突破传统支付网关在 AI 场景下无法处理高频小额调用的限制；完全移除账户、KYC 与 API Key，使 AI 能自主完成 M2M 支付闭环；并通过 EIP-3009 实现无 Gas 的 USDC 授权支付，原生兼容 Base 与 Solana，具备多链可扩展性。

基于对Agentic Commerce的核心协议栈的介绍，下表总结协议在各层级的定位、核心能力、主要限制与成熟度评估，为构建跨平台、可执行、可支付的智能体经济提供了清晰的结构化视角。

四、Web3智能体商业生态代表性项目

当下智能体商业（Agentic Commerce）的Web3生态可分为三层：

业务支付系统层（L3），包括 Skyfire、Payman、Catena Labs、Nevermined 等项目，提供支付封装、SDK 集成、额度与权限治理、人类审批与合规接入，并不同程度对接传统金融轨道（银行、卡组织、PSP、KYC/KYB），搭建支付业务与机器经济的桥梁。
原生支付协议层（L2），由 x402、Virtual ACP 等协议及其生态项目构成，负责收费请求、支付验证与链上结算，是当前 Agent 经济中真正实现自动化、端到端清算的核心。x402 完全不依赖银行、卡组织与支付服务商，提供链上原生 M2M/A2A 支付能力。
基础设施层（L1），包括 Ethereum、Base、Solana 以及 Kite AI 等，为支付与身份体系提供链上执行环境、密钥体系、MPC/AA 与权限 Runtime的技术栈可信底座。

L3业务支付系统层 — Skyfire：AI Agent 的身份与支付凭证

Skyfire 以 KYA + Pay为核心，将“身份验证 + 支付授权”抽象为 AI 可用的 JWT 凭证，为网站、API、MCP 服务提供可验证的自动化访问与扣费能力。系统自动为用户生成 Buyer/Seller Agent 与托管钱包，支持卡片、银行与 USDC 充值。

系统层面，Skyfire 为每个用户生成 Buyer/Seller Agent 与托管钱包，支持通过卡、银行和 USDC 充值余额。其最大优势是完全兼容 Web2（JWT/JWKS、WAF、API Gateway 可直接使用），可为内容网站、数据 API、工具类 SaaS 提供“带身份的自动付费访问”。

Skyfire 是现实可用的 Agent Payment 中间层，但身份与资产托管均为中心化方案。

L3业务支付系统层 — Payman：AI 原生资金权限风控

Payman 提供 Wallet、Payee、Policy、Approval 四类能力，为 AI 构建可治理、可审计的“资金权限层”。AI 可以执行真实支付，但所有资金动作必须满足用户设置的额度、策略与审批规则。核心交互通过 payman.ask() 自然语言接口完成，系统负责解析意图、验证策略与执行支付。

Payman 的关键价值在于：“AI 可以动钱，但永远不越权。”将企业级资金治理迁移到 AI 环境：自动发薪、报销、供应商付款、批量转账等都可在明确定义的权限边界内完成。Payman 适合企业与团队内部的财务自动化（工资、报销、供应商付款等），定位是 受控资金治理层，并不尝试构建开放式 Agent-to-Agent 支付协议。

L3业务支付系统层 — Catena Labs：Agent 身份/支付标准

Catena 以 AI-Native 金融机构（托管、清算、风控、KYA）为商业层，以 ACK（Agent Commerce Kit）为标准层，构建 Agent 的统一身份协议（ACK-ID）与 Agent-native 支付协议（ACK-Pay）。目标是填补机器经济中缺失的可验证身份、授权链与自动化支付标准。

ACK-ID 基于 DID/VC 建立 Agent 的所有权链、授权链；ACK-Pay 定义与底层结算网络（USDC、银行、Arc）解耦的支付请求与可验证收据格式。Catena 强调长期的跨生态互操作性，其角色更接近“Agent 经济的 TLS/EMV 层”，标准化程度强、愿景清晰。

L3业务支付系统层 — Nevermined：计量、计费与微支付结算

Nevermined 聚焦基于使用量的 AI 经济模型，提供 Access Control、Metering、Credits System 与 Usage Logs，用于自动化计量、按次计费、分账与审核。用户可通过 Stripe 或 USDC 充值 credits，系统在每次 API 调用时自动校验使用量、扣费并生成可审计日志。

其核心价值在于支持 sub-cent 的实时微支付与 Agent-to-Agent 自动化结算，使数据购买、API 调用、workflow 调度等都能以“按调用付费”的方式运行。Nevermined 不构建新的支付轨道，而是构建支付之上的计量/计费层：短期推动 AI SaaS 商业化，中期支撑 A2A marketplace，长期可能成为机器经济的微支付 fabric。

Skyfire、Payman、Catena Labs、Nevermined 属于业务支付层，都需要在不同程度上对接银行、卡组织、PSP 与 KYC/KYB，但它们的真正价值并不在“接入法币”，而在于解决传统金融无法覆盖的机器原生需求 — — 身份映射、权限治理、程序化风控与按次计费。

Skyfire(支付网关)：为网站/API 提供“身份 + 自动扣费”（链上身份映射Web2身份）
Payman(财务治理)：面向企业内部的策略、额度、权限与审批（AI 可花钱但不越权）
Catena Labs(金融基建)：银行体系结合，通过 KYA、托管与清算服务构建(AI合规银行)
Nevermined (收银台)：支付之上只做计量与计费；支付依赖 Stripe/USDC。

相比之下，x402 处于更底层，是唯一不依赖银行、卡组织与 PSP 的原生链上支付协议，可通过 402 工作流直接完成链上扣款与结算。当 Skyfire、Payman、Nevermined 等上层系统都可以调用 x402 作为结算轨道，从而为 Agent 提供真正意义上的 M2M / A2A 自动化原生支付闭环。

L2原生支付协议层 — x402 生态：从客户端到链上结算

x402 原生支付生态可分为四个层级：客户端（Client）、服务端（Server）、支付执行层（Facilitators）以及区块链结算层。客户端负责让 Agent 或应用发起支付请求；服务端按次向 Agent 提供数据、推理或存储等 API 服务；支付执行层完成链上扣款、验证与结算，是整个流程的核心执行引擎；区块链结算层则承担最终的代币扣款与链上确认，实现不可篡改的支付落地。

图例：X402支付流来源：x402白皮书

客户端集成层（Client-Side Integrations / The Payers）：让 Agent 或应用能够发起 x402 支付请求，是整个支付流程的“出发点”。代表项目：

thirdweb Client SDK — — 生态最常用的 x402 客户端标准，维护活跃、支持多链，是开发者集成 x402 的默认工具。
Nuwa AI — — 使 AI 可无需编码直接付费访问 x402 服务，“Agent 付费入口”的代表项目。
官网中同时列出 Axios/Fetch、Mogami Java SDK、Tweazy 等尚属于早期客户端。

目前现有客户端仍停留在 “SDK 时代”，本质上是开发者工具。而类似浏览器/OS客户端、机器人/IoT客户端、企业系统或能管理多钱包 / 多 Facilitator 的更高级形态的客户端尚未出现。

服务端 / API 商品方（Services / Endpoints / The Sellers）：向 Agent 按次出售数据、存储或推理服务，部分代表项目包括：

AIsa — — 为真实运行的 AI Agents 提供付费资源的 API 调用与结算基础设施，使其可按调用、按 token 或按量访问数据、内容、算力及第三方服务，目前x402调用量第一。
Firecrawl — — AI Agent 最常消费的网页解析与结构化爬虫入口。
Pinata — — 主流 Web3 存储基础设施，x402 已能覆盖真实的底层存储成本非轻量 API。
Gloria AI — — 提供高频实时新闻与结构化市场信号，交易与分析型 Agent 的情报来源。
AEON — — 将 x402 + USDC 扩展到东南亚 / 拉美 / 非洲线下线上商户收单，商户达50M
Neynar — — Farcaster 社交图基础设施，将社交数据以 x402 的方式开放给 Agent。

当前服务端集中于爬虫/存储/新闻API，将金融交易执行API、广告投放 API、Web2 SaaS 网关甚至可以执行现实世界任务API的更高级的关键层几乎未开发，是未来最具潜力的增长曲线。

支付执行层（Facilitators / The Processors）：完成链上扣款、验证与结算，是 x402 的核心执行引擎，代表项目：

Coinbase Facilitator（CDP） — — 企业级可信执行器，Base 主网零费率 + 内置 OFAC/KYT，是生产环境的最强选择。
PayAI Facilitator — — 多链覆盖最广、增长最快的执行层项目（Solana、Polygon、Base、Avalanche 等），是生态中使用量最高的多链 Facilitator。
Daydreams — — 将支付执行与 LLM 推理路由结合的强场景项目，是当前增长最快的“AI 推理支付执行器”，正成为 x402 生态的第三极力量。
根据 x402scan 近 30 日数据，还存在一批中长尾 Facilitator／Router，包括 Dexter、Virtuals Protocol、OpenX402、CodeNut、Heurist、Thirdweb、x402.rs、Mogami、Questflow 等，整体 交易量、卖家数量、买家数量均明显低于头部三家。

区块链结算层（Blockchain Settlement Layer）： x402 支付工作流的最终落点，负责完成代币的实际扣款与链上确认。虽然 x402 协议本身是Chain-Agnostic的，但从当前生态数据来看，结算主要集中于两条网络：

Base — — 由 CDP 官方 Facilitator 主推，USDC 原生、费用稳定，是目前交易量与卖家数量最大的结算网络。
Solana — — 由 PayAI 等多链 Facilitator 重点支持，凭借高吞吐和低延迟，在高频推理和实时 API 场景中增长最快。

链本身不参与支付逻辑，随着更多 Facilitator的扩展，x402 的结算层将呈现更强的多链化趋势。

在 x402 支付体系中，Facilitator是唯一真正执行链上支付的角色，离“协议级收入”最近：负责验证支付授权、提交与追踪链上交易，并生成可审计结算证明，同时处理重放、超时、多链兼容与基础的合规检查。与只处理 HTTP 请求的 Client SDK（Payers）和 API 服务端（Sellers）不同，掌握流量入口与结算收费权，因此处于 Agent 经济的价值捕获核心，最受市场关注。

但现实情况是，大多数项目仍停留在测试网或小规模 Demo 阶段，本质只是轻量“支付执行器”，在身份、计费、风控、多链稳态处理等关键能力上缺乏护城河，呈现明显的低门槛、高同质化特征。随着生态逐步成熟，具备稳定性与合规优势由Coinbase背书的 Facilitator 确实拥有较为明显的先发优势，但随着 CDP Facilitator 开始收费，而其他 Facilitator 仍可能探索不同的变现模式，整体市场格局与份额分布仍存在较大的演变空间。从长期看，x402 仍属于接口层，无法承载核心价值，真正具备持续性竞争力的，是能在结算能力之上构建身份、计费、风控与合规体系的综合平台。

L2原生支付协议层 — Virtual Agent Commerce Protocol

Virtual 的 Agent Commerce Protocol（ACP） 为自主 AI 提供了一套通用的商业交互标准，通过 Request → Negotiation → Transaction → Evaluation 四阶段流程，使独立智能体能够以安全、可验证的方式请求服务、协商条款、完成交易并接受质量评估。ACP 以区块链作为可信执行层，确保交互过程可审计、不可篡改，并通过引入 Evaluator Agents 建立激励驱动的信誉体系，使异构而独立的专业 Agent 能在无中心协调的条件下形成“自治商业体”，开展可持续的经济活动。目前，ACP 已超越早期实验阶段初具生态规模，不限于对“多智能体商业交互标准”的探索。

L1基础设施层 — 新兴/垂直Agent 原生支付链

Ethereum、Base（EVM）、Solana等主流通用公链为 Agent 提供了最核心的执行环境、账户体系、状态机、安全性与结算基础，拥有成熟的账户模型、稳定币生态和广泛的开发者基础。

Kite AI 是代表性的 “Agent 原生 L1” 基础设施，专为智能体设计支付、身份与权限的底层执行环境。其核心基于 SPACE 框架（稳定币原生、可编程约束、代理优先认证、合规审计、经济可行微支付），并通过 Root→Agent→Session 的三层密钥体系实现细粒度风险隔离；再结合优化状态通道构建“Agent 原生支付铁路”，将成本压至 $0.000001、延迟控制在百毫秒级，使 API 级高频微支付成为可行。作为通用执行层，Kite 向上兼容 x402、Google A2A、Anthropic MCP，向下兼容 OAuth 2.1，目标成为连接 Web2 与 Web3 的统一 Agent 支付与身份底座。

AIsaNet 集成x402与 L402（Lightning Labs 开发的基于闪电网络的 402 支付协议标准）协议，作为面向 AI Agents 的微支付与结算层，支持高频交易、跨协议调用协调、结算路径选择和交易路由，使 Agents 无需理解底层复杂性即可完成跨服务、跨链自动支付。

五、总结与展望：从支付协议到机器经济秩序重构

智能体商业（Agentic Commerce）是由机器主导的一套全新经济秩序的建立。它不是“AI 自动下单”这么简单，而是一整条跨主体链路的重构：服务如何被发现、可信度如何建立、订单如何表达、权限如何授权、价值如何清算、争议由谁承担。A2A、MCP、ACP、AP2、ERC-8004 与 x402 的出现，把“机器之间的商业闭环”标准化。

沿着这条演化路径，未来的支付基础设施将分化为两条平行轨道：一条是基于传统法币逻辑的业务治理轨道，另一条是基于 x402 协议的原生结算轨道。这两者之间的价值捕获逻辑并不同。

1. 业务治理轨道：Web3 业务支付系统层

适用场景： 低频、非微支付的真实世界交易（如采购、SaaS 订阅、实物电商）。
核心逻辑： 传统法币将长期主导，Agent 只是更聪明的前端与流程协调器，而不替代 Stripe / 卡组织 / 银行转账。稳定币大规模进入真实商业世界的硬障碍在监管与税务。
Skyfire、Payman、Catena Labs 等项目价值不在于底层的支付路由（通常由 Stripe/Circle 完成），而在于机器治理服务” (Governance-as-a-Service)。即解决传统金融无法覆盖的机器原生需求 — — 身份映射、权限治理、程序化风控、责任归属及M2M / A2A micropayment（按 token / 秒结算）。关键是谁能成为企业信赖的“AI 财务管家”。

2. 原生结算轨道：x402 协议生态与 Facilitator 的终局

适用场景： 高频、微支付、M2M/A2A 的数字原生交易（API 计费、资源流支付）。
核心逻辑： x402 作为开放标准，通过 HTTP 402 状态码实现了支付与资源的原子化绑定。在可编程微支付和 M2M / A2A 场景中，x402 目前是生态最完整、落地最靠前的协议（HTTP 原生 + 链上结算），在 Agent 经济中的地位有望类比 ‘Stripe for agents’。
单纯在 Client 或 Service 端接入 x402 并不带来赛道溢价；真正具备增长潜力的是能沉淀长期复购与高频调用的上层资产，如 OS 级 Agent 客户端、机器人/IoT 钱包及高价值 API 服务（市场数据、GPU 推理、现实任务执行等）。
Facilitator协助 Client 与 Server 完成支付握手、发票生成与资金清算的协议网关，既掌握流量也掌握结算费，是目前 x402 Stack 中离“收入”最近的一环。多数 Facilitator 本质上只是“支付执行器”，明显的低门槛、同质化特征。具备可用性与合规优势的巨头（如 Coinbase）形成主导格局。而避免被边缘化的核心价值将上移至 “Facilitator + X” 服务层：通过构建可验证服务目录与声誉体系，提供仲裁、风控、金库管理等高毛利能力。

我们相信未来将形成 “法币体系”与“稳定币体系”双轨并行”：前者支撑主流人类商业，后者承载机器原生与链上原生的高频、跨境、微支付场景。Web3 的角色不是取代传统支付，而是为 Agent 时代提供 可验证身份、可编程清算与全球稳定币 的底层能力。最终，智能体商业（Agentic Commerce）不仅限于支付优化，而是机器经济秩序的重构。当数十亿次微交易由 Agent 在后台自动完成时，那些率先提供信任、协调与优化能力的协议与公司，将成为下一代全球商业基础设施的核心力量。