Today we're releasing Context-Bench, a benchmark (and live leaderboard!) measuring LLMs on Agentic Context Engineering.
C-Bench measures an agent's ability to manipulate its own context window, a necessary skill for AI agents that can self-improve and continually learn.