Atla reposted this
Trees, not logs? Most approaches to automated trace evaluation for agents use traces that have been flattened into a single block of text. It’s simple, but it discards the actual tree structure created by spans, parent child relationships, and branching tool calls. In our latest blog, we dig into why it's useful to keep the tree structure of traces intact to evaluate agents (something we do at Atla). Treating traces as trees allows us to 1) represent the trace much more compactly, as a collection of conversation prefix trees, 2) traverse the tree, pausing at every LLM span and leaving “notes”, and 3) traverse both forwards and backwards—where a backward pass prunes our judgements using the benefit of hindsight. Our view: preserving the structure of traces leads to more precise and more faithful evaluations. As agents become more complex, evaluation methods need to account for that complexity. Trees, not logs 🌲 ➡️ Check out the blog for more details: https://lnkd.in/ea4CkvHj