LlamaIndex’s cover photo
LlamaIndex

LlamaIndex

Technology, Information and Internet

San Francisco, California 283,122 followers

AI agents for document OCR + workflows

About us

LlamaIndex delivers the world's most accurate agentic document processing platform. We bring together industry-leading agentic OCR with a natural language workflow builder to power intelligent agents that read and extract over complex documents, adapt to business logic, and scale reliably to production. Our SDK is downloaded more than 25M+ every month and used by the fastest growing AI companies and the Fortune 50.

Website
https://www.llamaindex.ai/
Industry
Technology, Information and Internet
Company size
11-50 employees
Headquarters
San Francisco, California
Type
Public Company

Locations

Employees at LlamaIndex

Updates

  • View organization page for LlamaIndex

    283,122 followers

    ParseBench is live on Kaggle. We partnered with Kaggle to launch the first Document OCR leaderboard for AI agents. The bar for OCR has shifted. When an agent is approving an insurance claim or reading a 10-K, "good enough for a human" isn't good enough. A transposed table header, a dropped decimal, a silently stripped strikethrough, any one of these can send a downstream decision off the rails. ParseBench measures what actually breaks agents: ✅ ~2,000 human-verified enterprise pages ✅ 167,000+ test rules ✅ 5 dimensions: tables, charts, content faithfulness, semantic formatting, and visual grounding ✅ 14 methods benchmarked Now anyone can run it, submit a model, and see how their parser holds up on real SERFF filings, financial reports, and contracts. Huge thanks to the Kaggle team for making it easy to turn a rigorous benchmark into a transparent, reproducible leaderboard the whole ecosystem can build on. Read the full story → https://lnkd.in/ehyEZ7kN

    • No alternative text description for this image
  • LiteParse is our wildly popular open-source, layout-aware PDF parser — built for AI agents that actually need to understand document structure, not just scrape text out of it. The secret behind it? Grid projection. Here's the core of the problem. PDFs don't store text in reading order. They store coordinates and glyphs, scattered across the page with no sense of what belongs together. That's why most extractors destroy tables, merge columns, and lose all the structure that made the document useful in the first place. Most tools pick one of two approaches: 🚫 Concatenate text left-to-right, top-to-bottom. Fast, but kills structure. 🚫 Full ML-based layout analysis. Accurate, but slow and complex. LiteParse does neither. It projects text onto a monospace character grid and lets alignment itself carry the structure: ✅ Extract "anchors" — recurring X coordinates where text consistently lines up (columns, tabs, margins) ✅ Classify each text item by which anchor it belongs to ✅ Project onto a character grid, with forward anchors propagating column positions down the page ✅ For flowing paragraphs, bypass the grid entirely to avoid artifacts The result: ~1,650 lines of TypeScript that preserves tables, columns, and alignment — without the cost of full layout analysis, and fast enough for agent workflows. One of the most interesting parts is the debug tooling. Every decision in the pipeline is traceable, and the visual debugger renders color-coded PNGs of the grid output. We built it specifically so coding agents like Claude can sit in the driver's seat and drive algorithm improvements themselves. Full walkthrough of how grid projection works: https://lnkd.in/eYvt-6xC LiteParse is open source: https://lnkd.in/e6b5Q-DZ

    • No alternative text description for this image
  • Let's talk parsing charts 📊 📈 . Last week we released ParseBench, the first document OCR benchmark for AI agents. Most document parsers see a chart and skip it. Or describe it in words. But AI agents don't need a description, they need numbers. That's why we built ChartDataPointMatch. Here's the gap it closes. Benchmarks like ChartQA evaluate chart understanding through Q&A — "What's the highest bar?" "How many values are below 50?" Useful for humans reading dashboards. Not useful for an agent processing a quarterly earnings PDF that needs the underlying structured table back: series names, x-axis categories, and precise numerical values. ChartDataPointMatch is built for that job: ✅ Each chart is annotated with up to 10 spot-check data points ✅Each point carries a value + labels (series name, axis category) ✅A point passes only if the value AND every label map to the right row/column in the parser's output ✅Explicit value labels → exact match required ✅No value labels (eyeballing a bar or line) → small tolerance, because the ground truth itself is imprecise The result is a metric that separates parsers that actually read charts from ones that just OCR the text around them.On ParseBench, most specialized document parsers score under 6% on charts. LlamaParse Agentic: 78%+. Read about ParseBench, access the GitHub code, published paper, Hugging Face dataset, and more: https://lnkd.in/epksZ3Nk

  • View organization page for LlamaIndex

    283,122 followers

    NYC FinTech Week starts is 7 days, and we're bringing AI builders to an icon NYC rooftop.🗽 As part of #NYFinTechWeek, we're teaming up with Linkup to host the AI Builders Rooftop Happy Hour, a gathering for the people actually shipping AI into fintech. If you're working on fintech agents, document intelligence pipelines, agentic workflows, or AI-native products, come trade notes with the builders doing the same. Expect real conversations about what's working in production, custom cocktails, good food, and a rooftop view of the city. (And rumor has it there may be a piñata battle. We're not confirming anything.) Register now 👇 https://luma.com/05oso3cq

  • Let's talk content faithfulness. Four days ago we launched ParseBench, the first document OCR benchmark for AI agents. Here's what most OCR and parsing benchmarks miss: they measure how well a human could read the output. But agents aren't humans. They can't infer a missing row, squint at a merged column, or mentally reconstruct a two-column layout. If a parser drops a single digit, the agent's downstream decision is wrong. Content Faithfulness is the most fundamental dimension in ParseBench. It asks the simplest question you can ask a parser: did it actually capture all the text, in the right order, without making things up? We test for three specific failure modes: ❌ Omissions — dropped text at word, sentence, and digit levels ❌ Hallucinations — fabricated content that doesn't exist in the source ❌ Reading order violations — multi-column layouts linearized incorrectly And we don't grade with fuzzy text similarity. We run 167,000+ rule-based tests across ~2,000 human-verified enterprise document pages. That's what makes the results actionable: when a parser drops data, you can see exactly which document types trigger it. This matters because the bar has moved. OCR used to mean "good enough for a human to read." For agents, it has to be "reliable enough to act on." The video breaks down why the metric is structured the way it is. Full benchmark, dataset, and paper: https://lnkd.in/epksZ3Nk

  • LlamaIndex reposted this

    View profile for Jerry Liu

    LlamaIndex43K followers

    We comprehensively benchmarked Opus 4.7 on document understanding. We evaluated it through ParseBench - our comprehensive OCR benchmark for enterprise documents where we evaluate tables, text, charts, and visual grounding. The results 🧑🔬: - Opus 4.7 is a general improvement over Opus 4.6. It has gotten much better at charts compared to the previous iteration - Opus 4.7 is quite good at tables, though not quite as good as Gemini 3 flash - Opus 4.7 wins on content faithfulness across all techniques (including ours) - Using Opus 4.7 as an OCR solution is expensive at ~7c per page!! For comparison, our agentic mode is 1.25c and cost-effective is ~0.4c by default. Take a look at these results and more on ParseBench! https://parsebench.ai/

    • No alternative text description for this image
  • Anthropic just released Opus 4.7, and we already have it running with Day 0 integration in ParseBench. Anthropic's own benchmarks show a striking result: 80.6% on Document Reasoning (OfficeQA Pro), up from 57.1% with Opus 4.6. That's a massive jump. But here's the thing: document reasoning benchmarks measure whether a model can answer questions about a document. ParseBench measures something harder: can the model actually parse the document correctly for a downstream agent to use? Those are very different tasks. And the ParseBench numbers tell a more nuanced story. Where it improved: ✅ Charts: +42.3 points (13.5% → 55.8%). Opus 4.7 has increased image resolution, which means it can now read data points from bar charts and line charts that previous Claude models couldn't touch. ✅ Semantic Formatting: +5.2 points (64.2% → 69.4%). Better at preserving strikethrough, superscripts, and bold, the type of formatting that carries meaning in financial and legal docs. ✅ Content Faithfulness: +0.6 points (89.7% → 90.3%). Fewer omissions and hallucinations. ✅ Tables: +0.7 points (86.5% → 87.2%). Marginal improvement on table structure. Where it didn't: ❌ Visual Grounding: -2.5 points (16.5% → 14.0%). Layout understanding actually regressed slightly. The model still can't reliably trace extracted content back to its source location on the page in a single shot. The chart improvement is genuinely impressive, going from 13.5% to 55.8% is a 4x gain. But it comes at a steep cost: ~7.1¢ per page vs. ~5.8¢ with Opus 4.6. For a 500-page insurance filing that’s $37.50 per document. LlamaParse Agentic continues to lead at 84.9% overall — at ~1.2¢ per page. ParseBench is fully open. Run it on Opus 4.7 yourself → https://lnkd.in/egd-bVY8

    • No alternative text description for this image
  • View organization page for LlamaIndex

    283,122 followers

    LiteParse crossed 4.3k+ GitHub stars, and it now has its own home at https://lnkd.in/eymJbYZF 🎉 Just weeks ago we open-sourced a fast, local-first document parser built for AI agents. No cloud, no API keys, no GPU. It parses ~500 pages in 2 seconds, handles 50+ file formats, and plugs into 46+ agents with one command. The developer response has been humbling. Engineers are already wiring LiteParse into Claude Code, Cursor, and production agent pipelines, turning messy PDFs, financial filings, and scanned docs into structured data their agents can actually reason over. Today we're making it official: LiteParse is part of the LlamaIndex open source family, alongside LlamaIndex and the rest of the stack that powers document agents in production. And we're just getting started. Next week, Logan Markewich, our Head of Open Source, is running a live workshop on building real fintech products with LiteParse. You'll go from raw financial documents to a working due diligence agent, live. April 28th, 9 AM PST. Save your spot → https://lnkd.in/gHgZ7skF

  • View organization page for LlamaIndex

    283,122 followers

    Fintech ❤️ Documents Our team is at the AI in Finance New York event, speaking to top 5 US banks, the largest hedge funds and PE firms, and the most innovative fintechs. Three out of 5 'unplanned' conversations are with current customers. While we're proud of our SF roots, its amazing to see our adoption across NYC. We spoke to current customers and future prospects about how: ✅ Template-free OCR beats legacy IDP ✅ Agentic doc extraction beats frontier VLMs ✅ Workflows powered by AI agents automate entire workflows If you're in finance, we'd love to chat about how we can save you some tokens while boosting your straight through processing rates today --> https://lnkd.in/g5648-ip

    • No alternative text description for this image
  • LlamaIndex reposted this

    View profile for Jerry Liu

    LlamaIndex43K followers

    Parsing complex tables in PDFs is extremely challenging. Existing metrics for measuring table accuracy, like TEDS (tree edit distance similarity), overweight exact table structure and underweight semantic correctness. 🚫 Overweight: If the rows within a table are out of order - even if the semantic meaning is still consistent - then TEDS heavily penalizes these values, even though the downstream AI agent would have no problem interpreting the values. 🚫 Overweight: If the HTML is semantic equivalent but output with different tags (th vs. td), TEDS will penalize 🚫 Underweight: If the header is dropped or transposed, then TEDS mildly penalizes these values, even though the entire semantic meaning of the table is destroyed. We recently released ParseBench, a comprehensive enterprise document benchmark with a heavy focus on *semantic correctness* for tables. We define a new metric: TableRecordMatch - which treats tables as a bag of records, where each record is a dictionary of key-value pairs, with keys being the headers and values being the cell values. We combine it with the GriTS metric (more robust than TEDS) to come up with the final GTRM score. It’s worth giving our full paper a read if you haven’t already. Also come check out our website hub! Website: http://parsebench.ai/ Blog: https://lnkd.in/gAnP4Mwq Paper: https://lnkd.in/gvJeDxEa

Similar pages

Browse jobs

Funding