Skip to main content
Image

r/UnstructuredIO


Upcoming Webinar: Multimodal Agentic Semantic Search with Teradata
Upcoming Webinar: Multimodal Agentic Semantic Search with Teradata
Event

Most enterprise data isn't text. It's PDFs with embedded tables,  scanned images, charts with no clear labelling, and more. Getting Agents to actually work on that content, not around it,  is what we're covering today.

In this DevTalk with our friends at Teradata, we're walking through a live multimodal agentic semantic search demo. From raw unstructured content to intelligent, grounded answers across documents, images, and tabular data.

Here's what we'll cover:

  • Processing PDFs, images, and text with Unstructured to generate clean, GenAI ready metadata

  • Storing and querying vector embeddings inside Teradata VantageCloud's Enterprise Vector Store

  • Building an agentic retrieval workflow that can reason across images 

If you're building search or RAG on top of messy, real-world enterprise data, this one's for you.

📆 TODAY at 8:30AM PT  
🎙️ Daniel Herrera (Teradata) & Ajay Krishnan (Unstructured)
🔗 https://www.linkedin.com/events/7437130385962070017/


Upcoming Webinar: How to Build & Maintain Connectors
Upcoming Webinar: How to Build & Maintain Connectors
Event

Data lives everywhere — S3, SharePoint, Salesforce, Slack, you name it. Before you can chunk, embed, or do anything useful with it for AI, you need connectors to pull it in. And then more connectors to write results back out.

Sounds simple. It's not.

Auth schemes are all over the place. APIs change without warning. Edge cases multiply fast when you're operating at scale across dozens of integrations.

We've spent a lot of time figuring out what actually works in production, and we're doing a session breaking it all down:

What we'll cover:

  • The four-phase architecture we landed on: Index → Download → Stage → Upload

  • How source connectors discover and retrieve docs from external systems

  • How destination connectors prep and deliver data to its final home

  • End-to-end testing strategies that keep 50+ integrations from falling apart

There's a live walkthrough + Q&A at the end. Can't make it live? Register and we'll send the recording.

Mar 4 @ 10am PT / 1pm ET — Register here


We're open-sourcing our document parsing evaluation framework + a new benchmark dataset
We're open-sourcing our document parsing evaluation framework + a new benchmark dataset
News

We've been evaluating document parsing systems internally for a while now, and today we're releasing SCORE-Bench to the community: 224 real-world documents with expert annotations, plus our complete evaluation methodology and code.

The dataset includes documents that challenge parsers in production: scanned forms with visual degradation, financial reports with deeply nested tables, multi-column layouts, mixed handwriting and printed text. Every document has been manually annotated by domain experts rather than algorithmically labeled.

We built the SCORE, our evaluation framework to handle the tricky parts of comparing generative systems fairly—like recognizing when different structural representations are semantically equivalent. We've already open-sourced the framework and methodology. Now with SCORE-Bench, we hope you can benchmark your own systems using the same approach, reproduce results, and track progress as document parsing evolves.

What we're releasing:

We built this to solve our own evaluation needs but figured the community might find it useful too. We’d love to hear about what documents break your systems usually, other evaluation scenarios that you follow, and more importantly, how we could make this more useful for your case. Feel free to reach out to us!