Reddit - The heart of the internet

Feed About

Best

Open sort options

Change post view

Upcoming Webinar: Multimodal Agentic Semantic Search with Teradata

u/Ajay_Unstructured

Upcoming Webinar: Multimodal Agentic Semantic Search with Teradata

Event

Most enterprise data isn't text. It's PDFs with embedded tables, scanned images, charts with no clear labelling, and more. Getting Agents to actually work on that content, not around it, is what we're covering today.

In this DevTalk with our friends at Teradata, we're walking through a live multimodal agentic semantic search demo. From raw unstructured content to intelligent, grounded answers across documents, images, and tabular data.

Here's what we'll cover:

Processing PDFs, images, and text with Unstructured to generate clean, GenAI ready metadata
Storing and querying vector embeddings inside Teradata VantageCloud's Enterprise Vector Store
Building an agentic retrieval workflow that can reason across images

If you're building search or RAG on top of messy, real-world enterprise data, this one's for you.

📆 TODAY at 8:30AM PT
🎙️ (Teradata) & (Unstructured)
🔗

Upcoming Webinar: How to Build & Maintain Connectors

u/Ajay_Unstructured

Upcoming Webinar: How to Build & Maintain Connectors

Event

Data lives everywhere — S3, SharePoint, Salesforce, Slack, you name it. Before you can chunk, embed, or do anything useful with it for AI, you need connectors to pull it in. And then more connectors to write results back out.

Sounds simple. It's not.

Auth schemes are all over the place. APIs change without warning. Edge cases multiply fast when you're operating at scale across dozens of integrations.

We've spent a lot of time figuring out what actually works in production, and we're doing a session breaking it all down:

What we'll cover:

The four-phase architecture we landed on: Index → Download → Stage → Upload
How source connectors discover and retrieve docs from external systems
How destination connectors prep and deliver data to its final home
End-to-end testing strategies that keep 50+ integrations from falling apart

There's a live walkthrough + Q&A at the end. Can't make it live? Register and we'll send the recording.

Mar 4 @ 10am PT / 1pm ET —

We're open-sourcing our document parsing evaluation framework + a new benchmark dataset

u/Ajay_Unstructured

We're open-sourcing our document parsing evaluation framework + a new benchmark dataset

News

We've been evaluating document parsing systems internally for a while now, and today we're releasing SCORE-Bench to the community: 224 real-world documents with expert annotations, plus our complete evaluation methodology and code.

The dataset includes documents that challenge parsers in production: scanned forms with visual degradation, financial reports with deeply nested tables, multi-column layouts, mixed handwriting and printed text. Every document has been manually annotated by domain experts rather than algorithmically labeled.

We built the SCORE, our evaluation framework to handle the tricky parts of comparing generative systems fairly—like recognizing when different structural representations are semantically equivalent. We've already open-sourced the framework and methodology. Now with SCORE-Bench, we hope you can benchmark your own systems using the same approach, reproduce results, and track progress as document parsing evolves.

What we're releasing:

We built this to solve our own evaluation needs but figured the community might find it useful too. We’d love to hear about what documents break your systems usually, other evaluation scenarios that you follow, and more importantly, how we could make this more useful for your case. Feel free to reach out to us!

r/UnstructuredIO

Community highlights

Welcome to r/UnstructuredIO 👋 Introduce yourself!

Rule 1: Be respectful

Rule 2: Stay Relevant

Rule 3: No Spam

Rule 4: Share to Help

Rule 5: Keep It Safe