Improved data processing features in Foundry IQ: Richer content extraction and data enrichment

gia_mondragon

Microsoft

Jun 02, 2026

Great answers start before retrieval. Learn how Foundry IQ (Azure AI Search) strengthens the data pipeline—bringing more enterprise content, preserving document structure, and enabling multimodal retrieval.

Great answers start before retrieval. They depend on how well your data pipeline can reach enterprise content, preserve document structure, and carry text and visual context into the index. As part of the Microsoft Foundry platform, the Foundry IQ (Azure AI Search) 2026-05-01-preview release focuses on that foundation: broader SharePoint indexing, richer Content Understanding extraction, and image serving for multimodal agentic retrieval, among others.

Many limitations in RAG and agentic retrieval come from the data pipeline. Gaps in content coverage, loss of document structure, and missing visual context in multiple scenarios would directly impact grounding and answer quality. This release addresses these gaps so developers and platform teams can build more reliable, enterprise-ready retrieval systems with less custom engineering.

What’s new:

Bringing more SharePoint knowledge into the retrieval with SharePoint indexer updates in preview
Improved document understanding and extraction with Content Understanding in Foundry Tools integration in preview
Surfacing document-embedded images during agentic retrieval with image serving

These new capabilities of Foundry IQ are surfaced via built-in indexing capabilities and via knowledge sources. The focus is practical: make more enterprise content usable, preserve more of the original document meaning, and give agents access to visual evidence alongside text. It also highlights related capabilities that are now generally available in the April 2026 API release.

Bring more SharePoint knowledge into retrieval

Many organizations rely on SharePoint to store intranet pages, announcements, operational lists, team content, and documents. However, AI retrieval pipelines have typically focused on document libraries, leaving pages and lists underutilized.

With the 2026-05-01-preview API, the SharePoint indexer adds support for modern ASPX site pages and SharePoint Lists, in addition to document libraries. It also includes recursive subsite discovery and source URL traceability.

This expands the surface area of enterprise knowledge available to retrieval systems. A support assistant can use operational lists, an employee assistant can reference intranet content, and business workflows can retrieve from pages, files, and structured data through a single retrieval layer. It also reduces manual configuration and preserves links to original content.

Sample: SharePoint ASPX and Lists

Improve content extraction, structure and semantic preservation with Content Understanding in Foundry Tools updates

A common limitation in RAG systems is loss of structure during ingestion. Tables, layout, and reading order are often flattened, which reduces answer quality.

Foundry IQ now supports semantic chunking and AI-generated image descriptions in preview with its Content Understanding in Foundry Tools data processing integration.

Semantic chunking respects document structure to produce more coherent content segments. Image descriptions convert charts, diagrams, and other visual elements into retrievable text.

Setting contentExtractionMode property to standard in file-based indexed knowledge sources (Azure blob, SharePoint, OneLake) enables Content Understanding in Foundry tools functionality within the ingestion pipeline. This is especially useful for complex PDFs where structure matters.

These updates improve how content is preserved before indexing, resulting in different scenarios in stronger grounding.

Surface document-embedded images during agentic retrieval with image serving

Enterprise documents often contain important information in images, such as diagrams, charts, screenshots, and scanned forms. Text-only retrieval can miss or misrepresent this content.

Image serving preserves extracted images during ingestion and makes them available at retrieval time. This allows models to reason over visual content alongside text.

For example, a technician can interpret a wiring diagram, an analyst can validate a scanned form, and a financial user can analyze a chart within a report. This improves grounding in scenarios where critical information is visual.

Figure 1 - Image serving high-level flow

Sample: https://aka.ms/FoundryIQ-data-samples

Other updates to explore

Azure API Management endpoint support for Azure OpenAI-powered skills in Foundry IQ: GenAI prompt (chat completions) / embedding and vectorizers – including custom domain, private network connectivity and managed identities. Enables increased model quota centralized policy control, observability, and traffic routing for enterprise scenarios with large scalability requirements.
Private connectivity for Search to model communication: Enables secure communication with Foundry resources (resource type Microsoft.CognitiveServices/accounts and foundry_account group).
New knowledge source options:
- Indexed: File, Azure SQL.
- Remote: Fabric Data Agent, Fabric Ontology, MCP Server, and Work IQ.

Generally available: REST API version 2026-04-01

The 2026-04-01 REST API includes the following capabilities now available for production:

GenAI Prompt skill: Enables chat model execution within indexing pipelines.
Content Understanding skill and indexed knowledge sources’ ‘standard’ document extraction mode: Improves layout-aware extraction and table handling for complex scenarios.
Knowledge bases and multiple knowledge sources: Supports retrieval across multiple enterprise data sources.
Markdown parsing modes: Improves ingestion of documentation and repositories.
Security and governance updates: Foundry IQ integrates with Microsoft security capabilities such as SharePoint ACLs, sensitivity labels across multiple sources, and private networking, among others, see: https://aka.ms/FoundryIQ-security.