Mixedbread (@mixedbreadai) / X

Mixedbread

127 posts

Mixedbread

@mixedbreadai

Building retrieval for agents.

San Francisco, CA

Joined March 2024

Pinned
Mixedbread
@mixedbreadai
Mar 12
Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.
203K
Mixedbread reposted
Alexander Martin
@alexdmartin314
3h
@mixedbreadai's wholembed-v3 set a high bar for video retrieval as a single-stage retriever on our MAGMaR shared task! To beat it, both C2F-RAG and MARQUIS needed multi-stage pipelines with reasoning-based reranking on top. That's a strong model.
MAGMaR
@MAGMaR_workshop
7h
Replying to @MAGMaR_workshop
The top 2 systems are C2F-RAG by Dai et al, using text captions + LLM-based cognitive reranking, and MARQUIS by @derangineer et al, using query decomposition and video native reranking.
1.5K
Mixedbread
@mixedbreadai
10h
New: Bring your own bucket Mixedbread is built on top of object storage. Now that storage can be yours. Your data stays in a bucket you control. Mixedbread indexes and searches with zero content retention on our side. For enterprise teams that need control, compliance, audit
00:00
2.3K
Mixedbread
@mixedbreadai
10h
Powerful search in your control:
717
Mixedbread
@mixedbreadai
Jun 10
New: Metadata explorer Adding metadata to files enables filtering during search. Now, you can browse metadata fields and values across your store.
00:00
1.7K
Mixedbread
@mixedbreadai
Jun 10
Agents can inspect file metadata in a store to understand available filters. docs: mixedbread.com/docs/stores/se…
263
Mixedbread
@mixedbreadai
Jun 2
By now, everyone knows that single-vector embedding models are hugely limiting for modern workflows. But they contain than you think: you can extract sparse Latent Terms from them. And it turns out that BM25 is all you need to turn this vocabulary into a strong retriever.
00:00
40K
Mixedbread
@mixedbreadai
Jun 2
Replying to @mixedbreadai
Having language-adjacent properties means that tools designed for lexical approaches "just work". BM25, always refusing to exit the scene, is strong here: applied over the Latent Terms extracted from nomic-embed-v1.5, it results in a near state-of-the-art sparse retriever.
2.6K
Mixedbread
@mixedbreadai
Jun 2
Read more here:
Dense Retrievers Know More Than They Can Express
From mixedbread.com
1.4K
Mixedbread
@mixedbreadai
May 27
New: grep for exact matching grep → keyword / regex matching search → fine-grained semantic retrieval Works across uploaded content, including text, PDFs (OCR) and audio/video (transcription). Give your agents both retrieval primitives to perform at their best.
5.5K
Mixedbread
@mixedbreadai
May 27
docs:
Grep Store Chunks
From mixedbread.com
617
Mixedbread
@mixedbreadai
May 25
Feature: Native agentic search on Mixedbread Search with auto-planning, exploration, and multi-hop reasoning across documents. Built for: - evidence discovery - exhaustive search - cross-document reasoning → Topped MADQA @snowflake with 93.4% accuracy across 18,000 PDF
8.9K
Mixedbread
@mixedbreadai
May 25
Steer search with more instructions. Docs: mixedbread.com/docs/stores/se…
839
Mixedbread
@mixedbreadai
May 25
View and export traces directly from your dashboard:
652
Mixedbread
@mixedbreadai
May 24
New: Traces for Mixedbread agentic search See every search call an agent makes directly in the dashboard, and tune instructions for better retrieval quality.
00:00
6.6K
Mixedbread
@mixedbreadai
May 11
Introducing mxbai-rerank-v3-listwise: reranking that goes beyond binary relevance. It reads the whole candidate set, resolves conflicts, and ranks by directives like recency, source priority, and multi-step rules. +11% NDCG@10 on average across multiple domains, modalities, and
25K
Mixedbread
@mixedbreadai
May 11
Read more here:
Ranking Beyond Binary Relevance: mxbai-rerank-v3-listwise
From mixedbread.com
1.2K
Mixedbread
@mixedbreadai
Mar 24
Replying to @mixedbreadai
You can read more about this in our blog post, where we present more detailed benchmark results and elaborate on the nature of the three benchmarks, and why we're very proud to be topping all three of them.
Closing the Oracle Gap for Your Agents
From mixedbread.com
3K
Mixedbread
@mixedbreadai
Mar 24
Mixedbread search's ultimate aim is to power all workflows, no matter their modality or language. Try it for your own knowledge-intensive tasks today:
Mixedbread
From mixedbread.com
2.2K
Mixedbread
@mixedbreadai
Mar 24
Replying to @mixedbreadai
Agents are increasingly performing knowledge work: Deep Research, generating financial reports, reasoning across historical knowledgebases... Many high-quality benchmarks now focus on evaluating such tasks, among which BrowseComp-Plus, @databricks's OfficeQA, or @Snowflake's
3K
Mixedbread
@mixedbreadai
Mar 24
So what is the Oracle gap? Optimising agentic systems is complicated. There are many individual components you need to get just right. Retrieval is one of those components, and its impact is best measured by the Oracle gap: the difference between the performance of the same
2.6K