Log inSign up
Mixedbread
127 posts
Image
user avatar
Mixedbread
@mixedbreadai
Building retrieval for agents.
San Francisco, CA
mixedbread.com
Joined March 2024
9
Following
3,420
Followers
  • Pinned
    user avatar
    Mixedbread
    @mixedbreadai
    Mar 12
    Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.
    Image
    203K
  • Mixedbread reposted
    user avatar
    Alexander Martin
    @alexdmartin314
    3h
    @mixedbreadai's wholembed-v3 set a high bar for video retrieval as a single-stage retriever on our MAGMaR shared task! To beat it, both C2F-RAG and MARQUIS needed multi-stage pipelines with reasoning-based reranking on top. That's a strong model.
    Image
    user avatar
    MAGMaR
    @MAGMaR_workshop
    7h
    Replying to @MAGMaR_workshop
    The top 2 systems are C2F-RAG by Dai et al, using text captions + LLM-based cognitive reranking, and MARQUIS by @derangineer et al, using query decomposition and video native reranking.
    1.5K
  • user avatar
    Mixedbread
    @mixedbreadai
    10h
    New: Bring your own bucket Mixedbread is built on top of object storage. Now that storage can be yours. Your data stays in a bucket you control. Mixedbread indexes and searches with zero content retention on our side. For enterprise teams that need control, compliance, audit
    Image
    00:00
    2.3K
    user avatar
    Mixedbread
    @mixedbreadai
    10h
    Powerful search in your control:
    Image
    717
  • user avatar
    Mixedbread
    @mixedbreadai
    Jun 10
    New: Metadata explorer Adding metadata to files enables filtering during search. Now, you can browse metadata fields and values across your store.
    Image
    00:00
    1.7K
    user avatar
    Mixedbread
    @mixedbreadai
    Jun 10
    Agents can inspect file metadata in a store to understand available filters. docs: mixedbread.com/docs/stores/se…
    Image
    263
  • user avatar
    Mixedbread
    @mixedbreadai
    Jun 2
    By now, everyone knows that single-vector embedding models are hugely limiting for modern workflows. But they contain than you think: you can extract sparse Latent Terms from them. And it turns out that BM25 is all you need to turn this vocabulary into a strong retriever.
    Image
    00:00
    40K
    user avatar
    Mixedbread
    @mixedbreadai
    Jun 2
    Replying to @mixedbreadai
    Having language-adjacent properties means that tools designed for lexical approaches "just work". BM25, always refusing to exit the scene, is strong here: applied over the Latent Terms extracted from nomic-embed-v1.5, it results in a near state-of-the-art sparse retriever.
    Image
    2.6K
    user avatar
    Mixedbread
    @mixedbreadai
    Jun 2
    Read more here:
    Dense Retrievers Know More Than They Can Express
    Dense Retrievers Know More Than They Can Express
    From mixedbread.com
    1.4K
  • user avatar
    Mixedbread
    @mixedbreadai
    May 27
    New: grep for exact matching grep → keyword / regex matching search → fine-grained semantic retrieval Works across uploaded content, including text, PDFs (OCR) and audio/video (transcription). Give your agents both retrieval primitives to perform at their best.
    Image
    5.5K
    user avatar
    Mixedbread
    @mixedbreadai
    May 27
    docs:
    Grep Store Chunks
    Grep Store Chunks
    From mixedbread.com
    617
  • user avatar
    Mixedbread
    @mixedbreadai
    May 25
    Feature: Native agentic search on Mixedbread Search with auto-planning, exploration, and multi-hop reasoning across documents. Built for: - evidence discovery - exhaustive search - cross-document reasoning → Topped MADQA @snowflake with 93.4% accuracy across 18,000 PDF
    Image
    8.9K
    user avatar
    Mixedbread
    @mixedbreadai
    May 25
    Steer search with more instructions. Docs: mixedbread.com/docs/stores/se…
    Image
    839
    user avatar
    Mixedbread
    @mixedbreadai
    May 25
    View and export traces directly from your dashboard:
    Image
    652
  • user avatar
    Mixedbread
    @mixedbreadai
    May 24
    New: Traces for Mixedbread agentic search See every search call an agent makes directly in the dashboard, and tune instructions for better retrieval quality.
    Image
    00:00
    6.6K
  • user avatar
    Mixedbread
    @mixedbreadai
    May 11
    Introducing mxbai-rerank-v3-listwise: reranking that goes beyond binary relevance. It reads the whole candidate set, resolves conflicts, and ranks by directives like recency, source priority, and multi-step rules. +11% NDCG@10 on average across multiple domains, modalities, and
    Image
    25K
    user avatar
    Mixedbread
    @mixedbreadai
    May 11
    Read more here:
    Ranking Beyond Binary Relevance: mxbai-rerank-v3-listwise
    Ranking Beyond Binary Relevance: mxbai-rerank-v3-listwise
    From mixedbread.com
    1.2K
  • user avatar
    Mixedbread
    @mixedbreadai
    Mar 24
    Replying to @mixedbreadai
    You can read more about this in our blog post, where we present more detailed benchmark results and elaborate on the nature of the three benchmarks, and why we're very proud to be topping all three of them.
    Closing the Oracle Gap for Your Agents
    Closing the Oracle Gap for Your Agents
    From mixedbread.com
    3K
    user avatar
    Mixedbread
    @mixedbreadai
    Mar 24
    Mixedbread search's ultimate aim is to power all workflows, no matter their modality or language. Try it for your own knowledge-intensive tasks today:
    Mixedbread
    Mixedbread
    From mixedbread.com
    2.2K
  • user avatar
    Mixedbread
    @mixedbreadai
    Mar 24
    Replying to @mixedbreadai
    Agents are increasingly performing knowledge work: Deep Research, generating financial reports, reasoning across historical knowledgebases... Many high-quality benchmarks now focus on evaluating such tasks, among which BrowseComp-Plus, @databricks's OfficeQA, or @Snowflake's
    3K
    user avatar
    Mixedbread
    @mixedbreadai
    Mar 24
    So what is the Oracle gap? Optimising agentic systems is complicated. There are many individual components you need to get just right. Retrieval is one of those components, and its impact is best measured by the Oracle gap: the difference between the performance of the same
    2.6K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement