Qdrant - Vector Search Engine

Distance-based data exploration

info@qdrant.tech (Andrey Vasnetsov) — Tue, 11 Mar 2025 12:00:00 +0300

Hidden Structure

When working with large collections of documents, images, or other arrays of unstructured data, it often becomes useful to understand the big picture. Examining data points individually is not always the best way to grasp the structure of the data.

Datapoints without context, pretty much useless

As numbers in a table obtain meaning when plotted on a graph, visualising distances (similar/dissimilar) between unstructured data items can reveal hidden structures and patterns.

Modern Sparse Neural Retrieval: From Theory to Practice

info@qdrant.tech (Andrey Vasnetsov) — Wed, 23 Oct 2024 00:00:00 +0000

Finding enough time to study all the modern solutions while keeping your production running is rarely feasible. Dense retrievers, hybrid retrievers, late interaction… How do they work, and where do they fit best? If only we could compare retrievers as easily as products on Amazon!

We explored the most popular modern sparse neural retrieval models and broke them down for you. By the end of this article, you’ll have a clear understanding of the current landscape in sparse neural retrieval and how to navigate through complex, math-heavy research papers with sky-high NDCG scores without getting overwhelmed.

Qdrant Summer of Code 2024 - ONNX Cross Encoders in Python

info@qdrant.tech (Andrey Vasnetsov) — Mon, 14 Oct 2024 08:00:00 +0300

Introduction

Hi everyone! I’m Huong (Celine) Hoang, and I’m thrilled to share my experience working at Qdrant this summer as part of their Summer of Code 2024 program. During my internship, I worked on integrating cross-encoders into the FastEmbed library for re-ranking tasks. This enhancement widened the capabilities of the Qdrant ecosystem, enabling developers to build more context-aware search applications, such as question-answering systems, using Qdrant’s suite of libraries.

This project was both technically challenging and rewarding, pushing me to grow my skills in handling large-scale ONNX (Open Neural Network Exchange) model integrations, tokenization, and more. Let me take you through the journey, the lessons learned, and where things are headed next.

What is a Vector Database?

info@qdrant.tech (Andrey Vasnetsov) — Wed, 09 Oct 2024 09:29:33 -0300

An Introduction to Vector Databases

Most of the millions of terabytes of data we generate each day is unstructured. Think of the meal photos you snap, the PDFs shared at work, or the podcasts you save but may never listen to. None of it fits neatly into rows and columns.

Unstructured data lacks a strict format or schema, making it challenging for conventional databases to manage. Yet, this unstructured data holds immense potential for AI, machine learning, and modern search engines.

What is Vector Quantization?

info@qdrant.tech (Andrey Vasnetsov) — Wed, 25 Sep 2024 09:29:33 -0300

Vector quantization is a data compression technique used to reduce the size of high-dimensional data. Compressing vectors reduces memory usage while maintaining nearly all of the essential information. This method allows for more efficient storage and faster search operations, particularly in large datasets.

When working with high-dimensional vectors, such as embeddings from providers like OpenAI, a single 1536-dimensional vector requires 6 KB of memory.

With 1 million vectors needing around 6 GB of memory, as your dataset grows to multiple millions of vectors, the memory and processing demands increase significantly.

Fine-Tuning Sparse Embeddings for E-Commerce Search | Part 1: Why Sparse Embeddings Beat BM25

info@qdrant.tech (Andrey Vasnetsov) — Mon, 09 Mar 2026 00:00:00 +0000

This is Part 1 of a 5-part series on fine-tuning sparse embeddings for e-commerce search. We’ll go from “why bother?” to a production system that beats BM25 by 29%.

Series:

Part 1: Why Sparse Embeddings Beat BM25 (here)
Part 2: Training on Modal
Part 3: Evaluation & Hard Negatives
Part 4: Specialization vs Generalization
Part 5: From Research to Product

Search “iPhone 15 Pro Max 256GB” on a dense embedding system and it happily returns the 128GB model. The semantic similarity is high - it’s the same phone! But the customer specified 256GB for a reason. In e-commerce, the details aren’t noise. They’re the whole point.

Vector Search Resource Optimization Guide

info@qdrant.tech (Andrey Vasnetsov) — Sun, 09 Feb 2025 00:00:00 +0000

What’s in This Guide?

Resource Management Strategies: If you are trying to scale your app on a budget - this is the guide for you. We will show you how to avoid wasting compute resources and get the maximum return on your investment.

Performance Improvement Tricks: We’ll dive into advanced techniques like indexing, compression, and partitioning. Our tips will help you get better results at scale, while reducing total resource expenditure.

A Complete Guide to Filtering in Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Tue, 10 Sep 2024 00:00:00 +0000

Imagine you sell computer hardware. To help shoppers easily find products on your website, you need to have a user-friendly search engine.

If you’re selling computers and have extensive data on laptops, desktops, and accessories, your search feature should guide customers to the exact device they want - or at least a very similar match.

When storing data in Qdrant, each product is a point, consisting of an id, a vector and payload:

Qdrant Internals: Immutable Data Structures

info@qdrant.tech (Andrey Vasnetsov) — Tue, 20 Aug 2024 10:45:00 +0200

Data Structures 101

Those who took programming courses might remember that there is no such thing as a universal data structure. Some structures are good at accessing elements by index (like arrays), while others shine in terms of insertion efficiency (like linked lists).

Hardware-optimized data structure

However, when we move from theoretical data structures to real-world systems, and particularly in performance-critical areas such as vector search, things become more complex. Big-O notation provides a good abstraction, but it doesn’t account for the realities of modern hardware: cache misses, memory layout, disk I/O, and other low-level considerations that influence actual performance.

Fine-Tuning Sparse Embeddings for E-Commerce Search | Part 2: Training SPLADE on Modal

info@qdrant.tech (Andrey Vasnetsov) — Mon, 09 Mar 2026 00:00:00 +0000

This is Part 2 of a 5-part series on fine-tuning sparse embeddings for e-commerce search. In Part 1, we covered why sparse embeddings beat BM25 for e-commerce. Now we build the training pipeline.

Series:

Part 1: Why Sparse Embeddings Beat BM25
Part 2: Training SPLADE on Modal (here)
Part 3: Evaluation & Hard Negatives
Part 4: Specialization vs Generalization
Part 5: From Research to Product

In the last article we made the case for sparse embeddings in e-commerce search. Now we write the code. All source code is available in the GitHub repo, and you can try the fine-tuned models on HuggingFace. Want to skip straight to fine-tuning on your own data? See the sparse-finetune CLI. By the end of this piece, you’ll have a SPLADE model trained on Amazon’s ESCI dataset, running on Modal’s serverless GPUs, with checkpoints saved to persistent storage.

Fine-Tuning Sparse Embeddings for E-Commerce Search | Part 3: Evaluation and Hard Negatives

info@qdrant.tech (Andrey Vasnetsov) — Mon, 09 Mar 2026 00:00:00 +0000

This is Part 3 of a 5-part series on fine-tuning sparse embeddings for e-commerce search. In Part 2, we trained a SPLADE model on Modal. Now we evaluate it and push further with hard negative mining.

Series:

Part 1: Why Sparse Embeddings Beat BM25
Part 2: Training SPLADE on Modal
Part 3: Evaluation & Hard Negatives (here)
Part 4: Specialization vs Generalization
Part 5: From Research to Product

We have a trained SPLADE model sitting on a Modal volume (or grab it from HuggingFace). Now comes the question that matters: is it actually better? In this article, we’ll index products into Qdrant, run retrieval benchmarks, implement hard negative mining, and dig into what the model learned. Full evaluation code is in the GitHub repo. To run this entire pipeline on your own data, see the sparse-finetune CLI.

Fine-Tuning Sparse Embeddings for E-Commerce Search | Part 4: Specialization vs Generalization

info@qdrant.tech (Andrey Vasnetsov) — Mon, 09 Mar 2026 00:00:00 +0000

This is Part 4 of a 5-part series on fine-tuning sparse embeddings for e-commerce search. In Part 3, we evaluated our model and implemented hard negative mining. Now we test how well it generalizes.

Series:

Part 1: Why Sparse Embeddings Beat BM25
Part 2: Training SPLADE on Modal
Part 3: Evaluation & Hard Negatives
Part 4: Specialization vs Generalization (here)
Part 5: From Research to Product

We’ve built a SPLADE model that beats BM25 by 28% on Amazon ESCI. But here’s the question that determines whether this is a lab result or a production strategy: does it work on data it wasn’t trained on? Full code is on GitHub, you can try the fine-tuned models on HuggingFace, or fine-tune on your own catalog with the sparse-finetune CLI.

Fine-Tuning Sparse Embeddings for E-Commerce Search | Part 5: From Research to Product

info@qdrant.tech (Andrey Vasnetsov) — Mon, 09 Mar 2026 00:00:00 +0000

This is Part 5 of a series on fine-tuning sparse embeddings for e-commerce search. Parts 1–4 built the pipeline from scratch. This article packages it into a tool anyone can use.

Series:

Part 1: Why Sparse Embeddings Beat BM25
Part 2: Training SPLADE on Modal
Part 3: Evaluation & Hard Negatives
Part 4: Specialization vs Generalization
Part 5: From Research to Product (here)

In Parts 1 through 4, we built a SPLADE fine-tuning pipeline piece by piece: data loading, Modal GPU training, Qdrant evaluation, ANCE hard negative mining, cross-domain experiments. The code worked. The results were strong: 28% over BM25 on Amazon ESCI.

miniCOIL: on the Road to Usable Sparse Neural Retrieval

info@qdrant.tech (Andrey Vasnetsov) — Tue, 13 May 2025 00:00:00 +0300

Have you ever heard of sparse neural retrieval? If so, have you used it in production?

It’s a field with excellent potential – who wouldn’t want to use an approach that combines the strengths of dense and term-based text retrieval? Yet it’s not so popular. Is it due to the common curse of “What looks good on paper is not going to work in practice”??

This article describes our path towards sparse neural retrieval as it should be – lightweight term-based retrievers capable of distinguishing word meanings.

Relevance Feedback in Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Fri, 20 Feb 2026 00:00:00 +0300

A year ago, we dropped a statement-bomb in the “Relevance Feedback in Information Retrieval” article and then went silent.

We claimed that even though the information retrieval research field has proposed many useful mechanisms for increasing the relevance of search results, none of them made it to the neural search industry, simply because these approaches are not scalable.

Certainly, there are methods widely used to improve the relevance of retrieved results: query rewriting, for example. Yet none of the vector search solutions out there have tried to use the possibilities that come with full access to the vector search index: traversing it in the direction of relevance, instead of guessing where to shoot the query to the vector space.

Relevance Feedback in Informational Retrieval

info@qdrant.tech (Andrey Vasnetsov) — Thu, 27 Mar 2025 00:00:00 +0300

A problem well stated is a problem half solved.

This quote applies as much to life as it does to information retrieval.

With a well-formulated query, retrieving the relevant document becomes trivial. In reality, however, most users struggle to precisely define what they are searching for.

While users may struggle to formulate a perfect request — especially in unfamiliar topics — they can easily judge whether a retrieved answer is relevant or not.

Built for Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 17 Feb 2025 10:00:00 +0300

Any problem with even a bit of complexity requires a specialized solution. You can use a Swiss Army knife to open a bottle or poke a hole in a cardboard box, but you will need an axe to chop wood — the same goes for software.

In this article, we will describe the unique challenges vector search poses and why a dedicated solution is the best way to tackle them.

Any* Embedding Model Can Become a Late Interaction Model... If You Give It a Chance!

info@qdrant.tech (Andrey Vasnetsov) — Wed, 14 Aug 2024 00:00:00 +0000

* At least any open-source model, since you need access to its internals.

You Can Adapt Dense Embedding Models for Late Interaction

Qdrant 1.10 introduced support for multi-vector representations, with late interaction being a prominent example of this model. In essence, both documents and queries are represented by multiple vectors, and identifying the most relevant documents involves calculating a score based on the similarity between the corresponding query and document embeddings. If you’re not familiar with this paradigm, our updated Hybrid Search article explains how multi-vector representations can enhance retrieval quality.

Optimizing Memory for Bulk Uploads

info@qdrant.tech (Andrey Vasnetsov) — Thu, 13 Feb 2025 00:00:00 +0000

Optimizing Memory Consumption During Bulk Uploads

Efficient memory management is a constant challenge when you’re dealing with large-scale vector data. In high-volume ingestion scenarios, even seemingly minor configuration choices can significantly impact stability and performance.

Let’s take a look at the best practices and recommendations to help you optimize memory usage during bulk uploads in Qdrant. We’ll cover scenarios with both dense and sparse vectors, helping your deployments remain performant even under high load and avoiding out-of-memory errors.

Introducing Gridstore: Qdrant's Custom Key-Value Store

info@qdrant.tech (Andrey Vasnetsov) — Wed, 05 Feb 2025 00:00:00 +0000

Why We Built Our Own Storage Engine

Databases need a place to store and retrieve data. That’s what Qdrant’s key-value storage does—it links keys to values.

When we started building Qdrant, we needed to pick something ready for the task. So we chose RocksDB as our embedded key-value store.

It is mature, reliable, and well-documented.

Over time, we ran into issues. Its architecture required compaction (uses LSMT), which caused random latency spikes. It handles generic keys, while we only use it for sequential IDs. Having lots of configuration options makes it versatile, but accurately tuning it was a headache. Finally, interoperating with C++ slowed us down (although we will still support it for quite some time 😭).

What is Agentic RAG? Building Agents with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Fri, 22 Nov 2024 00:00:00 +0000

Standard Retrieval Augmented Generation follows a predictable, linear path: receive a query, retrieve relevant documents, and generate a response. In many cases that might be enough to solve a particular problem. In the worst case scenario, your LLM will just decide to not answer the question, because the context does not provide enough information.

On the other hand, we have agents. These systems are given more freedom to act, and can take multiple non-linear steps to achieve a certain goal. There isn’t a single definition of what an agent is, but in general, it is an application that uses LLM and usually some tools to communicate with the outside world. LLMs are used as decision-makers which decide what action to take next. Actions can be anything, but they are usually well-defined and limited to a certain set of possibilities. One of these actions might be to query a vector database, like Qdrant, to retrieve relevant documents, if the context is not enough to make a decision. However, RAG is just a single tool in the agent’s arsenal.

Hybrid Search Revamped - Building with Qdrant's Query API

info@qdrant.tech (Andrey Vasnetsov) — Thu, 25 Jul 2024 00:00:00 +0000

It’s been over a year since we published the original article on how to build a hybrid search system with Qdrant. The idea was straightforward: combine the results from different search methods to improve retrieval quality. Back in 2023, you still needed to use an additional service to bring lexical search capabilities and combine all the intermediate results. Things have changed since then. Once we introduced support for sparse vectors, the additional search service became obsolete, but you were still required to combine the results from different methods on your end.

What is RAG: Understanding Retrieval-Augmented Generation

info@qdrant.tech (Andrey Vasnetsov) — Tue, 19 Mar 2024 09:29:33 -0300

Retrieval-augmented generation (RAG) integrates external information retrieval into the process of generating responses by Large Language Models (LLMs). It searches a database for information beyond its pre-trained knowledge base, significantly improving the accuracy and relevance of the generated responses.

Language models have exploded on the internet ever since ChatGPT came out, and rightfully so. They can write essays, code entire programs, and even make memes (though we’re still deciding on whether that’s a good thing).

BM42: New Baseline for Hybrid Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jul 2024 12:00:00 +0300

For the last 40 years, BM25 has served as the standard for search engines. It is a simple yet powerful algorithm that has been used by many search engines, including Google, Bing, and Yahoo.

Qdrant 1.8.0: Enhanced Search Capabilities for Better Results

info@qdrant.tech (Andrey Vasnetsov) — Wed, 06 Mar 2024 00:00:00 -0800

Unlocking Next-Level Search: Exploring Qdrant 1.8.0’s Advanced Search Capabilities

Qdrant 1.8.0 is out!. This time around, we have focused on Qdrant’s internals. Our goal was to optimize performance so that your existing setup can run faster and save on compute. Here is what we’ve been up to:

Faster sparse vectors: Hybrid search is up to 16x faster now!
CPU resource management: You can allocate CPU threads for faster indexing.
Better indexing performance: We optimized text indexing on the backend.

Faster search with sparse vectors

Search throughput is now up to 16 times faster for sparse vectors. If you are using Qdrant for hybrid search, this means that you can now handle up to sixteen times as many queries. This improvement comes from extensive backend optimizations aimed at increasing efficiency and capacity.

Optimizing RAG Through an Evaluation-Based Methodology

info@qdrant.tech (Andrey Vasnetsov) — Wed, 12 Jun 2024 00:00:00 +0000

In today’s fast-paced, information-rich world, AI is revolutionizing knowledge management. The systematic process of capturing, distributing, and effectively using knowledge within an organization is one of the fields in which AI provides exceptional value today.

The potential for AI-powered knowledge management increases when leveraging Retrieval Augmented Generation (RAG), a methodology that enables LLMs to access a vast, diverse repository of factual information from knowledge stores, such as vector databases.

This process enhances the accuracy, relevance, and reliability of generated text, thereby mitigating the risk of faulty, incorrect, or nonsensical results sometimes associated with traditional LLMs. This method not only ensures that the answers are contextually relevant but also up-to-date, reflecting the latest insights and data available.

Is RAG Dead? The Role of Vector Databases in Vector Search | Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 27 Feb 2024 00:00:00 +0000

Is RAG Dead? The Role of Vector Databases in AI Efficiency and Vector Search

When Anthropic came out with a context window of 100K tokens, they said: “Vector search is dead. LLMs are getting more accurate and won’t need RAG anymore.”

Google’s Gemini 1.5 now offers a context window of 10 million tokens. Their supporting paper claims victory over accuracy issues, even when applying Greg Kamradt’s NIAH methodology.

It’s over. RAG (Retrieval Augmented Generation) must be completely obsolete now. Right?

Optimizing OpenAI Embeddings: Enhance Efficiency with Qdrant's Binary Quantization

info@qdrant.tech (Andrey Vasnetsov) — Wed, 21 Feb 2024 13:12:08 -0800

OpenAI Ada-003 embeddings are a powerful tool for natural language processing (NLP). However, the size of the embeddings are a challenge, especially with real-time search and retrieval. In this article, we explore how you can use Qdrant’s Binary Quantization to enhance the performance and efficiency of OpenAI embeddings.

In this post, we discuss:

The significance of OpenAI embeddings and real-world challenges.
Qdrant’s Binary Quantization, and how it can improve the performance of OpenAI embeddings
Results of an experiment that highlights improvements in search efficiency and accuracy
Implications of these findings for real-world applications
Best practices for leveraging Binary Quantization to enhance OpenAI embeddings

If you’re new to Binary Quantization, consider reading our article which walks you through the concept and how to use it with Qdrant

How to Implement Multitenancy and Custom Sharding in Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 06 Feb 2024 13:21:00 +0000

Scaling Your Machine Learning Setup: The Power of Multitenancy and Custom Sharding in Qdrant

We are seeing the topics of multitenancy and distributed deployment pop-up daily on our Discord support channel. This tells us that many of you are looking to scale Qdrant along with the rest of your machine learning setup.

Whether you are building a bank fraud-detection system, RAG for e-commerce, or services for the federal government - you will need to leverage a multitenant architecture to scale your product. In the world of SaaS and enterprise apps, this setup is the norm. It will considerably increase your application’s performance and lower your hosting costs.

Data Privacy with Qdrant: Implementing Role-Based Access Control (RBAC)

info@qdrant.tech (Andrey Vasnetsov) — Tue, 18 Jun 2024 08:00:00 -0300

Data stored in vector databases is often proprietary to the enterprise and may include sensitive information like customer records, legal contracts, electronic health records (EHR), financial data, and intellectual property. Moreover, strong security measures become critical to safeguarding this data. If the data stored in a vector database is not secured, it may open a vulnerability known as “embedding inversion attack,” where malicious actors could potentially reconstruct the original data from the embeddings themselves.

Discovery needs context

info@qdrant.tech (Andrey Vasnetsov) — Wed, 31 Jan 2024 08:00:00 -0300

Discovery needs context

When Christopher Columbus and his crew sailed to cross the Atlantic Ocean, they were not looking for the Americas. They were looking for a new route to India because they were convinced that the Earth was round. They didn’t know anything about a new continent, but since they were going west, they stumbled upon it.

They couldn’t reach their target, because the geography didn’t let them, but once they realized it wasn’t India, they claimed it a new “discovery” for their crown. If we consider that sailors need water to sail, then we can establish a context which is positive in the water, and negative on land. Once the sailor’s search was stopped by the land, they could not go any further, and a new route was found. Let’s keep these concepts of target and context in mind as we explore the new functionality of Qdrant: Discovery search.

What are Vector Embeddings? - Revolutionize Your Search Experience

info@qdrant.tech (Andrey Vasnetsov) — Tue, 06 Feb 2024 15:29:33 -0300

Embeddings are numerical machine learning representations of the semantic of the input data. They capture the meaning of complex, high-dimensional data, like text, images, or audio, into vectors. Enabling algorithms to process and analyze the data more efficiently.

You know when you’re scrolling through your social media feeds and the content just feels incredibly tailored to you? There’s the news you care about, followed by a perfect tutorial with your favorite tech stack, and then a meme that makes you laugh so hard you snort.

What is a Sparse Vector? How to Achieve Vector-based Hybrid Search

info@qdrant.tech (Andrey Vasnetsov) — Sat, 09 Dec 2023 13:00:00 +0300

Think of a library with a vast index card system. Each index card only has a few keywords marked out (sparse vector) of a large possible set for each book (document). This is what sparse vectors enable for text.

What are sparse and dense vectors?

Sparse vectors are like the Marie Kondo of data—keeping only what sparks joy (or relevance, in this case).

Consider a simplified example of 2 documents, each with 200 words. A dense vector would have several hundred non-zero values, whereas a sparse vector could have, much fewer, say only 20 non-zero values.

Qdrant 1.7.0 has just landed!

info@qdrant.tech (Andrey Vasnetsov) — Sun, 10 Dec 2023 10:00:00 +0000

Please welcome the long-awaited Qdrant 1.7.0 release. Except for a handful of minor fixes and improvements, this release brings some cool brand-new features that we are excited to share! The latest version of your favorite vector search engine finally supports sparse vectors. That’s the feature many of you requested, so why should we ignore it? We also decided to continue our journey with vector similarity beyond search. The new Discovery API covers some utterly new use cases. We’re more than excited to see what you will build with it! But there is more to it! Check out what’s new in Qdrant 1.7.0!

Deliver Better Recommendations with Qdrant’s new API

info@qdrant.tech (Andrey Vasnetsov) — Wed, 25 Oct 2023 09:46:00 +0000

The most popular use case for vector search engines, such as Qdrant, is Semantic search with a single query vector. Given the query, we can vectorize (embed) it and find the closest points in the index. But Vector Similarity beyond Search does exist, and recommendation systems are a great example. Recommendations might be seen as a multi-aim search, where we want to find items close to positive and far from negative examples. This use of vector databases has many applications, including recommendation systems for e-commerce, content, or even dating apps.

Vector Search as a dedicated service

info@qdrant.tech (Andrey Vasnetsov) — Thu, 30 Nov 2023 10:00:00 +0300

Ever since the data science community discovered that vector search significantly improves LLM answers, various vendors and enthusiasts have been arguing over the proper solutions to store embeddings.

Some say storing them in a specialized engine (aka vector database) is better. Others say that it’s enough to use plugins for existing databases.

Here are just a few of them.

This article presents our vision and arguments on the topic . We will:

FastEmbed: Qdrant's Efficient Python Library for Embedding Generation

info@qdrant.tech (Andrey Vasnetsov) — Wed, 18 Oct 2023 10:00:00 +0300

Data Science and Machine Learning practitioners often find themselves navigating through a labyrinth of models, libraries, and frameworks. Which model to choose, what embedding size, and how to approach tokenizing, are just some questions you are faced with when starting your work. We understood how many data scientists wanted an easier and more intuitive means to do their embedding work. This is why we built FastEmbed, a Python library engineered for speed, efficiency, and usability. We have created easy to use default workflows, handling the 80% use cases in NLP embedding.

Google Summer of Code 2023 - Polygon Geo Filter for Qdrant Vector Database

info@qdrant.tech (Andrey Vasnetsov) — Thu, 12 Oct 2023 08:00:00 +0300

Introduction

Greetings, I’m Zein Wen, and I was a Google Summer of Code 2023 participant at Qdrant. I got to work with an amazing mentor, Arnaud Gourlay, on enhancing the Qdrant Geo Polygon Filter. This new feature allows users to refine their query results using polygons. As the latest addition to the Geo Filter family of radius and rectangle filters, this enhancement promises greater flexibility in querying geo data, unlocking interesting new use cases.

Binary Quantization - Vector Search, 40x Faster

info@qdrant.tech (Andrey Vasnetsov) — Mon, 18 Sep 2023 13:00:00 +0300

Optimizing High-Dimensional Vectors with Binary Quantization

Qdrant is built to handle typical scaling challenges: high throughput, low latency and efficient indexing. Binary quantization (BQ) is our latest attempt to give our customers the edge they need to scale efficiently. This feature is particularly excellent for collections with large vector lengths and a large number of points.

Our results are dramatic: Using BQ will reduce your memory consumption and improve retrieval speeds by up to 40x.

Food Discovery Demo

info@qdrant.tech (Andrey Vasnetsov) — Tue, 05 Sep 2023 11:32:00 +0000

Not every search journey begins with a specific destination in mind. Sometimes, you just want to explore and see what’s out there and what you might like. This is especially true when it comes to food. You might be craving something sweet, but you don’t know what. You might be also looking for a new dish to try, and you just want to see the options available. In these cases, it’s impossible to express your needs in a textual query, as the thing you are looking for is not yet defined. Qdrant’s semantic search for images is useful when you have a hard time expressing your tastes in words.

Google Summer of Code 2023 - Web UI for Visualization and Exploration

info@qdrant.tech (Andrey Vasnetsov) — Mon, 28 Aug 2023 08:00:00 +0300

Introduction

Hello everyone! My name is Kartik Gupta, and I am thrilled to share my coding journey as part of the Google Summer of Code 2023 program. This summer, I had the incredible opportunity to work on an exciting project titled “Web UI for Visualization and Exploration” for Qdrant, a vector search engine. In this article, I will take you through my experience, challenges, and achievements during this enriching coding journey.

Qdrant Summer of Code 2024 - WASM based Dimension Reduction

info@qdrant.tech (Andrey Vasnetsov) — Sat, 31 Aug 2024 10:39:48 +0000

Introduction

Hello, everyone! I’m Jishan Bhattacharya, and I had the incredible opportunity to intern at Qdrant this summer as part of the Qdrant Summer of Code 2024. Under the mentorship of Andrey Vasnetsov, I dived into the world of performance optimization, focusing on enhancing vector visualization using WebAssembly (WASM). In this article, I’ll share the insights, challenges, and accomplishments from my journey — one filled with learning, experimentation, and plenty of coding adventures.

Semantic Search As You Type

info@qdrant.tech (Andrey Vasnetsov) — Mon, 14 Aug 2023 00:00:00 +0100

Qdrant is one of the fastest vector search engines out there, so while looking for a demo to show off, we came upon the idea to do a search-as-you-type box with a fully semantic search backend. Now we already have a semantic/keyword hybrid search on our website. But that one is written in Python, which incurs some overhead for the interpreter. Naturally, I wanted to see how fast I could go using Rust.

Vector Similarity: Going Beyond Full-Text Search | Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 08 Aug 2023 08:00:00 +0300

Vector Similarity: Unleashing Data Insights Beyond Traditional Search

When making use of unstructured data, there are traditional go-to solutions that are well-known for developers:

Full-text search when you need to find documents that contain a particular word or phrase.
Vector search when you need to find documents that are semantically similar to a given query.

Sometimes people mix those two approaches, so it might look like the vector similarity is just an extension of full-text search. However, in this article, we will explore some promising new techniques that can be used to expand the use-case of unstructured data and demonstrate that vector similarity creates its own stack of data exploration tools.

Serverless Semantic Search

info@qdrant.tech (Andrey Vasnetsov) — Wed, 12 Jul 2023 10:00:00 +0100

Do you want to insert a semantic search function into your website or online app? Now you can do so - without spending any money! In this example, you will learn how to create a free prototype search engine for your own non-commercial purposes.

Ingredients

A Rust toolchain
cargo lambda (install via package manager, download binary or cargo install cargo-lambda)
The AWS CLI
Qdrant instance (free tier available)
An embedding provider service of your choice (see our Embeddings docs. You may be able to get credits from AI Grant, also Cohere has a rate-limited non-commercial free tier)
AWS Lambda account (12-month free tier available)

What you’re going to build

You’ll combine the embedding provider and the Qdrant instance to a neat semantic search, calling both services from a small Lambda function.

Bulk Operations

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Bulk Upload Vectors to a Qdrant Collection

Time: 20 min	Level: Intermediate

Uploading a large-scale dataset fast might be a challenge, but Qdrant has a few tricks to help you with that.

The first important detail about data uploading is that the bottleneck is usually located on the client side, not on the server side. This means that if you are uploading a large dataset, you should prefer a high-performance client library.

Getting Started

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

How ColPali Models Work

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 2

How ColPali Models Work

ColPali extends the late interaction paradigm from text to visual documents. It can process PDFs, images, and scanned documents, generating multi-vector representations that capture both textual and visual information.

Understanding ColPali’s architecture helps you leverage its full potential for multi-modal document retrieval.

Follow along in Colab:

From Text to Visual Documents

What about documents that aren’t just text? PDFs often contain diagrams, tables, charts, equations, and complex layouts where the visual presentation carries as much meaning as the text itself.

How vector search should be benchmarked?

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Benchmarking Vector Search

At Qdrant, performance is the top-most priority. We always make sure that we use system resources efficiently so you get the fastest and most accurate results at the cheapest cloud costs. So all of our decisions from choosing Rust, io optimisations, serverless support, binary quantization, to our fastembed library are all based on our principle. In this article, we will compare how Qdrant performs against the other vector search engines.

Here are the principles we followed while designing these benchmarks:

Late Interaction Basics

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 1

Late Interaction Basics

When building a search system, one fundamental question emerges: when should a query and document interact? The answer to this question may affect both the quality of search results and the system’s scalability.

This lesson introduces the late interaction paradigm - the foundation of multi-vector search - and explores how it compares to other approaches.

Follow along in Colab:

Multi-Stage Retrieval with Universal Query API

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 3

Multi-Stage Retrieval with Universal Query API

The most effective production deployments combine multiple optimization techniques in multi-stage pipelines. Fast approximate methods retrieve candidates, which are then reranked with higher-quality methods.

Qdrant’s Universal Query API makes it easy to build sophisticated multi-stage retrieval systems.

Follow along in Colab:

Why Multi-Stage Retrieval?

You’ve learned that multi-vector representations like ColBERT provide superior search quality compared to single-vector embeddings. But there’s a challenge: computing MaxSim for every document in a large collection is expensive.

Qdrant Fundamentals

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Frequently Asked Questions: General Topics


Vectors	Search	Collections	Compatibility	Cloud

Vectors

What is the maximum vector dimension supported by Qdrant?

In dense vectors, Qdrant supports up to 65,535 dimensions.

What is the maximum size of vector metadata that can be stored?

There is no inherent limitation on metadata size, but it should be optimized for performance and resource usage. Users can set upper limits in the configuration.

Qdrant Setup

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 0

Qdrant Setup

Before diving into multi-vector search, you need a running Qdrant instance. Whether you choose Qdrant Cloud for a managed solution or a local deployment, this lesson will get you up and running.

Multi-vector search requires specific collection configurations that differ from traditional single-vector setups. We’ll cover the essentials to prepare your environment.

Qdrant Cloud Setup (Recommended)

Qdrant Cloud is the fastest way to get started with multi-vector search. It provides a fully managed, production-ready vector database with automatic backups, high availability, and secure TLS connections. Both Qdrant Cloud and the open-source version provide the same feature set - Cloud simply handles the infrastructure for you.

Reranking for Better Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Improve Semantic Search with Reranking using Qdrant

In Retrieval-Augmented Generation (RAG) systems, irrelevant or missing information can throw off your model’s ability to produce accurate, meaningful outputs. One of the best ways to ensure you’re feeding your language model the most relevant, context-rich documents is through reranking. It’s a game-changer.

In this guide, we’ll dive into using reranking to boost the relevance of search results in Qdrant. We’ll start with an easy use case that leverages the Cohere Rerank model. Then, we’ll take it up a notch by exploring ColBERT for a more advanced approach. By the time you’re done, you’ll know how to implement hybrid search, fine-tune reranking models, and significantly improve your accuracy.

Role Management

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Role Management

💡 You can access this in Access Management > User & Role Management if available see this page for details.

A Role contains a set of permissions that define the ability to perform or control specific actions in Qdrant Cloud. Permissions are accessible through the Permissions tab in the Role Details page and offer fine-grained access control, logically grouped for easy identification.

Built-In Roles

Qdrant Cloud includes some built-in roles for common use-cases. The permissions for these built-in roles cannot be changed.

Setup Hybrid Cloud

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Creating a Hybrid Cloud Environment

The following instruction set will show you how to properly set up a Qdrant cluster in your Hybrid Cloud Environment.

You can also watch a video demo on how to set up a Hybrid Cloud Environment:

To learn how Hybrid Cloud works, read the overview document.

Prerequisites

Qdrant Hybrid Cloud is available as part of our Enterprise plan. To get access to Qdrant Hybrid Cloud, please contact us.

Setup Private Cloud

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Private Cloud Setup

Requirements

Kubernetes cluster: To install Qdrant Private Cloud, you need a standard compliant Kubernetes cluster. You can run this cluster in any cloud, on-premise or edge environment, with distributions that range from AWS EKS to VMWare vSphere. See Deployment Platforms for more information.
Storage: For storage, you need to set up the Kubernetes cluster with a Container Storage Interface (CSI) driver that provides block storage. For vertical scaling, the CSI driver needs to support volume expansion. For backups and restores, the driver needs to support CSI snapshots and restores.

Permissions: To install the Qdrant Kubernetes Operator you need to have cluster-admin access in your Kubernetes cluster.
Locations: By default, the Qdrant Operator Helm charts and container images are served from registry.cloud.qdrant.io.

Note: You can also mirror these images and charts into your own registry and pull them from there.

Introducing Qdrant 1.3.0

info@qdrant.tech (Andrey Vasnetsov) — Mon, 26 Jun 2023 00:00:00 +0000

A brand-new Qdrant 1.3.0 release comes packed with a plethora of new features, performance improvements and bux fixes:

Asynchronous I/O interface: Reduce overhead by managing I/O operations asynchronously, thus minimizing context switches.
Oversampling for Quantization: Improve the accuracy and performance of your queries while using Scalar or Product Quantization.
Grouping API lookup: Storage optimization method that lets you look for points in another collection using group ids.
Qdrant Web UI: A convenient dashboard to help you manage data stored in Qdrant.
Temp directory for Snapshots: Set a separate storage directory for temporary snapshots on a faster disk.
Other important changes

Your feedback is valuable to us, and are always tying to include some of your feature requests into our roadmap. Join our Discord community and help us build Qdrant!.

Single node benchmarks

info@qdrant.tech (Andrey Vasnetsov) — Tue, 23 Aug 2022 00:00:00 +0000

Observations

Most of the engines have improved since our last run. Both life and software have trade-offs but some clearly do better:

Qdrant achieves highest RPS and lowest latencies in almost all the scenarios, no matter the precision threshold and the metric we choose. It has also shown 4x RPS gains on one of the datasets.
Elasticsearch has become considerably fast for many cases but it’s very slow in terms of indexing time. It can be 10x slower when storing 10M+ vectors of 96 dimensions! (32mins vs 5.5 hrs)
Milvus is the fastest when it comes to indexing time and maintains good precision. However, it’s not on-par with others when it comes to RPS or latency when you have higher dimension embeddings or more number of vectors.
Redis is able to achieve good RPS but mostly for lower precision. It also achieved low latency with single thread, however its latency goes up quickly with more parallel requests. Part of this speed gain comes from their custom protocol.
Weaviate has improved the least since our last run.

How to read the results

Choose the dataset and the metric you want to check.
Select a precision threshold that would be satisfactory for your usecase. This is important because ANN search is all about trading precision for speed. This means in any vector search benchmark, two results must be compared only when you have similar precision. However most benchmarks miss this critical aspect.
The table is sorted by the value of the selected metric (RPS / Latency / p95 latency / Index time), and the first entry is always the winner of the category 🏆

Latency vs RPS

In our benchmark we test two main search usage scenarios that arise in practice.

Single node benchmarks (2022)

info@qdrant.tech (Andrey Vasnetsov) — Tue, 23 Aug 2022 00:00:00 +0000

This is an archived version of Single node benchmarks. Please refer to the new version here.

ColPali Family Overview

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 2

ColPali Family Overview

The ColPali is not only the name of a model. Still, it is also often used to refer to an entire family of models that convert images and text into multi-vector representations, based on Vision Language Models.

Let’s explore what the options are and which model to choose depending on the data you work with.

The ColPali family includes several model variants. When selecting a model for your application, you’ll need to consider factors like model size, supported languages, computational requirements, and licensing constraints - each variant offers different trade-offs along these dimensions.

Configuration

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Private Cloud Configuration

The Qdrant Private Cloud helm chart has several configuration options. The following YAML shows all configuration options with their default values:

operator:
 # Amount of replicas for the Qdrant operator (v2)
 replicaCount: 1

 image:
 # Image repository for the qdrant operator
 repository: registry.cloud.qdrant.io/qdrant/operator
 # Image pullPolicy
 pullPolicy: IfNotPresent
 # Overrides the image tag whose default is the chart appVersion.
 tag: ""

 imagePullSecrets:
 - name: qdrant-registry-creds

 nameOverride: ""
 fullnameOverride: "operator"

 # Service account configuration
 serviceAccount:
 create: true
 annotations: {}

 ## Additional labels to add to all resources
 customLabels: {}
 
 # Additional pod annotations
 podAnnotations: {}

 # pod security context
 podSecurityContext:
 runAsNonRoot: true
 runAsUser: 10001
 runAsGroup: 20001
 fsGroup: 30001

 # container security context
 securityContext:
 capabilities:
 drop:
 - ALL
 readOnlyRootFilesystem: true
 runAsNonRoot: true
 runAsUser: 10001
 runAsGroup: 20001
 allowPrivilegeEscalation: false
 seccompProfile:
 type: RuntimeDefault

 # Configuration for the Qdrant operator service to expose metrics
 service:
 enabled: true
 type: ClusterIP
 metricsPort: 9290

 # Configuration for the Qdrant operator service monitor to scrape metrics
 serviceMonitor:
 enabled: false

 # Resource requests and limits for the Qdrant operator
 resources: {}

 # Node selector for the Qdrant operator
 nodeSelector: {}

 # Tolerations for the Qdrant operator
 tolerations: []

 # Affinity configuration for the Qdrant operator
 affinity: {}

 watch:
 # If true, watches only the namespace where the Qdrant operator is deployed, otherwise watches the namespaces in watch.namespaces
 onlyReleaseNamespace: true
 # an empty list watches all namespaces.
 namespaces: []

 limitRBAC: true

 # Configuration for the Qdrant operator (v2)
 settings:
 # Does the operator run inside of a Kubernetes cluster (kubernetes) or outside (local)
 appEnvironment: kubernetes
 # The log level for the operator
 # Available options: DEBUG | INFO | WARN | ERROR
 logLevel: INFO
 # Metrics contains the operator config related the metrics
 metrics:
 # The port used for metrics
 port: 9290
 # Health contains the operator config related the health probe
 healthz:
 # The port used for the health probe
 port: 8285
 # Controller related settings
 controller:
 # The period a forced recync is done by the controller (if watches are missed / nothing happened)
 forceResyncPeriod: 10h
 # QPS indicates the maximum QPS to the master from this client.
 # Default is 200
 qps: 200
 # Maximum burst for throttle.
 # Default is 500.
 burst: 500
 # Features contains the settings for enabling / disabling the individual features of the operator
 features:
 # ClusterManagement contains the settings for qdrant (database) cluster management
 clusterManagement:
 # Whether or not the Qdrant cluster features are enabled.
 # If disabled, all other properties in this struct are disregarded. Otherwise, the individual features will be inspected.
 # Default is true.
 enable: true
 # The StorageClass used to make database and snapshot PVCs.
 # Default is nil, meaning the default storage class of Kubernetes.
 storageClass:
 # The StorageClass used to make database PVCs.
 # Default is nil, meaning the default storage class of Kubernetes.
 #database:
 # The StorageClass used to make snapshot PVCs.
 # Default is nil, meaning the default storage class of Kubernetes.
 #snapshot:
 #volumeAttributesClass:
 # driverName: ebs.csi.aws.com
 # default:
 # name: balanced
 # parameters:
 # iops: 3000
 # throughput: 125
 # template:
 # prefix: qdrant-vac
 # parameters: {}
 # Qdrant config contains settings specific for the database
 qdrant:
 # The config where to find the image for qdrant
 image:
 # The repository where to find the image for qdrant
 # Default is "qdrant/qdrant"
 repository: registry.cloud.qdrant.io/qdrant/qdrant
 # Docker image pull policy
 # Default "IfNotPresent", unless the tag is dev, master or latest. Then "Always"
 #pullPolicy:
 # Docker image pull secret name
 # This secret should be available in the namespace where the cluster is running
 # Default not set
 pullSecretName: qdrant-registry-creds
 # storage contains the settings for the storage of the Qdrant cluster
 storage:
 performance:
 # CPU budget, how many CPUs (threads) to allocate for an optimization job.
 # If 0 - auto selection, keep 1 or more CPUs unallocated depending on CPU size
 # If negative - subtract this number of CPUs from the available CPUs.
 # If positive - use this exact number of CPUs.
 optimizerCpuBudget: 0
 # Enable async scorer which uses io_uring when rescoring.
 # Only supported on Linux, must be enabled in your kernel.
 # See: <https://qdrant.tech/articles/io_uring/#and-what-about-qdrant>
 asyncScorer: false
 # Qdrant DB log level
 # Available options: DEBUG | INFO | WARN | ERROR
 # Default is "INFO"
 logLevel: INFO
 # Default Qdrant security context configuration
 securityContext:
 # Enable default security context
 # Default is false
 enabled: false
 # Default user for qdrant container
 # Default not set
 #user: 1000
 # Default fsGroup for qdrant container
 # Default not set
 #fsUser: 2000
 # Default group for qdrant container
 # Default not set
 #group: 3000
 # Network policies configuration for the Qdrant databases
 networkPolicies:
 # Whether or not NetworkPolicy management is enabled.
 # If set to false, no NetworkPolicies will be created.
 # Default is true.
 enable: true
 ingress:
 - ports:
 - protocol: TCP
 port: 6333
 - protocol: TCP
 port: 6334
 # Allow DNS resolution from qdrant pods at Kubernetes internal DNS server
 egress:
 - ports:
 - protocol: UDP
 port: 53
 # the settings for cloud inference proxy
 inference:
 # inference proxy endpoint
 # when set, the value is passed to Qdrant cluster config as `inference.address` config param
 # if QdrantCluster instance has `.spec.config.inference.enabled` field set
 address: ~
 # Scheduling config contains the settings specific for scheduling
 scheduling:
 # Default topology spread constraints (list from type corev1.TopologySpreadConstraint)
 # Default is an empty list
 topologySpreadConstraints: []
 # Default pod disruption budget (object from type policyv1.PodDisruptionBudgetSpec)
 # Default is not set
 podDisruptionBudget: {}
 # ClusterManager config contains the settings specific for cluster manager
 clusterManager:
 # Whether or not the cluster manager (on operator level).
 # If disabled, all other properties in this struct are disregarded. Otherwise, the individual features will be inspected.
 # Default is false.
 enable: true
 # The endpoint address where the cluster manager can be reached
 endpointAddress: "http://qdrant-cluster-manager"
 # InvocationInterval is the interval between calls (started after the previous call is retured)
 # Default is 10 seconds
 invocationInterval: 10s
 # SyncClustersInterval is the interval between sync-clusters calls (started after the previous call is retured)
 # Default is 10 seconds
 syncClustersInterval: 10s
 # Timeout is the duration a single call to the cluster manager is allowed to take.
 # Default is 30 seconds
 timeout: 30s
 # syncClustersTimeout is the duration a single call to the cluster manager to sync clusters is allowed to take.
 # Default is 10 seconds
 syncClustersTimeout: 10s
 # Specifies overrides for the manage rules
 manageRulesOverrides:
 #dry_run:
 #max_transfers:
 #max_transfers_per_collection:
 #rebalance:
 #replicate:
 # Specifies overrides for the manage rules
 syncClustersRulesOverrides:
 #dry_run:
 #max_downtime_sec:
 #sync_interval_sec:
 # Ingress config contains the settings specific for ingress
 ingress:
 # Whether or not the Ingress feature is enabled.
 # Default is true.
 enable: false
 # Which specific ingress provider should be used
 # Default is KubernetesIngress
 provider: KubernetesIngress
 # The specific settings when the Provider is QdrantCloudTraefik
 qdrantCloudTraefik:
 # Enable tls
 # Default is false
 tls: false
 # Secret with TLS certificate
 # Default is None
 secretName: ""
 # List of Traefik middlewares to apply
 # Default is an empty list
 middlewares: []
 # IP Allowlist Strategy for Traefik
 # Default is None
 ipAllowlistStrategy:
 # Enable body validator plugin and matching ingressroute rules
 # Default is false
 enableBodyValidatorPlugin: false
 # EntryPoints is the list of traefik entry points to use for the ingress route
 # Default is ["web"] or ["websecure"] depending on the TLS setting
 entryPoints: []
 # The specific settings when the Provider is KubernetesIngress
 kubernetesIngress:
 # Name of the ingress class
 # Default is None
 #ingressClassName:
 # TelemetryTimeout is the duration a single call to the cluster telemetry endpoint is allowed to take.
 # Default is 3 seconds
 telemetryTimeout: 3s
 # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 20.
 maxConcurrentReconciles: 20
 # VolumeExpansionMode specifies the expansion mode, which can be online or offline (e.g. in case of Azure).
 # Available options: Online, Offline
 # Default is Online
 volumeExpansionMode: Online
 # BackupManagementConfig contains the settings for backup management
 backupManagement:
 # Whether or not the backup features are enabled.
 # If disabled, all other properties in this struct are disregarded. Otherwise, the individual features will be inspected.
 # Default is true.
 enable: true
 # Snapshots contains the settings for snapshots as part of backup management.
 snapshots:
 # Whether or not the Snapshot feature is enabled.
 # Default is true.
 enable: true
 # The VolumeSnapshotClass used to make VolumeSnapshots.
 # Default is "csi-snapclass".
 volumeSnapshotClass: "csi-snapclass"
 # The duration a snapshot is retained when the phase becomes Failed or Skipped
 # Default is 72h (3d).
 retainUnsuccessful: 72h
 # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1.
 maxConcurrentReconciles: 1
 # ScheduledSnapshots contains the settings for scheduled snapshot as part of backup management.
 scheduledSnapshots:
 # Whether or not the ScheduledSnapshot feature is enabled.
 # Default is true.
 enable: true
 # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1.
 maxConcurrentReconciles: 1
 # Restores contains the settings for restoring (a snapshot) as part of backup management.
 restores:
 # Whether or not the Restore feature is enabled.
 # Default is true.
 enable: true
 # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1.
 maxConcurrentReconciles: 1

qdrant-cluster-manager:
 replicaCount: 1

 image:
 repository: registry.cloud.qdrant.io/qdrant/cluster-manager
 pullPolicy: IfNotPresent
 # Overrides the image tag whose default is the chart appVersion.
 tag: ""

 imagePullSecrets:
 - name: qdrant-registry-creds
 nameOverride: ""
 fullnameOverride: "qdrant-cluster-manager"

 serviceAccount:
 # Specifies whether a service account should be created
 create: true
 # Automatically mount a ServiceAccount's API credentials?
 automount: true
 # Annotations to add to the service account
 annotations: {}
 # The name of the service account to use.
 # If not set and create is true, a name is generated using the fullname template
 name: ""

 podAnnotations: {}
 podLabels: {}

 podSecurityContext:
 runAsNonRoot: true
 runAsUser: 10001
 runAsGroup: 20001
 fsGroup: 30001

 securityContext:
 capabilities:
 drop:
 - ALL
 readOnlyRootFilesystem: true
 runAsNonRoot: true
 runAsUser: 10001
 runAsGroup: 20001
 allowPrivilegeEscalation: false
 seccompProfile:
 type: RuntimeDefault

 service:
 type: ClusterIP

 networkPolicy:
 create: true

 resources: {}
 # We usually recommend not to specify default resources and to leave this as a conscious
 # choice for the user. This also increases chances charts run on environments with little
 # resources, such as Minikube. If you do want to specify resources, uncomment the following
 # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
 # limits:
 # cpu: 100m
 # memory: 128Mi
 # requests:
 # cpu: 100m
 # memory: 128Mi

 nodeSelector: {}

 tolerations: []

 affinity: {}

qdrant-cluster-exporter:
 image:
 repository: registry.cloud.qdrant.io/qdrant/qdrant-cluster-exporter
 pullPolicy: Always
 # Overrides the image tag. Defaults to the chart appVersion.
 tag: ""

 imagePullSecrets:
 - name: qdrant-registry-creds

 nameOverride: ""
 fullnameOverride: ""

 serviceAccount:
 # Specifies whether a service account should be created
 create: true
 # Annotations to add to the service account
 annotations: {}
 # The name of the service account to use.
 # If not set and create is true, a name is generated using the fullname template
 name: ""

 rbac:
 create: true

 ## Additional labels to add to all resources
 customLabels: {}
 
 podAnnotations: {}

 podSecurityContext:
 runAsNonRoot: true
 runAsUser: 65534
 runAsGroup: 65534
 fsGroup: 65534

 securityContext:
 readOnlyRootFilesystem: true
 runAsNonRoot: true
 runAsUser: 65534
 runAsGroup: 65534

 service:
 enabled: true
 type: ClusterIP
 port: 9090
 portName: metrics

 strategy:
 # Prevents double-scraping by terminating the old pod before creating a new one
 # The pod scrapes a large volume of metrics with high cardinality
 type: Recreate

 resources: {}
 # We usually recommend not setting default resources and to leave this as a conscious
 # choice for the user. This allows charts to run on environments with fewer
 # resources, such as Minikube. If you do want to specify resources, uncomment the following
 # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
 # limits:
 # cpu: 100m
 # memory: 128Mi
 # requests:
 # cpu: 100m
 # memory: 128Mi

 nodeSelector: {}

 tolerations: []

 affinity: {}

 serviceMonitor:
 enabled: false
 honorLabels: true
 scrapeInterval: 60s
 scrapeTimeout: 55s

 # Limit RBAC to the release namespace
 limitRBAC: true

 # Watched Namespaces Configuration
 watch:
 # If true, only the namespace where the exporter is deployed is watched, otherwise it watches the namespaces defined in watch.namespaces
 onlyReleaseNamespace: true
 # an empty list watches all namespaces
 namespaces: []

 # Configuration for the qdrant cluster exporter
 config:
 # The log level for the cluster-exporter
 # Available options: DEBUG | INFO | WARN | ERROR
 logLevel: INFO
 # Controller related settings
 controller:
 # Schedule for the controller to do a forced resync (if watches are missed / nothing happened)
 forceResyncPeriod: 10h
 # Indicates the maximum QPS from this client to the master
 # Default is 200
 qps: 200
 # Maximum burst for throttle.
 # Default is 500.
 burst: 500
 # Maximum number of concurrent reconciliations
 maxConcurrentReconciles: 20
 # Controller's object requeueing interval
 requeueInterval: 30s
 # Exporter Metrics Configuration
 metrics:
 # The port on which the metrics are exposed
 port: 9090
 # The path on which the metrics are exposed
 path: /metrics
 # Exporter Health Check Configuration
 healthz:
 # The port used for the health probe
 port: 8085
 # Qdrant Telemetry and Metrics Cache Configuration
 cache:
 # The period after which the cache is invalidated
 ttl: 60s
 # Qdrant Rest Client Configuration
 qdrant:
 restAPI:
 # The qdrant rest api port
 port: 6333
 # Qdrant API Request Timeout after which requests to Qdrant are canceled if not completed
 timeout: 20s
 # Path where qdrant exposes metrics
 metricsPath: "metrics"
 # Qdrant Telemetry Configuration
 telemetry:
 # Path where qdrant exposes telemetry
 path: "telemetry"
 # The level of details for telemetry
 detailsLevel: 6
 # Whether to anonymize the telemetry data
 anonymize: true

Create a Cluster

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Creating a Qdrant Cluster in Hybrid Cloud

Once a Hybrid Cloud Environment has been created you can follow the normal process to create a Qdrant cluster in that environment. This page also contains additional information on how to create a production-ready cluster.

Make sure to select your Hybrid Cloud Environment as the target.

Note that in the “Kubernetes Configuration” section you can additionally configure:

NodeSelectors for the Qdrant database pods
Toleration for the Qdrant database pods
TopologySpreadConstraints for the Qdrant database pods
Additional labels for the Qdrant database pods
A service type and annotations for the Qdrant database service

These settings can also be changed after the cluster is created on the cluster detail page.

Database Optimization

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Frequently Asked Questions: Database Optimization

How do I reduce memory usage?

The primary source of memory usage is vector data. There are several ways to address that:

Configure Quantization to reduce the memory usage of vectors.
Configure on-disk vector storage

The choice of the approach depends on your requirements. Read more about configuring the optimal use of Qdrant.

How do you choose the machine configuration?

There are two main scenarios of Qdrant usage in terms of resource consumption:

Final Project: Production-Ready Documentation Search Engine

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 6

Final Project: Production-Ready Documentation Search Engine

Your Mission

It’s time to synthesize everything you’ve learned into a portfolio-ready application. You’ll build a sophisticated documentation search engine that shows hybrid retrieval, multivector reranking, and production-quality evaluation.

Your search engine will understand both semantic meaning and exact keywords, then use fine-grained reranking to surface the most relevant documentation sections. When someone searches for “how to configure HNSW parameters,” your system should return the exact section with practical examples, not just a page that mentions “HNSW” somewhere.

HNSW Indexing Fundamentals

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 2

HNSW Indexing Fundamentals

At this point, you’ve learned how vector search retrieves the nearest vectors to a query using cosine similarity, dot product, or Euclidean distance. How does this work at scale?

Why Vector Search Needs Indexing

The Vector Search Challenge

You might wonder if Qdrant calculates the distance to every single vector in your collection for each query. This method, known as brute force search, technically works but with millions or billions of vectors this is too slow per query.

Hybrid Search with Reranking

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Hybrid Search with Reranking

Time: 40 min	Level: Intermediate

Hybrid search combines dense and sparse retrieval to deliver precise and comprehensive results. By adding reranking with ColBERT, you can further refine search outputs for maximum relevance.

In this guide, we’ll show you how to implement hybrid search with reranking in Qdrant, leveraging dense, sparse, and late interaction embeddings to create an efficient, high-accuracy search system. Let’s get started!

Overview

Let’s start by breaking down the architecture:

Installing Dependencies

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 0

Installing Dependencies

To work with multi-vector search in Qdrant, you’ll need several Python libraries: Qdrant client for search and FastEmbed for multi-vector embeddings.

We’ll set up a clean Python environment and install everything you need to start experimenting with multi-vector representations.

Python Environment Setup

Using uv (Recommended)

For this course, we recommend using uv, a modern Python package manager that’s significantly faster and more reliable than traditional pip. It handles virtual environments and dependencies with better performance and dependency resolution.

Integrating with Haystack

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 7

Integrating with Haystack

Build end-to-end agentic pipelines with Qdrant.

What You’ll Learn

Haystack pipeline integration
Document processing workflows
Question answering systems
Search and retrieval optimization
Sparse vector search and metadata filtering
LLM-based agent development
Movie recommendation system architecture

Haystack Movie Recommendation Assistant

Haystack provides a powerful framework for building sophisticated recommendation systems that combine multiple search strategies. The movie recommendation assistant demonstrates how to leverage sparse vector search, metadata filtering, and LLM-based agents to handle complex natural language queries like “find me a highly-rated action movie about car racing” or “recommend five Japanese thrillers.”

MaxSim Distance Metric

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 1

MaxSim Distance Metric

MaxSim (Maximum Similarity) is the core distance metric for late interaction models. Unlike traditional vector similarity metrics that operate on pairs of single vectors, MaxSim computes similarity between sequences of vectors.

Understanding MaxSim is important for working with multi-vector search effectively and understanding its performance characteristics.

Follow along in Colab:

The MaxSim Formula

In late interaction, we represent documents and queries as sequences of token vectors. But how do we measure similarity between two sets of vectors?

Multivectors and Late Interaction

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Multivector Representations for Reranking in Qdrant

Time: 30 min	Level: Intermediate

Multivector Representations are one of the most powerful features of Qdrant. However, most people don’t use them effectively, resulting in massive RAM overhead, slow inserts, and wasted compute.

In this tutorial, you’ll discover how to effectively use multivector representations in Qdrant.

What are Multivector Representations?

In most vector engines, each document is represented by a single vector - an approach that works well for short texts but often struggles with longer documents. Single vector representations perform pooling of the token-level embeddings, which obviously leads to losing some information.

Multivectors for Late Interaction Models

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 5

Multivectors for Late Interaction Models

Many embedding models represent data as a single vector. Transformer-based encoders achieve this by pooling the per-token vector matrix from the final layer into a single vector. That works great for most cases. But when your documents get more complex, cover multiple topics, or require context sensitivity, that one-size-fits-all compression starts to break down. You lose granularity and semantic alignment (though chunking and learned pooling mitigate this to an extent).

Points, Vectors and Payloads

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 1

Points, Vectors and Payloads

Understanding Qdrant’s core data model is essential for building effective vector search applications. This lesson establishes the precise technical vocabulary and concepts you’ll use throughout the course.

Points: The Core Entity

Points are the central entity that Qdrant operates with. A point is a record consisting of three components:

Unique ID (64-bit unsigned integer or UUID)
Vector (dense, sparse, or multivector)
Optional Payload (metadata)

Qdrant Setup

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 0

Qdrant Setup

Spin up production-grade vector search in minutes. Qdrant Cloud gives you a managed endpoint with TLS, automatic backups, high-availability options, and a clean API.

Create your cluster

Sign up at cloud.qdrant.io with email, Google, or GitHub.
Open Clusters → Create a Free Cluster. The Free Tier is enough for this course.

Pick a region close to your users or app.
When the cluster is ready, copy the API key and store it securely. You can make new keys later from API Keys on the cluster page.

Relevance Feedback Retrieval in Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Relevance Feedback in Qdrant

Time: 30 min	Level: Intermediate	Output: GitHub

In Qdrant 1.17 we introduced a new Relevance Feedback Query, our scalable, first ever vector index-native approach to incorporating relevance feedback in retrieval.

In this tutorial, you’ll see how to:

Customize Relevance Feedback Query for your Qdrant collection, retriever and feedback model.
Add customized Relevance Feedback Query to your search pipeline.
Evaluate the gains it brings to this pipeline.

Relevance Feedback

Relevance feedback distills signals about the relevance of current search results into the next retrieval iteration, surfacing better results over time.

Semantic Search Basics

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Semantic Search Basics with Qdrant

Time: 30 min	Level: Beginner	Output: GitHub

This tutorial shows you how to build and deploy your own neural search service to look through descriptions of companies from startups-list.com and pick the most similar ones to your query. The website contains the company names, descriptions, locations, and a picture for each entry.

A neural search service uses artificial neural networks to improve the accuracy and relevance of search results. Besides offering simple keyword results, this system can retrieve results by meaning. It can understand and interpret complex search queries and provide more contextually relevant output, effectively enhancing the user’s search experience.

Semantic Search for Code

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Semantic Search for Code with Qdrant

Time: 45 min	Level: Intermediate

You too can enrich your applications with Qdrant semantic search. In this tutorial, we describe how you can use Qdrant to navigate a codebase, to help you find relevant code snippets. As an example, we will use the Qdrant source code itself, which is mostly written in Rust.

The approach

We want to search codebases using natural semantic queries, and searching for code based on similar logic. You can set up these tasks with embeddings:

Sparse Vectors and Inverted Indexes

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 3

Sparse Vectors and Inverted Indexes

Create and index sparse vector representations for keywords-based search and recommendations.

What You’ll Learn

Understanding sparse vector representations
Using sparse vectors in Qdrant

Sparse Vector Representations

Sparse vectors are high dimensional vectors, filled up with zeroes except for a few dimensions. Each dimension of a sparse vector refers to a certain object, and its value – a role of this object in this sparse representation.

User Management

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

User Management

💡 You can access this in Access Management > User & Role Management if available see this page for details.

Inviting Users to an Account

Account users can be managed via the User Management section. Start by selecting a role from the dropdown and then type the name of the user you wish to manage. For users who are not in your account you will have the option to invite them. For users already in your account you can add them to the role, or see if the role has already been assigned to them.

Vector Quantization Methods

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 4

Vector Quantization Methods

Production vector search engines face an inevitable scaling challenge: memory requirements grow with dataset size, while search latency demands vectors remain in fast storage. Quantization provides the solution by compressing vector representations while maintaining retrieval quality - but the method you choose fundamentally determines your system’s performance characteristics.

The Memory Economics

Consider the mathematics of scale. OpenAI’s text-embedding-3-small produces 1536-dimensional vectors requiring 6 KB each (1536 × 4 bytes per float32). This scales predictably: 1 million vectors consume 6 GB, 10 million require 60 GB, and 100 million demand 600 GB of memory.

Vector Quantization Techniques

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 3

Vector Quantization Techniques

Vector quantization compresses vectors by reducing the precision of each component. Qdrant supports several quantization methods that can reduce memory usage by 4-64x, sometimes with minimal quality loss.

Choosing the right quantization method depends on your quality requirements and memory constraints.

Follow along in Colab:

The Memory Challenge with Multi-Vector Models

By default, embedding models produce vectors with float32 precision - each component uses 32 bits (4 bytes) of memory. For single-vector embeddings, this is manageable. But multi-vector models like ColModernVBERT change the equation dramatically.

Qdrant under the hood: io_uring

info@qdrant.tech (Andrey Vasnetsov) — Wed, 21 Jun 2023 09:45:00 +0200

With Qdrant version 1.3.0 we introduce the alternative io_uring based async uring storage backend on Linux-based systems. Since its introduction, io_uring has been known to improve async throughput wherever the OS syscall overhead gets too high, which tends to occur in situations where software becomes IO bound (that is, mostly waiting on disk).

Input+Output

Around the mid-90s, the internet took off. The first servers used a process- per-request setup, which was good for serving hundreds if not thousands of concurrent request. The POSIX Input + Output (IO) was modeled in a strictly synchronous way. The overhead of starting a new process for each request made this model unsustainable. So servers started forgoing process separation, opting for the thread-per-request model. But even that ran into limitations.

Filtered search benchmark

info@qdrant.tech (Andrey Vasnetsov) — Mon, 13 Feb 2023 00:00:00 +0000

Filtered search benchmark

Applying filters to search results brings a whole new level of complexity. It is no longer enough to apply one algorithm to plain data. With filtering, it becomes a matter of the cross-integration of the different indices.

To measure how well different search engines perform in this scenario, we have prepared a set of Filtered ANN Benchmark Datasets - https://github.com/qdrant/ann-filtering-benchmark-datasets

It is similar to the ones used in the ann-benchmarks project but enriched with payload metadata and pre-generated filtering requests. It includes synthetic and real-world datasets with various filters, from keywords to geo-spatial queries.

Accuracy Recovery with Rescoring

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 4

Accuracy Recovery with Rescoring

When we use quantization methods like Scalar, Binary, or Product Quantization, we're compressing our vectors to save memory and improve performance. However, this compression can slightly reduce the accuracy of our similarity searches because the quantized vectors are approximations of the original data. To mitigate this loss of accuracy, you can use oversampling and rescoring, which help improve the accuracy of the final search results.

So let’s say we are performing a search in a collection with Binary Quantization. Qdrant retrieves the top candidates using the quantized vectors based on their similarity to the query vector, as determined by the quantized data. This step is fast because we’re using the quantized vectors.

Collaborative Filtering

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Build a Recommendation System with Collaborative Filtering using Qdrant

Time: 45 min	Level: Intermediate

Every time Spotify recommends the next song from a band you’ve never heard of, it uses a recommendation algorithm based on other users’ interactions with that song. This type of algorithm is known as collaborative filtering.

Unlike content-based recommendations, collaborative filtering excels when the objects’ semantics are loosely or unrelated to users’ preferences. This adaptability is what makes it so fascinating. Movie, music, or book recommendations are good examples of such use cases. After all, we rarely choose which book to read purely based on the plot twists.

Combining Vector Search and Filtering

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 2

Combining Vector Search and Filtering

We’ve talked about how Qdrant uses the HNSW graph to efficiently search dense vectors. But in real-world applications, you’ll often want to constrain your search using filters. This creates unique challenges for graph traversal that Qdrant solves elegantly.

The Challenge: Filters Break Graph Connectivity

Consider retrieving items from an online store collection where you only want to show laptops priced under $1,000. That price information, along with the category ’laptop’, isn’t part of the vector - it lives in the payload.

Configure, Scale & Update Clusters

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Configure, Scale & Update Qdrant Hybrid Cloud Clusters

Configure Clusters

Alongside Hybrid Cloud specific scheduling options, you can also adjust various other advanced configuration options for your clusters. See Configure Clusters for more details.

Scale Clusters

Hybrid cloud clusters can be scaled up and down, horizontally and vertically, at any time. For more details see Scale Clusters.

Automatic Shard Rebalancing

Qdrant Cloud supports automatic shard rebalancing when scaling your cluster horizontally. This ensures that data is evenly distributed across the nodes, optimizing performance and resource utilization. For more details see Shard Rebalancing.

Course Completion and Next Steps

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 6

Course Completion and Next Steps

Congratulations! You’ve Mastered Vector Search.

You’ve built and shipped a complete vector search application and gained the expertise to run Qdrant in production. This achievement represents mastery of modern retrieval systems and positions you at the forefront of AI-powered search technology.

Your Learning Journey

You’ve progressed from vector search fundamentals to production-ready expertise:

Foundation Building (Days 0-2): You mastered the core concepts of vector search, learned how similarity metrics work, and understood how HNSW indexing enables fast retrieval at scale.

Demo: Keyword Search with Sparse Vectors

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 3

Demo: Keyword Search with Sparse Vectors

Use sparse vectors for keywords-based text retrieval.

What You’ll Learn

Connection between Sparse Vectors & keywords-based retrieval
Using BM25 in Qdrant
Sparse Neural Retrieval
Using SPLADE++ in Qdrant

Text Encoding

In sparse vectors, each non‑zero dimension represents an object that plays a specific role for the item being represented. When we work with text, the natural choice for these objects is words.

Distance Metrics

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 1

Distance Metrics

After vectors are stored, we can use their spatial properties to perform nearest neighbor searches that retrieve semantically similar items based on how close they are in this space.

The position of a vector in embedding space only reflects meaning as far as the embedding model has learned to encode it. The model and its training objective tell you what “close” means.

Quick rule of thumb

Most users do not need to design a distance metric from scratch:

Hugging Face Dataset Ingestion

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Load Hugging Face Datasets into Qdrant

Hugging Face provides a platform for sharing and using ML models and datasets. Qdrant also publishes datasets along with the embeddings that you can use to practice with Qdrant and build your applications based on semantic search. Please let us know if you’d like to see a specific dataset!

arxiv-titles-instructorxl-embeddings

This dataset contains embeddings generated from the paper titles only. Each vector has a payload with the title used to create it, along with the DOI (Digital Object Identifier).

Hybrid Search with FastEmbed

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Hybrid Search with Qdrant’s FastEmbed

Time: 20 min	Level: Beginner	Output: GitHub

This tutorial shows you how to build and deploy your own hybrid search service to look through descriptions of companies from startups-list.com and pick the most similar ones to your query. The website contains the company names, descriptions, locations, and a picture for each entry.

As we have already written on our blog, there is no single definition of hybrid search. In this tutorial we are covering the case with a combination of dense and sparse embeddings. The former ones refer to the embeddings generated by such well-known neural networks as BERT, while the latter ones are more related to a traditional full-text search approach.

Implementing a Basic Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 0

Implementing a Basic Vector Search

Follow along as we build your first collection, insert vectors, and run similarity searches. This guided tutorial walks you through each step.

Step 1: Install the Qdrant Client

To interact with Qdrant, we need the Python client. This enables us to communicate with the Qdrant service, manage collections, and perform vector searches.

!pip install qdrant-client

Step 2: Import Required Libraries

Import the necessary modules from the qdrant-client package. The QdrantClient class establishes connection to Qdrant, while the models module provides configurations for Distance, VectorParams, and PointStruct.

Integrating with Unstructured.io

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 7

Integrating with Unstructured.io

Process and vectorize documents with Unstructured.io and Qdrant.

What You’ll Learn

Document processing with Unstructured.io
Multi-format document ingestion
Structured data extraction
Automated vectorization pipelines
Enterprise data transformation workflows
VLM-powered document understanding
Production-ready ETL pipelines

Unstructured Enterprise Data Processing

Unstructured.io addresses the critical challenge of processing unstructured enterprise data, which typically accounts for 80% of enterprise information. The platform provides a composable solution to transform PDFs, Word documents, emails, and other unstructured formats into structured outputs optimized for GenAI initiatives, eliminating the complexity of custom scripts and tools.

Managing a Cluster

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Managing a Qdrant Cluster

The most minimal QdrantCluster configuration is:

apiVersion: qdrant.io/v1
kind: QdrantCluster
metadata:
 name: qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840
 labels:
 cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840"
 customer-id: "acme-industries"
spec:
 id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840"
 version: "v1.11.3"
 size: 1
 resources:
 cpu: 100m
 memory: "1Gi"
 storage: "2Gi"

The id should be unique across all Qdrant clusters in the same namespace, the name must follow the above pattern and the cluster-id and customer-id labels are mandatory.

Permission Reference

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Permission Reference

This document outlines the permissions available in Qdrant Cloud.

💡 When enabling write:* permissions in the UI, the corresponding read:* permission will also be enabled and non-actionable. This guarantees access to resources after creating and/or updating them.

Identity and Access Management

Permissions for users, user roles, management keys, and invitations.

Permission	Description
`read:roles`	View roles in the Access Management page.
`write:roles`	Create and modify roles in the Access Management page.
`delete:roles`	Remove roles in the Access Management page.
`read:management_keys`	View Cloud Management Keys in the Access Management page.
`write:management_keys`	Create and manage Cloud Management Keys.
`delete:management_keys`	Remove Cloud Management Keys in the Access Management page.
`write:invites`	Invite new users to an account and revoke invitations.
`read:invites`	View pending invites in an account.
`delete:invites`	Remove an invitation.
`read:users`	View user details in the profile page. - Also applicable in User Management and Role details (User tab).
`delete:users`	Remove users from an account. - Applicable in User Management and Role details (User tab).

Cluster

Permissions for API Keys, backups, clusters, and backup schedules.

Pooling Techniques

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 3

Pooling Techniques

While quantization reduces the size of each vector, pooling reduces the number of vectors per document. By intelligently combining token embeddings, you can achieve significant memory savings while preserving retrieval quality.

Follow along in Colab:

Pooling in Embedding Models

Pooling isn’t new to vector search - it’s fundamental to how most embedding models work. When you encode text with models like Sentence Transformers, the model first generates embeddings for each token in your input. But to create a single vector representing the entire text, the model must pool these token embeddings together.

The Universal Query API

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 5

The Universal Query API

Picture this: a customer types “leather jackets” into your store’s search bar. You want to show items that match the style semantically - so a bomber jacket surfaces even if it doesn’t mention “leather jackets” verbatim - but you also need to enforce your business rules. Only products under $200, only items in stock, only jackets released within the past year. Traditionally, you’d fire off a search, gather results, then apply filters and glue code. With Qdrant’s Universal Query API, all of that happens in one declarative request.

Use Cases for Multi-Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 1

Use Cases for Multi-Vector Search

When is the added complexity of multi-vector search actually worth it? Multi-vector representations require more storage, more computation, and more careful implementation than simple single-vector embeddings. So why bother?

The answer comes down to one core capability: fine-grained matching. In the previous lessons, you learned how late interaction preserves token-level representations and how MaxSim computes similarity through independent token matching. Now you’ll see when this precision actually matters - and when it doesn’t.

Visual Interpretability of ColPali

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 2

Visual Interpretability of ColPali

Why did this document match my query? Unlike traditional black-box embedding models that produce a single opaque vector, ColPali’s multi-vector architecture offers something remarkable: you can see exactly where the model “looks” when matching a query to a document.

This visual interpretability is invaluable for building trust in multi-modal search systems, debugging unexpected results, and understanding model behavior and limitations.

Follow along in Colab:

Product Quantization in Vector Search | Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 30 May 2023 09:45:00 +0200

Product Quantization Demystified: Streamlining Efficiency in Data Management

Qdrant 1.1.0 brought the support of Scalar Quantization, a technique of reducing the memory footprint by even four times, by using int8 to represent the values that would be normally represented by float32.

The memory usage in vector search might be reduced even further! Please welcome Product Quantization, a brand-new feature of Qdrant 1.2.0!

What is Product Quantization?

Product Quantization converts floating-point numbers into integers like every other quantization method. However, the process is slightly more complicated than Scalar Quantization and is more customizable, so you can find the sweet spot between memory usage and search precision. This article covers all the steps required to perform Product Quantization and the way it’s implemented in Qdrant.

info@qdrant.tech (Andrey Vasnetsov) — Mon, 13 Feb 2023 00:00:00 +0000

Filtered Results

As you can see from the charts, there are three main patterns:

Speed boost - for some engines/queries, the filtered search is faster than the unfiltered one. It might happen if the filter is restrictive enough, to completely avoid the usage of the vector index.
Speed downturn - some engines struggle to keep high RPS, it might be related to the requirement of building a filtering mask for the dataset, as described above.

Async API

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Build High-Throughput Applications with Qdrant’s Async API

Time: 25 min	Level: Intermediate

Asynchronous programming is being broadly adopted in the Python ecosystem. Tools such as FastAPI have embraced this new paradigm, but it is also becoming a standard for ML models served as SaaS. For example, the Cohere SDK provides an async client next to its synchronous counterpart.

Databases are often launched as separate services and are accessed via a network. All the interactions with them are IO-bound and can be performed asynchronously so as not to waste time actively waiting for a server response. In Python, this is achieved by using async/await syntax. That lets the interpreter switch to another task while waiting for a response from the server.

Backups

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Backups

To create a one-time backup, create a QdrantClusterSnapshot resource:

apiVersion: qdrant.io/v1
kind: QdrantClusterSnapshot
metadata:
 name: "qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840-snapshot-timestamp"
 labels:
 cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840"
 customer-id: "acme-industries" 
spec:
 cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840"
 retention: 1h

You can also create a recurring backup with the QdrantClusterScheduledSnapshot resource:

apiVersion: qdrant.io/v1
kind: QdrantClusterScheduledSnapshot
metadata:
 name: "qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840-snapshot-timestamp"
 labels:
 cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840"
 customer-id: "acme-industries"
spec:
 scheduleShortId: a7d8d973
 cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840"
 # every hour
 schedule: "0 * * * *"
 retention: 1h

To restore from a backup, create a QdrantClusterRestore resource:

Cloud Quickstart

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Quick Start with Qdrant Cloud

Learn how to set up Qdrant Cloud and perform your first semantic search in just a few minutes. We’ll use a sample dataset of menu items embedded with the sentence-transformers/all-MiniLM-L6-v2 model via Cloud Inference. This is one of the free embedding models available on Qdrant Cloud. For a list of the available free and paid models, refer to the Inference tab of the Cluster Detail page in the Qdrant Cloud Console.

Configure the Qdrant Operator

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Configuring Qdrant Operator: Advanced Options

The Qdrant Operator has several configuration options, which can be configured in the advanced section of your Hybrid Cloud Environment.

The following YAML shows all configuration options with their default values:

# Additional pod annotations
podAnnotations: {}

# Configuration for the Qdrant operator service monitor to scrape metrics
serviceMonitor:
 enabled: false

# Resource requests and limits for the Qdrant operator
resources: {}

# Node selector for the Qdrant operator
nodeSelector: {}

# Tolerations for the Qdrant operator
tolerations: []

# Affinity configuration for the Qdrant operator
affinity: {}

# Configuration for the Qdrant operator (v2)
settings:
 # The log level for the operator
 # Available options: DEBUG | INFO | WARN | ERROR
 logLevel: INFO 
 # Controller related settings
 controller:
 # The period a forced recync is done by the controller (if watches are missed / nothing happened)
 forceResyncPeriod: 10h
 # QPS indicates the maximum QPS to the master from this client.
 # Default is 200
 qps: 200
 # Maximum burst for throttle.
 # Default is 500.
 burst: 500
 # Features contains the settings for enabling / disabling the individual features of the operator
 features:
 # ClusterManagement contains the settings for qdrant (database) cluster management
 clusterManagement:
 # Whether or not the Qdrant cluster features are enabled.
 # If disabled, all other properties in this struct are disregarded. Otherwise, the individual features will be inspected.
 # Default is true.
 enable: true
 # The StorageClass used to make database and snapshot PVCs.
 # Default is nil, meaning the default storage class of Kubernetes.
 storageClass: 
 # The StorageClass used to make database PVCs.
 # Default is nil, meaning the default storage class of Kubernetes.
 #database: 
 # The StorageClass used to make snapshot PVCs.
 # Default is nil, meaning the default storage class of Kubernetes.
 #snapshot:
 # Qdrant config contains settings specific for the database
 qdrant:
 # The config where to find the image for qdrant
 image: 
 # The repository where to find the image for qdrant
 # Default is "qdrant/qdrant"
 repository: qdrant/qdrant
 # Docker image pull policy
 # Default "IfNotPresent", unless the tag is dev, master or latest. Then "Always"
 #pullPolicy:
 # Docker image pull secret name
 # This secret should be available in the namespace where the cluster is running
 # Default not set
 #pullSecretName:
 # storage contains the settings for the storage of the Qdrant cluster
 storage:
 performance:
 # CPU budget, how many CPUs (threads) to allocate for an optimization job.
 # If 0 - auto selection, keep 1 or more CPUs unallocated depending on CPU size
 # If negative - subtract this number of CPUs from the available CPUs.
 # If positive - use this exact number of CPUs.
 optimizerCpuBudget: 0
 # Enable async scorer which uses io_uring when rescoring.
 # Only supported on Linux, must be enabled in your kernel.
 # See: <https://qdrant.tech/articles/io_uring/#and-what-about-qdrant>
 asyncScorer: false
 # Qdrant DB log level
 # Available options: DEBUG | INFO | WARN | ERROR 
 # Default is "INFO"
 logLevel: INFO
 # Default Qdrant security context configuration
 securityContext:
 # Enable default security context
 # Default is false
 enabled: false
 # Default user for qdrant container
 # Default not set
 #user: 1000
 # Default fsGroup for qdrant container
 # Default not set
 #fsUser: 2000
 # Default group for qdrant container
 # Default not set
 #group: 3000
 # Network policies configuration for the Qdrant databases
 networkPolicies:
 ingress:
 - ports:
 - protocol: TCP
 port: 6333
 - protocol: TCP
 port: 6334
 # Allow DNS resolution from qdrant pods at Kubernetes internal DNS server
 egress:
 - ports:
 - protocol: UDP
 port: 53
 # Scheduling config contains the settings specific for scheduling
 scheduling:
 # Default topology spread constraints (list from type corev1.TopologySpreadConstraint)
 topologySpreadConstraints:
 - maxSkew: 1
 topologyKey: "kubernetes.io/hostname"
 whenUnsatisfiable: "ScheduleAnyway"
 # Default pod disruption budget (object from type policyv1.PodDisruptionBudgetSpec)
 podDisruptionBudget:
 maxUnavailable: 1
 # ClusterManager config contains the settings specific for cluster manager
 clusterManager:
 # Whether or not the cluster manager (on operator level).
 # If disabled, all other properties in this struct are disregarded. Otherwise, the individual features will be inspected.
 # Default is false.
 enable: true
 # The endpoint address the cluster manager could be reached
 # If set, this should be a full URL like: http://cluster-manager.qdrant-cloud-ns.svc.cluster.local:7333
 endpointAddress: http://qdrant-cluster-manager:80
 # InvocationInterval is the interval between calls (started after the previous call is retured)
 # Default is 10 seconds
 invocationInterval: 10s
 # Timeout is the duration a single call to the cluster manager is allowed to take.
 # Default is 30 seconds
 timeout: 30s
 # Specifies overrides for the manage rules
 manageRulesOverrides:
 #dry_run: 
 #max_transfers:
 #max_transfers_per_collection:
 #rebalance:
 #replicate:
 # Ingress config contains the settings specific for ingress
 ingress:
 # Whether or not the Ingress feature is enabled.
 # Default is true.
 enable: false
 # Which specific ingress provider should be used
 # Default is KubernetesIngress
 provider: KubernetesIngress
 # The specific settings when the Provider is QdrantCloudTraefik
 qdrantCloudTraefik:
 # Enable tls
 # Default is false
 tls: false
 # Secret with TLS certificate
 # Default is None
 secretName: ""
 # List of Traefik middlewares to apply
 # Default is an empty list
 middlewares: []
 # IP Allowlist Strategy for Traefik
 # Default is None
 ipAllowlistStrategy:
 # Enable body validator plugin and matching ingressroute rules
 # Default is false
 enableBodyValidatorPlugin: false
 # The specific settings when the Provider is KubernetesIngress
 kubernetesIngress:
 # Name of the ingress class
 # Default is None
 #ingressClassName:
 # TelemetryTimeout is the duration a single call to the cluster telemetry endpoint is allowed to take.
 # Default is 3 seconds
 telemetryTimeout: 3s
 # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 20.
 maxConcurrentReconciles: 20 
 # VolumeExpansionMode specifies the expansion mode, which can be online or offline (e.g. in case of Azure).
 # Available options: Online, Offline
 # Default is Online
 volumeExpansionMode: Online
 # BackupManagementConfig contains the settings for backup management
 backupManagement:
 # Whether or not the backup features are enabled.
 # If disabled, all other properties in this struct are disregarded. Otherwise, the individual features will be inspected.
 # Default is true.
 enable: true
 # Snapshots contains the settings for snapshots as part of backup management.
 snapshots:
 # Whether or not the Snapshot feature is enabled.
 # Default is true.
 enable: true
 # The VolumeSnapshotClass used to make VolumeSnapshots.
 # Default is "csi-snapclass".
 volumeSnapshotClass: "csi-snapclass"
 # The duration a snapshot is retained when the phase becomes Failed or Skipped
 # Default is 72h (3d).
 retainUnsuccessful: 72h
 # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1.
 maxConcurrentReconciles: 1
 # ScheduledSnapshots contains the settings for scheduled snapshot as part of backup management.
 scheduledSnapshots:
 # Whether or not the ScheduledSnapshot feature is enabled.
 # Default is true.
 enable: true
 # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1.
 maxConcurrentReconciles: 1
 # Restores contains the settings for restoring (a snapshot) as part of backup management.
 restores:
 # Whether or not the Restore feature is enabled.
 # Default is true.
 enable: true
 # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1.
 maxConcurrentReconciles: 1

Demo: HNSW Performance Tuning

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 2

Demo: HNSW Performance Tuning

Learn how to improve vector search speed with HNSW tuning and payload indexing on a real 100K dataset.

Follow along in Colab:

What You’ll Do

Yesterday you learned the theory behind HNSW indexing. Today you’ll see it in action on a 100,000-vector dataset, measuring performance differences and applying optimization strategies that work in production.

You’ll learn to:

Optimize bulk upload speed with strategic HNSW configuration
Measure the performance impact of payload indexes
Tune HNSW params
Compare full-scan vs. HNSW search performance

The Performance Challenge

Working with 100K high-dimensional vectors (1536 dimensions from OpenAI’s text-embedding-3-large) presents real performance challenges:

Demo: Universal Query for Hybrid Retrieval

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 5

Demo: Universal Query for Hybrid Retrieval

In this hands-on demo, we’ll build a research paper discovery system using the arXiv dataset that showcases the full power of Qdrant’s Universal Query API. You’ll see how to combine dense semantics, sparse keywords, and ColBERT reranking to help researchers find exactly the papers they need - all in a single query.

Follow along in Colab:

The Challenge: Intelligent Research Discovery

Imagine you’re a machine learning researcher looking for “transformer architectures for multimodal learning with attention mechanisms.” You need to:

Hybrid Search and the Universal Query API

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 3

Hybrid Search and the Universal Query API

Learn how to combine dense and sparse vector search methods to build powerful hybrid search pipelines that serve diverse user needs.

What You’ll Learn

Understand when to use dense vs. sparse vectors
Build hybrid search pipelines with Qdrant’s Universal Query API
Apply Reciprocal Rank Fusion (RRF) to combine results
Design multi-stage retrieval and reranking strategies

The Challenge: Different Users, Different Search Needs

The reality is that your users exist across a spectrum: from precise keyword searchers to vague natural language describers, and forcing a single search approach means disappointing part of your audience. Rather than compromising on search quality for different user types, hybrid search allows you to meet everyone where they are.

Integrating with Tensorlake

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 7

Integrating with TensorLake

Build scalable data lakes with vector search capabilities using TensorLake’s advanced document parsing techniques.

What You’ll Learn

Data lake architecture with vectors
Large-scale data management
Analytics and vector search integration
ETL pipeline optimization
Knowledge graph creation from unstructured documents
Document parsing and structured data extraction
LangGraph agent integration for natural language querying

TensorLake Knowledge Graph Integration

TensorLake introduces an innovative approach to enhancing Qdrant collection querying through advanced document parsing and knowledge graph creation. The platform transforms unstructured documents into structured knowledge graphs, providing comprehensive data extraction and intelligent summarization of complex tables and figures, leading to more accurate embeddings and fine-tuned searches in RAG applications.

Large-Scale Data Ingestion

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 4

Large-Scale Data Ingestion

In vector search applications inserting a few thousand data points is straightforward but the dynamics change completely when dealing with millions or billions of records. Tiny inefficiencies in the ingestion process compound into significant time losses, increased memory pressure, and degraded search performance.

Every individual upsert call initiates a transaction that consumes memory and disk I/O to build parts of the index. At scale, this naive approach can overwhelm your system, causing upload times to spike and search quality to decrease. Efficiently preparing and loading your data into Qdrant is paramount for building a robust and scalable AI application.

Logging & Monitoring

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Configuring Logging & Monitoring in Qdrant Private Cloud

Logging

You can access the logs with kubectl or the Kubernetes log management tool of your choice. For example:

kubectl -n qdrant-private-cloud logs -l app=qdrant,cluster-id=a7d8d973-0cc5-42de-8d7b-c29d14d24840

Configuring log levels: You can configure log levels for the databases individually through the QdrantCluster spec. Example:

apiVersion: qdrant.io/v1
kind: QdrantCluster
metadata:
 name: qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840
 labels:
 cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840"
 customer-id: "acme-industries"
spec:
 id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840"
 version: "v1.11.3"
 size: 1
 resources:
 cpu: 100m
 memory: "1Gi"
 storage: "2Gi"
 config:
 log_level: "DEBUG"

Integrating with a log management system

You can integrate the logs into any log management system that supports Kubernetes. There are no Qdrant specific configurations necessary. Just configure the agents of your system to collect the logs from all Pods in the Qdrant namespace.

Multivector Document Retrieval

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Multivector Document Retrieval with ColPali/ColQwen

Time: 30 min	Level: Intermediate	Output: GitHub

Efficient PDF documents retrieval is a common requirement in tasks like (agentic) retrieval-augmented generation (RAG) and many other search-based applications. At the same time, setting up PDF documents retrieval is rarely possible without additional challenges.

Many traditional PDF retrieval solutions rely on optical character recognition (OCR) together with use case-specific heuristics to handle visually complex elements like tables, images and charts. These algorithms are often non-transferable – even within the same domain – with their task-customized parsing and chunking strategies, labor-intensive, prone to errors, and difficult to scale.

MUVERA

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 3

MUVERA

MUVERA (Multi-Vector Retrieval with Approximation) solves a fundamental problem: MaxSim’s asymmetry makes traditional indexing methods like HNSW ineffective. MUVERA enables fast approximate search for multi-vector representations.

Understanding MUVERA is key to scaling multi-vector search to millions of documents.

Follow along in Colab:

The HNSW Incompatibility Problem

Traditional vector indexes like HNSW are designed for single-vector search with symmetric distance metrics. Multi-vector representations break this assumption: MaxSim is inherently asymmetric and non-metric.

Networking, Logging & Monitoring

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Configuring Networking, Logging & Monitoring in Qdrant Hybrid Cloud

Configure network policies

For security reasons, each database cluster is secured with network policies. By default, database pods only allow egress traffic between each other and ingress traffic to ports 6333 (REST) and 6334 (gRPC) from within the Kubernetes cluster.

You can modify the default network policies in the Hybrid Cloud environment configuration:

qdrant:
 networkPolicies:
 ingress:
 - from:
 - ipBlock:
 cidr: 192.168.0.0/22
 - podSelector:
 matchLabels:
 app: client-app
 namespaceSelector:
 matchLabels:
 kubernetes.io/metadata.name: client-namespace
 - podSelector:
 matchLabels:
 app: traefik
 namespaceSelector:
 matchLabels:
 kubernetes.io/metadata.name: kube-system
 ports:
 - port: 6333
 protocol: TCP
 - port: 6334
 protocol: TCP

Logging

You can access the logs with kubectl or the Kubernetes log management tool of your choice. For example:

Problems of Multi-Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 1

Problems of Multi-Vector Search

Multi-vector search delivers impressive retrieval quality, but it comes with significant challenges. Before deploying multi-vector search in production, you need to understand these limitations and plan accordingly.

The good news: Module 3 covers optimization techniques that address many of these challenges.

The Indexing Challenge: Why HNSW Doesn’t Work

One of the fundamental challenges with multi-vector search stems from HNSW indexing incompatibility. As you learned in the MaxSim lesson, traditional vector search relies on HNSW (Hierarchical Navigable Small World) graphs to enable fast approximate nearest neighbor search. HNSW works by building static proximity graphs that connect similar documents, allowing efficient traversal during queries.

Project: Building Your First Vector Search System

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 0

Project: Building Your First Vector Search System

Time to apply what you’ve learned. You’ll create a complete, working vector search system from scratch.

Your Mission

Build a functional vector search system that demonstrates the core concepts: collections, points, similarity search, and filtering. You’ll design simple 4-dimensional vectors that represent different concepts or items.

Estimated Time: 30 minutes

What You’ll Build

A working search system with:

One collection with 4-dimensional vectors and Cosine distance
5–10 points with hand-crafted vectors and meaningful payloads
Basic similarity search to find nearest neighbors
Filtered search combining similarity with payload conditions

Setup

Prerequisites

Qdrant Cloud cluster (URL + API key)
Python 3.9+ (or Colab)
Required packages: qdrant-client.

Models

None. We will create vectors by hand.

Dataset

None. We will create our own data points.

Before creating data, decide what each of the four dimensions in your vectors will represent. This is the creative part of vector search!

Retrieval Quality Evaluation

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Evaluate Retrieval Quality with Qdrant

Time: 30 min	Level: Intermediate

Semantic search pipelines are as good as the embeddings they use. If your model cannot properly represent input data, similar objects might be far away from each other in the vector space. No surprise, that the search results will be poor in this case. There is, however, another component of the process which can also degrade the quality of the search results. It is the ANN algorithm itself.

Semantic Search 101

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Build a Semantic Search Engine in 5 Minutes

Time: 5 - 15 min	Level: Beginner

There are two versions of this tutorial:

The version on this page uses Qdrant Cloud. You’ll deploy a cluster and generate vector embedding in the cloud using Qdrant Cloud’s forever free tier (no credit card required).

Alternatively, you can run Qdrant on your own machine. This requires you to manage your own cluster and vector embedding infrastructure. If you prefer this option, check out the local deployment version of this tutorial.

Overview

If you are new to vector search engines, this tutorial is for you. In 5 minutes you will build a semantic search engine for science fiction books. After you set it up, you will ask the engine about an impending alien threat. Your creation will recommend books as preparation for a potential space attack.

Text Chunking Strategies

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 1

Text Chunking Strategies

So far we’ve talked about points - what they’re made of, and how Qdrant compares them for approximate nearest neighbor search using distance metrics like cosine similarity, dot product, or Euclidean distance.

But none of this matters until we give Qdrant something meaningful to compare. That brings us to the real beginning of the system.

Disclaimer: In this section, we focus on text chunking. Although other types of data (images, videos, audio and code) can also be chunked, we are covering the basics of text chunking as it is the most popular type of data.

Scalar Quantization: Background, Practices & More | Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 27 Mar 2023 10:45:00 +0100

Efficiency Unleashed: The Power of Scalar Quantization

High-dimensional vector embeddings can be memory-intensive, especially when working with large datasets consisting of millions of vectors. Memory footprint really starts being a concern when we scale things up. A simple choice of the data type used to store a single number impacts even billions of numbers and can drive the memory requirements crazy. The higher the precision of your type, the more accurately you can represent the numbers. The more accurate your vectors, the more precise is the distance calculation. But the advantages stop paying off when you need to order more and more memory.

Agentic RAG with CrewAI

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Agentic RAG System with CrewAI

Time: 45 min	Level: Beginner	Output: GitHub

By combining the power of Qdrant for vector search and CrewAI for orchestrating modular agents, you can build systems that don’t just answer questions but analyze, interpret, and act.

Traditional RAG systems focus on fetching data and generating responses, but they lack the ability to reason deeply or handle multi-step processes.

In this tutorial, we’ll walk you through building an Agentic RAG system step by step. By the end, you’ll have a working framework for storing data in a Qdrant Vector Database and extracting insights using CrewAI agents in conjunction with Vector Search over your data.

API Reference

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

API Reference

auth.qdrant.io/v1alpha1

Package v1alpha1 contains API Schema definitions for the qdrant.io v1alpha1 API group

Resource Types

APIAuthentication

APIAuthentication

APIAuthentication is a configuration for authenticating against Qdrant clusters.

Appears in:

APIAuthenticationList

Field	Description	Default	Validation
`apiVersion` string	`auth.qdrant.io/v1alpha1`
`kind` string	`APIAuthentication`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` APIAuthenticationSpec

APIAuthenticationSpec

APIAuthenticationSpec describes the configuration for authenticating against Qdrant clusters.

Capacity Planning

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Capacity Planning

When setting up your cluster, you’ll need to figure out the right balance of RAM and disk storage. The best setup depends on a few things:

How many vectors you have and their dimensions.
The amount of payload data you’re using and their indexes.
What data you want to store in memory versus on disk.
Your cluster’s replication settings.
Whether you’re using quantization and how you’ve set it up.

Calculating RAM size

You should store frequently accessed data in RAM for faster retrieval. If you want to keep all vectors in memory for optimal performance, you can use this rough formula for estimation:

Changelog

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Changelog

1.9.6 (2026-02-19)

Component	Version
qdrant-kubernetes-api	v1.23.0
operator	2.15.0
qdrant-cluster-manager	v0.3.17
qdrant-cluster-exporter	1.7.6

Latest validated Qdrant version: 1.17.0

Support for Qdrant 1.17

1.9.5 (2026-02-18)

Component	Version
qdrant-kubernetes-api	v1.23.0
operator	2.14.0
qdrant-cluster-manager	v0.3.17
qdrant-cluster-exporter	1.7.6

Latest validated Qdrant version: 1.16.3

Performance and stability improvements
Extended status information in QdrantCluster status
Support for VolumeAttributeClasses for storage volumes
Qdrant Pod zone added to QdrantCluster status
Metrics are additionally exposed on a separate port to allow more granular access control through NetworkPolicies

1.9.4 (2026-01-15)

Component	Version
qdrant-kubernetes-api	v1.22.1
operator	2.9.1
qdrant-cluster-manager	v0.3.15
qdrant-cluster-exporter	1.7.6

Latest validated Qdrant version: 1.16.3

Databricks Ingestion

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Ingest Databricks Data into Qdrant

Time: 30 min	Level: Intermediate	Complete Notebook

Databricks is a unified analytics platform for working with big data and AI. It’s built around Apache Spark, a powerful open-source distributed computing system well-suited for processing large-scale datasets and performing complex analytics tasks.

Apache Spark is designed to scale horizontally, meaning it can handle expensive operations like generating vector embeddings by distributing computation across a cluster of machines. This scalability is crucial when dealing with large datasets.

Demo: Implementing a Hybrid Search System

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 3

Demo: Implementing a Hybrid Search System

Build a complete hybrid search system with hands-on examples.

What You’ll Learn

Step-by-step hybrid search implementation
RRF algorithm in practice
Performance optimization techniques
Testing and evaluation methods

Follow along in Colab:

What You’ll Discover

In the previous lesson, you learned the theory behind hybrid search and the Universal Query API. Today you’ll implement it hands-on with a real dataset, comparing dense and sparse vector search and combining them using fusion algorithms.

Demo: Semantic Movie Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 1

Demo: Semantic Movie Search

Let’s synthesize everything we’ve learned today into a practical project: a semantic search engine for science fiction movies.

Follow along in Colab:

Project Overview: When Search Understands Meaning

Imagine asking a search engine: “Show me movies about questioning reality and the nature of existence” and getting back The Matrix, Inception, and Ex Machina, but not because these titles contain those exact words, but because the system understands what these films are actually about.

Deployment Platforms

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Hybrid Cloud: Hosting Platforms & Deployment Options

This page provides an overview of how to deploy Qdrant Hybrid Cloud on various managed Kubernetes platforms.

For a general list of prerequisites and installation steps, see our Hybrid Cloud setup guide. This platform specific documentation also applies to Qdrant Private Cloud.

Akamai (Linode)

The Linode Kubernetes Engine (LKE) is a managed container orchestration engine built on top of Kubernetes. LKE enables you to quickly deploy and manage your containerized applications without needing to build (and maintain) your own Kubernetes cluster. All LKE instances are equipped with a fully managed control plane at no additional cost.

Evaluating Search Pipelines

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 3

Evaluating Search Pipelines

Throughout this module, you’ve learned many optimization techniques: quantization to reduce memory, pooling to compress representations, MUVERA for efficient indexing, and multi-stage retrieval to balance speed with accuracy. But how do you know which combination is right for your data?

The answer lies in systematic evaluation across three dimensions: cost (memory and compute resources), latency (query response time), and quality (retrieval accuracy). Cost and latency are straightforward to measure - you can observe memory usage and time queries directly. Quality, however, requires a more principled approach: you need to measure whether your system returns the right documents.

From Pinecone

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate from Pinecone to Qdrant

What You Need from Pinecone

API key — from the Pinecone console
Index name — the name of the index to migrate
Index host URL — the host endpoint shown in your index dashboard

Concept Mapping

Pinecone	Qdrant	Notes
Index	Collection	One-to-one mapping
Namespace	Payload field or separate collection	No direct equivalent — the tool migrates all namespaces. Use `--pinecone.namespace` to migrate a specific one
Metadata	Payload	Direct mapping
Sparse values	Sparse vectors	Mapped to `sparse_vector` named vector by default
`cosine`	`Cosine`	Direct mapping
`dotproduct`	`Dot`	Pinecone requires unit-normalized vectors for dotproduct
`euclidean`	`Euclid`	Direct mapping

Run the Migration

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration pinecone \
 --pinecone.index-host 'https://your-index-host.pinecone.io' \
 --pinecone.index-name 'your-index' \
 --pinecone.api-key 'pcsk_...' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

Migrating a Specific Namespace

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration pinecone \
 --pinecone.index-host 'https://your-index-host.pinecone.io' \
 --pinecone.index-name 'your-index' \
 --pinecone.api-key 'pcsk_...' \
 --pinecone.namespace 'my-namespace' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

All Pinecone-Specific Flags

Flag	Required	Description
`--pinecone.index-name`	Yes	Name of the Pinecone index
`--pinecone.index-host`	Yes	Host URL of the Pinecone index
`--pinecone.api-key`	Yes	Pinecone API key
`--pinecone.namespace`	No	Specific namespace to migrate
`--pinecone.service-host`	No	Custom Pinecone service host

Qdrant-Side Options

Flag	Default	Description
`--qdrant.id-field`	`__id__`	Payload field name for original Pinecone IDs
`--qdrant.sparse-vector`	`sparse_vector`	Named vector for Pinecone sparse values

Gotchas

Score scaling: Pinecone cosine similarity returns values in [0, 1] (rescaled). Qdrant returns [-1, 1]. Rankings are identical, but raw scores won’t match.
Metadata size limits: Pinecone limits metadata to 40KB per vector. Qdrant has no per-payload size limit, so data is preserved as-is.
Namespace strategy: If you have multiple namespaces, decide upfront whether to merge them into a single Qdrant collection (using a namespace payload field for filtering) or create separate collections.

Next Steps

After migration, verify your data arrived correctly with the Migration Verification Guide.

GraphRAG with Qdrant and Neo4j

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Build a GraphRAG Agent with Neo4j and Qdrant

Time: 30 min	Level: Intermediate	Output: GitHub

To make Artificial Intelligence (AI) systems more intelligent and reliable, we face a paradox: Large Language Models (LLMs) possess remarkable reasoning capabilities, yet they struggle to connect information in ways humans find intuitive. While groundbreaking, Retrieval-Augmented Generation (RAG) approaches often fall short when tasked with complex information synthesis. When asked to connect disparate pieces of information or understand holistic concepts across large documents, these systems frequently miss crucial connections that would be obvious to human experts.

Integrating with Superlinked

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 7

Integrating with Superlinked

Advanced feature engineering for vector search applications.

What You’ll Learn

Advanced feature engineering techniques
Vector space optimization
Multi-modal data handling
Performance enhancement strategies

Mixture of Encoders Architecture in Superlinked

The Mixture of Encoders architecture is Superlinked’s modular system for combining multiple data-specific embedding models into one unified representation. It creates metadata-aware vector embeddings that integrate signals from text, images, popularity, user interaction, numbers, categories, and time, producing richer and more accurate results for search, retrieval, and recommendation tasks.

Multi-Vector Embeddings in Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 1

Multi-Vector Embeddings in Qdrant

You’ve learned how MaxSim enables fine-grained token-level matching and explored both the benefits and challenges of multi-vector search. Now it’s time to put that knowledge into practice.

Qdrant provides first-class support for multi-vector embeddings, making it straightforward to build search systems that leverage late interaction. In this lesson, you’ll learn how to configure Qdrant collections for multi-vector search, index documents with token-level embeddings, and execute queries using MaxSim distance.

OpenLLMetry

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

OpenLLMetry

OpenLLMetry from Traceloop is a set of extensions built on top of OpenTelemetry that gives you complete observability over your LLM application.

OpenLLMetry supports instrumenting the qdrant_client Python library and exporting the traces to various observability platforms, as described in their Integrations catalog.

This page assumes you’re using qdrant-client version 1.7.3 or above.

Usage

To set up OpenLLMetry, follow these steps:

Install the SDK:

pip install traceloop-sdk

Instantiate the SDK:

from traceloop.sdk import Traceloop

Traceloop.init()

You’re now tracing your qdrant_client usage with OpenLLMetry!

Points

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Points

The points are the central entity that Qdrant operates with. A point is a record consisting of a vector and an optional payload.

It looks like this:

// This is a simple point
{
 "id": 129,
 "vector": [0.1, 0.2, 0.3, 0.4],
 "payload": {"color": "red"},
}

You can search among the points grouped in one collection based on vector similarity. This procedure is described in more detail in the search and filtering sections.

This section explains how to create and manage vectors.

Pre-Migration Baseline

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Pre-Migration Baseline

Establishing a baseline is paramount for migration verification. If you don’t capture what “correct” looks like before you migrate, you have nothing to compare against afterward. This page covers what to record from your source system before starting the migration.

What to Capture

There are four pieces of information that need to be accounted for when establishing a baseline: collection/index inventory, metadata samples, baseline search results, and system configuration snapshots.

Project: Building a Recommendation System

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 5

Project: Building a Recommendation System

Bring together dense, sparse, and multivectors in one atomic Universal Query. You’ll retrieve candidates, fuse signals, rerank with ColBERT, and apply business filters - in a single request.

Your Mission

Build a complete recommendation system using Qdrant’s Universal Query API with dense, sparse, and ColBERT multivectors in one request.

Estimated Time: 90 minutes

What You’ll Build

A hybrid recommendation system using:

Multi-vector architecture with dense, sparse, and ColBERT vectors
Universal Query API for atomic multi-stage search
RRF fusion for combining candidates
ColBERT reranking for fine-grained relevance scoring
Business rule filtering at multiple pipeline stages
Production-ready patterns for recommendation systems

Setup

Prerequisites

Qdrant Cloud cluster (URL + API key)
Python 3.9+ (or Google Colab)
Packages: qdrant-client, fastembed

Models

Dense: sentence-transformers/all-MiniLM-L6-v2 (384-dim)
Sparse: prithivida/Splade_PP_en_v1 (SPLADE)
Multivector: colbert-ir/colbertv2.0 (128-dim tokens)

Dataset

Scope: A small set of sample items (e.g., 10-20 movies).
Payload Fields: title, description, category, genre, year, rating, user_segment, popularity_score, release_date.
Filters Used: category, user_segment, release_date, popularity_score.

Build Steps

Step 1: Set Up the Hybrid Collection

Initialize Client and Collection

First, connect to Qdrant and create a clean collection for our recommendation system:

Project: HNSW Performance Benchmarking

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 2

Project: HNSW Performance Benchmarking

Now that you’ve seen how HNSW parameters and payload indexes affect performance with the DBpedia dataset, it’s time to optimize for your own domain and use case.

Your Mission

Build on your Day 1 search engine by adding performance optimization. You’ll discover which HNSW settings work best for your specific data and queries, and measure the real impact of payload indexing.

Estimated Time: 90 minutes

Project: Quantization Performance Optimization

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 4

Project: Quantization Performance Optimization

Apply quantization techniques to your domain search engine and measure the real-world impact on speed, memory, and accuracy. You’ll discover how different quantization methods affect your specific use case and learn to optimize the accuracy recovery pipeline.

Your Mission

Transform your search engine from previous days into a production-ready system by implementing quantization optimization. You’ll test different quantization methods, measure performance impacts, and tune the oversampling + rescoring pipeline for optimal results.

Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Similarity search

Searching for the nearest vectors is at the core of many representational learning applications. Modern neural networks are trained to transform objects into vectors so that objects close in the real world appear close in vector space. It could be, for example, texts with similar meanings, visually similar pictures, or songs of the same genre.

This is how vector similarity works

Query API

Available as of v1.10.0

What is Qdrant?

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Introduction

Vector databases are a relatively new way for interacting with abstract data representations derived from opaque machine learning models such as deep learning architectures. These representations are often called vectors or embeddings and they are a compressed version of the data used to train a machine learning model to accomplish a task like sentiment analysis, speech recognition, object detection, and many others.

These new databases shine in many applications like semantic search and recommendation systems, and here, we’ll learn about one of the most popular and fastest growing vector databases in the market, Qdrant.

With Postgres

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Keeping Postgres and Qdrant in Sync

If you’ve migrated your vectors to Qdrant but still use Postgres as your source of truth, the next challenge is keeping both systems in sync as data changes.

This guide covers three progressively robust sync architectures — from simple application-level dual-writes to production-grade Change Data Capture — with working code, failure mode analysis, and clear guidance on when to use each.

Not sure if you need a dedicated vector store alongside Postgres? Read our pgvector tradeoffs blog post to understand the six conditions under which pgvector is sufficient — and when you’ll outgrow it.

On Unstructured Data, Vector Databases, New AI Age, and Our Seed Round.

info@qdrant.tech (Andrey Vasnetsov) — Wed, 19 Apr 2023 00:42:00 +0000

Vector databases are here to stay. The New Age of AI is powered by vector embeddings, and vector databases are a foundational part of the stack. At Qdrant, we are working on cutting-edge open-source vector similarity search solutions to power fantastic AI applications with the best possible performance and excellent developer experience.

Our 7.5M seed funding – led by Unusual Ventures, awesome angels, and existing investors – will help us bring these innovations to engineers and empower them to make the most of their unstructured data and the awesome power of LLMs at any scale.

Using LangChain for Question Answering with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 31 Jan 2023 10:53:20 +0100

Streamlining Question Answering: Simplifying Integration with LangChain and Qdrant

Building applications with Large Language Models doesn’t have to be complicated. A lot has been going on recently to simplify the development, so you can utilize already pre-trained models and support even complex pipelines with a few lines of code. LangChain provides unified interfaces to different libraries, so you can avoid writing boilerplate code and focus on the value you want to bring.

Integrating with LlamaIndex

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 7

Integrating with LlamaIndex

Data framework for building LLM applications with Qdrant.

What You’ll Learn

Building data pipelines with LlamaIndex
Connecting LlamaIndex to Qdrant
Query engines and retrieval strategies
Advanced RAG patterns with LlamaIndex
Agent workflows and function calling
Custom workflow development with events and steps
Qdrant Cloud integration with Llama Cloud
Real-world data ingestion and processing

LlamaIndex Agent Development Framework

LlamaIndex provides a comprehensive framework for building sophisticated LLM applications with Qdrant integration. The platform supports multiple deployment options including local Qdrant instances, Qdrant Cloud, and Llama Cloud with Qdrant Cloud synchronization, enabling flexible and scalable agent development.

Project: Building a Hybrid Search Engine

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 3

Project: Building a Hybrid Search Engine

Build a hybrid system that combines dense and sparse vectors with Reciprocal Rank Fusion, demonstrating how to get the best of both semantic understanding and keyword precision.

Your Mission

Create a production-ready hybrid search system that leverages both dense and sparse vectors to deliver superior search results. You’ll implement the complete hybrid pipeline and compare its performance against single-vector approaches.

Estimated Time: 75 minutes

Project: Building a Semantic Search Engine

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 1

Project: Building a Semantic Search Engine

Now that you’ve seen how semantic search works with movies, it’s time to build your own. Choose a domain you care about and create a search engine that understands meaning, not just keywords.

Your Mission

Build a semantic search engine for a topic of your choice. You’ll discover how chunking strategy affects search quality in your specific domain.

Estimated Time: 120 minutes

Minimal RAM you need to serve a million vectors

info@qdrant.tech (Andrey Vasnetsov) — Wed, 07 Dec 2022 10:18:00 +0000

When it comes to measuring the memory consumption of our processes, we often rely on tools such as htop to give us an indication of how much RAM is being used. However, this method can be misleading and doesn’t always accurately reflect the true memory usage of a process.

Question Answering as a Service with Cohere and Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 29 Nov 2022 15:45:00 +0100

Bi-encoders are probably the most efficient way of setting up a semantic Question Answering system. This architecture relies on the same neural model that creates vector embeddings for both questions and answers. The assumption is, both question and answer should have representations close to each other in the latent space. It should be like that because they should both describe the same semantic concept. That doesn’t apply to answers like “Yes” or “No” though, but standard FAQ-like problems are a bit easier as there is typically an overlap between both texts. Not necessarily in terms of wording, but in their semantics.

Final Project: Build Your Own Multi-Vector Search System

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Module 3

Final Project: Build Your Own Multi-Vector Search System

Your Mission

It’s time to bring together everything you’ve learned about multi-vector search, late interaction models, and production optimization. You’ll build a sophisticated document retrieval system that leverages late interaction’s token-level matching for superior search quality.

Your search engine will understand the nuanced relationships between query terms and document content. When someone searches for “machine learning applications in healthcare,” your system will find documents that discuss relevant concepts even when they use different terminology, thanks to late interaction’s fine-grained matching.

Integrating with Quotient

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 7

Integrating with Quotient

Advanced analytics with vector data using Quotient platform.

What You’ll Learn

Analytics platform integration
Vector data analysis techniques
Business intelligence applications
Reporting and visualization
RAG monitoring and quality assurance
AI application monitoring and debugging
Hallucination detection and document relevance scoring

Quotient AI Monitoring Platform

Quotient AI provides critical monitoring capabilities for AI applications and agents, automatically detecting quality issues and providing comprehensive insights into system performance. The platform serves as an essential monitoring layer for RAG (Retrieval Augmented Generation) applications, helping maintain reliability and enabling effective debugging of AI systems.

Introducing Qdrant 1.2.x

info@qdrant.tech (Andrey Vasnetsov) — Wed, 24 May 2023 10:45:00 +0200

A brand-new Qdrant 1.2 release comes packed with a plethora of new features, some of which were highly requested by our users. If you want to shape the development of the Qdrant vector database, please join our Discord community and let us know how you use it!

New features

As usual, a minor version update of Qdrant brings some interesting new features. We love to see your feedback, and we tried to include the features most requested by our community.

Finding errors in datasets with Similarity Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 18 Jul 2022 10:18:00 +0000

Nowadays, people create a huge number of applications of various types and solve problems in different areas. Despite such diversity, they have something in common - they need to process data. Real-world data is a living structure, it grows day by day, changes a lot and becomes harder to work with.

In some cases, you need to categorize or label your data, which can be a tough problem given its scale. The process of splitting or labelling is error-prone and these errors can be very costly. Imagine that you failed to achieve the desired quality of the model due to inaccurate labels. Worse, your users are faced with a lot of irrelevant items, unable to find what they need and getting annoyed by it. Thus, you get poor retention, and it directly impacts company revenue. It is really important to avoid such errors in your data.

Integrating with Camel AI

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 7

Integrating with Camel AI

Agentic RAG with multi-agent systems using Camel AI and Qdrant.

What You’ll Learn

Multi-agent system architectures
Agentic RAG patterns and best practices
Agent collaboration and communication
Building autonomous AI systems with Qdrant
Auto-Retrieval with CAMEL for automated RAG processes
Discord bot integration with vector databases

CAMEL Auto-Retrieval Architecture

CAMEL (Communicative Agents for “Mind” Exploration of Large Language Model Society) provides an advanced framework for building multi-agent systems with automated RAG capabilities. The Auto-Retrieval module streamlines the process of expanding agent capabilities by automatically handling context retrieval from vector databases like Qdrant.

Q&A with Similarity Learning

info@qdrant.tech (Andrey Vasnetsov) — Tue, 28 Jun 2022 08:57:07 +0000

Question-answering system with Similarity Learning and Quaterion

Many problems in modern machine learning are approached as classification tasks. Some are the classification tasks by design, but others are artificially transformed into such. And when you try to apply an approach, which does not naturally fit your problem, you risk coming up with over-complicated or bulky solutions. In some cases, you would even get worse performance.

Imagine that you got a new task and decided to solve it with a good old classification approach. Firstly, you will need labeled data. If it came on a plate with the task, you’re lucky, but if it didn’t, you might need to label it manually. And I guess you are already familiar with how painful it might be.

Integrating with Jina AI

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Day 7

Integrating with Jina AI

Advanced multimodal embeddings with Jina AI and Qdrant.

What You’ll Learn

Jina Embeddings v4 model capabilities
Multimodal text and image embeddings
Multi-vector embeddings for enhanced performance
API integration and self-hosting options
Text-to-image retrieval systems
Late chunking for long documents
Performance optimization strategies

Jina AI Multimodal Embeddings

Jina AI provides state-of-the-art deep neural networks for transforming text and images into high-quality vector representations. The Jina Embeddings v4 model represents a breakthrough in multimodal embedding technology, enabling seamless integration of text and image data within a unified vector space for sophisticated search and retrieval applications.

Why Rust?

info@qdrant.tech (Andrey Vasnetsov) — Thu, 11 May 2023 10:00:00 +0100

Building Qdrant in Rust

Looking at the github repository, you can see that Qdrant is built in Rust. Other offerings may be written in C++, Go, Java or even Python. So why does Qdrant chose Rust? Our founder Andrey had built the first prototype in C++, but didn’t trust his command of the language to scale to a production system (to be frank, he likened it to cutting his leg off). He was well versed in Java and Scala and also knew some Python. However, he considered neither a good fit:

Layer Recycling and Fine-tuning Efficiency

info@qdrant.tech (Andrey Vasnetsov) — Tue, 23 Aug 2022 13:00:00 +0300

A recent paper by Allen AI has attracted attention in the NLP community as they cache the output of a certain intermediate layer in the training and inference phases to achieve a speedup of ~83% with a negligible loss in model performance. This technique is quite similar to the caching mechanism in Quaterion, but the latter is intended for any data modalities while the former focuses only on language models despite presenting important insights from their experiments. In this post, I will share our findings combined with those, hoping to provide the community with a wider perspective on layer recycling.

Fine Tuning Similar Cars Search

info@qdrant.tech (Andrey Vasnetsov) — Tue, 28 Jun 2022 13:00:00 +0300

Supervised classification is one of the most widely used training objectives in machine learning, but not every task can be defined as such. For example,

Your classes may change quickly —e.g., new classes may be added over time,
You may not have samples from every possible category,
It may be impossible to enumerate all the possible classes during the training time,
You may have an essentially different task, e.g., search or retrieval.

All such problems may be efficiently solved with similarity learning.

Benchmarks F.A.Q.

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Benchmarks F.A.Q.

Are we biased?

Probably, yes. Even if we try to be objective, we are not experts in using all the existing vector search engines. We build Qdrant and know the most about it. Due to that, we could have missed some important tweaks in different vector search engines.

However, we tried our best, kept scrolling the docs up and down, experimented with combinations of different configurations, and gave all of them an equal chance to stand out. If you believe you can do it better than us, our benchmarks are fully open-sourced, and contributions are welcome!

Data Integrity

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Data Integrity Verification

Once you’ve established a baseline, you first need to check data integrity. Data integrity answers the question: “Did all my data arrive, and did it arrive correctly?” These are the fastest checks to run and catch the most common migration failures.

1. Vector Count Verification

The simplest check: does the number of vectors in Qdrant match your source system?

from qdrant_client import QdrantClient

client = QdrantClient("localhost", port=6333)

# Get collection info
collection_info = client.get_collection("your_collection")
qdrant_count = collection_info.points_count

# Compare against baseline
source_count = baseline["total_vector_count"] # From pre-migration capture

if qdrant_count == source_count:
 print(f"✓ Vector count matches: {qdrant_count}")
else:
 diff = source_count - qdrant_count
 pct = (diff / source_count) * 100
 print(f"✗ Count mismatch: source={source_count}, qdrant={qdrant_count}, "
 f"missing={diff} ({pct:.2f}%)")

Common causes of count mismatches:

Filtering

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Filtering

With Qdrant, you can set conditions when searching or retrieving points. For example, you can impose conditions on both the payload and the id of the point.

Setting additional conditions is important when it is impossible to express all the features of the object in the embedding. Examples include a variety of business requirements: stock availability, user location, or desired price range.

A Complete Guide to Filtering in Vector Search	Developer advice on proper usage and advanced practices.

Filtering clauses

Qdrant allows you to combine conditions in clauses. Clauses are different logical operations, such as OR, AND, and NOT. Clauses can be recursively nested into each other so that you can reproduce an arbitrary boolean expression.

From Weaviate

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate from Weaviate to Qdrant

What You Need from Weaviate

Host URL — the Weaviate instance address
Class name — the class to migrate
Authentication — API key, username/password, or bearer token depending on your setup
Vector dimensions — Weaviate does not expose vector dimensions through its API, so you must know this value

Pre-Create Your Qdrant Collection

curl -X PUT 'https://your-instance.cloud.qdrant.io:6333/collections/your-collection' \
 -H 'api-key: your-qdrant-api-key' \
 -H 'Content-Type: application/json' \
 -d '{
 "vectors": {
 "size": 384,
 "distance": "Cosine"
 }
 }'

Replace 384 with your actual vector dimensions. Set the distance metric to match your Weaviate configuration.

Installation

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Installation requirements

The following sections describe the requirements for deploying Qdrant.

CPU and memory

The preferred size of your CPU and RAM depends on:

Number of vectors
Vector dimensions
Payloads and their indexes
Storage
Replication
How you configure quantization

Our Cloud Pricing Calculator can help you estimate required resources without payload or index data.

Supported CPU architectures:

64-bit system:

x86_64/amd64
AArch64/arm64

32-bit system:

Not supported

Storage

For persistent storage, Qdrant requires block-level access to storage devices with a POSIX-compatible file system. Network systems such as iSCSI that provide block-level access are also acceptable. Qdrant won’t work with Network file systems such as NFS, or Object storage systems such as S3.

LLM-Powered Filter Automation

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

LLM-Powered Filter Automation with Qdrant

Our complete guide to filtering in vector search describes why filtering is important, and how to implement it with Qdrant. However, applying filters is easier when you build an application with a traditional interface. Your UI may contain a form with checkboxes, sliders, and other elements that users can use to set their criteria. But what if you want to build a RAG-powered application with just the conversational interface, or even voice commands? In this case, you need to automate the filtering process!

OpenLIT

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

OpenLIT

OpenLIT is an OpenTelemetry-native LLM Application Observability tool and includes OpenTelemetry auto-instrumentation to monitor Qdrant and provide insights to improve database operations and application performance.

This page assumes you’re using qdrant-client version 1.7.3 or above.

Usage

Step 1: Install OpenLIT

Open your command line or terminal and run:

pip install openlit

Step 2: Initialize OpenLIT in your Application

Integrating OpenLIT into LLM applications is straightforward with just two lines of code:

Querying with Airflow

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Semantic Querying with Airflow and Astronomer

Time: 45 min	Level: Intermediate

In this tutorial, you will use Qdrant as a provider in Apache Airflow, an open-source tool that lets you setup data-engineering workflows.

You will write the pipeline as a DAG (Directed Acyclic Graph) in Python. With this, you can leverage the powerful suite of Python’s capabilities and libraries to achieve almost anything your data pipeline needs.

Quickstart

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Edge Quickstart

Install Qdrant Edge

First, install the Python Bindings for Qdrant Edge or the Rust crate.

Create a Storage Directory

A Qdrant Edge Shard stores its data in a local directory on disk. Create the directory if it doesn’t exist yet:

from pathlib import Path

SHARD_DIRECTORY = "./qdrant-edge-directory"

Path(SHARD_DIRECTORY).mkdir(parents=True, exist_ok=True)

const SHARD_DIRECTORY: &str = "./qdrant-edge-directory";

fs_err::create_dir_all(SHARD_DIRECTORY)?;

Configure the Edge Shard

An Edge Shard is configured with a definition of the dense and sparse vectors that can be stored in the Edge Shard, similar to how you would configure a Qdrant collection.

Quickstart

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

How to Generate Text Embedings with FastEmbed

Install FastEmbed

pip install fastembed

Just for demo purposes, you will use Lists and NumPy to work with sample data.

from typing import List
import numpy as np

Load default model

In this example, you will use the default text embedding model, BAAI/bge-small-en-v1.5.

from fastembed import TextEmbedding

Add sample data

Now, add two sample documents. Your documents must be in a list, and each document must be a string

documents: List[str] = [
 "FastEmbed is lighter than Transformers & Sentence-Transformers.",
 "FastEmbed is supported by and maintained by Qdrant.",
]

Download and initialize the model. Print a message to verify the process.

S3 Ingestion with LangChain

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

S3 Ingestion with LangChain and Qdrant

Time: 30 min	Level: Beginner

Data ingestion into a vector store is essential for building effective search and retrieval algorithms, especially since nearly 80% of data is unstructured, lacking any predefined format.

In this tutorial, we’ll create a streamlined data ingestion pipeline, pulling data directly from AWS S3 and feeding it into Qdrant. We’ll dive into vector embeddings, transforming unstructured data into a format that allows you to search documents semantically. Prepare to discover new ways to uncover insights hidden within unstructured data!

Snapshots

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Backup & Restore Qdrant with Snapshots

Time: 20 min	Level: Beginner

A collection is a basic unit of data storage in Qdrant. It contains vectors, their IDs, and payloads. However, keeping the search efficient requires additional data structures to be built on top of the data. Building these data structures may take a while, especially for large collections. That’s why using snapshots is the best way to export and import Qdrant collections, as they contain all the bits and pieces required to restore the entire collection efficiently.

Understanding Vector Search in Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

How Does Vector Search Work in Qdrant?

If you are still trying to figure out how vector search works, please read ahead. This document describes how vector search is used, covers Qdrant’s place in the larger ecosystem, and outlines how you can use Qdrant to augment your existing projects.

For those who want to start writing code right away, visit our Complete Beginners tutorial to build a search engine in 5-15 minutes.

Vectors

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Vectors

Vectors (or embeddings) are the core concept of the Qdrant Vector Search engine. Vectors define the similarity between objects in the vector space.

If a pair of vectors are similar in vector space, it means that the objects they represent are similar in some way.

For example, if you have a collection of images, you can represent each image as a vector. If two images are similar, their vectors will be close to each other in the vector space.

Managed Services

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Upgrades

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Upgrading Qdrant

If you are several versions behind, multiple updates might be required to reach the latest version. When upgrading Qdrant, upgrade to the latest patch version of each intermediate minor version first. For example, if you are running version 1.15 and want to upgrade to 1.17, you must first upgrade all cluster nodes to 1.16.3 before upgrading to 1.17. A Qdrant node with version 1.17 will be compatible with a node with version 1.16, but not with a node with version 1.15. If you run a single node cluster, you also can not skip versions to ensure that all data migrations are properly applied. Qdrant Cloud does this automatically for you.

Getting Started

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Getting Started with Qdrant Managed Cloud

Welcome to Qdrant Managed Cloud! This document contains all the information you need to get started.

Prerequisites

Before creating a cluster, make sure you have a Qdrant Cloud account. Detailed instructions for signing up can be found in the Qdrant Cloud Setup guide. Qdrant Cloud supports granular role-based access control.

You also need to provide payment details. If you have a custom payment agreement, first create your account, then contact our Support Team to finalize the setup.

Account Setup

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Setting up a Qdrant Cloud Account

Registration

There are different ways to register for a Qdrant Cloud account:

With an email address and passwordless login via email
With a Google account
With a GitHub account
By connection an enterprise SSO solution

Every account is tied to an email address. You can invite additional users to your account and manage their permissions.

Email Registration

Inviting Additional Users to an Account

You can invite additional users to your account, and manage their permissions on the Account -> Access Management page in the Qdrant Cloud Console.

Agentic RAG with LangGraph

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Agentic RAG with LangGraph and Qdrant

Time: 45 min	Level: Intermediate

Traditional Retrieval-Augmented Generation (RAG) systems follow a straightforward path: query → retrieve → generate. Sure, this works well for many scenarios. But let’s face it—this linear approach often struggles when you’re dealing with complex queries that demand multiple steps or pulling together diverse types of information.

Agentic RAG takes things up a notch by introducing AI agents that can orchestrate multiple retrieval steps and smartly decide how to gather and use the information you need. Think of it this way: in an Agentic RAG workflow, RAG becomes just one powerful tool in a much bigger and more versatile toolkit.

From Milvus

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate from Milvus to Qdrant

What You Need from Milvus

Milvus URL — the gRPC endpoint of your Milvus instance
Collection name — the collection to migrate
API key — if using Zilliz Cloud or authenticated Milvus

Concept Mapping

Milvus	Qdrant	Notes
Collection	Collection	One-to-one mapping
Partition	Payload field or separate collection	Use `--milvus.partitions` to specify which partitions to migrate
Schema fields	Payload	Non-vector fields become payload
`COSINE`	`Cosine`	Direct mapping
`L2`	`Euclid`	Direct mapping
`IP` (inner product)	`Dot`	Direct mapping
Dynamic fields	Payload	JSON-typed dynamic fields are preserved

Run the Migration

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration milvus \
 --milvus.url 'your-milvus-host:19530' \
 --milvus.collection 'your-collection' \
 --milvus.api-key 'your-milvus-api-key' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

Migrating Specific Partitions

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration milvus \
 --milvus.url 'your-milvus-host:19530' \
 --milvus.collection 'your-collection' \
 --milvus.partitions 'partition_a,partition_b' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

All Milvus-Specific Flags

Flag	Required	Description
`--milvus.url`	Yes	Milvus gRPC endpoint
`--milvus.collection`	Yes	Collection name to migrate
`--milvus.api-key`	No	API key (for Zilliz Cloud)
`--milvus.username`	No	Username for authentication
`--milvus.password`	No	Password for authentication
`--milvus.db-name`	No	Database name
`--milvus.partitions`	No	Comma-separated partition names
`--milvus.server-version`	No	Override detected server version
`--milvus.enable-tls-auth`	No	Enable TLS authentication

Qdrant-Side Options

Flag	Default	Description
`--qdrant.distance-metric`	—	Distance metric per vector field (map format, e.g., `field1:cosine,field2:dot`)

Gotchas

Partition handling: Milvus partitions can map to Qdrant collections or payload filters. If you merge partitions into a single collection, add a partition name as a payload field for filtering.
Schema strictness: Milvus enforces schema on write; Qdrant is schema-flexible. Verify that the schema-less flexibility didn’t cause payload fields to drift during migration.
Dynamic fields: Milvus dynamic fields (introduced in 2.3) may serialize differently. Check that JSON-typed dynamic fields survived the migration with correct structure.

Next Steps

After migration, verify your data arrived correctly with the Migration Verification Guide.

Hybrid Queries

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Hybrid and Multi-Stage Queries

Available as of v1.10.0

With the introduction of multiple named vectors per point, there are use-cases when the best search is obtained by combining multiple queries, or by performing the search in more than one stage.

Qdrant has a flexible and universal interface to make this possible, called Query API (API reference).

The main component for making the combinations of queries possible is the prefetch parameter, which enables making sub-requests.

Kafka Streaming into Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Stream Real-Time Data into Qdrant with Kafka and Confluent

Author: M K Pavan Kumar , research scholar at IIITDM, Kurnool. Specialist in hallucination mitigation techniques and RAG methodologies. • GitHub • Medium

Introduction

This guide will walk you through the detailed steps of installing and setting up the Qdrant Sink Connector, building the necessary infrastructure, and creating a practical playground application. By the end of this article, you will have a deep understanding of how to leverage this powerful integration to streamline your data workflows, ultimately enhancing the performance and capabilities of your data-driven real-time semantic search and RAG applications.

On-Device Embeddings

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

On-Device Embeddings with Qdrant Edge and FastEmbed

When using Python, you can use the FastEmbed library to generate embeddings for use with Qdrant Edge. FastEmbed provides multimodal models that run efficiently on edge devices to generate vector embeddings from text and images.

Provision the Device

Assuming the devices on which you will run Qdrant Edge have intermittent or no internet connectivity, you need to provision them with the necessary dependencies and model files ahead of time. First, install FastEmbed and the Qdrant Edge Python bindings:

Payload

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Payload

One of the significant features of Qdrant is the ability to store additional information along with vectors. This information is called payload in Qdrant terminology.

Qdrant allows you to store any information that can be represented using JSON.

Here is an example of a typical payload:

{
 "name": "jacket",
 "colors": ["red", "blue"],
 "count": 10,
 "price": 11.99,
 "locations": [
 {
 "lon": 52.5200, 
 "lat": 13.4050
 }
 ],
 "reviews": [
 {
 "user": "alice",
 "score": 4
 },
 {
 "user": "bob",
 "score": 5
 }
 ]
}

Payload types

In addition to storing payloads, Qdrant also allows you search based on certain kinds of values. This feature is implemented as additional filters during the search and will enable you to incorporate custom logic on top of semantic similarity.

Search Quality

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Search Quality Verification

Two systems can hold identical vectors and produce different search results because of differences in indexing, quantization, scoring, and filtering implementation.

This is perhaps the hardest part of migration verification. The guide breaks it into three tiers so you can pick the level of rigor that matches your resources and risk tolerance.

Three-Tiered Search Quality Checks

Tier	Effort	What It Catches	When to Use
Tier 1: Spot-Check	15 min	Gross failures: wrong metric, broken filters, obviously wrong results	Every migration
Tier 2: Statistical Sampling	1-2 hours	Systematic recall degradation, filter interaction bugs, score distribution shifts	Production workloads, >100K vectors
Tier 3: Gold-Standard Evaluation	Half day to days	Measurable relevance changes with confidence intervals	High-stakes search (revenue, safety), regulated industries

Our recommendation: Every migration should run Tier 1 and Tier 2. Tier 3 is for teams that have (or can build) labeled evaluation data. If you don’t have labeled data today, Tier 2 gives you a strong quantitative baseline and this guide shows you how to build toward Tier 3 over time.

Billing & Payments

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Cloud Billing & Payments

Qdrant database clusters in Qdrant Cloud are priced based on CPU, memory, and disk storage usage. To get a clearer idea for the pricing structure, based on the amounts of vectors you want to store, please use our Pricing Calculator.

Billing

You can pay for your Qdrant Cloud database clusters either with a credit card or through an AWS, GCP, or Azure Marketplace subscription.

Your payment method is charged at the beginning of each month for the previous month’s usage. There is no difference in pricing between the different payment methods.

Premium Tier

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Cloud Premium Tier

Qdrant Cloud offers an optional premium tier for customers who require additional features and better SLA support levels. The premium tier includes:

24/7 Support: Our support team is available around the clock to help you with any issues you may encounter (compared to 10x5 in standard).
Shorter Response Times: Premium customers receive priority support and can expect faster response times, with shorter SLAs.
99.9% Uptime SLA: We guarantee 99.9% uptime for your Qdrant Cloud clusters (compared to 99.5% in standard).
Single Sign-On (SSO): Premium customers can use their existing SSO provider to manage access to Qdrant Cloud.
VPC Private Links: Premium customers can connect their Qdrant Cloud clusters to their VPCs using private links.
Storage encryption with shared keys: Premium customers can encrypt their data at rest using their own keys.

Please refer to the Qdrant Cloud SLA for a detailed definition on uptime and support SLAs.

Metric Learning Tips & Tricks

info@qdrant.tech (Andrey Vasnetsov) — Sat, 15 May 2021 10:18:00 +0000

How to train object matching model with no labeled data and use it in production

Currently, most machine-learning-related business cases are solved as a classification problems. Classification algorithms are so well studied in practice that even if the original problem is not directly a classification task, it is usually decomposed or approximately converted into one.

However, despite its simplicity, the classification task has requirements that could complicate its production integration and scaling. E.g. it requires a fixed number of classes, where each class should have a sufficient number of training samples.

Building a Chain-of-Thought Medical Chatbot with Qdrant and DSPy

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Building a Chain-of-Thought Medical Chatbot with Qdrant and DSPy

Accessing medical information from LLMs can lead to hallucinations or outdated information. Relying on this type of information can result in serious medical consequences. Building a trustworthy and context-aware medical chatbot can solve this.

In this article, we will look at how to tackle these challenges using:

Retrieval-Augmented Generation (RAG): Instead of answering the questions from scratch, the bot retrieves the information from medical literature before answering questions.
Filtering: Users can filter the results by specialty and publication year, ensuring the information is accurate and up-to-date.

Let’s discover the technologies needed to build the medical bot.

Collections

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Collections

A collection is a named set of points (vectors with a payload) among which you can search. The vector of each point within the same collection must have the same dimensionality and be compared by a single metric. Named vectors can be used to have multiple vectors in a single point, each of which can have their own dimensionality and metric requirements.

Distance metrics are used to measure similarities among vectors. The choice of metric depends on the way vectors obtaining and, in particular, on the method of neural network encoder training.

Create a Cluster

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Creating a Qdrant Cloud Cluster

Qdrant Cloud offers two types of clusters: Free and Standard.

Free Clusters

Free tier clusters are perfect for prototyping and testing. You don’t need a credit card to join.

A free tier cluster only includes 1 single node with the following resources:

Resource	Value
RAM	1 GB
vCPU	0.5
Disk space	4 GB
Nodes	1

This configuration supports serving about 1 M vectors of 768 dimensions. To calculate your needs, refer to our documentation on Capacity Planning.

Data Migration

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate Your Embeddings to Qdrant

Time: Varies	Level: Intermediate

Migrating data between vector databases, especially across regions, platforms, or deployment types, can be a hassle. That’s where the Qdrant Migration Tool comes in. It supports a wide range of migration needs, including transferring data between Qdrant instances and migrating from other vector database providers to Qdrant.

You can run the migration tool on any machine where you have connectivity to both the source and the target Qdrant databases. Direct connectivity between both databases is not required. For optimal performance, you should run the tool on a machine with a fast network connection and minimum latency to both databases.

Data Synchronization Patterns

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Data Synchronization Patterns

This page describes patterns for synchronizing data between Qdrant Edge Shards and Qdrant server collections. For a practical end-to-end guide on implementing these patterns, refer to the Qdrant Edge Synchronization Guide.

Initialize Edge Shard from Existing Qdrant Collection

Instead of starting with an empty Edge Shard, you may want to initialize it with pre-existing data from a collection on a Qdrant server. You can achieve this by restoring a snapshot of a shard in the server-side collection.

Diagnosing Discrepancies

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Diagnosing Discrepancies

When verification catches a problem, you need to determine whether it’s a data issue (something went wrong during migration) or a configuration issue (the data is correct but the systems behave differently). This page provides a diagnostic decision tree and vendor-specific gotchas.

Decision Tree

Start here when any verification check fails:

Is the vector count wrong?
├─ Yes → Data-level issue
│ ├─ Count lower than expected → Check migration script logs for errors,
│ │ timeouts, or partial failures. Re-run for missing segments.
│ ├─ Count higher than expected → Check for duplicate inserts (retried batches)
│ │ or source count excluding namespaces/partitions.
│ └─ Count matches but IDs differ → ID mapping error during migration.
│
└─ No (count matches) → Continue
 │
 Are metadata fields missing or wrong type?
 ├─ Yes → Payload mapping issue
 │ ├─ Fields missing → Source system may omit null fields on export.
 │ │ Check migration script's null handling.
 │ ├─ Types changed → See "Type Coercion" section below.
 │ └─ Values differ → Encoding issue (UTF-8, special characters, unicode normalization).
 │
 └─ No (metadata looks correct) → Continue
 │
 Are search results completely different?
 ├─ Yes → Configuration-level issue
 │ ├─ Check distance metric (most common cause)
 │ ├─ Check if index is built (HNSW may not be built yet on fresh data)
 │ └─ Check if vectors are normalized (affects cosine vs. dot product)
 │
 └─ No (results overlap but differ at the margins) → Expected behavior
 │
 Is recall@10 below 0.85?
 ├─ Yes → Indexing parameter mismatch
 │ ├─ Compare HNSW ef_construction and M values
 │ ├─ Compare ef (search-time) parameters
 │ └─ Check quantization settings
 │
 └─ No → Migration is working correctly.
 Results differ on borderline cases due to
 ANN approximation. This is normal.

Configuration-Level Issues

Distance Metric Mismatch

The most impactful configuration error. Here’s how metrics map across systems:

Discord RAG Bot

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Agentic RAG Discord Bot with CAMEL-AI and OpenAI

Time: 45 min	Level: Intermediate

Unlike traditional RAG techniques, which passively retrieve context and generate responses, agentic RAG involves active decision-making and multi-step reasoning by the chatbot. Instead of just fetching data, the chatbot makes decisions, dynamically interacts with various data sources, and adapts based on context, giving it a much more dynamic and intelligent approach.

Explore

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Explore the data

After mastering the concepts in search, you can start exploring your data in other ways. Qdrant provides a stack of APIs that allow you to find similar vectors in a different fashion, as well as to find the most dissimilar ones. These are useful tools for recommendation systems, data exploration, and data cleaning.

Recommendation API

In addition to the regular search, Qdrant also allows you to search based on multiple positive and negative examples. The API is called recommend, and the examples can be point IDs, so that you can leverage the already encoded objects; and, as of v1.6, you can also use raw vectors as input, so that you can create your vectors on the fly without uploading them as points.

FastEmbed & Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Using FastEmbed with Qdrant for Vector Search

Install Qdrant Client and FastEmbed

pip install "qdrant-client[fastembed]>=1.14.2"

Initialize the client

Qdrant Client has a simple in-memory mode that lets you try semantic search locally.

from qdrant_client import QdrantClient, models

client = QdrantClient(":memory:") # Qdrant is running from RAM.

Add data

Now you can add two sample documents, their associated metadata, and a point id for each.

docs = [
 "Qdrant has a LangChain integration for chatbots.",
 "Qdrant has a LlamaIndex integration for agents.",
]
metadata = [
 {"source": "langchain-docs"},
 {"source": "llamaindex-docs"},
]
ids = [42, 2]

Create a collection

Qdrant stores vectors and associated metadata in collections. Collection requires vector parameters to be set during creation. In this tutorial, we’ll be using BAAI/bge-small-en to compute embeddings.

From Elasticsearch

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate from Elasticsearch to Qdrant

What You Need from Elasticsearch

Elasticsearch URL — the HTTP endpoint
Index name — the index containing your vectors
Credentials — username/password or API key

Concept Mapping

Elasticsearch	Qdrant	Notes
Index	Collection	One-to-one mapping
Document	Point	Each document becomes a point
`dense_vector` field	Vector	Mapped automatically
Document fields	Payload	Non-vector fields become payload
`cosine`	`Cosine`	ES returns `1 - cosine_distance`; Qdrant returns cosine similarity directly
`l2_norm`	`Euclid`	Direct mapping
`dot_product`	`Dot`	Direct mapping

Run the Migration

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration elasticsearch \
 --elasticsearch.url 'https://your-es-host:9200' \
 --elasticsearch.index 'your-index' \
 --elasticsearch.username 'elastic' \
 --elasticsearch.password 'your-password' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

Using API Key Authentication

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration elasticsearch \
 --elasticsearch.url 'https://your-es-host:9200' \
 --elasticsearch.index 'your-index' \
 --elasticsearch.api-key 'your-es-api-key' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

All Elasticsearch-Specific Flags

Flag	Required	Description
`--elasticsearch.url`	Yes	Elasticsearch HTTP endpoint
`--elasticsearch.index`	Yes	Index to migrate
`--elasticsearch.username`	No	Username for basic auth
`--elasticsearch.password`	No	Password for basic auth
`--elasticsearch.api-key`	No	API key for authentication
`--elasticsearch.insecure-skip-verify`	No	Skip TLS certificate verification

Qdrant-Side Options

Flag	Default	Description
`--qdrant.id-field`	`__id__`	Payload field name for original Elasticsearch document IDs

Hybrid Search Considerations

If your Elasticsearch setup uses hybrid BM25 + kNN scoring, you’ll need to reconstruct this in Qdrant using sparse vectors (for BM25-like behavior) alongside dense vectors. The migration tool transfers the dense vectors; you’ll need to generate sparse vectors separately if you want hybrid search in Qdrant.

From OpenSearch

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate from OpenSearch to Qdrant

What You Need from OpenSearch

OpenSearch URL — the HTTP endpoint
Index name — the index containing your vectors
Credentials — username/password or API key

Concept Mapping

OpenSearch	Qdrant	Notes
Index	Collection	One-to-one mapping
Document	Point	Each document becomes a point
`knn_vector` field	Vector	Mapped automatically
Document fields	Payload	Non-vector fields become payload
`cosinesimil`	`Cosine`	Direct mapping
`l2`	`Euclid`	Direct mapping
`innerproduct`	`Dot`	Direct mapping

Run the Migration

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration opensearch \
 --opensearch.url 'https://your-opensearch-host:9200' \
 --opensearch.index 'your-index' \
 --opensearch.username 'admin' \
 --opensearch.password 'your-password' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

Using API Key Authentication

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration opensearch \
 --opensearch.url 'https://your-opensearch-host:9200' \
 --opensearch.index 'your-index' \
 --opensearch.api-key 'your-opensearch-api-key' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

All OpenSearch-Specific Flags

Flag	Required	Description
`--opensearch.url`	Yes	OpenSearch HTTP endpoint
`--opensearch.index`	Yes	Index to migrate
`--opensearch.username`	No	Username for basic auth
`--opensearch.password`	No	Password for basic auth
`--opensearch.api-key`	No	API key for authentication
`--opensearch.insecure-skip-verify`	No	Skip TLS certificate verification

Qdrant-Side Options

Flag	Default	Description
`--qdrant.id-field`	`__id__`	Payload field name for original OpenSearch document IDs

Gotchas

OpenSearch vs. Elasticsearch: OpenSearch is a fork of Elasticsearch, so many of the same considerations apply. However, the CLI subcommand is opensearch, not elasticsearch.
Score normalization: OpenSearch _score values are not directly comparable to Qdrant scores. Use rank-based metrics when verifying your migration.
Nested documents: OpenSearch nested documents need to be flattened or restructured for Qdrant’s payload model.

Next Steps

After migration, verify your data arrived correctly with the Migration Verification Guide.

Interfaces & Tools

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Multimodal and Multilingual RAG

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Multimodal and Multilingual RAG with LlamaIndex and Qdrant

Time: 15 min	Level: Beginner	Output: GitHub

Overview

We often understand and share information more effectively when combining different types of data. For example, the taste of comfort food can trigger childhood memories. We might describe a song with just “pam pam clap” sounds. Instead of writing paragraphs. Sometimes, we may use emojis and stickers to express how we feel or to share complex ideas.

Multitenancy with LlamaIndex

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Multitenancy with LlamaIndex

If you are building a service that serves vectors for many independent users, and you want to isolate their data, the best practice is to use a single collection with payload-based partitioning. This approach is called multitenancy. Our guide on the Separate Partitions describes how to set it up in general, but if you use LlamaIndex as a backend, you may prefer reading a more specific instruction. So here it is!

Snapshots

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Snapshots

Available as of v0.8.4

Snapshots are tar archive files that contain data and configuration of a specific collection on a specific node at a specific time. In a distributed setup, when you have multiple nodes in your cluster, you must create snapshots for each node separately when dealing with a single collection.

This feature can be used to archive data or easily replicate an existing deployment. For disaster recovery, Qdrant Cloud users may prefer to use Backups instead, which are physical disk-level copies of your data.

Storage

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Storage

All data within one collection is divided into segments. Each segment has its independent vector and payload storage as well as indexes.

Data stored in segments usually do not overlap. However, storing the same point in different segments will not cause problems since the search contains a deduplication mechanism.

The segments consist of vector and payload storages, vector and payload indexes, and id mapper, which stores the relationship between internal and external ids.

Text Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Text Search

Qdrant is a vector search engine, making it a great tool for semantic search. However, Qdrant’s capabilities go beyond just vector search. It also supports a range of lexical search features, including filtering on text fields and full-text search using popular algorithms like BM25.

Semantic Search

Semantic search is a search technique that focuses on the meaning of the text rather than just matching on keywords. This is achieved by converting text into vectors (embeddings) using machine learning models. These vectors capture the semantic meaning of the text, enabling you to find similar text even if it doesn’t share exact keywords.

Qdrant Cloud API

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Cloud API: Powerful gRPC and Flexible REST/JSON Interfaces

Note: This is not the Qdrant REST or gPRC API of the database itself. For database APIs & SDKs, see our list of interfaces

Introduction

The Qdrant Cloud API lets you automate the Qdrant Cloud platform. You can use this API to manage your accounts, clusters, backup schedules, authentication methods, hybrid cloud environments, and more.

To cater to diverse integration needs, the Qdrant Cloud API offers two primary interaction models:

Qdrant Cloud CLI

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Cloud CLI

qcloud is the official command-line interface for managing Qdrant Cloud. It lets you manage clusters, authentication, and anything the Qdrant Cloud API has to offer—all from your terminal.

Installation

From GitHub Releases

Download the latest release from GitHub Releases.

Select the archive that matches your OS and CPU architecture, extract it, and place the qcloud binary somewhere in your PATH (e.g. ~/.local/bin or /usr/local/bin).

macOS: The binary is currently not signed. If macOS blocks it, run xattr -d com.apple.quarantine qcloud after extracting.

Metric Learning for Anomaly Detection

info@qdrant.tech (Andrey Vasnetsov) — Wed, 04 May 2022 13:00:00 +0300

Anomaly detection is a thirsting yet challenging task that has numerous use cases across various industries. The complexity results mainly from the fact that the task is data-scarce by definition.

Similarly, anomalies are, again by definition, subject to frequent change, and they may take unexpected forms. For that reason, supervised classification-based approaches are:

Data-hungry - requiring quite a number of labeled data;
Expensive - data labeling is an expensive task itself;
Time-consuming - you would try to obtain what is necessarily scarce;
Hard to maintain - you would need to re-train the model repeatedly in response to changes in the data distribution.

These are not desirable features if you want to put your model into production in a rapidly-changing environment. And, despite all the mentioned difficulties, they do not necessarily offer superior performance compared to the alternatives. In this post, we will detail the lessons learned from such a use case.

Triplet Loss - Advanced Intro

info@qdrant.tech (Andrey Vasnetsov) — Thu, 24 Mar 2022 15:12:00 +0300

What is Triplet Loss?

Triplet Loss was first introduced in FaceNet: A Unified Embedding for Face Recognition and Clustering in 2015, and it has been one of the most popular loss functions for supervised similarity or metric learning ever since. In its simplest explanation, Triplet Loss encourages that dissimilar pairs be distant from any similar pairs by at least a certain margin value. Mathematically, the loss value can be calculated as $L=max(d(a,p) - d(a,n) + m, 0)$, where:

5-Minute RAG with DeepSeek

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

RAG in 5 Minutes with DeepSeek and Qdrant

Time: 5 min	Level: Beginner	Output: GitHub

This tutorial demonstrates how to build a Retrieval-Augmented Generation (RAG) pipeline using Qdrant as a vector storage solution and DeepSeek for semantic query enrichment. RAG pipelines enhance Large Language Model (LLM) responses by providing contextually relevant data.

Overview

In this tutorial, we will:

Take sample text and turn it into vectors with FastEmbed.
Send the vectors to a Qdrant collection.
Connect Qdrant and DeepSeek into a minimal RAG pipeline.
Ask DeepSeek different questions and test answer accuracy.
Enrich DeepSeek prompts with content retrieved from Qdrant.
Evaluate answer accuracy before and after.

Architecture:

Authentication

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Database Authentication in Qdrant Managed Cloud

This page describes what Database API keys are and shows you how to use the Qdrant Cloud Console to create a Database API key for a cluster. You will learn how to connect to your cluster using the new API key.

Database API keys can be configured with granular access control. Database API keys with granular access control can be recognized by starting with eyJhb. Please refer to the Table of access to understand what permissions you can configure.

From pgvector

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate from pgvector to Qdrant

What You Need from Postgres

Connection URL — a standard Postgres connection string
Table name — the table containing your vector data

Concept Mapping

pgvector	Qdrant	Notes
Table	Collection	One-to-one mapping
Row	Point	Each row becomes a point
`vector` column	Vector	Mapped automatically
Other columns	Payload	All non-vector columns become payload fields
`vector_cosine_ops`	`Cosine`	pgvector returns distance (1 - similarity); Qdrant returns similarity
`vector_l2_ops`	`Euclid`	Direct mapping
`vector_ip_ops`	`Dot`	pgvector uses negative inner product for ordering; scores will be inverted

Run the Migration

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration pg \
 --pg.url 'postgres://user:password@host:5432/dbname' \
 --pg.table 'your_embeddings_table' \
 --pg.key-column 'id' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

Selecting Specific Columns

By default, all columns are migrated. Use --pg.columns to select specific ones:

Indexing

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Indexing

A key feature of Qdrant is the effective combination of vector and traditional indexes. It is essential to have this because for vector search to work effectively with filters, having a vector index only is not enough. In simpler terms, a vector index speeds up vector search, and payload indexes speed up filtering.

The indexes in the segments exist independently, but the parameters of the indexes themselves are configured for the whole collection.

Migrate to a New Embedding Model

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate to a New Embedding Model with Zero Downtime in Qdrant

Time: 40 min	Level: Intermediate

When building a semantic search application, you need to choose an embedding model. Over time, you may want to switch to a different model for better quality or cost-effectiveness. If your application is in production, this must be done with zero downtime to avoid disrupting users. Switching models requires re-embedding all vectors in your collection, which can take time. If your data doesn’t change, you can re-embed everything and switch to the new embeddings. However, in systems with frequent updates, stopping the search service to re-embed is not an option.

Optimize Throughput

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Optimize FastEmbed Throughput

By default, FastEmbed processes documents sequentially in the main processing thread. To optimize throughput, FastEmbed supports processing documents in parallel.

When parallel processing is enabled, FastEmbed splits a dataset across multiple workers, each running an independent copy of the embedding model. Internally, documents are split into batches and put on a shared input queue. Each batch is then processed by one of the workers, put on a shared output queue, and then collected and reordered to match the original input order.

Private Chatbot for Interactive Learning

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Private Chatbot for Interactive Learning

Time: 120 min	Level: Advanced

With chatbots, companies can scale their training programs to accommodate a large workforce, delivering consistent and standardized learning experiences across departments, locations, and time zones. Furthermore, having already completed their online training, corporate employees might want to refer back old course materials. Most of this information is proprietary to the company, and manually searching through an entire library of materials takes time. However, a chatbot built on this knowledge can respond in the blink of an eye.

Search Relevance

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Search Relevance

By default, Qdrant ranks search results based on vector similarity scores. However, you may wish to consider additional factors when ranking results. Qdrant offers several tools to help you accomplish this.

Score Boosting

Available as of v1.14.0

When introducing vector search to specific applications, sometimes business logic needs to be considered for ranking the final list of results.

A quick example is our own documentation search bar. It has vectors for every part of the documentation site. If one were to perform a search by “just” using the vectors, all kinds of elements would be equally considered good results. However, when searching for documentation, we can establish a hierarchy of importance:

Support

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Synchronize with a Server

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Synchronize Qdrant Edge with a Server

Qdrant Edge can be synchronized with a collection from an external Qdrant server to support use cases like:

Offload indexing: Indexing is a computationally expensive operation. By synchronizing an Edge Shard with a server collection, you can offload the indexing process to a more powerful server instance. The indexed data can then be synchronized back to the Edge Shard.
Back up and Restore: Regularly back up your Edge Shard data to a central Qdrant instance to prevent data loss. In case of hardware failure or data corruption, you can restore the data from the central instance.
Data Aggregation: Collect data from multiple Edge Shards deployed in different locations and aggregate it into a central Qdrant instance for comprehensive analysis and reporting.
Synchronization between devices: Keep data consistent across multiple edge devices by synchronizing their Edge Shards with a central Qdrant instance.

Synchronizing Qdrant Edge with a Server

To support having local updates as well as updates from a centralized server, implement a setup with two Edge Shards:

Usage Statistics

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Usage statistics

The Qdrant open-source container image collects anonymized usage statistics from users in order to improve the engine by default. You can deactivate at any time, and any data that has already been collected can be deleted on request.

Deactivating this will not affect your ability to monitor the Qdrant database yourself by accessing the /metrics or /telemetry endpoints of your database. It will just stop sending independent, anonymized usage statistics to the Qdrant team.

Cloud Inference Hybrid Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Hybrid Search Using Qdrant Cloud Inference

Time: 30 min	Level: Intermediate

In this tutorial, we’ll walkthrough building a hybrid semantic search engine using Qdrant Cloud’s built-in inference capabilities. You’ll learn how to:

Automatically embed your data using cloud Inference without needing to run local models,
Combine dense semantic embeddings with sparse BM25 keywords, and
Perform hybrid search using Reciprocal Rank Fusion (RRF) to retrieve the most relevant results.

Initialize the Client

Initialize the Qdrant client after creating a Qdrant Cloud account and a dedicated paid cluster. Set cloud_inference to True to enable cloud inference.

Cluster Access

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Accessing Qdrant Cloud Clusters

Once you created a cluster, and set up an API key, you can access your cluster through the integrated Cluster UI, the REST API and the GRPC API.

Cluster UI

You can access your Cluster UI via the Cluster Details page in the Qdrant Cloud Console. Authentication to a cluster is automatic if your cloud user has the read:cluster_data or write:cluster_data permission. Without the correct permissions you will be prompted to enter an API Key to access the cluster.

From S3 Vectors

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate from S3 Vectors to Qdrant

What You Need from AWS

S3 bucket name — the bucket containing your vector data
Index name — the S3 Vectors index to migrate
AWS credentials — configured via aws configure or environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)

Set your AWS credentials using the AWS CLI's configure command or environment variables before running the migration container.

Run the Migration

docker run --net=host --rm -it \
 -e AWS_ACCESS_KEY_ID='your-access-key' \
 -e AWS_SECRET_ACCESS_KEY='your-secret-key' \
 -e AWS_REGION='us-east-1' \
 registry.cloud.qdrant.io/library/qdrant-migration s3 \
 --s3.bucket 'your-bucket-name' \
 --s3.index 'your-index-name' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

All S3 Vectors-Specific Flags

Flag	Required	Description
`--s3.bucket`	Yes	S3 bucket name
`--s3.index`	Yes	S3 Vectors index name

AWS credentials are passed via environment variables or the default AWS credential chain, not CLI flags.

Implement Cohere RAG connector

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Implement custom connector for Cohere RAG

Time: 45 min	Level: Intermediate

The usual approach to implementing Retrieval Augmented Generation requires users to build their prompts with the relevant context the LLM may rely on, and manually sending them to the model. Cohere is quite unique here, as their models can now speak to the external tools and extract meaningful data on their own. You can virtually connect any data source and let the Cohere LLM know how to access it. Obviously, vector search goes well with LLMs, and enabling semantic search over your data is a typical case.

Low-Latency Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Tips for Low-Latency Search with Qdrant

Scale Horizontally with Replicas

Qdrant can be deployed in a distributed configuration. In distributed mode, multiple instances of Qdrant, called peers, operate as a single entity, called a cluster. Data is stored in collections, which are divided into shards that are distributed across the peers. Each shard can have multiple replicas for redundancy and load balancing. Because every replica of the same shard contains the same data, read requests can be distributed across replicas, reducing latency and increasing throughput.

Monitoring & Telemetry

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Monitoring & Telemetry

Qdrant exposes its metrics in Prometheus/OpenMetrics format, so you can integrate them easily with the compatible tools and monitor Qdrant with your own monitoring system. You can use the /metrics endpoint and configure it as a scrape target.

Metrics endpoint: http://localhost:6333/metrics

The integration with Qdrant is easy to configure with Prometheus and Grafana.

Metrics

Qdrant exposes various metrics in Prometheus/OpenMetrics format, commonly used together with Grafana for monitoring.

n8n Workflow Automation

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Automate Qdrant Workflows with n8n

Time: 45 min	Level: Intermediate

This tutorial shows how to combine Qdrant with n8n low-code automation platform to cover use cases beyond basic Retrieval-Augmented Generation (RAG). You’ll learn how to use vector search for recommendations and unstructured big data analysis.

Setting Up Qdrant in n8n

To start using Qdrant with n8n, you need to provide your Qdrant instance credentials in the credentials tab. Select QdrantApi from the list.

Quantization

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Quantization

Quantization is an optional feature in Qdrant that enables efficient storage and search of high-dimensional vectors. By transforming original vectors into a new representations, quantization compresses data while preserving close to original relative distances between vectors. Different quantization methods have different mechanics and tradeoffs. We will cover them in this section.

Quantization is primarily used to reduce the memory footprint and accelerate the search process in high-dimensional vector spaces. In the context of the Qdrant, quantization allows you to optimize the search engine for specific use cases, striking a balance between accuracy, storage efficiency, and search speed.

Support

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Cloud Support and Troubleshooting

Community Support

All Qdrant Cloud users are welcome to join our Discord community.

Qdrant Cloud Support

Paying customers have access to our Support team. Links to the support portal are available in the Qdrant Cloud Console.

Support is handled via Jira Service Management (JSM). When creating a support ticket, you will be asked to select a request type and provide information to help us understand and prioritize your issue.

Managed Cloud Prometheus Monitoring

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Monitoring Managed Cloud with Prometheus and Grafana

This tutorial will guide you through the process of setting up Prometheus and Grafana to monitor Qdrant databases running in Qdrant Managed Cloud.

Prerequisites

This tutorial assumes that you already have a Kubernetes cluster running where you want to deploy your monitoring stack, and a Qdrant database created in Qdrant Managed Cloud. You should also have kubectl and helm configured to interact with your cluster.

Security

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Cloud Security

Compliance and Certifications

Qdrant is committed to maintaining high standards of security and compliance. We are both SOC2 Type 2 and HIPAA certified, ensuring that our systems and processes meet rigorous security criteria. You can find our compliance reports in our Trust Center. The trust center also contains our internal security policies and procedures, so you can learn how we manage data protection, vulnerabilities, disaster recovery, incident responses, and more.

Self-Hosted Prometheus Monitoring

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Monitoring Hybrid/Private Cloud with Prometheus and Grafana

Time: 30 min	Level: Intermediate

This tutorial will guide you through the process of setting up Prometheus and Grafana to monitor Qdrant databases running in a Kubernetes cluster used for Hybrid or Private Cloud.

Prerequisites

This tutorial assumes that you already have a Kubernetes cluster running and a Qdrant database deployed in it, using either a Hybrid Cloud or Private Cloud deployment. You should also have kubectl and helm configured to interact with your cluster.

From Chroma

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate from Chroma to Qdrant

What You Need from Chroma

Chroma URL — the HTTP endpoint of your Chroma server
Collection name — the collection to migrate
Authentication — API token or basic auth credentials, if configured

Concept Mapping

Chroma	Qdrant	Notes
Collection	Collection	One-to-one mapping
Document	Point	Each document becomes a point
Embeddings	Vector	Mapped automatically
Metadata	Payload	Direct mapping
Documents (text)	Payload field	Stored via `--qdrant.document-field`

Run the Migration

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration chroma \
 --chroma.url 'http://localhost:8000' \
 --chroma.collection 'your-collection' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

With Authentication

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration chroma \
 --chroma.url 'https://your-chroma-host:8000' \
 --chroma.collection 'your-collection' \
 --chroma.auth-type token \
 --chroma.token 'your-chroma-token' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

All Chroma-Specific Flags

Flag	Required	Description
`--chroma.url`	No	Chroma HTTP endpoint (default: `http://localhost:8000`)
`--chroma.collection`	Yes	Collection name to migrate
`--chroma.tenant`	No	Chroma tenant
`--chroma.database`	No	Chroma database
`--chroma.auth-type`	No	`none`, `basic`, or `token` (default: `none`)
`--chroma.username`	No	Username (when auth-type is `basic`)
`--chroma.password`	No	Password (when auth-type is `basic`)
`--chroma.token`	No	Token (when auth-type is `token`)
`--chroma.token-header`	No	Custom header name for token auth

Qdrant-Side Options

Flag	Default	Description
`--qdrant.document-field`	`document`	Payload field name to store Chroma document text
`--qdrant.id-field`	`__id__`	Payload field name for original Chroma IDs
`--qdrant.distance-metric`	`euclid`	`cosine`, `dot`, `manhattan`, or `euclid`

Gotchas

Document text: Chroma stores raw document text alongside embeddings. Use --qdrant.document-field to preserve this text as a payload field in Qdrant.
ID mapping: Chroma uses string IDs. The migration tool maps these to Qdrant point IDs and stores the original Chroma ID in a payload field (default: __id__).
Distance metric: Chroma defaults to L2 distance. Verify which metric your collection uses and set --qdrant.distance-metric accordingly.

Next Steps

After migration, verify your data arrived correctly with the Migration Verification Guide.

Large-Scale Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Large-Scale Search in Qdrant

Time: 2 days	Level: Advanced

In this tutorial, we will describe an approach to upload, index, and search a large volume of data cost-efficiently, on an example of the real-world dataset LAION-400M.

The goal of this tutorial is to demonstrate what minimal amount of resources is required to index and search a large dataset, while still maintaining a reasonable search latency and accuracy.

All relevant code snippets are available in the GitHub repository.

Multitenancy

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Configure Multitenancy

How many collections should you create? In most cases, a single collection per embedding model with payload-based partitioning for different tenants and use cases. This approach is called multitenancy. It is efficient for most users, but requires additional configuration. This document will show you how to set it up.

Question-Answering System for AI Customer Support

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Question-Answering System for AI Customer Support

Time: 120 min	Level: Advanced

Maintaining top-notch customer service is vital to business success. As your operation expands, so does the influx of customer queries. Many of these queries are repetitive, making automation a time-saving solution. Your support team’s expertise is typically kept private, but you can still use AI to automate responses securely.

In this tutorial we will setup a private AI service that answers customer support queries with high accuracy and effectiveness. By leveraging Cohere’s powerful models (deployed to AWS) with Qdrant Hybrid Cloud, you can create a fully private customer support system. Data synchronization, facilitated by Airbyte, will complete the setup.

Security

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Security

Qdrant supports various security features to help you secure your instance. Most of these must to be explicitly configured to make your instance production ready. Please read the following section carefully.

Secure Your Instance

By default, all self-deployed Qdrant instances are not secure. They are open to all network interfaces and do not have any kind of authentication configured. They may be open to everybody on the internet without any restrictions. You must therefore take security measures to make your instance production-ready. Please read through this section carefully for instructions on how to secure your instance.

Video Anomaly Detection Part 1: Architecture, Twelve Labs, and NVIDIA VSS

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Video Anomaly Detection: Architecture, Twelve Labs, and NVIDIA VSS

Time: 90 min	Level: Advanced	Output: GitHub

This is Part 1 of a 3-part series on building real-time video anomaly detection from edge to cloud. We’ll go from architecture and integrations to a production-grade detection pipeline.

Series:

Part 1 | Architecture, Twelve Labs, and NVIDIA VSS (here)
Part 2 | Edge-to-Cloud Pipeline
Part 3 | Scoring, Governance, and Deployment

In this tutorial, you will learn how to build a real-time video anomaly detection system that monitors live surveillance cameras across multiple sites, automatically detecting unusual events without training on specific anomaly types. You’ll see how Qdrant Edge integrates with Twelve Labs and NVIDIA Metropolis VSS to create a production-grade edge-to-cloud detection pipeline deployed on Vultr Cloud GPUs.

Working with miniCOIL

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

How to use miniCOIL, Qdrant’s Sparse Neural Retriever

miniCOIL is an open-sourced sparse neural retrieval model that acts as if a BM25-based retriever understood the contextual meaning of keywords and ranked results accordingly.

miniCOIL scoring is based on the BM25 formula scaled by the semantic similarity between matched keywords in a query and a document. $$ \text{miniCOIL}(D,Q) = \sum_{i=1}^{N} \text{IDF}(q_i) \cdot \text{Importance}^{q_i}_{D} \cdot {\color{YellowGreen}\text{Meaning}^{q_i \times d_j}} \text{, where keyword } d_j \in D \text{ equals } q_i $$

Chat With Product PDF Manuals Using Hybrid Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Chat With Product PDF Manuals Using Hybrid Search

Time: 120 min	Level: Advanced	Output: GitHub

With the proliferation of digital manuals and the increasing demand for quick and accurate customer support, having a chatbot capable of efficiently parsing through complex PDF documents and delivering precise information can be a game-changer for any business.

In this tutorial, we’ll walk you through the process of building a RAG-based chatbot, designed specifically to assist users with understanding the operation of various household appliances. We’ll cover the essential steps required to build your system, including data ingestion, natural language understanding, and response generation for customer support use cases.

Troubleshooting

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Solving common errors

Too many files open (OS error 24)

Each collection segment needs some files to be open. At some point you may encounter the following errors in your server log:

Error: Too many files open (OS error 24)

In such a case you may need to increase the limit of the open files. It might be done, for example, while you launch the Docker container:

docker run --ulimit nofile=10000:10000 qdrant/qdrant:latest

The command above will set both soft and hard limits to 10000.

Video Anomaly Detection Part 2: Edge-to-Cloud Pipeline

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Video Anomaly Detection: Edge-to-Cloud Pipeline

Time: 90 min	Level: Advanced	Output: GitHub

This is Part 2 of a 3-part series on building real-time video anomaly detection from edge to cloud.

Series:

Part 1 | Architecture, Twelve Labs, and NVIDIA VSS
Part 2 | Edge-to-Cloud Pipeline (here)
Part 3 | Scoring, Governance, and Deployment

In Part 1, we set up the project, covered why kNN anomaly detection in Qdrant outperforms classifiers, integrated Twelve Labs for video embeddings and Q&A, and connected NVIDIA VSS. Now we build the edge.

Neural Search 101: A Complete Guide and Step-by-Step Tutorial

info@qdrant.tech (Andrey Vasnetsov) — Thu, 10 Jun 2021 10:18:00 +0000

Neural Search 101: A Comprehensive Guide and Step-by-Step Tutorial

Information retrieval technology is one of the main technologies that enabled the modern Internet to exist. These days, search technology is the heart of a variety of applications. From web-pages search to product recommendations. For many years, this technology didn’t get much change until neural networks came into play.

In this guide we are going to find answers to these questions:

Configuration

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Configuration

Qdrant ships with sensible defaults for collection and network settings that are suitable for most use cases. You can view these defaults in the Qdrant source. If you need to customize the settings, you can do so using configuration files and environment variables.

Configuration Files

To customize Qdrant, you can mount your configuration file in any of the following locations. This guide uses .yaml files, but Qdrant also supports other formats such as .toml, .json, and .ini.

Qdrant Multi-Vector Certification

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Multi-Vector Search Certification

Congratulations! You’ve completed the Multi-Vector Search course. You didn’t just learn how to store vectors; you learned how to build high-performance retrieval systems using late interaction models and multi-vector representations.

You’ve moved past single-vector embeddings and dove deep into ColBERT, ColPali, MaxSim scoring, MUVERA, and production-grade multi-vector pipelines. That effort deserves more than just a “finished” status. It deserves professional recognition!

Get #QdrantCertified

Your expertise is now production-ready. It’s time to validate those skills with our official certification.

Region-Specific Contract Management System

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Region-Specific Contract Management System

Time: 90 min	Level: Advanced

Contract management benefits greatly from Retrieval Augmented Generation (RAG), streamlining the handling of lengthy business contract texts. With AI assistance, complex questions can be asked and well-informed answers generated, facilitating efficient document management. This proves invaluable for businesses with extensive relationships, like shipping companies, construction firms, and consulting practices. Access to such contracts is often restricted to authorized team members due to security and regulatory requirements, such as GDPR in Europe, necessitating secure storage practices.

Scale Clusters

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Scaling Qdrant Cloud Clusters

The amount of data is always growing and at some point you might need to change the capacity of your cluster. You can easily scale your Qdrant cluster up or down from the Cluster detail page in the Qdrant Cloud console.

Vertical Scaling

Vertical scaling is the process of increasing the capacity of a cluster by adding or removing CPU, storage and memory resources on each database node.

Video Anomaly Detection Part 3: Scoring, Governance, and Deployment

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Video Anomaly Detection: Scoring, Governance, and Deployment

Time: 90 min	Level: Advanced	Output: GitHub

This is Part 3 of a 3-part series on building real-time video anomaly detection from edge to cloud.

Series:

Part 1 | Architecture, Twelve Labs, and NVIDIA VSS
Part 2 | Edge-to-Cloud Pipeline
Part 3 | Scoring, Governance, and Deployment (here)

In Part 1, we set up the architecture, Twelve Labs integration, and NVIDIA VSS connection. In Part 2, we built Qdrant Edge’s two-shard architecture and the escalation pipeline. Now we turn raw scores into incidents, protect the baseline, and deploy.

Working with SPLADE

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

How to Generate Sparse Vectors with SPLADE

SPLADE is a novel method for learning sparse text representation vectors, outperforming BM25 in tasks like information retrieval and document classification. Its main advantage is generating efficient and interpretable sparse vectors, making it effective for large-scale text data.

Setup

First, install FastEmbed.

pip install -q fastembed

Next, import the required modules for sparse embeddings and Python’s typing module.

from fastembed import SparseTextEmbedding, SparseEmbedding

You may always check the list of all supported sparse embedding models.

Administration

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Administration

Qdrant exposes administration tools which enable to modify at runtime the behavior of a qdrant instance without changing its configuration manually.

Recovery mode

Available as of v1.2.0

Recovery mode can help in situations where Qdrant fails to start repeatedly. When starting in recovery mode, Qdrant only loads collection metadata to prevent going out of memory. This allows you to resolve out of memory situations, for example, by deleting a collection. After resolving Qdrant can be restarted normally to continue operation.

Configure Clusters

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Configure Qdrant Cloud Clusters

Qdrant Cloud offers several advanced configuration options to optimize clusters for your specific needs. You can access these options from the Cluster Details page in the Qdrant Cloud console.

The cloud platform does not expose all configuration options available in Qdrant. We have selected the relevant options that are explained in detail below.

In addition, the cloud platform automatically configures the following settings for your cluster to ensure optimal performance and reliability:

Monitor Clusters

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Monitoring Qdrant Cloud Clusters

Telemetry

Qdrant Cloud provides you with a set of metrics to monitor the health of your database cluster. You can access these metrics in the Qdrant Cloud Console in the Metrics and Request sections of the Cluster Details page.

Logs

Logs of the database cluster are available in the Qdrant Cloud Console in the Logs section of the Cluster Details page.

Alerts

The account owner will receive automatic alerts via email if your cluster has any of the following issues:

RAG System for Employee Onboarding

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

RAG System for Employee Onboarding

Public websites are a great way to share information with a wide audience. However, finding the right information can be challenging, if you are not familiar with the website’s structure or the terminology used. That’s what the search bar is for, but it is not always easy to formulate a query that will return the desired results, if you are not yet familiar with the content. This is even more important in a corporate environment, and for the new employees, who are just starting to learn the ropes, and don’t even know how to ask the right questions yet. You may have even the best intranet pages, but onboarding is more than just reading the documentation, it is about understanding the processes. Semantic search can help with finding right resources easier, but wouldn’t it be easier to just chat with the website, like you would with a colleague?

Update Clusters

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Updating Qdrant Cloud Clusters

As soon as a new Qdrant version is available. Qdrant Cloud will show you an update notification in the Cluster list and on the Cluster details page.

To update to a new version, go to the Cluster Details page, choose the new version from the version dropdown and click Update.

If you are several versions behind, multiple updates might be required to reach the latest version. In this case, Qdrant Cloud will automatically perform the required intermediate updates to ensure a supported update path. You need to ensure that your client applications and used SDKs are compatible with the target version.

Filterable HNSW

info@qdrant.tech (Andrey Vasnetsov) — Sun, 24 Nov 2019 22:44:08 +0300

If you need to find some similar objects in vector space, provided e.g. by embeddings or matching NN, you can choose among a variety of libraries: Annoy, FAISS or NMSLib. All of them will give you a fast approximate neighbors search within almost any space.

But what if you need to introduce some constraints in your search? For example, you want search only for products in some category or select the most similar customer of a particular brand. I did not find any simple solutions for this. There are several discussions like this, but they only suggest to iterate over top search results and apply conditions consequently after the search.

Distributed Deployment

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Distributed deployment

Since version v0.8.0 Qdrant supports a distributed deployment mode. In this mode, multiple Qdrant services communicate with each other to distribute the data across the peers to extend the storage capabilities and increase stability.

How many Qdrant nodes should I run?

The ideal number of Qdrant nodes depends on how much you value cost-saving, resilience, and performance/scalability in relation to each other.

Prioritizing cost-saving: If cost is most important to you, run a single Qdrant node. This is not recommended for production environments. Drawbacks:

Private RAG Information Extraction Engine

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Private RAG Information Extraction Engine

Time: 90 min	Level: Advanced

Handling private documents is a common task in many industries. Various businesses possess a large amount of unstructured data stored as huge files that must be processed and analyzed. Industry reports, financial analysis, legal documents, and many other documents are stored in PDF, Word, and other formats. Conversational chatbots built on top of RAG pipelines are one of the viable solutions for finding the relevant answers in such documents. However, if we want to extract structured information from these documents, and pass them to downstream systems, we need to use a different approach.

Working with ColBERT

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

How to Generate ColBERT Multivectors with FastEmbed

ColBERT

ColBERT is an embedding model that produces a matrix (multivector) representation of input text, generating one vector per token (a token being a meaningful text unit for a machine learning model). This approach allows ColBERT to capture more nuanced input semantics than many dense embedding models, which represent an entire input with a single vector. By producing more granular input representations, ColBERT becomes a strong retriever. However, this advantage comes at the cost of increased resource consumption compared to traditional dense embedding models, both in terms of speed and memory.

Backup Clusters

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Backing up Qdrant Cloud Clusters

Qdrant organizes cloud instances as clusters. On occasion, you may need to restore your cluster because of application or system failure.

You may already have a source of truth for your data in a regular database. If you have a problem, you could reindex the data into your Qdrant vector search cluster. However, this process can take time. For high availability critical projects we recommend replication. It guarantees the proper cluster functionality as long as at least one replica is running.

Introducing Qdrant 0.11

info@qdrant.tech (Andrey Vasnetsov) — Wed, 26 Oct 2022 13:55:00 +0200

We are excited to announce the release of Qdrant v0.11, which introduces a number of new features and improvements.

Replication

One of the key features in this release is replication support, which allows Qdrant to provide a high availability setup with distributed deployment out of the box. This, combined with sharding, enables you to horizontally scale both the size of your collections and the throughput of your cluster. This means that you can use Qdrant to handle large amounts of data without sacrificing performance or reliability.

Movie Recommendation System

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Movie Recommendation System

Time: 120 min	Level: Advanced	Output: GitHub

In this tutorial, you will build a mechanism that recommends movies based on defined preferences. Vector databases like Qdrant are good for storing high-dimensional data, such as user and item embeddings. They can enable personalized recommendations by quickly retrieving similar entries based on advanced indexing techniques. In this specific case, we will use sparse vectors to create an efficient and accurate recommendation system.

Running with GPU

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Running Qdrant with GPU Support

Starting from version v1.13.0, Qdrant offers support for GPU acceleration.

However, GPU support is not included in the default Qdrant binary due to additional dependencies and libraries. Instead, you will need to use dedicated Docker images with GPU support (NVIDIA, AMD).

Configuration

Qdrant includes a number of configuration options to control GPU usage. The following options are available:

gpu:
 # Enable GPU indexing.
 indexing: false
 # Force half precision for `f32` values while indexing.
 # `f16` conversion will take place 
 # only inside GPU memory and won't affect storage type.
 force_half_precision: false
 # Used vulkan "groups" of GPU. 
 # In other words, how many parallel points can be indexed by GPU.
 # Optimal value might depend on the GPU model.
 # Proportional, but doesn't necessary equal
 # to the physical number of warps.
 # Do not change this value unless you know what you are doing.
 # Default: 512
 groups_count: 512
 # Filter for GPU devices by hardware name. Case insensitive.
 # Comma-separated list of substrings to match 
 # against the gpu device name.
 # Example: "nvidia"
 # Default: "" - all devices are accepted.
 device_filter: ""
 # List of explicit GPU devices to use.
 # If host has multiple GPUs, this option allows to select specific devices
 # by their index in the list of found devices.
 # If `device_filter` is set, indexes are applied after filtering.
 # By default, all devices are accepted.
 devices: null
 # How many parallel indexing processes are allowed to run.
 # Default: 1
 parallel_indexes: 1
 # Allow to use integrated GPUs.
 # Default: false
 allow_integrated: false
 # Allow to use emulated GPUs like LLVMpipe. Useful for CI.
 # Default: false
 allow_emulated: false

It is not recommended to change these options unless you are familiar with the Qdrant internals and the Vulkan API.

Qdrant 0.10 released

info@qdrant.tech (Andrey Vasnetsov) — Mon, 19 Sep 2022 13:30:00 +0200

Qdrant 0.10 is a new version that brings a lot of performance improvements, but also some new features which were heavily requested by our users. Here is an overview of what has changed.

Storing multiple vectors per object

Previously, if you wanted to use semantic search with multiple vectors per object, you had to create separate collections for each vector type. This was even if the vectors shared some other attributes in the payload. With Qdrant 0.10, you can now store all of these vectors together in the same collection, which allows you to share a single copy of the payload. This makes it easier to use semantic search with multiple vector types, and reduces the amount of work you need to do to set up your collections.

Blog-Reading Chatbot with GPT-4o

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Blog-Reading Chatbot with GPT-4o

Time: 90 min	Level: Advanced	GitHub

In this tutorial, you will build a RAG system that combines blog content ingestion with the capabilities of semantic search. OpenAI’s GPT-4o LLM is powerful, but scaling its use requires us to supply context systematically.

RAG enhances the LLM’s generation of answers by retrieving relevant documents to aid the question-answering process. This setup showcases the integration of advanced search and AI language processing to improve information retrieval and generation tasks.

From Redis

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate from Redis to Qdrant

What You Need from Redis

Redis address — host and port of your Redis instance
FT index name — the RediSearch full-text index that contains your vectors
Authentication — username and password, if configured

Concept Mapping

Redis	Qdrant	Notes
FT Index	Collection	One-to-one mapping
Document	Point	Each document becomes a point
Vector field	Vector	Named vectors are preserved
Hash/JSON fields	Payload	Direct mapping
Document key	Payload field	Stored via `--qdrant.id-field`

Run the Migration

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration redis \
 --redis.index 'your-ft-index' \
 --redis.addr 'localhost:6379' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

With Authentication

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration redis \
 --redis.index 'your-ft-index' \
 --redis.addr 'your-redis-host:6379' \
 --redis.username 'your-username' \
 --redis.password 'your-password' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection' \
 --migration.create-collection false

All Redis-Specific Flags

Flag	Required	Description
`--redis.index`	Yes	RediSearch FT index name
`--redis.addr`	No	Redis address (default: `localhost:6379`)
`--redis.protocol`	No	Redis protocol version (default: `2`)
`--redis.username`	No	Username for authentication
`--redis.password`	No	Password for authentication
`--redis.client-name`	No	Client name
`--redis.db`	No	Database number
`--redis.network`	No	Network type: `tcp` or `unix` (default: `tcp`)

Qdrant-Side Options

Flag	Default	Description
`--qdrant.id-field`	`__id__`	Payload field name for original Redis document keys

Gotchas

Named vectors: If your Redis index has multiple vector fields, all are migrated as named vectors. Ensure your pre-created Qdrant collection has a matching named vector configuration.
ID mapping: Redis document keys are converted to Qdrant point IDs. The original key is stored in the payload under --qdrant.id-field.

Next Steps

After migration, verify your data arrived correctly with the Migration Verification Guide.

Optimize Performance

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Optimizing Qdrant Performance: Three Scenarios

Different use cases require different balances between memory usage, search speed, and precision. Qdrant is designed to be flexible and customizable so you can tune it to your specific needs.

This guide will walk you three main optimization strategies:

High Speed Search & Low Memory Usage
High Precision & Low Memory Usage
High Precision & High Speed Search

1. High-Speed Search with Low Memory Usage

To achieve high search speed with minimal memory usage, you can store vectors on disk while minimizing the number of disk reads. Vector quantization is a technique that compresses vectors, allowing more of them to be stored in memory, thus reducing the need to read from disk.

Optimizer

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Optimizer

It is much more efficient to apply changes in batches than perform each change individually, as many other databases do. Qdrant here is no exception. Since Qdrant operates with data structures that are not always easy to change, it is sometimes necessary to rebuild those structures completely.

Storage optimization in Qdrant occurs at the segment level (see storage). In this case, the segment to be optimized remains readable for the time of the rebuild.

From MongoDB

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate from MongoDB to Qdrant

What You Need from MongoDB

Connection string — MongoDB URI (e.g., mongodb://user:pass@host:27017)
Database name — the database containing your collection
Collection name — the collection to migrate
Vector field names — the names of fields that store vector embeddings

Concept Mapping

MongoDB	Qdrant	Notes
Collection	Collection	One-to-one mapping
Document	Point	Each document becomes a point
Vector field	Vector	Named vectors are preserved
Non-vector fields	Payload	Direct mapping
`_id` (ObjectID or string)	Point ID + Payload	Converted to UUID; original stored in payload

Run the Migration

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration mongodb \
 --mongodb.url 'mongodb://localhost:27017' \
 --mongodb.database 'your-database' \
 --mongodb.collection 'your-collection' \
 --mongodb.vector-fields 'embedding' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

With Multiple Vector Fields

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration mongodb \
 --mongodb.url 'mongodb+srv://user:pass@cluster.mongodb.net' \
 --mongodb.database 'your-database' \
 --mongodb.collection 'your-collection' \
 --mongodb.vector-fields 'title_embedding,body_embedding' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection' \
 --migration.create-collection false

All MongoDB-Specific Flags

Flag	Required	Description
`--mongodb.url`	Yes	MongoDB connection string
`--mongodb.database`	Yes	Database name
`--mongodb.collection`	Yes	Collection name
`--mongodb.vector-fields`	Yes	Comma-separated list of vector field names

Qdrant-Side Options

Flag	Default	Description
`--qdrant.id-field`	`__id__`	Payload field name for original MongoDB `_id` values

Gotchas

Vector field names are required: MongoDB has no schema-level marker for vector fields. You must explicitly list them via --mongodb.vector-fields.
ID mapping: MongoDB _id values (ObjectID or string) are converted to Qdrant UUIDs. The original value is stored in payload under --qdrant.id-field.

Next Steps

After migration, verify your data arrived correctly with the Migration Verification Guide.

Reranking with FastEmbed

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

How to use rerankers with FastEmbed

Rerankers

A reranker is a model that improves the ordering of search results. A subset of documents is initially retrieved using a fast, simple method (e.g., BM25 or dense embeddings). Then, a reranker – a more powerful, precise, but slower and heavier model – re-evaluates this subset to refine document relevance to the query.

Rerankers analyze token-level interactions between the query and each document in depth, making them expensive to use but precise in defining relevance. They trade speed for accuracy, so they are best used on a limited candidate set rather than the entire corpus.

Inference

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Inference in Qdrant Managed Cloud

Inference is the process of creating vector embeddings from text, images, or other data types using a machine learning model.

Qdrant Managed Cloud allows you to use inference directly in the cloud, without the need to set up and maintain your own inference infrastructure. You can use embedding models hosted on Qdrant Cloud, or use externally hosted models.

From FAISS

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate from FAISS to Qdrant

What You Need from FAISS

Index file path — path to your FAISS index file
Distance metric — the metric used when the index was built (l2, inner product, etc.)

Important: Only non-quantized FAISS index types are supported. Quantized indexes (e.g., IndexIVFPQ) do not store the original vectors and cannot be migrated.

Supported Index Types

FAISS Index Type	Supported	Notes
`IndexFlatL2`	Yes	Maps to `euclid` distance
`IndexFlatIP`	Yes	Maps to `dot` distance
`IndexHNSWFlat`	Yes	Full vectors are stored
`IndexIVFFlat`	Yes	Full vectors are stored
`IndexIVFPQ`	No	Quantized — original vectors not stored
`IndexPQ`	No	Quantized — original vectors not stored

Concept Mapping

FAISS	Qdrant	Notes
Index	Collection	One-to-one mapping
Vector (by position)	Point	Position in index becomes point ID

Run the Migration

docker run --net=host --rm -it \
 -v /path/to/your/index:/data \
 registry.cloud.qdrant.io/library/qdrant-migration faiss \
 --faiss.index-path '/data/your-index.index' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection' \
 --qdrant.distance-metric cosine

All FAISS-Specific Flags

Flag	Required	Description
`--faiss.index-path`	Yes	Path to the FAISS index file (inside the container)

Qdrant-Side Options

Flag	Default	Description
`--qdrant.distance-metric`	`cosine`	Distance metric: `cosine`, `dot`, `euclid`, or `manhattan`

Gotchas

No metadata: FAISS indexes store only vectors. All points will have empty payloads. If you have a separate metadata store keyed by vector position, import that separately after migration.
Point IDs: Points are assigned IDs based on their position in the FAISS index. Use this to join with any external metadata store.

Next Steps

After migration, verify your data arrived correctly with the Migration Verification Guide.

Multi-Vector Postprocessing

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Multi-Vector Postprocessing

FastEmbed’s postprocessing module provides techniques for transforming and optimizing embeddings after generation. These postprocessing methods can improve search performance, reduce storage requirements, or adapt embeddings for specific use cases.

Currently, the postprocessing module includes MUVERA (Multi-Vector Retrieval Algorithm) for speeding up multi-vector embeddings. Additional postprocessing techniques are planned for future releases.

MUVERA

MUVERA transforms variable-length sequences of vectors into fixed-dimensional single-vector representations. These approximations can be used for fast initial retrieval using traditional vector search methods like HNSW. Once you’ve retrieved a small set of candidates quickly, you can then rerank them using the original multi-vector representations for maximum accuracy.

Articles

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

From Apache Solr

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate from Apache Solr to Qdrant

What You Need from Solr

Solr URL — the base URL of your Solr instance (e.g., http://localhost:8983)
Collection name — the Solr collection to migrate
Authentication — username and password, if configured

Concept Mapping

Solr	Qdrant	Notes
Collection	Collection	One-to-one mapping
Document	Point	Each document becomes a point
Dense vector field	Vector	Named vectors are preserved
Non-vector fields	Payload	Direct mapping
Document ID (`id` field)	Payload field	Stored via `--qdrant.id-field`

Run the Migration

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration solr \
 --solr.url 'http://localhost:8983' \
 --solr.collection 'your-collection' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection'

With Authentication

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration solr \
 --solr.url 'https://your-solr-host:8983' \
 --solr.collection 'your-collection' \
 --solr.username 'your-username' \
 --solr.password 'your-password' \
 --qdrant.url 'https://your-instance.cloud.qdrant.io:6334' \
 --qdrant.api-key 'your-qdrant-api-key' \
 --qdrant.collection 'your-collection' \
 --migration.create-collection false

All Solr-Specific Flags

Flag	Required	Description
`--solr.url`	Yes	Solr base URL (e.g., `http://localhost:8983`)
`--solr.collection`	Yes	Solr collection name
`--solr.username`	No	Username for basic authentication
`--solr.password`	No	Password for basic authentication
`--solr.insecure-skip-verify`	No	Skip TLS certificate verification (default: `false`)

Qdrant-Side Options

Flag	Default	Description
`--qdrant.id-field`	`__id__`	Payload field name for original Solr document IDs

Gotchas

Named vectors: If your Solr schema has multiple dense vector fields, all are migrated as named vectors. Ensure your pre-created collection has matching named vector configurations.
ID mapping: Solr document IDs (strings) are converted to Qdrant UUIDs. The original Solr ID is stored in payload under --qdrant.id-field.

Next Steps

After migration, verify your data arrived correctly with the Migration Verification Guide.

Getting Started

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate to Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

From Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Migrate Between Qdrant Instances

Use the qdrant subcommand to copy a collection from one Qdrant instance to another — or between collections within the same instance. The tool automatically recreates the full collection schema (vector config, HNSW settings, quantization, sharding) on the target.

What You Need

Source Qdrant URL — gRPC endpoint of the source instance
Target Qdrant URL — gRPC endpoint of the target instance
Source collection name
Target collection name (must be different from source if using the same instance)
API keys — for each instance, if authentication is enabled

Concept Mapping

Source Qdrant	Target Qdrant	Notes
Collection	Collection	Recreated with exact schema
Named vectors	Named vectors	All vector types preserved
Sparse vectors	Sparse vectors	Direct mapping
Payload	Payload	Direct mapping
Payload indexes	Payload indexes	Recreated if `--target.ensure-payload-indexes` is `true`
Shard keys	Shard keys	Recreated automatically

Run the Migration

Between Two Instances

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration qdrant \
 --source.url 'http://source-instance:6334' \
 --source.api-key 'source-api-key' \
 --source.collection 'your-collection' \
 --target.url 'https://your-instance.cloud.qdrant.io:6334' \
 --target.api-key 'your-qdrant-api-key' \
 --target.collection 'your-collection'

Within the Same Instance

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration qdrant \
 --source.url 'http://localhost:6334' \
 --source.collection 'original-collection' \
 --target.url 'http://localhost:6334' \
 --target.collection 'new-collection'

With Parallel Workers

docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration qdrant \
 --source.url 'http://source-instance:6334' \
 --source.api-key 'source-api-key' \
 --source.collection 'your-collection' \
 --target.url 'https://your-instance.cloud.qdrant.io:6334' \
 --target.api-key 'your-qdrant-api-key' \
 --target.collection 'your-collection' \
 --migration.num-workers 4

All Source Flags

Flag	Required	Description
`--source.collection`	Yes	Source collection name
`--source.url`	No	Source gRPC URL (default: `http://localhost:6334`)
`--source.api-key`	No	API key for the source instance
`--source.max-message-size`	No	Maximum gRPC message size in bytes (default: `33554432` = 32 MB)

All Target Flags

Flag	Required	Description
`--target.collection`	Yes	Target collection name
`--target.url`	No	Target gRPC URL (default: `http://localhost:6334`)
`--target.api-key`	No	API key for the target instance
`--target.ensure-payload-indexes`	No	Recreate payload indexes from source (default: `true`)

Parallel Worker Option

Flag	Default	Description
`--migration.num-workers`	Number of CPU cores	Number of parallel workers for migration

Gotchas

Source and target must differ: You cannot migrate a collection to itself.
Parallel workers and resume: Migration progress is tracked per worker. If you change --migration.num-workers between runs, the saved offsets are invalidated and the migration restarts from scratch. Use --migration.restart explicitly if you intentionally want to change the worker count.
Large messages: If you encounter gRPC message size errors, increase --source.max-message-size.
Existing target collection: If the target collection already exists, the tool uses it as-is without modifying the schema.

Next Steps

After migration, verify your data arrived correctly with the Migration Verification Guide.

Qdrant Quickstart

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

How to Get Started with Qdrant Locally

In this short example, you will use the Python Client to create a Collection, load data into it and run a basic search query.

Download and run

First, download the latest Qdrant image from Dockerhub:

docker pull qdrant/qdrant

Then, run the service:

docker run -p 6333:6333 -p 6334:6334 \
 -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
 qdrant/qdrant

Under the default configuration all data will be stored in the ./qdrant_storage directory. This will also be the only directory that both the Container and the host machine can both see.

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Static Embeddings

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Static Embeddings in Practice

Time: 20 min	Level: Intermediate

In the world of resource-constrained computing, a quiet revolution is taking place. While transformers dominate leaderboards with their impressive capabilities, static embeddings are making an unexpected comeback, offering remarkable speed improvements with surprisingly small quality trade-offs. We evaluated how Qdrant users can benefit from this renaissance, and the results are promising.

What makes static embeddings different?

Transformers are often seen as the only way to go when it comes to embeddings. The use of attention mechanisms helps to capture the relationships between the input tokens, so each token gets a vector representation that is context-aware and defined not only by the token itself but also by the surrounding tokens. Transformer-based models easily beat the quality of the older methods, such as word2vec or GloVe, which could only create a single vector embedding per each word. As a result, the word “bank” would have identical representation in the context of “river bank” and “financial institution”.

Courses

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

User Manual

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Inference

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Inference

Inference is the process of using a machine learning model to create vector embeddings from text, images, or other data types. While you can create embeddings on the client side, you can also let Qdrant generate them while storing or querying data.

There are several advantages to generating embeddings with Qdrant:

No need for external pipelines or separate model servers.
Work with a single unified API instead of a different API per model provider.
No external network calls, minimizing delays or data transfer overhead.

Depending on the model you want to use, inference can be executed:

Retrieval Optimization

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Web UI

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Web UI

You can manage both local and cloud Qdrant deployments through the Web UI.

If you’ve set up a deployment locally with the Qdrant Quickstart, navigate to http://localhost:6333/dashboard.

If you’ve set up a deployment in a cloud cluster, find your Cluster URL in your cloud dashboard, at https://cloud.qdrant.io. Add :6333/dashboard to the end of the URL.

Access the Web UI

Qdrant’s Web UI is an intuitive and efficient graphic interface for your Qdrant Collections, REST API and data points.

API & SDKs

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Interfaces

Qdrant supports these “official” clients.

Note: If you are using a language that is not listed here, you can use the REST API directly or generate a client for your language using OpenAPI or protobuf definitions.

Client Libraries

Client Repository	Installation	Version
Python + (Client Docs)	`pip install qdrant-client[fastembed]`	Latest Release
JavaScript / Typescript	`npm install @qdrant/js-client-rest`	Latest Release
Rust	`cargo add qdrant-client`	Latest Release
Go	`go get github.com/qdrant/go-client`	Latest Release
.NET	`dotnet add package Qdrant.Client`	Latest Release
Java	Available on Maven Central	Latest Release

API Reference

All interaction with Qdrant takes place via the REST API. We recommend using REST API if you are using Qdrant for the first time or if you are working on a prototype.

Qdrant Tools

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Tutorials

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant MCP Server

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Tutorials

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Overview

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Tutorial Repository

Basic Tutorials

Get up and running with Qdrant in minutes.

Tutorial	Objective	Stack	Time	Level
Qdrant Local Quickstart	Basic CRUD operations and local deployment.	Python	10m	Beginner
Semantic Search 101	Build a search engine for science fiction books.	Python	5m	Beginner

Search Engineering Tutorials

Master vector search modalities, reranking, and retrieval quality.

Tutorial	Objective	Stack	Time	Level
Semantic Search Intro	Deploy a search service for company descriptions.	FastAPI	30m	Beginner
Hybrid Search with FastEmbed	Combine dense and sparse search.	FastAPI	20m	Beginner
Relevance Feedback	Relevance Feedback Retrieval in Qdrant	Python	30m	Intermediate
Collaborative Filtering	Collaborative filtering using sparse embeddings.	Python	45m	Intermediate
Multivector Document Retrieval	PDF RAG using ColPali and embedding pooling.	Python	30m	Intermediate
Retrieval Quality Evaluation	Measure quality and tune HNSW parameters.	Python	30m	Intermediate
Hybrid Search with Reranking	Implement late interaction and sparse reranking.	Python	40m	Intermediate
Semantic Search for Code	Navigate codebases using vector similarity.	Python	45m	Intermediate
Multivectors and Late Interaction	Effective use of multivector representations.	Python	30m	Intermediate
Static Embeddings	Evaluate the utility of static embeddings.	Python	20m	Intermediate

Operations & Scale

Production-grade management, monitoring, and high-volume optimization.

Integrations

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Support

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Release Notes

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Vultr and Qdrant Hybrid Cloud Support Next-Gen AI Projects

info@qdrant.tech (Andrey Vasnetsov) — Wed, 10 Apr 2024 00:08:00 +0000

We’re excited to share that Qdrant and Vultr are partnering to provide seamless scalability and performance for vector search workloads. With Vultr’s global footprint and customizable platform, deploying vector search workloads becomes incredibly flexible. Qdrant’s new Qdrant Hybrid Cloud offering and its Kubernetes-native design, coupled with Vultr’s straightforward virtual machine provisioning, allows for simple setup when prototyping and building next-gen AI apps.

Adapting to Diverse AI Development Needs with Customization and Deployment Flexibility

In the fast-paced world of AI and ML, businesses are eagerly integrating AI and generative AI to enhance their products with new features like AI assistants, develop new innovative solutions, and streamline internal workflows with AI-driven processes. Given the diverse needs of these applications, it’s clear that a one-size-fits-all approach doesn’t apply to AI development. This variability in requirements underscores the need for adaptable and customizable development environments.

Vector Search in constant time

info@qdrant.tech (Andrey Vasnetsov) — Sat, 01 Apr 2023 00:48:00 +0000

The advent of quantum computing has revolutionized many areas of science and technology, and one of the most intriguing developments has been its potential application to artificial neural networks (ANNs). One area where quantum computing can significantly improve performance is in vector search, a critical component of many machine learning tasks. In this article, we will discuss the concept of quantum quantization for ANN vector search, focusing on the conversion of float32 to qbit vectors and the ability to perform vector search on arbitrary-sized databases in constant time.

STACKIT and Qdrant Hybrid Cloud for Best Data Privacy

info@qdrant.tech (Andrey Vasnetsov) — Wed, 10 Apr 2024 00:07:00 +0000

Qdrant and STACKIT are thrilled to announce that developers are now able to deploy a fully managed vector database to their STACKIT environment with the introduction of Qdrant Hybrid Cloud. This is a great step forward for the German AI ecosystem as it enables developers and businesses to build cutting edge AI applications that run on German data centers with full control over their data.

Vector databases are an essential component of the modern AI stack. They enable rapid and accurate retrieval of high-dimensional data, crucial for powering search, recommendation systems, and augmenting machine learning models. In the rising field of GenAI, vector databases power retrieval-augmented-generation (RAG) scenarios as they are able to enhance the output of large language models (LLMs) by injecting relevant contextual information. However, this contextual information is often rooted in confidential internal or customer-related information, which is why enterprises are in pursuit of solutions that allow them to make this data available for their AI applications without compromising data privacy, losing data control, or letting data exit the company’s secure environment.

Qdrant Hybrid Cloud and Scaleway Empower GenAI

info@qdrant.tech (Andrey Vasnetsov) — Wed, 10 Apr 2024 00:06:00 +0000

In a move to empower the next wave of AI innovation, Qdrant and Scaleway collaborate to introduce Qdrant Hybrid Cloud, a fully managed vector database that can be deployed on existing Scaleway environments. This collaboration is set to democratize access to advanced AI capabilities, enabling developers to easily deploy and scale vector search technologies within Scaleway’s robust and developer-friendly cloud infrastructure. By focusing on the unique needs of startups and the developer community, Qdrant and Scaleway are providing access to intuitive and easy to use tools, making cutting-edge AI more accessible than ever before.

Red Hat OpenShift and Qdrant Hybrid Cloud Offer Seamless and Scalable AI

info@qdrant.tech (Andrey Vasnetsov) — Thu, 11 Apr 2024 00:04:00 +0000

We’re excited about our collaboration with Red Hat to bring the Qdrant vector database to Red Hat OpenShift customers! With the release of Qdrant Hybrid Cloud, developers can now deploy and run the Qdrant vector database directly in their Red Hat OpenShift environment. This collaboration enables developers to scale more seamlessly, operate more consistently across hybrid cloud environments, and maintain complete control over their vector data. This is a big step forward in simplifying AI infrastructure and empowering data-driven projects, like retrieval augmented generation (RAG) use cases, advanced search scenarios, or recommendations systems.

Qdrant and OVHcloud Bring Vector Search to All Enterprises

info@qdrant.tech (Andrey Vasnetsov) — Wed, 10 Apr 2024 00:05:00 +0000

With the official release of Qdrant Hybrid Cloud, businesses running their data infrastructure on OVHcloud are now able to deploy a fully managed vector database in their existing OVHcloud environment. We are excited about this partnership, which has been established through the OVHcloud Open Trusted Cloud program, as it is based on our shared understanding of the importance of trust, control, and data privacy in the context of the emerging landscape of enterprise-grade AI applications. As part of this collaboration, we are also providing a detailed use case tutorial on building a recommendation system that demonstrates the benefits of running Qdrant Hybrid Cloud on OVHcloud.

New RAG Horizons with Qdrant Hybrid Cloud and LlamaIndex

info@qdrant.tech (Andrey Vasnetsov) — Wed, 10 Apr 2024 00:04:00 +0000

We’re happy to announce the collaboration between LlamaIndex and Qdrant’s new Hybrid Cloud launch, aimed at empowering engineers and scientists worldwide to swiftly and securely develop and scale their GenAI applications. By leveraging LlamaIndex’s robust framework, users can maximize the potential of vector search and create stable and effective AI products. Qdrant Hybrid Cloud offers the same Qdrant functionality on a Kubernetes-based architecture, which further expands the ability of LlamaIndex to support any user on any environment.

Developing Advanced RAG Systems with Qdrant Hybrid Cloud and LangChain

info@qdrant.tech (Andrey Vasnetsov) — Sun, 14 Apr 2024 00:04:00 +0000

LangChain and Qdrant are collaborating on the launch of Qdrant Hybrid Cloud, which is designed to empower engineers and scientists globally to easily and securely develop and scale their GenAI applications. Harnessing LangChain’s robust framework, users can unlock the full potential of vector search, enabling the creation of stable and effective AI products. Qdrant Hybrid Cloud extends the same powerful functionality of Qdrant onto a Kubernetes-based architecture, enhancing LangChain’s capability to cater to users across any environment.

Cutting-Edge GenAI with Jina AI and Qdrant Hybrid Cloud

info@qdrant.tech (Andrey Vasnetsov) — Wed, 10 Apr 2024 00:03:00 +0000

We’re thrilled to announce the collaboration between Qdrant and Jina AI for the launch of Qdrant Hybrid Cloud, empowering users worldwide to rapidly and securely develop and scale their AI applications. By leveraging Jina AI’s top-tier large language models (LLMs), engineers and scientists can optimize their vector search efforts. Qdrant’s latest Hybrid Cloud solution, designed natively with Kubernetes, seamlessly integrates with Jina AI’s robust embedding models and APIs. This synergy streamlines both prototyping and deployment processes for AI solutions.

Qdrant Hybrid Cloud and Haystack for Enterprise RAG

info@qdrant.tech (Andrey Vasnetsov) — Wed, 10 Apr 2024 00:02:00 +0000

We’re excited to share that Qdrant and Haystack are continuing to expand their seamless integration to the new Qdrant Hybrid Cloud offering, allowing developers to deploy a managed vector database in their own environment of choice. Earlier this year, both Qdrant and Haystack, started to address their user’s growing need for production-ready retrieval-augmented-generation (RAG) deployments. The ability to build and deploy AI apps anywhere now allows for complete data sovereignty and control. This gives large enterprise customers the peace of mind they need before they expand AI functionalities throughout their operations.

Qdrant Hybrid Cloud and DigitalOcean for Scalable and Secure AI Solutions

info@qdrant.tech (Andrey Vasnetsov) — Thu, 11 Apr 2024 00:02:00 +0000

Developers are constantly seeking new ways to enhance their AI applications with new customer experiences. At the core of this are vector databases, as they enable the efficient handling of complex, unstructured data, making it possible to power applications with semantic search, personalized recommendation systems, and intelligent Q&A platforms. However, when deploying such new AI applications, especially those handling sensitive or personal user data, privacy becomes important.

DigitalOcean and Qdrant are actively addressing this with an integration that lets developers deploy a managed vector database in their existing DigitalOcean environments. With the recent launch of Qdrant Hybrid Cloud, developers can seamlessly deploy Qdrant on DigitalOcean Kubernetes (DOKS) clusters, making it easier for developers to handle vector databases without getting bogged down in the complexity of managing the underlying infrastructure.

Enhance AI Data Sovereignty with Aleph Alpha and Qdrant Hybrid Cloud

info@qdrant.tech (Andrey Vasnetsov) — Thu, 11 Apr 2024 00:01:00 +0000

Aleph Alpha and Qdrant are on a joint mission to empower the world’s best companies in their AI journey. The launch of Qdrant Hybrid Cloud furthers this effort by ensuring complete data sovereignty and hosting security. This latest collaboration is all about giving enterprise customers complete transparency and sovereignty to make use of AI in their own environment. By using a hybrid cloud vector database, those looking to leverage vector search for the AI applications can now ensure their proprietary and customer data is completely secure.

Elevate Your Data With Airbyte and Qdrant Hybrid Cloud

info@qdrant.tech (Andrey Vasnetsov) — Wed, 10 Apr 2024 00:00:00 +0000

In their mission to support large-scale AI innovation, Airbyte and Qdrant are collaborating on the launch of Qdrant’s new offering - Qdrant Hybrid Cloud. This collaboration allows users to leverage the synergistic capabilities of both Airbyte and Qdrant within a private infrastructure. Qdrant’s new offering represents the first managed vector database that can be deployed in any environment. Businesses optimizing their data infrastructure with Airbyte are now able to host a vector database either on premise, or on a public cloud of their choice - while still reaping the benefits of a managed database product.

Ecosystem Guides

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Practice Datasets

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Common Datasets in Snapshot Format

You may find that creating embeddings from datasets is a very resource-intensive task. If you need a practice dataset, feel free to pick one of the ready-made snapshots on this page. These snapshots contain pre-computed vectors that you can easily import into your Qdrant instance.

Available datasets

Our snapshots are usually generated from publicly available datasets, which are often used for non-commercial or academic purposes. The following datasets are currently available. Please click on a dataset name to see its detailed description.

Qdrant Skills for AI Agents

info@qdrant.tech (Andrey Vasnetsov) — Tue, 31 Mar 2026 00:00:00 +0000

The standard RAG tutorial teaches a simple pattern: embed your documents, store them in a vector database, retrieve the top K, and feed them to the LLM. The vector engine is passive infrastructure. Put vectors in, get neighbors out. Configure once, forget about it forever.

This mental model is why most AI agents treat vector search as a black box. They can call the API. They cannot make the engineering decisions that determine whether it works well.

Master Multi-Vector Search With Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 24 Mar 2026 00:00:00 +0000

Most vector search tutorials stop at single-vector embeddings: one document, one vector, one similarity score. That works for demos. It falls apart when your retrieval pipeline needs to capture fine-grained token-level interactions across text, images, and PDFs at production scale.

Until now, engineers who wanted to go deeper had to piece together scattered papers, blog posts, and half-documented repos. There was no structured, hands-on resource that connected the theory of late interaction models to real implementation in a production search engine.

Start with pgvector: Why You'll Outgrow It Faster Than You Think

info@qdrant.tech (Andrey Vasnetsov) — Tue, 17 Mar 2026 00:00:00 +0000

The most common advice in every vector database thread online is some version of “start with pgvector, graduate later.” We analyzed 110+ community threads from Hacker News and Reddit to see if the data supports this heuristic. The short answer is that it’s more nuanced than it sounds, and most applications will hit its limits sooner than expected.

The Appeal of “Just Use pgvector”

This advice is attractive for obvious reasons. If you’re already running Postgres — and most teams are — pgvector gives you vector search without new infrastructure, new ops burden, or new sync headaches. One system, one deployment.

Video Anomaly Detection From Edge to Cloud With Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Sun, 15 Mar 2026 00:00:00 +0000

What if your surveillance cameras could detect fights, accidents, intrusions, and equipment failures without ever being trained on those specific events?

Traditional video classifiers need labeled examples of every anomaly type you want to catch. That breaks in the real world. You can’t enumerate everything that could go wrong, and the moment something new happens, your model scores 0.0.

We built a system that takes a different approach: reframe anomaly detection as a nearest-neighbor search problem. Instead of asking “is this a fight?”, ask “how different is this from what we normally see?” That question is a vector distance calculation, and Qdrant answers it in sub-millisecond time.

We Raised $50M to Build Composable Vector Search as Core Infrastructure

info@qdrant.tech (Andrey Vasnetsov) — Thu, 12 Mar 2026 00:00:00 +0000

Today we’re announcing $50 million in Series B funding, led by AVP, with participation from Bosch Ventures, Unusual Ventures, Spark Capital, and 42CAP.

Retrieval Is on the Critical Path of Every AI System

Every serious AI workload — RAG, agents, multimodal search — depends on retrieving the right information, at the right time, under real constraints. Teams prototype with whatever is convenient, then hit walls in production: indexes that stall under writes, filtering applied after search instead of during it, tail latencies that spike under load. These aren’t configuration problems. They’re architectural ones. And they’re why we started Qdrant.

Qdrant Meets Google Gemini Embedding 2

info@qdrant.tech (Andrey Vasnetsov) — Tue, 10 Mar 2026 00:00:00 +0000

Until now, building a search system that understands both what a document says and what its images, videos, or audio convey could be complex and limited. It meant stitching together multiple embedding models and writing code to reconcile results across modalities. That pipeline complexity may have been the single biggest barrier to production-grade multimodal retrieval.

Today, Google launched Gemini Embedding 2 in Public Preview, the first fully multimodal embedding model in the Gemini family, and Qdrant supports it from day one. This post explains what the model offers, why Qdrant is a natural fit, and how to get started.

How GlassDollar improved high-recall sourcing by migrating from Elasticsearch to Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Wed, 04 Mar 2026 00:00:00 +0000

GlassDollar helps enterprises such as Siemens, Mahle, and A2A discover, compare, and run proof-of-concepts with innovative startups. The platform combines an chatbot-like experience for discovering innovate companies with tools to manager innovation projects from start to finish.

For GlassDollar, search is not a feature. It is the core mechanism that turns an enterprise problem statement into a shortlist of relevant companies, ranked and contextualized for decision-making.

Scaling to 10 Million Documents Pushed Search to its Limits

GlassDollar’s dataset spans a few million companies. Each company can map to multiple documents, including product descriptions and technical summaries.

Convolve 4.0 - IIT Hackathon Winners

info@qdrant.tech (Andrey Vasnetsov) — Fri, 27 Feb 2026 00:00:00 +0000

Builders from across India came together for Convolve 4.0 - A Pan IIT AI/ML Hackathon, hosted by IIT Madras, to develop impactful, real-world AI systems across critical domains. The hackathon focused on multi-agent systems, retrieval-augmented generation (RAG), vector memory, multimodal intelligence, and production-ready AI pipelines.

Participants built solutions spanning healthcare and medical reasoning, disaster response and crisis intelligence, climate and risk monitoring, legal-tech and governance systems, misinformation detection, public safety, infrastructure auditing, education, and civic-tech platforms. Many projects emphasized persistent AI memory, domain-aware guardrails, spatial intelligence, and collaborative agent architectures designed for long-term, scalable deployment.

How My AskAI Built Self-Improving Support Agents

info@qdrant.tech (Andrey Vasnetsov) — Wed, 25 Feb 2026 00:00:00 +0000

My AskAI built a managed platform for AI customer support agents that plug directly into existing helpdesk tools like Intercom and Zendesk . The goal was to make AI behave like a reliable coworker, not a brittle chatbot. In production, My AskAI’s agents are designed to resolve a large portion of inbound support requests automatically, then hand over to a human when the agent cannot answer confidently. My AskAI positions this as deflecting around 75 percent of support requests and sustaining a resolution rate in the low to mid 70s, depending on the time window and workload mix.

Qdrant 1.17 - Relevance Feedback & Search Latency Improvements

info@qdrant.tech (Andrey Vasnetsov) — Fri, 20 Feb 2026 00:00:00 -0800

Qdrant 1.17.0 is out! Let’s look at the main features for this version:

Relevance Feedback Query: Improve the quality of search results by incorporating information about their relevance.

Search Latency Improvements: Manage search latency with new tools, such as an update queue and delayed fan-outs, as well as many internal search performance improvements.

Greater Operational Observability: Better insights into operational metrics and faster troubleshooting with a new cluster-wide telemetry API and segment optimization monitoring.

How Bazaarvoice scaled AI-powered product insights with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 10 Feb 2026 00:00:00 +0000

Turning billions of reviews into real-time, actionable intelligence

Bazaarvoice powers ratings and reviews across the global ecommerce ecosystem, connecting brands, retailers, and consumers through authentic product feedback. From brand-owned storefronts to major retailers, Bazaarvoice sources, verifies, and amplifies reviews at a scale few companies ever reach.

As large language models (LLMs) became production-ready, Bazaarvoice saw an opportunity to enhance the experiences of their clients’ shoppers. The company wanted to help shoppers ask questions directly on product detail pages using natural language and help brands extract meaningful insights from vast volumes of unstructured customer feedback.

Sketch & Search: Google Deepmind x Qdrant x Freepik Hackathon Winners

info@qdrant.tech (Andrey Vasnetsov) — Tue, 03 Feb 2026 00:00:00 +0000

Builders from around the world came together for Sketch & Search, a global hackathon powered by Google DeepMind, Freepik, and Qdrant, to explore the future of AI-driven creative pipelines.

Teams were challenged to go beyond single-prompt generation and build end-to-end systems combining generative models, visual creation, and vector search. Submissions showcased consistent characters and style memory, image-as-prompt and image-to-video workflows, intelligent asset discovery, recommendations, and built-in brand-safe guardrails.

The hackathon kicked off in San Francisco on November 22, 2025, followed by a two-week virtual build window and a live demo day where winners were announced. Projects were judged on creative quality, effective search and similarity, UX tradeoffs, guardrails, and real-world applicability.

Two Approaches to Helping AI Agents Use Your API (And Why You Need Both)

info@qdrant.tech (Andrey Vasnetsov) — Wed, 28 Jan 2026 00:00:00 -0800

AI coding agents fail in predictable ways when working with APIs. Two recent approaches from Mintlify and Armin Ronacher attack different failure modes. Understanding both reveals something useful about how agents should interact with developer tools.

Two Failure Modes

When an agent writes code against your API, it can fail because:

It doesn’t know what it doesn’t know. The agent uses a deprecated method, misconfigures a parameter, or violates a constraint that isn’t obvious from type signatures. This is the “known unknowns” problem: things the API maintainer knows but the agent doesn’t.

How Anima Health scaled clinical document intelligence with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Wed, 28 Jan 2026 00:00:00 +0000

Primary care systems across the UK are under intense strain. General practitioners (GPs) balance their time with patient demand, understaffing, administrative burden vs. delivering care. Anima Health set out to address this challenge by building a clinical operating system designed to make primary care more efficient, more informed, and more humane for both clinicians and patients.

At the heart of Anima’s platform is the ability to process large volumes of unstructured clinical data, including documents, test results, referral letters, and notes, while maintaining strict privacy guarantees. To achieve this at scale, Anima relies on Qdrant as a core infrastructure component for vector search, similarity analysis, and agentic AI workflows.

Qdrant Academy Expands with Official Certification

info@qdrant.tech (Andrey Vasnetsov) — Wed, 28 Jan 2026 00:00:00 +0000

Since we first announced Qdrant Academy, our mission has been to provide developers with more than just documentation. We wanted to build a structured path to mastering vector search. As the AI search landscape matures, the distinction between a simple storage layer and a high-performance vector search engine has become the defining factor in production-grade RAG and recommendation systems.

Today, we are thrilled to take the next step in that mission. It’s time to move from learning to proving your expertise with the launch of our first official certification.

How Kakao Built an AI-Powered Internal Service Desk with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 27 Jan 2026 00:00:00 +0000

Kakao is one of South Korea’s leading technology companies, best known for KakaoTalk, the country’s dominant messaging platform with over 48 million monthly active users. Beyond messaging, Kakao operates a broad ecosystem of services including maps, mobility, fintech, and enterprise solutions.

Helping employees find answers faster without sacrificing precision or control

Kakao’s Connectivity Platform team set out to solve a familiar internal problem: employees across the organization needed a faster, more reliable way to get answers about internal systems, APIs, and operational procedures. The result was Service Desk Agent, an AI-powered internal service desk designed to answer questions in natural language using Kakao’s internal documentation and historical inquiry data.

Building real-time multimodal similarity search in Flipkart Trust & Safety with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Fri, 09 Jan 2026 00:00:00 +0000

Tackling fraud and abuse with scalable similarity search

At Flipkart, the Trust & Safety team is focused on detecting and preventing platform abuse and fraud. A critical part of this work involves running large-scale similarity searches across customer and seller-submitted data, particularly images. This allows the team to identify patterns associated with fraudulent activity, such as repeat returns or duplicate seller claims, before they cause downstream harm.

“Platform integrity is a constant challenge. To stay ahead of fraudulent actors, we needed a system that could compare multimodal data in real time, not just in long-running batch jobs.”

Qdrant 2025 Recap: Powering the Agentic Era

info@qdrant.tech (Andrey Vasnetsov) — Wed, 17 Dec 2025 00:00:00 +0000

This year was a defining year for Qdrant. Not because of a single feature or launch, but because of a clear shift in what the platform enables. As AI systems moved from static assistants to autonomous, multi-step agents, the demands placed on retrieval changed fundamentally. Speed alone was no longer enough. Production systems now require precise relevance control, predictable performance at scale, and the flexibility to run wherever data and users live.

New DeepLearning.AI Course on Multi-Vector Image Retrieval with ColPali and MUVERA

info@qdrant.tech (Andrey Vasnetsov) — Thu, 11 Dec 2025 17:00:00 +0000

We’re thrilled to announce our latest collaboration with DeepLearning.AI: Multi-Vector Image Retrieval. Building on the success of our previous course on retrieval optimization, this intermediate-level course takes you deeper into advanced search techniques that are transforming how AI systems understand and retrieve visual content.

Led once again by Qdrant’s Kacper Łukawski, Senior Developer Advocate, this free course is designed for AI builders working with multi-modal data who want to implement cutting-edge image retrieval in their applications.

How Cosmos delivered editorial-grade visual search with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Thu, 20 Nov 2025 00:00:00 +0000

Cosmos is redefining how people find inspiration online. It’s a visual search app built for creative professionals and everyday users who want a clean, meditative, ad-free place to collect and curate ideas. In contrast to feeds dominated by doomscrolling, ads, and generative “AI slop,” Cosmos focuses on high-quality, human-made content. AI-powered search and captions connect each image to its creator, making visual discovery richer, more accurate, and easier to navigate.

Qdrant 1.16 - Tiered Multitenancy & Disk-Efficient Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Wed, 19 Nov 2025 00:00:00 -0800

Qdrant 1.16.0 is out! Let’s look at the main features for this version:

Tiered Multitenancy: An improved approach to multitenancy that enables you to combine small and large tenants in a single collection, with the ability to promote growing tenants to dedicated shards.

ACORN: A new search algorithm that improves the quality of filtered vector search in cases of multiple filters with weak selectivity.

Inline Storage: A new HNSW index storage mode that stores vector data directly inside HNSW nodes, enabling efficient disk-based vector search.

How Dragonfruit AI scaled real-time computer vision with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Thu, 13 Nov 2025 00:00:00 +0000

Dragonfruit AI scales real-time computer vision with Qdrant

Building enterprise-ready computer vision

Dragonfruit AI builds enterprise-ready computer vision solutions, turning ordinary IP camera feeds into actionable insights for security, safety, operations, and compliance. Their platform ships a suite of AI “agents,” including retail loss prevention and warehouse safety, that run with a patented “Split AI” approach: real-time inference on-prem for speed and bandwidth efficiency, paired with cloud services for aggregation and search. Dragonfruit needed to keep total cost of ownership low, meet strict latency targets, and operate reliably across hundreds of sites with thousands of cameras; all without asking customers to rip and replace existing infrastructure.

How Xaver scaled personalized financial advice with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Thu, 13 Nov 2025 00:00:00 +0000

How Xaver Built its AI Knowledge Engine with Qdrant

Xaver is tackling a core challenge in the financial industry: scaling personalized financial and retirement advice. As demographic shifts increase demand for private pensions, traditional, manual consultation models are proving too slow and costly to support everyone who needs help.

To solve this, Xaver provides banks, insurers and distributors with a vertically specialized and compliant agentic sales platform. This technology acts as both an AI sales assistant for human advisors and as an autonomous agent to deliver compliant, personalized financial guidance to consumers via phone, video avatars, messengers and web journeys.

Building Performant, Scaled Agentic Vector Search with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Sun, 26 Oct 2025 00:00:00 +0000

Overview

AI agents have grown from simple Q&A chatbots into systems that can independently plan, retrieve, act, and verify tasks. As developers work to recreate real-life workflows with agents, a common starting point is to give your agent access to a search API.

The Limitations of Agents

While agents have proven they can create incredible impact, they still face serious limitations without the right tools. This is where a simple search box isn’t enough, and agents often fail when they move from prototype to production in three key areas:

Qdrant Academy Launches with Qdrant Essentials Course

info@qdrant.tech (Andrey Vasnetsov) — Thu, 23 Oct 2025 00:00:00 +0000

Today, we’re proud to launch Qdrant Academy, a new learning site designed to help developers, data scientists, and engineers build real-world vector search systems.

This started with a mission: to make learning more accessible, scalable, and frictionless for practitioners around the world. And today, we crossed the first milestone in our mission by launching our first course, Qdrant Essentials.

With Qdrant Essentials, you get a free, self-paced, structured learning course that teaches the fundamentals of vector search, embeddings, and productionalizing AI systems using Qdrant. You’ll learn not just what vector search is, but how to build, query, and optimize search with real projects, exercises, and examples.

How TrustGraph built enterprise-grade agentic AI with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Fri, 10 Oct 2025 00:00:00 +0000

TrustGraph + Qdrant: A Technical Deep Dive

When teams first experiment with agentic AI, the journey often starts with a slick demo: a few APIs stitched together, a large language model answering questions, and just enough smoke and mirrors to impress stakeholders.

But as soon as those demos face enterprise requirements (constant data ingestion, compliance, thousands of users, 24×7 uptime), the illusion breaks. Services stall at the first failure, query reliability plummets, and regulatory guardrails are nowhere to be found. What worked in a five-minute demo becomes impossible to maintain in production.

All Vectors Lead to Community: Vector Space Day 2025 Recap

info@qdrant.tech (Andrey Vasnetsov) — Tue, 30 Sep 2025 00:00:00 +0000

[See all event slides here]

On September 26, 2025, nearly 400 developers, researchers, and engineers came together at the Colosseum Theater in Berlin for the first-ever Qdrant Vector Space Day.

From the start, the day belonged to the community. Over coffee and fresh Qdrant swag, the conversations quickly moved to embeddings, hybrid search, and AI agents. Laptops flipped open, QR codes were shared, and the room filled with people eager to trade ideas and learn from one another.

Thinking Outside the Bot with 2025 Hackathon Winners

info@qdrant.tech (Andrey Vasnetsov) — Mon, 29 Sep 2025 00:00:00 +0000

Over the past several weeks, builders from around the world proved that vector search is about much more than chatbots. We challenged teams to think beyond RAG, and they delivered: robotics safety reflexes, event discovery on routes, 3D shopping, video game characters, and more.

Winners were announced live in Berlin on Friday, September 26, 2025 at Vector Space Day. Full hackathon details are here: Hackathon page.

With numerous submissions from around the world, hackathon judges used criteria of Creativity, Technical Depth, and Qdrant Usage to evaluate each submission and determine the top projects. There was $10,000 in prizes from Qdrant as well as many additional bonus prizes for using partner tech:

Announcing the Vector Space Day 2025 Speaker Lineup

info@qdrant.tech (Andrey Vasnetsov) — Mon, 15 Sep 2025 00:00:00 +0000

Announcing the Vector Space Day 2025 Speaker Lineup

We are just days away from Vector Space Day in Berlin, and the full speaker lineup is here! This year’s program spans keynotes, deep-dive technical sessions, and lightning talks, covering everything from benchmarking search engines to scalable AI memory and multimodal embeddings. Here’s what to expect.

Opening Keynotes

The day begins with perspectives from across the ecosystem:

Andre Zayarni, Andrey Vasnetsov, and Neil Kanungo sharing Qdrant’s vision for the future of vector search and how devs can engage with the Qdrant Community.
Robert Eichenseer (Microsoft), Kevin Cochrane (Vultr), and Inaam Syed (AWS) offering insights on how cloud, infrastructure, and developer communities are reshaping AI systems.

Breakout Sessions

Track A: Milky Way - Architectures, Infrastructure and Multimodal Retrieval

AskNews - Building a News Sleuth for the Deep Research Paradigm: How high-performance hybrid retrieval can support investigative journalism and geopolitical risk monitoring.
Delivery Hero - How to Cheat at Benchmarking Search Engines: Lessons from building reproducible benchmarking harnesses and public leaderboards.
Neo4j - Hands-On GraphRAG: Practical guidance on combining knowledge graphs with RAG for more explainable retrieval.
Superlinked - Beyond Text-Only: How mixture of encoders unlocks advanced retrieval using Google DeepMind’s latest embeddings.
Jina AI - Vision-Language Models for Embedding: Training insights for multimodal embeddings that span text, diagrams, and UI screenshots.
TwelveLabs - Practical Multimodal Embeddings: Real workflows for cross-modal video search and recommendations.
Baseten - High Throughput, Low Latency Embedding Pipelines: Patterns and open-source tools for production-ready embedding inference.
Google DeepMind - Vector Search with Gemini and EmbeddingGemma: Deploying cutting-edge embeddings with the right indexing strategies.

Track B: Andromeda - AI Workflows, Agents and Applications

Linkup - Beyond Web Search: Infrastructure for AI-native agents that need structured, real-time web intelligence.
Cognee - Building Scalable AI Memory: Abstractions that sync graphs and vectors for durable, multi-backend AI memory.
n8n - Evaluate Your Qdrant-RAG Agents: A live no-code session on agent evaluation using n8n’s native tools.
Arize AI - Self-Improving Evaluations: Feedback loops and tracing for reliable agentic RAG in production.
LlamaIndex - Vector Databases for Workflow Engineering: Using Qdrant to orchestrate context-aware AI pipelines.
deepset - Agent-Powered Retrieval with Haystack and Qdrant: When retrieval agents outperform or overcomplicate pipelines.
GoodData - Scaling Real-Time RAG for Analytics: Lessons from streaming BI artifacts into Qdrant for natural-language analytics.
Equal - Redefining Long-Term Memory: Streaming-driven ingestion architectures that give agents enterprise-grade responsiveness.

Lightning Talks

The afternoon features rapid-fire sessions from innovators including:

How Tavus used Qdrant Edge to create conversational AI

info@qdrant.tech (Andrey Vasnetsov) — Fri, 12 Sep 2025 00:00:00 +0000

How Tavus delivered human-grade conversational AI with edge retrieval on Qdrant

Tavus is a human–computer research lab building CVI, the Conversational Video Interface. CVI presents a face-to-face AI that reads tone, gesture, and on-screen context in real time, allowing for humans to interface with powerful, functional AI like never before. The team’s north star was simple to say and hard to ship: conversations should feel natural. That meant tracking conversational dynamics like utterance-to-utterance timing, back-channeling, and turn-taking while grounding replies in a customer’s private knowledge.

MUVERA: Making Multivectors More Performant

info@qdrant.tech (Andrey Vasnetsov) — Fri, 05 Sep 2025 00:00:00 +0000

What are MUVERA Embeddings?

Multi-vector representations are superior to single-vector embeddings in many benchmarks. It might be tempting to use them right away, but there is a catch: they are slower to search. Traditional vector search structures like HNSW are optimized for retrieving the nearest neighbors of a single query vector using simple metrics such as cosine similarity. These indexes are not suitable for multi-vector retrieval strategies, such as MaxSim, where a query and document are each represented by multiple vectors and the final score is computed as the maximum similarity over all cross-pairings. MaxSim is inherently asymmetric and non-metric, so HNSW could potentially help us find the closest document token to a given query token, but that does not mean the whole document is the best hit for the query.

Balancing Relevance and Diversity with MMR Search

info@qdrant.tech (Andrey Vasnetsov) — Thu, 04 Sep 2025 00:00:00 +0000

Variety is the spice of life! Yet often, with search engines, users find that the results are too similar to get value. You search for a black jacket on your favorite shopping site, and you get 5 black full zip bomber jackets. Search for a black dress and you get 5 strapless dresses. Traditional vector search focuses on returning the most relevant items, which creates an echo chamber of similar results.

How Fieldy AI Achieved Reliable AI Memory with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Thu, 04 Sep 2025 00:00:00 +0000

Fieldy AI’s migration to Qdrant: Building a fault-tolerant AI memory platform

Capturing and retrieving a lifetime of conversations

Fieldy is a hands-free wearable AI note taker that continuously records, transcribes, and organizes real-world conversations into your personal, searchable memory. The system’s goal is simple in concept but demanding in execution: capture every relevant spoken interaction, transcribe it with high accuracy, and make it instantly retrievable. This requires a robust ingestion pipeline, a scalable vector search layer, and a retrieval process capable of handling growing volumes of multimodal data without introducing latency or errors.

How OpenTable Reinvented Restaurant Discovery with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 02 Sep 2025 00:00:00 +0000

Reinventing Restaurant Discovery: How OpenTable built Concierge, an AI Dining Assistant

Recognizing that AI would redefine restaurant discovery

When generative AI tools entered the mainstream, OpenTable knew diners would change how they find and choose restaurants. People were beginning to expect conversational, intelligent and context-aware assistants, rather than static search boxes.

Patrick Lombardo, Staff ML Engineer at OpenTable, recalls that the team wanted to move quickly. “We knew early on that generative AI was going to change user expectations. Concierge was an opportunity for us to transform the way that diners discover restaurants while building the tooling and infrastructure that will support future AI-powered experiences.”

Untangling Relevance Score Boosting and Decay Functions

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Sep 2025 14:55:45 +0200

A problem we’ve noticed while monitoring the Qdrant Discord Community is that due to the extensive list of expressions that the score boosting functionality provides, there’s room for confusion on how it’s supposed to be applied. And that might block you from moving the business logic behind relevance scoring into the Qdrant search engine. We don’t want that!

In this blog, we’d like to de-spooky-fy the decay functions part of the score boosting, or, more precisely: LinDecayExpression, ExpDecayExpression, and GaussDecayExpression – frequent guests on the Discord #ask-for-help channel.

How PortfolioMind Delivered Real-Time Crypto Intelligence with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Thu, 31 Jul 2025 00:00:00 +0000

How PortfolioMind delivered real-time crypto intelligence with Qdrant

The crypto world is an inherently noisy and volatile place. Markets shift quickly, narratives change overnight, and wallet activities conceal subtle yet critical patterns. For PortfolioMind, Web3-native AI research copilot built using the SpoonOS framework, the challenge was not only finding just finding relevant information, but also surfacing it in real-time.

Challenge: Moving beyond static insights

Most crypto platforms presume users want simple token tracking. PortfolioMind, however, recognized that real research behaviors are dynamic. Users pivot rapidly between topics like L2 scaling, meme tokens, protocol risks, and DeFi yield fluctuations based on real-time events.

Qdrant Edge: Vector Search for Embedded AI

info@qdrant.tech (Andrey Vasnetsov) — Tue, 29 Jul 2025 00:00:00 +0000

Qdrant Edge (Private Beta): Vector Search for Embedded AI

Over the past two years, vector search has become foundational infrastructure for AI applications, from retrieval-augmented generation (RAG) to agentic reasoning. But as AI systems extend beyond cloud-hosted inference into the physical world - running on devices like robots, kiosks, home assistants, and mobile phones - new constraints emerge. Low-latency retrieval, multimodal inputs, and bandwidth-independent operation will become first-class requirements. Qdrant Edge is our response to this shift.

Qdrant for Research: The Story Behind ETH & Stanford’s MIRIAD Dataset

info@qdrant.tech (Andrey Vasnetsov) — Wed, 23 Jul 2025 00:00:00 +0200

This summer, researchers from ETH Zurich and Stanford released MIRIAD, an open source dataset of 5.8 million medical Question Answer pairs, each grounded in peer-reviewed literature.

A dataset of this scale has the potential to become an ultimate solution to the lack of structured, rich-in-context, high-quality data in the medical field. It is a powerful measure for a significant reduction of hallucinations in medical AI applications, created to be a knowledge base for Retrieval Augmented Generation (RAG) and a source for downstreaming embedding models.

Qdrant 1.15 - Smarter Quantization & better Text Filtering

info@qdrant.tech (Andrey Vasnetsov) — Fri, 18 Jul 2025 00:00:00 -0800

Qdrant 1.15.0 is out! Let’s look at the main features for this version:

New quantizations: We introduce asymmetric quantization and 1.5 and 2-bit quantizations. Asymmetric quantization allows vectors and queries to have different quantization algorithms. 1.5 and 2-bit quantizations allow for improved accuracy.

Changes in text index: Introduction of a new multilingual tokenizer, stopwords support, stemming, and phrase matching.

Various optimizations, including HNSW healing, allowing HNSW indexes to reuse the old graph without a complete rebuild, and Migration to Gridstore unlocks faster injestion.

Qdrant joins AI Agent category on AWS Marketplace to accelerate Agentic AI development

info@qdrant.tech (Andrey Vasnetsov) — Wed, 16 Jul 2025 00:00:00 +0000

Qdrant is now available in the new AWS Marketplace AI Agents and Tools category.

Customers can now use AWS Marketplace to easily discover, buy, and deploy AI agents solutions, including Qdrant’s vector search engine using their AWS accounts, accelerating AI agent and agentic workflow development.

Qdrant helps organizations build enterprise AI agents with long-term memory and real-time context retrieval by enabling step-aware reasoning and reliable decision-making across complex, unstructured data with a vector-native search engine built for accuracy, scale, and responsiveness.

How &AI scaled global legal retrieval with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 15 Jul 2025 00:00:00 +0000

How &AI scaled global patent retrieval with Qdrant

&AI is on a mission to redefine patent litigation. Their platform helps legal professionals invalidate patents through intelligent prior art search, claim charting, and automated litigation support. To make this work at scale, CTO and co-founder Herbie Turner needed a vector database that could power fast, accurate retrieval across billions of documents without ballooning DevOps complexity. That’s where Qdrant came in.

How to choose an embedding model

info@qdrant.tech (Andrey Vasnetsov) — Tue, 15 Jul 2025 00:00:00 +0000

No matter if you are just beginning your journey in the world of vector search, or you are a seasoned practitioner, you have probably wondered how to choose the right embedding model to achieve the best search quality. There are some public benchmarks, such as MTEB, that can help you narrow down the options, but datasets used in those benchmarks will rarely be representative of your domain-specific data. Moreover, search quality is not the only requirement you could have. For example, some of the best models might be amazingly accurate for retrieval, but you can’t afford to run them, e.g., due to high resource usage or your budget constraints.

Introducing Qdrant Cloud Inference

info@qdrant.tech (Andrey Vasnetsov) — Tue, 15 Jul 2025 00:00:00 +0000

Introducing Qdrant Cloud Inference

Today, we’re announcing the launch of Qdrant Cloud Inference (get started in your cluster). With Qdrant Cloud Inference, users can generate, store and index embeddings in a single API call, turning unstructured text and images into search-ready vectors in a single environment. Directly integrating model inference into Qdrant Cloud removes the need for separate inference infrastructure, manual pipelines, and redundant data transfers.

This simplifies workflows, accelerates development cycles, and eliminates unnecessary network hops for developers. With a single API call, you can now embed, store, and index your data more quickly and more simply. This speeds up application development for RAG, Multimodal, Hybrid search, and more.

Announcing Vector Space Day 2025 in Berlin

info@qdrant.tech (Andrey Vasnetsov) — Mon, 14 Jul 2025 00:00:00 +0000

Vector Space Day 2025: Powered by Qdrant

📍 Colosseum Berlin, Germany
🗓️ Friday, September 26, 2025

About

We’re hosting our first-ever full-day in-person Vector Space Day this September in Berlin, and you’re invited.

The Vector Space Day will bring together engineers, researchers, and AI builders to explore the cutting edge of retrieval, vector search infrastructure, and agentic AI. From building scalable RAG pipelines to enabling real-time AI memory and next-gen context engineering, we’re covering the full spectrum of modern vector-native search.

How Pento modeled aesthetic taste with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 14 Jul 2025 00:00:00 +0000

Bringing People Together Through Qdrant

Taste in art isn’t just a preference; it’s a fingerprint.

Imagine you’re an artist or art enthusiast searching not for a painting, but for people who share your unique taste, someone who resonates with surrealist colors just as deeply as you, or who finds quiet joy in minimalist lines. How would a system know who those people are? Traditional recommenders often suggest what’s trending or popular, or just can’t understand the nuances of art.

How Alhena AI unified its AI stack and improved ecommerce conversions with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Thu, 10 Jul 2025 00:00:00 +0000

How Alhena AI unified its AI stack and accelerated ecommerce outcomes with Qdrant

Building AI agents that drive both revenue and support outcomes

Alhena AI is redefining the ecommerce experience through intelligent agents that assist customers before and after a purchase. On the front end, these agents help users find the perfect product based on nuanced preferences. On the back end, they resolve complex support queries without escalating to a human.

How GoodData turbocharged AI analytics with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Wed, 09 Jul 2025 00:00:00 +0000

GoodData’s Evolution into AI-Powered Analytics

AI is redefining how people interact with data, pushing analytics platforms beyond static dashboards toward intelligent, conversational experiences. While traditionally recognized as a powerful BI platform, GoodData is laser-focused on accelerating both ’time to insight’ and ’time to solution’ by enhancing productivity for analysts and business users alike.

What sets GoodData apart is its unique position in the market: a composable, API-first platform designed for teams that build data products, not just consume them. With deep support for white-labeled analytics, embedded use cases, and governed self-service at scale, GoodData delivers the flexibility modern organizations need. With AI being integrated across every layer of the platform, GoodData is helping their over 140,000 end customers move from traditional BI to intelligent, real-time decision-making.

The Hitchhiker's Guide to Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Wed, 09 Jul 2025 00:00:00 +0200

From lecture halls to production pipelines, Qdrant Stars – founders, mentors and open-source contributors – share how they’re building with vectors in the wild.
In this post, Clelia distils tips from her talk at the “Bavaria, Advancements in SEarch Development” meetup, where she covered hard-won lessons from her extensive open-source building.

Hey there, vector space astronauts!

I am Clelia, an Open Source Engineer at LlamaIndex. In the last two years, I’ve dedicated myself to the AI space, building (and breaking) many things, and sometimes even deploying them to production!

How FAZ unlocked 75 years of journalism with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Thu, 03 Jul 2025 00:00:00 +0000

How FAZ Built a Hybrid Search Engine with Qdrant to Unlock 75 Years of Journalism

Frankfurter Allgemeine Zeitung (FAZ), a major national newspaper in Germany, has spent decades building a rich archive of journalistic content, stretching back to 1949. The FAZ archive has long built expertise in making its extensive collection of over 75 years accessible and searchable for both internal and external customers through keyword- and index-based search engines. New AI-powered search technologies were therefore immediately recognized as an opportunity to unlock the potential of the comprehensive archive in entirely new ways and to systematically address the limitations of traditional search methods. The solution they arrived at involved a thoughtful orchestration of technologies - with Qdrant at the heart.

GraphRAG: How Lettria Unlocked 20% Accuracy Gains with Qdrant and Neo4j

info@qdrant.tech (Andrey Vasnetsov) — Tue, 17 Jun 2025 00:00:00 +0000

Scaled Vector & Graph Retrieval: How Lettria Unlocked 20% Accuracy Gains with Qdrant & Neo4j

Why Complex Document Intelligence Needs More Than Just Vector Search

In regulated industries where precision, auditability, and accuracy are paramount, leveraging Large Language Models (LLMs) effectively often requires going beyond traditional Retrieval-Augmented Generation (RAG). Lettria, a leader in document intelligence platforms, recognized that complex, highly regulated data sets like pharmaceutical research, legal compliance, and aerospace documentation demanded superior accuracy and more explainable outputs than vector-only RAG systems could provide. To achieve the expected level of performance, the team has focused its effort on building a very robust document parsing engine designed for complex pdf (with tables, diagrams, charts etc.), an automatic ontology builder and an ingestion pipeline covering vectors and graph enrichment

Vector Data Migration Tool

info@qdrant.tech (Andrey Vasnetsov) — Mon, 16 Jun 2025 00:02:00 +0000

Migrating your data just got easier

We’ve launched the beta of our Qdrant Vector Data Migration Tool, designed to simplify moving data between different instances, whether you’re migrating between Qdrant deployments or switching from other vector database providers.

This powerful tool streams all vectors from a source collection to a target Qdrant instance in live batches. It supports migrations from one Qdrant deployment to another, including from open source to Qdrant Cloud or between cloud regions. But that’s not all. You can also migrate your data from other vector databases directly into Qdrant. All with a single command.

How Lawme Scaled AI Legal Assistants and Significantly Cut Costs with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Wed, 11 Jun 2025 00:00:00 +0000

How Lawme Scaled AI Legal Assistants and Cut Costs by 75% with Qdrant

Legal technology (LegalTech) is at the forefront of digital transformation in the traditionally conservative legal industry. Lawme.ai, an ambitious startup, is pioneering this transformation by automating routine legal workflows with AI assistants. By leveraging sophisticated AI-driven processes, Lawme empowers law firms to dramatically accelerate legal document preparation, from initial research and analysis to comprehensive drafting. However, scaling their solution presented formidable challenges, particularly around data management, compliance, and operational costs.

How ConvoSearch Boosted Revenue for D2C Brands with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 10 Jun 2025 00:00:00 +0000

How ConvoSearch Boosted E-commerce Revenue with Qdrant

Driving E-commerce Success Through Enhanced Search

E-commerce retailers face intense competition and constant pressure to increase conversion rates. ConvoSearch , an AI-powered recommendation engine tailored for direct-to-consumer (D2C) e-commerce brands, addresses these challenges by delivering hyper-personalized search and recommendations. With customers like The Closet Lover and Uncle Reco achieving dramatic revenue increases, ConvoSearch relies heavily on high-speed vector search to ensure relevance and accuracy at scale.

LegalTech Builder's Guide: Navigating Strategic Decisions with Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Tue, 10 Jun 2025 00:00:00 +0000

LegalTech Builder’s Guide: Navigating Strategic Decisions with Vector Search

LegalTech innovation needs a new search stack

LegalTech applications, more than most other application types, demand accuracy due to complex document structures, high regulatory stakes, and compliance requirements. Traditional keyword searches often fall short, failing to grasp semantic nuances essential for precise legal queries. Qdrant addresses these challenges by providing robust vector search solutions tailored for the complexities inherent in LegalTech applications.

Qdrant Achieves SOC 2 Type II and HIPAA Certifications

info@qdrant.tech (Andrey Vasnetsov) — Tue, 10 Jun 2025 00:00:00 +0000

Qdrant Attains SOC 2 Type II and HIPAA Certifications: Strengthening Our Commitment to Enterprise Security

At Qdrant, we’re proud to announce that we’ve successfully renewed our SOC 2 Type II certification and attained our HIPAA compliance certification (link). This continued achievement highlights our unwavering dedication to maintaining robust security, confidentiality, and compliance standards, especially critical in supporting enterprise-scale operations and sensitive data management.

SOC 2 Type II: Continuous Commitment to Security

Building on our initial SOC 2 Type II certification from 2024, Qdrant sustained our rigorous security and operational practices over a full 12-month observation period. SOC 2 Type II audits meticulously assess the practical implementation of security measures aligned with the American Institute of Certified Public Accountants (AICPA) Trust Services criteria:

Introducing the Official Qdrant Node for n8n

info@qdrant.tech (Andrey Vasnetsov) — Mon, 09 Jun 2025 00:00:00 +0000

Introducing the Official Qdrant Node for n8n

Amazing news for n8n builders working with semantic search: Qdrant now has an official, team-supported node for n8n, an early adopter of n8n’s new verified community nodes feature!

This new integration brings the full power of Qdrant directly into your n8n workflows: no more wrestling with HTTP nodes ever again! Whether you’re building RAG systems, agentic pipelines, or advanced data analysis tools, this node is designed to make your life easier and your solutions more robust.

Qdrant + DataTalks.Club: Free 10-Week Course on LLM Applications

info@qdrant.tech (Andrey Vasnetsov) — Thu, 05 Jun 2025 23:00:00 +0000

Want to learn how to build an AI system that answers questions about your knowledge base?

We’re excited to announce our partnership with Alexey Grigorev and DataTalks.Club to bring you a free, hands-on, 10-week course focused on building real-life applications of LLMs.

Gain hands-on experience with LLMs, RAG, vector search, evaluation, monitoring, and more.

Learn RAG and Vector Search

In this course, you’ll learn how to create an AI system that can answer questions about your own knowledge base using LLMs and RAG.

How Qovery Accelerated Developer Autonomy with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 27 May 2025 00:00:00 +0000

Qovery Scales Real-Time DevOps Automation with Qdrant

Empowering Developers with Autonomous Infrastructure Management

Qovery, trusted by over 200 companies including Alan, Talkspace, GetSafe, and RxVantage, empowers software engineering teams to autonomously manage their infrastructure through its robust DevOps automation platform. As their platform evolved, Qovery recognized an opportunity to enhance developer autonomy further by integrating an AI-powered DevOps Copilot. To achieve real-time accuracy and rapid responses, Qovery selected Qdrant as the backbone of their vector database infrastructure.

How Tripadvisor Drives 2 to 3x More Revenue with Qdrant-Powered AI

info@qdrant.tech (Andrey Vasnetsov) — Tue, 13 May 2025 23:00:00 +0000

How Tripadvisor Is Reimagining Travel with Qdrant

Tripadvisor, the world’s largest travel guidance platform, is undergoing a deep transformation. With hundreds of millions of monthly users and over a billion reviews and contributions, it holds one of the richest datasets in the travel industry. And until recently, that data, particularly its unstructured content, had incredible untapped potential. Now, with the rise of generative AI and the adoption of tools like Qdrant’s vector database, Tripadvisor is unlocking its full potential to deliver intelligent, personalized, and high-impact travel experiences.

Precision at Scale: How Aracor Accelerated Legal Due Diligence with Hybrid Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Tue, 13 May 2025 00:00:00 +0000

Precision at Scale: How Aracor Uses Qdrant to Accelerate Legal Due Diligence Resulting in 90% Faster Workflows

How Aracor Accelerated Legal Due Diligence with Qdrant Vector Search

The world of mergers and acquisitions (M&A) is notoriously painstaking, slow, expensive and error-prone. Lawyers spend weeks combing through thousands of documents—validating signatures, comparing versions, and flagging risks.

Lawyers and dealmakers sift through mountains of documents—often numbering into the thousands—to validate every detail, from validating signatures, comparing the documents involved in the deal transaction, flagging risks to to patent validity. This meticulous process typically drains weeks or even months of productivity from highly trained professionals. Aracor AI set out to change that and to solve the M&A transparency gap. The Miami-based AI platform is laser-focused on transforming this painstaking due diligence into an automated, accurate, and dramatically faster operation.

How Garden Scaled Patent Intelligence with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Fri, 09 May 2025 00:00:00 +0000

Garden Accelerates Patent Intelligence with Qdrant’s Filterable Vector Search

For more than a century, patent litigation has been a slow, people-powered business. Analysts read page after page—sometimes tens of thousands of pages—hunting for the smoking-gun paragraph that proves infringement or invalidity. Garden, a New York-based startup, set out to change that by applying large-scale AI to the entire global patent corpus—more than 200 million patents—in conjunction with terabytes of real world data.

Exploring Qdrant Cloud Just Got Easier

info@qdrant.tech (Andrey Vasnetsov) — Tue, 06 May 2025 00:02:00 +0000

Exploring Qdrant Cloud just got easier

We always aim to simplify our product for developers, platform teams, and enterprises.

Here’s a quick overview of recent improvements designed to simplify your journey from login, creating your first cluster, prototyping, and going to production.

We’ve reduced the steps to create and access your account, and also simplified navigation between login and registration.

How Pariti Doubled Its Fill Rate with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Thu, 01 May 2025 00:00:00 +0000

From Manual Bottlenecks to Millisecond Matching: Connecting Africa’s Best Talent

Pariti’s mission is bold: connect Africa’s best talent with the continent’s most-promising startups—fast. Its referral-driven marketplace lets anyone nominate a great candidate, but viral growth triggered an avalanche of data. A single job post now attracts more than 300 applicants within 72 hours, yet Pariti still promises clients an interview-ready shortlist inside those same five days.

Vector Search in Production

info@qdrant.tech (Andrey Vasnetsov) — Wed, 30 Apr 2025 00:00:00 +0000

What Does it Take to Run Search in Production?

A mid-sized e-commerce company launched a vector search pilot to improve product discovery. During testing, everything ran smoothly. But in production, their queries began failing intermittently: memory errors, disk I/O spikes, and search delays sprang up unexpectedly.

It turned out the team hadn’t adjusted the default configuration settings or reserved dedicated paths for write-ahead logs. Their vector index was too large to fit comfortably in RAM, and it frequently spilled to disk, causing slowdowns.

How Dust Scaled to 5,000+ Data Sources with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 29 Apr 2025 00:00:00 +0000

Inside Dust’s Vector Stack Overhaul: Scaling to 5,000+ Data Sources with Qdrant

The Challenge: Scaling AI Infrastructure for Thousands of Data Sources

Dust, an OS for AI-native companies enabling users to build AI agents powered by actions and company knowledge, faced a set of growing technical hurdles as it scaled its operations. The company’s core product enables users to give AI agents secure access to internal and external data resources, enabling enhanced workflows and faster access to information. However, this mission hit bottlenecks when their infrastructure began to strain under the weight of thousands of data sources and increasingly demanding user queries.

How SayOne Enhanced Government AI Services with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 28 Apr 2025 00:00:00 +0000

How SayOne Enhanced Government AI Services with Qdrant

The Challenge

SayOne is an information technology and digital services company headquartered in India. They create end-to-end customized digital solutions, and have completed over 200 projects for clients worldwide. When SayOne embarked on building advanced AI solutions for government institutions, their initial choice was Pinecone, primarily due to its prevalence within AI documentation. However, SayOne soon discovered significant limitations impacting their projects. Key challenges included escalating costs, restrictive customization options, and considerable scalability issues. Furthermore, reliance on external cloud infrastructure posed critical data privacy concerns, especially since governmental entities demanded stringent data sovereignty and privacy controls.

Beyond Multimodal Vectors: Hotel Search With Superlinked and Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Thu, 24 Apr 2025 00:00:00 -0800

More Than Just Multimodal Search?

AI has transformed how we find products, services, and content. Now users express needs in natural language and expect precise, tailored results.

For example, you might search for hotels in Paris with specific criteria:

“Affordable luxury hotels near Eiffel Tower with lots of good reviews and free parking.” This isn’t just a search query—it’s a complex set of interrelated preferences spanning multiple data types.

Qdrant 1.14 - Reranking Support & Extensive Resource Optimizations

info@qdrant.tech (Andrey Vasnetsov) — Tue, 22 Apr 2025 00:00:00 -0800

Qdrant 1.14.0 is out! Let’s look at the main features for this version:

Score-Boosting Reranker: Blend vector similarity with custom rules and context.
Improved Resource Utilization: CPU and disk IO optimization for faster processing.

Incremental HNSW Indexing: Build indexes gradually as data arrives.
Batch Search: Optimized parallel processing for batch queries.

Memory Optimization: Reduced usage for large datasets with improved ID tracking.

Score-Boosting Reranker

When integrating vector search into specific applications, you can now tweak the final result list using domain or business logic. For example, if you are building a chatbot or search on website content, you can rank results with title metadata higher than body_text in your results.

Pathwork Optimizes Life Insurance Underwriting with Precision Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Tue, 22 Apr 2025 00:00:00 +0000

Pathwork Optimizes Life Insurance Underwriting with Precision Vector Search

About Pathwork

Pathwork is redesigning life and health insurance workflows for the age of AI. Brokerages and insurance carriers utilize Pathwork’s advanced agentic system to automate their underwriting processes and enhance back-office sales operations. Pathwork’s solution drastically reduces errors, completes tasks up to 70 times faster, and significantly conserves human capital.

The Challenge: Accuracy Above All

Life insurance underwriting demands exceptional accuracy. Traditionally, underwriting involves extensive manual input, subjective judgment, and frequent errors. These errors, such as misclassifying risk based on incomplete or misunderstood health data, often result in lost sales and customer dissatisfaction due to sudden premium changes.

How Lyzr Supercharged AI Agent Performance with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 15 Apr 2025 00:00:00 +0000

How Lyzr Supercharged AI Agent Performance with Qdrant

Scaling Intelligent Agents: How Lyzr Supercharged Performance with Qdrant

As AI agents become more capable and pervasive, the infrastructure behind them must evolve to handle rising concurrency, low-latency demands, and ever-growing knowledge bases. At Lyzr Agent Studio—where over 100 agents are deployed across industries—these challenges arrived quickly and at scale.

When their existing vector database infrastructure began to buckle under pressure, the engineering team needed a solution that could do more than just keep up. It had to accelerate them forward.

How Mixpeek Uses Qdrant for Efficient Multimodal Feature Stores

info@qdrant.tech (Andrey Vasnetsov) — Tue, 08 Apr 2025 00:00:00 +0000

How Mixpeek Uses Qdrant for Efficient Multimodal Feature Stores

About Mixpeek

Mixpeek is a multimodal data processing and retrieval platform designed for developers and data teams. Founded by Ethan Steininger, a former MongoDB search specialist, Mixpeek enables efficient ingestion, feature extraction, and retrieval across diverse media types including video, images, audio, and text.

The Challenge: Optimizing Feature Stores for Complex Retrievers

As Mixpeek’s multimodal data warehouse evolved, their feature stores needed to support increasingly complex retrieval patterns. Initially using MongoDB Atlas’s vector search, they encountered limitations when implementing hybrid retrievers combining dense and sparse vectors with metadata pre-filtering.

Satellite Vector Broadcasting: Near-Zero Latency Retrieval from Space

info@qdrant.tech (Andrey Vasnetsov) — Tue, 01 Apr 2025 00:00:00 -0800

📡 Qdrant Launches Satellite Vector Broadcasting for Near-Zero Latency Retrieval

CAPE CANAVERAL, FL — Qdrant today announced the successful deployment of Satellite Vector Broadcasting, an ambitious new system for high-speed vector search that uses actual satellites to transmit, shard, and retrieve embeddings — bypassing Earth entirely.

“Cloud is old news. Space is the new infrastructure,” said orbital software lead Luna Hertz. “We’re proud to say we’ve finally untethered cosine similarity from the bonds of gravity and Wi-Fi.”

HubSpot & Qdrant: Scaling an Intelligent AI Assistant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 24 Mar 2025 00:00:00 +0000

HubSpot, a global leader in CRM solutions, continuously enhances its product suite with powerful AI-driven features. To optimize Breeze AI, its flagship intelligent assistant, HubSpot chose Qdrant as its vector database.

Challenges Scaling an Intelligent AI

As HubSpot expanded its AI capabilities, it faced several critical challenges in scaling Breeze AI to meet growing user demands:

Delivering highly personalized, context-aware responses required a robust vector search solution that could retrieve data quickly while maintaining accuracy.
With increasing user interactions, HubSpot needed a scalable system capable of handling rapid data growth without performance degradation.
Integration with HubSpot’s existing AI infrastructure had to be swift and easy to support fast-paced development cycles.
HubSpot sought a future-proof vector search solution that could adapt to emerging AI advancements while maintaining high availability.

These challenges made it essential to find a high-performance, developer-friendly vector database that could power Breeze AI efficiently.

Vibe Coding RAG with our MCP server

info@qdrant.tech (Andrey Vasnetsov) — Fri, 21 Mar 2025 12:02:00 +0100

Another month means another webinar! This time Kacper Łukawski put some of the popular AI coding agents to the test. There is a lot of excitement around tools such as Cursor, GitHub Copilot, Aider and Claude Code, so we wanted to see how they perform in implementing something more complex than a simple frontend application. Wouldn’t it be awesome if LLMs could code Retrieval Augmented Generation on their own?

Vibe coding

Vibe coding is a development approach introduced by Andrej Karpathy where developers surrender to intuition rather than control. It leverages AI coding assistants for implementation while developers focus on outcomes. Through voice interfaces and complete trust in AI suggestions, the process prioritizes results over code comprehension.

How Deutsche Telekom Built a Multi-Agent Enterprise Platform Leveraging Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Fri, 07 Mar 2025 08:00:00 +0000

How Deutsche Telekom Built a Scalable, Multi-Agent Enterprise Platform Leveraging Qdrant—Powering Over 2 Million Conversations Across Europe

Arun Joseph, who leads engineering and architecture for Deutsche Telekom’s AI Competence Center (AICC), faced a critical challenge: how do you efficiently and scalably deploy AI-powered assistants across a vast enterprise ecosystem? The goal was to deploy GenAI for customer sales and service operations to resolve customer queries faster across the 10 countries where Deutsche Telekom operates in Europe.

Introducing Qdrant Cloud’s New Enterprise-Ready Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Tue, 04 Mar 2025 00:00:00 +0000

At Qdrant, we enable developers to power AI workloads - not only securely, but at any scale. That’s why we are excited to introduce Qdrant Cloud’s new suite of enterprise-grade features. With our Cloud API, Cloud RBAC, Single Sign-On (SSO), granular Database API Keys, and Advanced Monitoring & Observability, you now have the control and visibility needed to operate at scale.

Securely Scale Your AI Workloads

Your enterprise-grade AI applications demand more than just a powerful vector database—they need to meet compliance, performance, and scalability requirements. To do that, you need simplified management, secure access & authentication, and real-time monitoring & observability. Now, Qdrant’s new enterprise-grade features address these needs, giving your team the tools to reduce operational overhead, simplify authentication, enforce access policies, and have deep visibility into performance.

Metadata automation and optimization - Reece Griffiths | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Mon, 24 Feb 2025 18:29:51 -0300

“Metadata is one of the key unlocks to both segmentation and file organization, setting up the right knowledge base, and enriching it to hit that last mile of accuracy and speed.”
— Reece Griffiths

Reece Griffiths is the CEO and co-founder of Deasy Labs, a metadata automation platform that helps companies optimize their vector databases for retrieval accuracy. Previously part of Y Combinator, Deasy Labs focuses on improving metadata extraction, classification, and enrichment at scale.

How to Build Intelligent Agentic RAG with CrewAI and Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Fri, 24 Jan 2025 09:00:00 +0000

In a recent live session, we teamed up with CrewAI, a framework for building intelligent, multi-agent applications. If you missed it, Kacper Łukawski from Qdrant and Tony Kipkemboi from CrewAI gave an insightful overview of CrewAI’s capabilities and demonstrated how to leverage Qdrant for creating an agentic RAG (Retrieval-Augmented Generation) system. The focus was on semi-automating email communication, using Obsidian as the knowledge base.

In this article, we’ll guide you through the process of setting up an AI-powered system that connects directly to your email inbox and knowledge base, enabling it to analyze incoming messages and existing content to generate contextually relevant response suggestions.

Qdrant 1.13 - GPU Indexing, Strict Mode & New Storage Engine

info@qdrant.tech (Andrey Vasnetsov) — Thu, 23 Jan 2025 00:00:00 -0800

Qdrant 1.13.0 is out! Let’s look at the main features for this version:

GPU Accelerated Indexing: Fast HNSW indexing with architecture-free GPU support.
Strict Mode: Enforce operation restrictions on collections for enhanced control.

HNSW Graph Compression: Reduce storage use via HNSW Delta Encoding.
Named Vector Filtering: New has_vector filtering condition for named vectors.
Custom Storage: For constant-time reads/writes of payloads and sparse vectors.

GPU Accelerated Indexing

We are making it easier for you to handle even the most demanding workloads.

Voiceflow & Qdrant: Powering No-Code AI Agent Creation with Scalable Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Tue, 10 Dec 2024 00:02:00 +0000

Voiceflow enables enterprises to create AI agents in a no-code environment by designing workflows through a drag-and-drop interface. The platform allows developers to host and customize chatbot interfaces without needing to build their own RAG pipeline, working out of the box and being easily adaptable to specific use cases. “Powered by technologies like Natural Language Understanding (NLU), Large Language Models (LLM), and Qdrant as a vector search engine, Voiceflow serves a diverse range of customers, including enterprises that develop chatbots for internal and external AI use cases,” says Xavier Portillo Edo, Head of Cloud Infrastructure at Voiceflow.

Building a Facial Recognition System with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 03 Dec 2024 00:00:00 -0800

The Twin Celebrity App

In the era of personalization, combining cutting-edge technology with fun can create engaging applications that resonate with users. One such project is the Twin Celebrity app, a tool that matches users with their celebrity look-alikes using facial recognition embeddings and vector search powered by Qdrant. This blog post dives into the architecture, tools, and practical advice for developers who want to build this app—or something similar.

The Twin Celebrity app identifies which celebrity a user resembles by analyzing a selfie. The app utilizes:

Optimizing ColPali for Retrieval at Scale, 13x Faster Results

info@qdrant.tech (Andrey Vasnetsov) — Wed, 27 Nov 2024 00:40:24 -0300

ColPali is a fascinating leap in document retrieval. Its precision in handling visually rich PDFs is phenomenal, but scaling it to handle real-world datasets comes with its share of computational challenges.

Here’s how we solved these challenges to make ColPali 13x faster without sacrificing the precision it’s known for.

The Scaling Dilemma

ColPali generates 1,030 vectors for just one page of a PDF. While this is manageable for small-scale tasks, in a real-world production setting where you may need to store hundreds od thousands of PDFs, the challenge of scaling becomes significant.

Best Practices in RAG Evaluation: A Comprehensive Guide

info@qdrant.tech (Andrey Vasnetsov) — Sun, 24 Nov 2024 00:00:00 -0800

Introduction

This guide will teach you how to evaluate a RAG system for both accuracy and quality. You will learn to maintain RAG performance by testing for search precision, recall, contextual relevance, and response accuracy.

Building a RAG application is just the beginning; it is crucial to test its usefulness for the end-user and calibrate its components for long-term stability.

RAG systems can encounter errors at any of the three crucial stages: retrieving relevant information, augmenting that information, and generating the final response. By systematically assessing and fine-tuning each component, you will be able to maintain a reliable and contextually relevant GenAI application that meets user needs.

Empowering QA.tech’s Testing Agents with Real-Time Precision and Scale

info@qdrant.tech (Andrey Vasnetsov) — Thu, 21 Nov 2024 00:02:00 +0000

QA.tech, a company specializing in AI-driven automated testing solutions, found that building and fully testing web applications, especially end-to-end, can be complex and time-consuming. Unlike unit tests, end-to-end tests reveal what’s actually happening in the browser, often uncovering issues that other methods miss.

Traditional solutions like hard-coded tests are not only labor-intensive to set up but also challenging to maintain over time. Alternatively, hiring QA testers can be a solution, but for startups, it quickly becomes a bottleneck. With every release, more testers are needed, and if testing is outsourced, managing timelines and ensuring quality becomes even harder.

Advanced Retrieval with ColPali & Qdrant Vector Database

info@qdrant.tech (Andrey Vasnetsov) — Tue, 05 Nov 2024 00:02:00 +0000

Time: 30 min	Level: Advanced	Notebook: GitHub

It’s no secret that even the most modern document retrieval systems have a hard time handling visually rich documents like PDFs, containing tables, images, and complex layouts.

ColPali introduces a multimodal retrieval approach that uses Vision Language Models (VLMs) instead of the traditional OCR and text-based extraction.

By processing document images directly, it creates multi-vector embeddings from both the visual and textual content, capturing the document’s structure and context more effectively. This method outperforms traditional techniques, as demonstrated by the Visual Document Retrieval Benchmark (ViDoRe).

How Sprinklr Leverages Qdrant to Enhance AI-Driven Customer Experience Solutions

info@qdrant.tech (Andrey Vasnetsov) — Thu, 17 Oct 2024 00:02:00 +0000

Sprinklr, a leader in unified customer experience management (Unified-CXM), helps global brands engage customers meaningfully across more than 30 digital channels. To achieve this, Sprinklr needed a scalable solution for AI-powered search to support their AI applications, particularly in handling the vast data requirements of customer interactions.

Raghav Sonavane, Associate Director of Machine Learning Engineering at Sprinklr, leads the Applied AI team, focusing on Generative AI (GenAI) and Retrieval-Augmented Generation (RAG). His team is responsible for training and fine-tuning in-house models and deploying advanced retrieval and generation systems for customer-facing applications like FAQ bots and other GenAI-driven services. The team provides all of these capabilities in a centralized platform to the Sprinklr product engineering teams.

Qdrant 1.12 - Distance Matrix, Facet Counting & On-Disk Indexing

info@qdrant.tech (Andrey Vasnetsov) — Tue, 08 Oct 2024 00:00:00 -0800

Qdrant 1.12.0 is out! Let’s look at major new features and a few minor additions:

Distance Matrix API: Efficiently calculate pairwise distances between vectors.
GUI Data Exploration Visually navigate your dataset and analyze vector relationships.
Faceting API: Dynamically aggregate and count unique values in specific fields.

Text Index on disk: Reduce memory usage by storing text indexing data on disk.
Geo Index on disk: Offload indexed geographic data on disk for memory efficiency.

New DeepLearning.AI Course on Retrieval Optimization: From Tokenization to Vector Quantization

info@qdrant.tech (Andrey Vasnetsov) — Sun, 06 Oct 2024 00:02:00 +0000

We’re excited to announce a new course on DeepLearning.AI’s platform: Retrieval Optimization: From Tokenization to Vector Quantization. This collaboration between Qdrant and DeepLearning.AI aims to empower developers and data enthusiasts with the skills needed to enhance vector search capabilities in their applications.

Led by Qdrant’s Kacper Łukawski, this free, one-hour course is designed for beginners eager to delve into the world of retrieval optimization.

Why This Collaboration Matters

At Qdrant, we believe in the power of effective search to transform user experiences. Partnering with DeepLearning.AI allows us to combine our cutting-edge vector search technology with their educational expertise, providing learners with a comprehensive understanding of how to build and optimize Retrieval-Augmented Generation (RAG) applications. This course is part of our commitment to equip the community with practical skills that leverage advanced machine learning techniques.

Introducing Qdrant for Startups

info@qdrant.tech (Andrey Vasnetsov) — Wed, 02 Oct 2024 00:02:00 +0000

Supporting Early-Stage Startups

Over the past few years, we’ve witnessed some of the most innovative AI applications being built on Qdrant. A significant number of these have come from startups pushing the boundaries of what’s possible in AI. To ensure these pioneering teams have access to the right resources at the right time, we’re introducing Qdrant for Startups. This initiative is designed to provide startups with the technical support, guidance, and infrastructure they need to scale their AI innovations quickly and effectively.

Qdrant and Shakudo: Secure & Performant Vector Search in VPC Environments

info@qdrant.tech (Andrey Vasnetsov) — Mon, 23 Sep 2024 00:02:00 +0000

We are excited to announce that Qdrant has partnered with Shakudo, bringing Qdrant Hybrid Cloud to Shakudo’s virtual private cloud (VPC) deployments. This collaboration allows Shakudo clients to seamlessly integrate Qdrant’s high-performance vector database as a managed service into their private infrastructure, ensuring data sovereignty, scalability, and low-latency vector search for enterprise AI applications.

Data Sovereignty and Compliance with Secure Vector Search

Shakudo’s VPC deployments ensure that client data remains within their infrastructure, providing strict control over sensitive information while leveraging a fully managed AI toolset. Qdrant Hybrid Cloud is tailored for environments where data privacy and regulatory compliance are paramount. It keeps the data plane inside the customer’s infrastructure, with only essential telemetry shared externally, guaranteeing database isolation and security, while providing a fully managed service.

Data-Driven RAG Evaluation: Testing Qdrant Apps with Relari AI

info@qdrant.tech (Andrey Vasnetsov) — Mon, 16 Sep 2024 00:02:00 +0000

Using Performance Metrics to Evaluate RAG Systems

Evaluating the performance of a Retrieval-Augmented Generation (RAG) application can be a complex task for developers.

To help simplify this, Qdrant has partnered with Relari to provide an in-depth RAG evaluation process.

As a vector database, Qdrant handles the data storage and retrieval, while Relari enables you to run experiments to assess how well your RAG app performs in real-world scenarios. Together, they allow for fast, iterative testing and evaluation, making it easier to keep up with your app’s development pace.

Nyris & Qdrant: How Vectors are the Future of Visual Search

info@qdrant.tech (Andrey Vasnetsov) — Tue, 10 Sep 2024 00:02:00 +0000

About Nyris

Founded in 2015 by CTO Markus Lukasson and his sister Anna Lukasson-Herzig, Nyris offers advanced visual search solutions for companies, positioning itself as the “Google Lens” for corporate data. Their technology powers use cases such as visual search on websites of large retailers and machine manufacturing companies that require visual identification of spare parts. The primary goal is to identify items in a product catalog or spare parts as quickly as possible. With a strong foundation in e-commerce and nearly a decade of experience in vector search, Nyris is at the forefront of visual search innovation.

Kern AI & Qdrant: Precision AI Solutions for Finance and Insurance

info@qdrant.tech (Andrey Vasnetsov) — Wed, 28 Aug 2024 00:02:00 +0000

About Kern AI

Kern AI specializes in data-centric AI. Originally an AI consulting firm, the team led by Co-Founder and CEO Johannes Hötter quickly realized that developers spend 80% of their time reviewing data instead of focusing on model development. This inefficiency significantly reduces the speed of development and adoption of AI. To tackle this challenge, Kern AI developed a low-code platform that enables developers to quickly analyze their datasets and identify outliers using vector search. This innovation led to enhanced data accuracy and streamlined workflows for the rapid deployment of AI applications.

Qdrant 1.11 - The Vector Stronghold: Optimizing Data Structures for Scale and Efficiency

info@qdrant.tech (Andrey Vasnetsov) — Mon, 12 Aug 2024 00:00:00 -0800

Qdrant 1.11.0 is out! This release largely focuses on features that improve memory usage and optimize segments. However, there are a few cool minor features, so let’s look at the whole list:

Optimized Data Structures:
Defragmentation: Storage for multitenant workloads is more optimized and scales better.
On-Disk Payload Index: Store less frequently used data on disk, rather than in RAM.
UUID for Payload Index: Additional data types for payload can result in big memory savings.

Kairoswealth & Qdrant: Transforming Wealth Management with AI-Driven Insights and Scalable Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Wed, 10 Jul 2024 00:02:00 +0000

About Kairoswealth

Kairoswealth is a comprehensive wealth management platform designed to provide users with a holistic view of their financial portfolio. The platform offers access to unique financial products and automates back-office operations through its AI assistant, Gaia.

Motivations for Adopting a Vector Database

“At Kairoswealth we encountered several use cases necessitating the ability to run similarity queries on large datasets. Key applications included product recommendations and retrieval-augmented generation (RAG),” says Vincent Teyssier, Chief Technology & AI Officer at Kairoswealth. These needs drove the search for a more robust and scalable vector database solution.

Qdrant 1.10 - Universal Query, Built-in IDF & ColBERT Support

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jul 2024 00:00:00 -0800

Qdrant 1.10.0 is out! This version introduces some major changes, so let’s dive right in:

Universal Query API: All search APIs, including Hybrid Search, are now in one Query endpoint.
Built-in IDF: We added the IDF mechanism to Qdrant’s core search and indexing processes.
Multivector Support: Native support for late interaction ColBERT is accessible via Query API.

One Endpoint for All Queries

Query API will consolidate all search APIs into a single request. Previously, you had to work outside of the API to combine different search requests. Now these approaches are reduced to parameters of a single request, so you can avoid merging individual results.

Community Highlights #1

info@qdrant.tech (Andrey Vasnetsov) — Thu, 20 Jun 2024 11:57:37 -0300

Welcome to the very first edition of Community Highlights, where we celebrate the most impactful contributions and achievements of our vector search community! 🎉

Content Highlights 🚀

Here are some standout projects and articles from our community this past month. If you’re looking to learn more about vector search or build some great projects, we recommend you to check these guides:

Implementing Advanced Agentic Vector Search: A Comprehensive Guide to CrewAI and Qdrant by Pavan Kumar
Build Your Own RAG Using Unstructured, Llama3 via Groq, Qdrant & LangChain by Sudarshan Koirala
Qdrant filtering and self-querying retriever retrieval with LangChain by Daniel Romero
RAG Evaluation with Arize Phoenix by Atita Arora
Building a Serverless Application with AWS Lambda and Qdrant for Semantic Search by Benito Martin
Production ready Secure and Powerful AI Implementations with Azure Services by Pavan Kumar
Building Agentic RAG with Rust, OpenAI & Qdrant by Joshua Mo
Qdrant Hybrid Search under the hood using Haystack by Nicola Procopio
Llama 3 Powered Voice Assistant: Integrating Local RAG with Qdrant, Whisper, and LangChain by Datadrifters
Distributed deployment of Qdrant cluster with sharding & replicas by Vardhanam Daga
Private Healthcare AI Assistant using Qdrant Hybrid Cloud, DSPy, and Groq by Sachin Khandewal

Creator of the Month 🌟

Congratulations to Pavan Kumar for being awarded Creator of the Month! Check out what were Pavan’s most valuable contributions to the Qdrant vector search community this past month:

Response to CVE-2024-3829: Arbitrary file upload vulnerability

info@qdrant.tech (Andrey Vasnetsov) — Mon, 10 Jun 2024 17:00:00 +0000

Summary

A security vulnerability has been discovered in Qdrant affecting all versions prior to v1.9, described in CVE-2024-3829. The vulnerability allows an attacker to upload arbitrary files to the filesystem, which can be used to gain remote code execution. This is a different but similar vulnerability to CVE-2024-2221, announced in April 2024.

The vulnerability does not materially affect Qdrant cloud deployments, as that filesystem is read-only and authentication is enabled by default. At worst, the vulnerability could be used by an authenticated user to crash a cluster, which is already possible, such as by uploading more vectors than can fit in RAM.

Qdrant Attains SOC 2 Type II Audit Report

info@qdrant.tech (Andrey Vasnetsov) — Thu, 23 May 2024 20:26:20 -0300

At Qdrant, we are happy to announce the successful completion our the SOC 2 Type II Audit. This achievement underscores our unwavering commitment to upholding the highest standards of security, availability, and confidentiality for our services and our customers’ data.

SOC 2 Type II: What Is It?

SOC 2 Type II certification is an examination of an organization’s controls in reference to the American Institute of Certified Public Accountants (AICPA) Trust Services criteria. It evaluates not only our written policies but also their practical implementation, ensuring alignment between our stated objectives and operational practices. Unlike Type I, which is a snapshot in time, Type II verifies over several months that the company has lived up to those controls. The report represents thorough auditing of our security procedures throughout this examination period: January 1, 2024 to April 7, 2024.

Introducing Qdrant Stars: Join Our Ambassador Program!

info@qdrant.tech (Andrey Vasnetsov) — Sun, 19 May 2024 11:57:37 -0300

We’re excited to introduce Qdrant Stars, our new ambassador program created to recognize and support Qdrant users making a strong impact in the AI and vector search space.

Whether through innovative content, real-world applications tutorials, educational events, or engaging discussions, they are constantly making vector search more accessible and interesting to explore.

👋 Say hello to the first Qdrant Stars!

Our inaugural Qdrant Stars are a diverse and talented lineup who have shown exceptional dedication to our community. You might recognize some of their names:

Intel’s New CPU Powers Faster Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Fri, 10 May 2024 00:00:00 -0800

New generation silicon is a game-changer for AI/ML applications

Intel’s 5th gen Xeon processor is made for enterprise-scale operations in vector space.

Vector search is surging in popularity with institutional customers, and Intel is ready to support the emerging industry. Their latest generation CPU performed exceptionally with Qdrant, a leading vector database used for enterprise AI applications.

Intel just released the latest Xeon processor (codename: Emerald Rapids) for data centers, a market which is expected to grow to $45 billion. Emerald Rapids offers higher-performance computing and significant energy efficiency over previous generations. Compared to the 4th generation Sapphire Rapids, Emerald boosts AI inference performance by up to 42% and makes vector search 38% faster.

QSoC 2024: Announcing Our Interns!

info@qdrant.tech (Andrey Vasnetsov) — Wed, 08 May 2024 16:44:22 -0300

We are excited to announce the interns selected for the inaugural Qdrant Summer of Code (QSoC) program! After receiving many impressive applications, we have chosen two talented individuals to work on the following projects:

Jishan Bhattacharya: WASM-based Dimension Reduction Visualization

Jishan will be implementing a dimension reduction algorithm in Rust, compiling it to WebAssembly (WASM), and integrating it with the Qdrant Web UI. This project aims to provide a more efficient and smoother visualization experience, enabling the handling of more data points and higher dimensions efficiently.

Semantic Cache: Accelerating AI with Lightning-Fast Data Retrieval

info@qdrant.tech (Andrey Vasnetsov) — Tue, 07 May 2024 00:00:00 -0800

What is Semantic Cache?

Semantic cache is a method of retrieval optimization, where similar queries instantly retrieve the same appropriate response from a knowledge base.

Semantic cache differs from traditional caching methods. In computing, cache refers to high-speed memory that efficiently stores frequently accessed data. In the context of vector databases, a semantic cache improves AI application performance by storing previously retrieved results along with the conditions under which they were computed. This allows the application to reuse those results when the same or similar conditions occur again, rather than finding them from scratch.

Are You Vendor Locked?

info@qdrant.tech (Andrey Vasnetsov) — Sun, 05 May 2024 00:00:00 -0800

We all are.

“There is no use fighting it. Pick a vendor and go all in. Everything else is a mirage.” The last words of a seasoned IT professional

As long as we are using any product, our solution’s infrastructure will depend on its vendors. Many say that building custom infrastructure will hurt velocity. Is this true in the age of AI?

It depends on where your company is at. Most startups don’t survive more than five years, so putting too much effort into infrastructure is not the best use of their resources. You first need to survive and demonstrate product viability.

Visua and Qdrant: Vector Search in Computer Vision

info@qdrant.tech (Andrey Vasnetsov) — Wed, 01 May 2024 00:02:00 +0000

For over a decade, VISUA has been a leader in precise, high-volume computer vision data analysis, developing a robust platform that caters to a wide range of use cases, from startups to large enterprises. Starting with social media monitoring, where it excels in analyzing vast data volumes to detect company logos, VISUA has built a diverse ecosystem of customers, including names in social media monitoring, like Brandwatch, cybersecurity like Mimecast, trademark protection like Ebay and several sports agencies like Vision Insights for sponsorship evaluation.

Qdrant 1.9.0 - Heighten Your Security With Role-Based Access Control Support

info@qdrant.tech (Andrey Vasnetsov) — Wed, 24 Apr 2024 00:00:00 -0800

Qdrant 1.9.0 is out! This version complements the release of our new managed product Qdrant Hybrid Cloud with key security features valuable to our enterprise customers, and all those looking to productionize large-scale Generative AI. Data privacy, system stability and resource optimizations are always on our mind - so let’s see what’s new:

Granular access control: You can further specify access control levels by using JSON Web Tokens.
Optimized shard transfers: The synchronization of shards between nodes is now significantly faster!
Support for byte embeddings: Reduce the memory footprint of Qdrant with official uint8 support.

New access control options via JSON Web Tokens

Historically, our API key supported basic read and write operations. However, recognizing the evolving needs of our user base, especially large organizations, we’ve implemented additional options for finer control over data access within internal environments.

Qdrant's Trusted Partners for Hybrid Cloud Deployment

info@qdrant.tech (Andrey Vasnetsov) — Mon, 15 Apr 2024 00:02:00 +0000

With the launch of Qdrant Hybrid Cloud we provide developers the ability to deploy Qdrant as a managed vector database in any desired environment, be it in the cloud, on premise, or on the edge.

We are excited to have trusted industry players support the launch of Qdrant Hybrid Cloud, allowing developers to unlock best-in-class advantages for building production-ready AI applications:

Deploy In Your Own Environment: Deploy the Qdrant vector database as a managed service on the infrastructure of choice, such as our launch partner solutions Oracle Cloud Infrastructure (OCI), Red Hat OpenShift, Vultr, DigitalOcean, OVHcloud, Scaleway, Civo, and STACKIT.

Qdrant Hybrid Cloud: the First Managed Vector Database You Can Run Anywhere

info@qdrant.tech (Andrey Vasnetsov) — Mon, 15 Apr 2024 00:01:00 +0000

We are excited to announce the official launch of Qdrant Hybrid Cloud today, a significant leap forward in the field of vector search and enterprise AI. Rooted in our open-source origin, we are committed to offering our users and customers unparalleled control and sovereignty over their data and vector search workloads. Qdrant Hybrid Cloud stands as the industry’s first managed vector database that can be deployed in any environment - be it cloud, on-premise, or the edge.

Advancements and Challenges in RAG Systems - Syed Asad | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Thu, 11 Apr 2024 22:25:00 +0000

“The problem with many of the vector databases is that they work fine, they are scalable. This is common. The problem is that they are not easy to use. So that is why I always use Qdrant.”
— Syed Asad

Syed Asad is an accomplished AI/ML Professional, specializing in LLM Operations and RAGs. With a focus on Image Processing and Massive Scale Vector Search Operations, he brings a wealth of expertise to the field. His dedication to advancing artificial intelligence and machine learning technologies has been instrumental in driving innovation and solving complex challenges. Syed continues to push the boundaries of AI/ML applications, contributing significantly to the ever-evolving landscape of the industry.

Building Search/RAG for an OpenAPI spec - Nick Khami | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Thu, 11 Apr 2024 22:23:00 +0000

“It’s very, very simple to build search over an Open API specification with a tool like Trieve and Qdrant. I think really there’s something to highlight here and how awesome it is to work with a group based system if you’re using Qdrant.”
— Nick Khami

Nick Khami, a seasoned full-stack engineer, has been deeply involved in the development of vector search and RAG applications since the inception of Qdrant v0.11.0 back in October 2022. His expertise and passion for innovation led him to establish Trieve, a company dedicated to facilitating businesses in embracing cutting-edge vector search and RAG technologies.

Iveta Lohovska on Gen AI and Vector Search | Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Thu, 11 Apr 2024 22:12:00 +0000

Exploring Gen AI and Vector Search: Insights from Iveta Lohovska

“In the generative AI context of AI, all foundational models have been trained on some foundational data sets that are distributed in different ways. Some are very conversational, some are very technical, some are on, let’s say very strict taxonomy like healthcare or chemical structures. We call them modalities, and they have different representations.”
— Iveta Lohovska

Iveta Lohovska serves as the Chief Technologist and Principal Data Scientist for AI and Supercomputing at Hewlett Packard Enterprise (HPE), where she champions the democratization of decision intelligence and the development of ethical AI solutions. An industry leader, her multifaceted expertise encompasses natural language processing, computer vision, and data mining. Committed to leveraging technology for societal benefit, Iveta is a distinguished technical advisor to the United Nations’ AI for Good program and a Data Science lecturer at the Vienna University of Applied Sciences. Her career also includes impactful roles with the World Bank Group, focusing on open data initiatives and Sustainable Development Goals (SDGs), as well as collaborations with USAID and the Gates Foundation.

Teaching Vector Databases at Scale - Alfredo Deza | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Tue, 09 Apr 2024 03:06:00 +0000

“So usually I get asked, why are you using Qdrant? What’s the big deal? Why are you picking these over all of the other ones? And to me it boils down to, aside from being renowned or recognized, that it works fairly well. There’s one core component that is critical here, and that is it has to be very straightforward, very easy to set up so that I can teach it, because if it’s easy, well, sort of like easy to or straightforward to teach, then you can take the next step and you can make it a little more complex, put other things around it, and that creates a great development experience and a learning experience as well.”
— Alfredo Deza

How to meow on the long tail with Cheshire Cat AI? - Piero and Nicola | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Tue, 09 Apr 2024 03:05:00 +0000

“We love Qdrant! It is our default DB. We support it in three different forms, file based, container based, and cloud based as well.”
— Piero Savastano

Piero Savastano is the Founder and Maintainer of the open-source project, Cheshire Cat AI. He started in Deep Learning pure research. He wrote his first neural network from scratch at the age of 19. After a period as a researcher at La Sapienza and CNR, he provides international consulting, training, and mentoring services in the field of machine and deep learning. He spreads Artificial Intelligence awareness on YouTube and TikTok.

Response to CVE-2024-2221: Arbitrary file upload vulnerability

info@qdrant.tech (Andrey Vasnetsov) — Fri, 05 Apr 2024 13:00:00 -0700

Summary

A security vulnerability has been discovered in Qdrant affecting all versions prior to v1.9, described in CVE-2024-2221. The vulnerability allows an attacker to upload arbitrary files to the filesystem, which can be used to gain remote code execution.

Introducing FastLLM: Qdrant’s Revolutionary LLM

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Apr 2024 00:00:00 +0000

Today, we’re happy to announce that FastLLM (FLLM), our lightweight Language Model tailored specifically for Retrieval Augmented Generation (RAG) use cases, has officially entered Early Access!

Developed to seamlessly integrate with Qdrant, FastLLM represents a significant leap forward in AI-driven content generation. Up to this point, LLM’s could only handle up to a few million tokens.

As of today, FLLM offers a context window of 1 billion tokens.

However, what sets FastLLM apart is its optimized architecture, making it the ideal choice for RAG applications. With minimal effort, you can combine FastLLM and Qdrant to launch applications that process vast amounts of data. Leveraging the power of Qdrant’s scalability features, FastLLM promises to revolutionize how enterprise AI applications generate and retrieve content at massive scale.

VirtualBrain: Best RAG to unleash the real power of AI - Guillaume Marquis | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Wed, 27 Mar 2024 12:41:51 +0000

“It’s like mandatory to have a vector database that is scalable, that is fast, that has low latencies, that can under parallel request a large amount of requests. So you have really this need and Qdrant was like an obvious choice.”
— Guillaume Marquis

Guillaume Marquis, a dedicated Engineer and AI enthusiast, serves as the Chief Technology Officer and Co-Founder of VirtualBrain, an innovative AI company. He is committed to exploring novel approaches to integrating artificial intelligence into everyday life, driven by a passion for advancing the field and its applications.

Talk with YouTube without paying a cent - Francesco Saverio Zuppichini | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Wed, 27 Mar 2024 12:37:55 +0000

“Now I do believe that Qdrant, I’m not sponsored by Qdrant, but I do believe it’s the best one for a couple of reasons. And we’re going to see them mostly because I can just run it on my computer so it’s full private and I’m in charge of my data.”
– Francesco Saverio Zuppichini

Francesco Saverio Zuppichini is a Senior Full Stack Machine Learning Engineer at Zurich Insurance with experience in both large corporations and startups of various sizes. He is passionate about sharing knowledge, and building communities, and is known as a skilled practitioner in computer vision. He is proud of the community he built because of all the amazing people he got to know.

Qdrant is Now Available on Azure Marketplace!

info@qdrant.tech (Andrey Vasnetsov) — Tue, 26 Mar 2024 10:30:00 +0000

We’re thrilled to announce that Qdrant is now officially available on Azure Marketplace, bringing enterprise-level vector search directly to Azure’s vast community of users. This integration marks a significant milestone in our journey to make Qdrant more accessible and convenient for businesses worldwide.

With the landscape of AI being complex for most customers, Qdrant’s ease of use provides an easy approach for customers’ implementation of RAG patterns for Generative AI solutions and additional choices in selecting AI components on Azure, - Tara Walker, Principal Software Engineer at Microsoft.

Production-scale RAG for Real-Time News Distillation - Robert Caulk | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Mon, 25 Mar 2024 08:49:22 +0000

“We’ve got a lot of fun challenges ahead of us in the industry, I think, and the industry is establishing best practices. Like you said, everybody’s just trying to figure out what’s going on. And some of these base layer tools like Qdrant really enable products and enable companies and they enable us.”
– Robert Caulk

Robert, Founder of Emergent Methods is a scientist by trade, dedicating his career to a variety of open-source projects that range from large-scale artificial intelligence to discrete element modeling. He is currently working with a team at Emergent Methods to adaptively model over 1 million news articles per day, with a goal of reducing media bias and improving news awareness.

Insight Generation Platform for LifeScience Corporation - Hooman Sedghamiz | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Mon, 25 Mar 2024 08:46:28 +0000

“There is this really great vector db comparison that came out recently. I saw there are like maybe more than 40 vector stores in 2024. When we started back in 2023, there were only a few. What I see, which is really lacking in this pipeline of retrieval augmented generation is major innovation around data pipeline.”
– Hooman Sedghamiz

Hooman Sedghamiz, Sr. Director AI/ML - Insights at Bayer AG is a distinguished figure in AI and ML in the life sciences field. With years of experience, he has led teams and projects that have greatly advanced medical products, including implantable and wearable devices. Notably, he served as the Generative AI product owner and Senior Director at Bayer Pharmaceuticals, where he played a pivotal role in developing a GPT-based central platform for precision medicine.

The challenges in using LLM-as-a-Judge - Sourabh Agrawal | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Tue, 19 Mar 2024 15:05:02 +0000

“You don’t want to use an expensive model like GPT 4 for evaluation, because then the cost adds up and it does not work out. If you are spending more on evaluating the responses, you might as well just do something else, like have a human to generate the responses.”
– Sourabh Agrawal

Sourabh Agrawal, CEO & Co-Founder at UpTrain AI is a seasoned entrepreneur and AI/ML expert with a diverse background. He began his career at Goldman Sachs, where he developed machine learning models for financial markets. Later, he contributed to the autonomous driving team at Bosch/Mercedes, focusing on computer vision modules for scene understanding. In 2020, Sourabh ventured into entrepreneurship, founding an AI-powered fitness startup that gained over 150,000 users. Throughout his career, he encountered challenges in evaluating AI models, particularly Generative AI models. To address this issue, Sourabh is developing UpTrain, an open-source LLMOps tool designed to evaluate, test, and monitor LLM applications. UpTrain provides scores and offers insights to enhance LLM applications by performing root-cause analysis, identifying common patterns among failures, and providing automated suggestions for resolution.

Vector Search for Content-Based Video Recommendation - Gladys and Samuel from Dailymotion

info@qdrant.tech (Andrey Vasnetsov) — Tue, 19 Mar 2024 14:08:00 +0000

“The vector search engine that we chose is Qdrant, but why did we choose it? Actually, it answers all the load constraints and the technical needs that we had. It allows us to do a fast neighbor search. It has a python API which matches the recommender tag that we have.”
– Gladys Roch

Gladys Roch is a French Machine Learning Engineer at Dailymotion working on recommender systems for video content.

Integrating Qdrant and LangChain for Advanced Vector Similarity Search

info@qdrant.tech (Andrey Vasnetsov) — Tue, 12 Mar 2024 09:00:00 +0000

“Building AI applications doesn’t have to be complicated. You can leverage pre-trained models and support complex pipelines with a few lines of code. LangChain provides a unified interface, so that you can avoid writing boilerplate code and focus on the value you want to bring.” Kacper Lukawski, Developer Advocate, Qdrant

Long-Term Memory for Your GenAI App

Qdrant’s vector database quickly grew due to its ability to make Generative AI more effective. On its own, an LLM can be used to build a process-altering invention. With Qdrant, you can turn this invention into a production-level app that brings real business value.

IrisAgent and Qdrant: Redefining Customer Support with AI

info@qdrant.tech (Andrey Vasnetsov) — Wed, 06 Mar 2024 07:45:34 -0800

Artificial intelligence is evolving customer support, offering unprecedented capabilities for automating interactions, understanding user needs, and enhancing the overall customer experience. IrisAgent, founded by former Google product manager Palak Dalal Bhatia, demonstrates the concrete impact of AI on customer support with its AI-powered customer support automation platform.

Bhatia describes IrisAgent as “the system of intelligence which sits on top of existing systems of records like support tickets, engineering bugs, sales data, or product data,” with the main objective of leveraging AI and generative AI, to automatically detect the intent and tags behind customer support tickets, reply to a large number of support tickets chats improve the time to resolution and increase the deflection rate of support teams. Ultimately, IrisAgent enables support teams to more with less and be more effective in helping customers.

Dailymotion's Journey to Crafting the Ultimate Content-Driven Video Recommendation Engine with Qdrant Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Tue, 27 Feb 2024 13:22:31 +0100

Dailymotion’s Journey to Crafting the Ultimate Content-Driven Video Recommendation Engine with Qdrant Vector Search

In today’s digital age, the consumption of video content has become ubiquitous, with an overwhelming abundance of options available at our fingertips. However, amidst this vast sea of videos, the challenge lies not in finding content, but in discovering the content that truly resonates with individual preferences and interests and yet is diverse enough to not throw users into their own filter bubble. As viewers, we seek meaningful and relevant videos that enrich our experiences, provoke thought, and spark inspiration.

Qdrant vs Pinecone: Vector Databases for AI Apps

info@qdrant.tech (Andrey Vasnetsov) — Sun, 25 Feb 2024 00:00:00 -0800

Qdrant vs Pinecone: An Analysis of Vector Databases for AI Applications

Data forms the foundation upon which AI applications are built. Data can exist in both structured and unstructured formats. Structured data typically has well-defined schemas or inherent relationships. However, unstructured data, such as text, image, audio, or video, must first be converted into numerical representations known as vector embeddings. These embeddings encapsulate the semantic meaning or features of unstructured data and are in the form of high-dimensional vectors.

What is Vector Similarity? Understanding its Role in AI Applications.

info@qdrant.tech (Andrey Vasnetsov) — Sat, 24 Feb 2024 00:00:00 -0800

Understanding Vector Similarity: Powering Next-Gen AI Applications

A core function of a wide range of AI applications is to first understand the meaning behind a user query, and then provide relevant answers to the questions that the user is asking. With increasingly advanced interfaces and applications, this query can be in the form of language, or an image, an audio, video, or other forms of unstructured data.

On an ecommerce platform, a user can, for instance, try to find ‘clothing for a trek’, when they actually want results around ‘waterproof jackets’, or ‘winter socks’. Keyword, or full-text, or even synonym search would fail to provide any response to such a query. Similarly, on a music app, a user might be looking for songs that sound similar to an audio clip they have heard. Or, they might want to look up furniture that has a similar look as the one they saw on a trip.

DSPy vs LangChain: A Comprehensive Framework Comparison

info@qdrant.tech (Andrey Vasnetsov) — Fri, 23 Feb 2024 08:00:00 -0300

The Evolving Landscape of AI Frameworks

As Large Language Models (LLMs) and vector stores have become steadily more powerful, a new generation of frameworks has appeared which can streamline the development of AI applications by leveraging LLMs and vector search technology. These frameworks simplify the process of building everything from Retrieval Augmented Generation (RAG) applications to complex chatbots with advanced conversational abilities, and even sophisticated reasoning-driven AI applications.

The most well-known of these frameworks is possibly LangChain. Launched in October 2022 as an open-source project by Harrison Chase, the project quickly gained popularity, attracting contributions from hundreds of developers on GitHub. LangChain excels in its broad support for documents, data sources, and APIs. This, along with seamless integration with vector stores like Qdrant and the ability to chain multiple LLMs, has allowed developers to build complex AI applications without reinventing the wheel.

Qdrant Summer of Code 24

info@qdrant.tech (Andrey Vasnetsov) — Wed, 21 Feb 2024 00:39:53 +0000

Google Summer of Code (#GSoC) is celebrating its 20th anniversary this year with the 2024 program. Over the past 20 years, 19K new contributors were introduced to #opensource through the program under the guidance of thousands of mentors from over 800 open-source organizations in various fields. Qdrant participated successfully in the program last year. Both projects, the UI Dashboard with unstructured data visualization and the advanced Geo Filtering, were completed in time and are now a part of the engine. One of the two young contributors joined the team and continues working on the project.

Dust and Qdrant: Using AI to Unlock Company Knowledge and Drive Employee Productivity

info@qdrant.tech (Andrey Vasnetsov) — Tue, 06 Feb 2024 07:03:26 -0800

One of the major promises of artificial intelligence is its potential to accelerate efficiency and productivity within businesses, empowering employees and teams in their daily tasks. The French company Dust, co-founded by former Open AI Research Engineer Stanislas Polu, set out to deliver on this promise by providing businesses and teams with an expansive platform for building customizable and secure AI assistants.

Challenge

“The past year has shown that large language models (LLMs) are very useful but complicated to deploy,” Polu says, especially in the context of their application across business functions. This is why he believes that the goal of augmenting human productivity at scale is especially a product unlock and not only a research unlock, with the goal to identify the best way for companies to leverage these models. Therefore, Dust is creating a product that sits between humans and the large language models, with the focus on supporting the work of a team within the company to ultimately enhance employee productivity.

The Bitter Lesson of Retrieval in Generative Language Model Workflows - Mikko Lehtimäki | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Mon, 29 Jan 2024 16:31:02 +0000

“If you haven’t heard of the bitter lesson, it’s actually a theorem. It’s based on a blog post by Ricard Sutton, and it states basically that based on what we have learned from the development of machine learning and artificial intelligence systems in the previous decades, the methods that can leverage data and compute tends to or will eventually outperform the methods that are designed or handcrafted by humans.”
– Mikko Lehtimäki

Indexify Unveiled - Diptanu Gon Choudhury | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Fri, 26 Jan 2024 16:40:55 +0000

“We have something like Qdrant, which is very geared towards doing Vector search. And so we understand the shape of the storage system now.”
— Diptanu Gon Choudhury

Diptanu Gon Choudhury is the founder of Tensorlake. They are building Indexify - an open-source scalable structured extraction engine for unstructured data to build near-real-time knowledgebase for AI/agent-driven workflows and query engines. Before building Indexify, Diptanu created the Nomad cluster scheduler at Hashicorp, inventor of the Titan/Titus cluster scheduler at Netflix, led the FBLearner machine learning platform, and built the real-time speech inference engine at Facebook.

Unlocking AI Potential: Insights from Stanislas Polu

info@qdrant.tech (Andrey Vasnetsov) — Fri, 26 Jan 2024 16:22:37 +0000

Qdrant x Dust: How Vector Search Helps Make Work Better with Stanislas Polu

“We ultimately chose Qdrant due to its open-source nature, strong performance, being written in Rust, comprehensive documentation, and the feeling of control.”
– Stanislas Polu

Stanislas Polu is the Co-Founder and an Engineer at Dust. He had previously sold a company to Stripe and spent 5 years there, seeing them grow from 80 to 3000 people. Then pivoted to research at OpenAI on large language models and mathematical reasoning capabilities. He started Dust 6 months ago to make work work better with LLMs.

Announcing Qdrant's $28M Series A Funding Round

info@qdrant.tech (Andrey Vasnetsov) — Tue, 23 Jan 2024 09:00:00 +0000

Today, we are excited to announce our $28M Series A funding round, which is led by Spark Capital with participation from our existing investors Unusual Ventures and 42CAP.

We have seen incredible user growth and support from our open-source community in the past two years - recently exceeding 5M downloads. This is a testament to our mission to build the most efficient, scalable, high-performance vector database on the market. We are excited to further accelerate this trajectory with our new partner and investor, Spark Capital, and the continued support of Unusual Ventures and 42CAP. This partnership uniquely positions us to empower enterprises with cutting edge vector search technology to build truly differentiating, next-gen AI applications at scale.

Introducing Qdrant Cloud on Microsoft Azure

info@qdrant.tech (Andrey Vasnetsov) — Wed, 17 Jan 2024 08:40:42 +0000

Great news! We’ve expanded Qdrant’s managed vector database offering — Qdrant Cloud — to be available on Microsoft Azure. You can now effortlessly set up your environment on Azure, which reduces deployment time, so you can hit the ground running.

Get started

What this means for you:

Rapid application development: Deploy your own cluster through the Qdrant Cloud Console within seconds and scale your resources as needed.
Billion vector scale: Seamlessly grow and handle large-scale datasets with billions of vectors. Leverage Qdrant features like horizontal scaling and binary quantization with Microsoft Azure’s scalable infrastructure.

“With Qdrant, we found the missing piece to develop our own provider independent multimodal generative AI platform at enterprise scale.” – Jeremy Teichmann (AI Squad Technical Lead & Generative AI Expert), Daly Singh (AI Squad Lead & Product Owner) - Bosch Digital.

Qdrant Updated Benchmarks 2024

info@qdrant.tech (Andrey Vasnetsov) — Mon, 15 Jan 2024 09:29:33 -0300

It’s time for an update to Qdrant’s benchmarks!

We’ve compared how Qdrant performs against the other vector search engines to give you a thorough performance analysis. Let’s get into what’s new and what remains the same in our approach.

What’s Changed?

All engines have improved

Since the last time we ran our benchmarks, we received a bunch of suggestions on how to run other engines more efficiently, and we applied them.

Navigating challenges and innovations in search technologies

info@qdrant.tech (Andrey Vasnetsov) — Fri, 12 Jan 2024 15:39:53 +0000

Navigating challenges and innovations in search technologies

We participated in a podcast on search technologies, specifically with retrieval-augmented generation (RAG) in language models.

RAG is a cutting-edge approach in natural language processing (NLP). It uses information retrieval and language generation models. We describe how it can enhance what AI can do to understand, retrieve, and generate human-like text.

More about RAG

Think of RAG as a system that finds relevant knowledge from a vast database. It takes your query, finds the best available information, and then provides an answer.

Optimizing an Open Source Vector Database with Andrey Vasnetsov

info@qdrant.tech (Andrey Vasnetsov) — Wed, 10 Jan 2024 16:04:57 +0000

Optimizing Open Source Vector Search: Strategies from Andrey Vasnetsov at Qdrant

“For systems like Qdrant, scalability and performance in my opinion, is much more important than transactional consistency, so it should be treated as a search engine rather than database.”
– Andrey Vasnetsov

Discussing core differences between search engines and databases, Andrey underlined the importance of application needs and scalability in database selection for vector search tasks.

Andrey Vasnetsov, CTO at Qdrant is an enthusiast of Open Source, machine learning, and vector search. He works on Open Source projects related to Vector Similarity Search and Similarity Learning. He prefers practical over theoretical, working demo over arXiv paper.

Vector Search Complexities: Insights from Projects in Image Search and RAG - Noé Achache | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Tue, 09 Jan 2024 13:51:26 +0000

“I really think it’s something the technology is ready for and would really help this kind of embedding model jumping onto the text search projects.”
– Noé Achache on the future of image embedding

Exploring the depths of vector search? Want an analysis of its application in image search and document retrieval? Noé got you covered.

Noé Achache is a Lead Data Scientist at Sicara, where he worked on a wide range of projects mostly related to computer vision, prediction with structured data, and more recently LLMs.

How to Superpower Your Semantic Search Using a Vector Database Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Tue, 09 Jan 2024 12:27:18 +0000

How to Superpower Your Semantic Search Using a Vector Database with Nicolas Mauti

“We found a trade off between performance and precision in Qdrant’s that were better for us than what we can found on Elasticsearch.”
– Nicolas Mauti

Want precision & performance in freelancer search? Malt’s move to the Qdrant database is a masterstroke, offering geospatial filtering & seamless scaling. How did Nicolas Mauti and the team at Malt identify the need to transition to a retriever-ranker architecture for their freelancer matching app?

Building LLM Powered Applications in Production - Hamza Farooq | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Tue, 09 Jan 2024 12:16:22 +0000

“There are 10 billion search queries a day, estimated half of them go unanswered. Because people don’t actually use search as what we used.”
– Hamza Farooq

How do you think Hamza’s background in machine learning and previous experiences at Google and Walmart Labs have influenced his approach to building LLM-powered applications?

Hamza Farooq, an accomplished educator and AI enthusiast, is the founder of Traversaal.ai. His journey is marked by a relentless passion for AI exploration, particularly in building Large Language Models. As an adjunct professor at UCLA Anderson, Hamza shapes the future of AI by teaching cutting-edge technology courses. At Traversaal.ai, he empowers businesses with domain-specific AI solutions, focusing on conversational search and recommendation systems to deliver personalized experiences. With a diverse career spanning academia, industry, and entrepreneurship, Hamza brings a wealth of experience from time at Google. His overarching goal is to bridge the gap between AI innovation and real-world applications, introducing transformative solutions to the market. Hamza eagerly anticipates the dynamic challenges and opportunities in the ever-evolving field of AI and machine learning.

Building a High-Performance Entity Matching Solution with Qdrant - Rishabh Bhardwaj | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Tue, 09 Jan 2024 11:53:56 +0000

“When we were building proof of concept for this solution, we initially started with Postgres. But after some experimentation, we realized that it basically does not perform very well in terms of recall and speed… then we came to know that Qdrant performs a lot better as compared to other solutions that existed at the moment.”
– Rishabh Bhardwaj

How does the HNSW (Hierarchical Navigable Small World) algorithm benefit the solution built by Rishabh?

FastEmbed: Fast & Lightweight Embedding Generation - Nirant Kasliwal | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Tue, 09 Jan 2024 11:38:59 +0000

“When things are actually similar or how we define similarity. They are close to each other and if they are not, they’re far from each other. This is what a model or embedding model tries to do.”
– Nirant Kasliwal

Heard about FastEmbed? It’s a game-changer. Nirant shares tricks on how to improve your embedding models. You might want to give it a shot!

Nirant Kasliwal, the creator and maintainer of FastEmbed, has made notable contributions to the Finetuning Cookbook at OpenAI Cookbook. His contributions extend to the field of Natural Language Processing (NLP), with over 5,000 copies of the NLP book sold.

When music just doesn't match our vibe, can AI help? - Filip Makraduli | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Tue, 09 Jan 2024 10:44:20 +0000

“Was it possible to somehow maybe find a way to transfer this feeling that we have this vibe and get the help of AI to understand what exactly we need at that moment in terms of songs?”
– Filip Makraduli

Imagine if the recommendation system could understand spoken instructions or hummed melodies. This would greatly impact the user experience and accuracy of the recommendations.

Filip Makraduli, an electrical engineering graduate from Skopje, Macedonia, expanded his academic horizons with a Master’s in Biomedical Data Science from Imperial College London.

Binary Quantization - Andrey Vasnetsov | Vector Space Talks

info@qdrant.tech (Andrey Vasnetsov) — Tue, 09 Jan 2024 10:30:10 +0000

“Everything changed when we actually tried binary quantization with OpenAI model.”
– Andrey Vasnetsov

Ever wonder why we need quantization for vector indexes? Andrey Vasnetsov explains the complexities and challenges of searching through proximity graphs. Binary quantization reduces storage size and boosts speed by 30x, but not all models are compatible.

Andrey worked as a Machine Learning Engineer most of his career. He prefers practical over theoretical, working demo over arXiv paper. He is currently working as the CTO at Qdrant a Vector Similarity Search Engine, which can be used for semantic search, similarity matching of text, images or even videos, and also recommendations.

Loading Unstructured.io Data into Qdrant from the Terminal

info@qdrant.tech (Andrey Vasnetsov) — Tue, 09 Jan 2024 00:41:38 +0530

Building powerful applications with Qdrant starts with loading vector representations into the system. Traditionally, this involves scraping or extracting data from sources, performing operations such as cleaning, chunking, and generating embeddings, and finally loading it into Qdrant. While this process can be complex, Unstructured.io includes Qdrant as an ingestion destination.

In this blog post, we’ll demonstrate how to load data into Qdrant from the channels of a Discord server. You can use a similar process for the 20+ vetted data sources supported by Unstructured.

Chat with a codebase using Qdrant and N8N

info@qdrant.tech (Andrey Vasnetsov) — Sat, 06 Jan 2024 04:09:05 +0530

n8n (pronounced n-eight-n) helps you connect any app with an API. You can then manipulate its data with little or no code. With the Qdrant node on n8n, you can build AI-powered workflows visually.

Let’s go through the process of building a workflow. We’ll build a chat with a codebase service.

Prerequisites

A running Qdrant instance. If you need one, use our Quick start guide to set it up.
An OpenAI API Key. Retrieve your key from the OpenAI API page for your account.
A GitHub access token. If you need to generate one, start at the GitHub Personal access tokens page.

Building the App

Our workflow has two components. Refer to the n8n quick start guide to get acquainted with workflow semantics.

"Vector search and applications" by Andrey Vasnetsov, CTO at Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 11 Dec 2023 12:16:42 +0000

Andrey Vasnetsov, Co-founder and CTO at Qdrant has shared about vector search and applications with Learn NLP Academy.

He covered the following topics:

Qdrant search engine and Quaterion similarity learning framework;
Similarity learning to multimodal settings;
Elastic search embeddings vs vector search engines;
Support for multiple embeddings;
Fundraising and VC discussions;
Vision for vector search evolution;
Finetuning for out of domain.

From Content Quality to Compression: The Evolution of Embedding Models at Cohere with Nils Reimers

info@qdrant.tech (Andrey Vasnetsov) — Sun, 19 Nov 2023 12:48:36 +0000

For the second edition of our Vector Space Talks we were joined by none other than Cohere’s Head of Machine Learning Nils Reimers.

Key Takeaways

Let’s dive right into the five key takeaways from Nils’ talk:

Content Quality Estimation: Nils explained how embeddings have traditionally focused on measuring topic match, but content quality is just as important. He demonstrated how their model can differentiate between informative and non-informative documents.
Compression-Aware Training: He shared how they’ve tackled the challenge of reducing the memory footprint of embeddings, making it more cost-effective to run vector databases on platforms like Qdrant.

Pienso & Qdrant: Future Proofing Generative AI for Enterprise-Level Customers

info@qdrant.tech (Andrey Vasnetsov) — Tue, 28 Feb 2023 09:48:00 +0000

The partnership between Pienso and Qdrant is set to revolutionize interactive deep learning, making it practical, efficient, and scalable for global customers. Pienso’s low-code platform provides a streamlined and user-friendly process for deep learning tasks. This exceptional level of convenience is augmented by Qdrant’s scalable and cost-efficient high vector computation capabilities, which enable reliable retrieval of similar vectors from high-dimensional spaces.

Together, Pienso and Qdrant will empower enterprises to harness the full potential of generative AI on a large scale. By combining the technologies of both companies, organizations will be able to train their own large language models and leverage them for downstream tasks that demand data sovereignty and model autonomy. This collaboration will help customers unlock new possibilities and achieve advanced AI-driven solutions. Strengthening LLM Performance

Powering Bloop semantic code search

info@qdrant.tech (Andrey Vasnetsov) — Tue, 28 Feb 2023 09:48:00 +0000

Founded in early 2021, bloop was one of the first companies to tackle semantic search for codebases. A fast, reliable Vector Search Database is a core component of a semantic search engine, and bloop surveyed the field of available solutions and even considered building their own. They found Qdrant to be the top contender and now use it in production.

This document is intended as a guide for people who intend to introduce semantic search to a novel field and want to find out if Qdrant is a good solution for their use case.

Full-text filter and index are already available!

info@qdrant.tech (Andrey Vasnetsov) — Wed, 16 Nov 2022 00:00:00 -0800

Qdrant is designed as an efficient vector database, allowing for a quick search of the nearest neighbours. But, you may find yourself in need of applying some extra filtering on top of the semantic search. Up to version 0.10, Qdrant was offering support for keywords only. Since 0.10, there is a possibility to apply full-text constraints as well. There is a new type of filter that you can use to do that, also combined with every other filter type.

Optimizing Semantic Search by Managing Multiple Vectors

info@qdrant.tech (Andrey Vasnetsov) — Wed, 05 Oct 2022 00:00:00 -0800

How to Optimize Vector Storage by Storing Multiple Vectors Per Object

In a real case scenario, a single object might be described in several different ways. If you run an e-commerce business, then your items will typically have a name, longer textual description and also a bunch of photos. While cooking, you may care about the list of ingredients, and description of the taste but also the recipe and the way your meal is going to look. Up till now, if you wanted to enable semantic search with multiple vectors per object, Qdrant would require you to create separate collections for each vector type, even though they could share some other attributes in a payload. However, since Qdrant 0.10 you are able to store all those vectors together in the same collection and share a single copy of the payload!

Mastering Batch Search for Vector Optimization

info@qdrant.tech (Andrey Vasnetsov) — Mon, 26 Sep 2022 00:00:00 -0800

How to Optimize Vector Search Using Batch Search in Qdrant 0.10.0

The latest release of Qdrant 0.10.0 has introduced a lot of functionalities that simplify some common tasks. Those new possibilities come with some slightly modified interfaces of the client library. One of the recently introduced features is the possibility to query the collection with multiple vectors at once — a batch search mechanism.

There are a lot of scenarios in which you may need to perform multiple non-related tasks at the same time. Previously, you only could send several requests to Qdrant API on your own. But multiple parallel requests may cause significant network overhead and slow down the process, especially in case of poor connection speed.

Qdrant supports ARM architecture!

info@qdrant.tech (Andrey Vasnetsov) — Wed, 21 Sep 2022 09:49:53 +0000

The processor architecture is a thing that the end-user typically does not care much about, as long as all the applications they use run smoothly. If you use a PC then chances are you have an x86-based device, while your smartphone rather runs on an ARM processor. In 2020 Apple introduced their ARM-based M1 chip which is used in modern Mac devices, including notebooks. The main differences between those two architectures are the set of supported instructions and energy consumption. ARM’s processors have a way better energy efficiency and are cheaper than their x86 counterparts. That’s why they became available as an affordable alternative in the hosting providers, including the cloud.

Qdrant has joined NVIDIA Inception Program

info@qdrant.tech (Andrey Vasnetsov) — Mon, 04 Apr 2022 12:06:36 +0000

Recently we’ve become a member of the NVIDIA Inception. It is a program that helps boost the evolution of technology startups through access to their cutting-edge technology and experts, connects startups with venture capitalists, and provides marketing support.

Along with the various opportunities it gives, we are the most excited about GPU support since it is an essential feature in Qdrant’s roadmap. Stay tuned for our new updates.

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Agno

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Agno

Agno is an incredibly fast multi-agent framework, runtime and UI. It enables you to build multi-agent systems with memory, knowledge, human-in-the-loop capabilities, and Model Context Protocol (MCP) support.

You can orchestrate agents as multi-agent teams (providing more autonomy) or step-based agentic workflows (offering more control). Agno works seamlessly with Qdrant as a vector database for knowledge bases, enabling efficient storage and retrieval of information for your AI agents.

Airbyte

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Airbyte

Airbyte is an open-source data integration platform that helps you replicate your data between different systems. It has a growing list of connectors that can be used to ingest data from multiple sources. Building data pipelines is also crucial for managing the data in Qdrant, and Airbyte is a great tool for this purpose.

Airbyte may take care of the data ingestion from a selected source, while Qdrant will help you to build a search engine on top of it. There are three supported modes of how the data can be ingested into Qdrant:

Aleph Alpha

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Using Aleph Alpha Embeddings with Qdrant

Aleph Alpha is a multimodal and multilingual embeddings’ provider. Their API allows creating the embeddings for text and images, both in the same latent space. They maintain an official Python client that might be installed with pip:

pip install aleph-alpha-client

There is both synchronous and asynchronous client available. Obtaining the embeddings for an image and storing it into Qdrant might be done in the following way:

Apache Airflow

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Apache Airflow

Apache Airflow is an open-source platform for authoring, scheduling and monitoring data and computing workflows. Airflow uses Python to create workflows that can be easily scheduled and monitored.

Qdrant is available as a provider in Airflow to interface with the database.

Prerequisites

Before configuring Airflow, you need:

A Qdrant instance to connect to. You can set one up in our installation guide.
A running Airflow instance. You can use their Quick Start Guide.

Apache Spark

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Apache Spark

Spark is a distributed computing framework designed for big data processing and analytics. The Qdrant-Spark connector enables Qdrant to be a storage destination in Spark.

Installation

To integrate the connector into your Spark environment, get the JAR file from one of the sources listed below.

GitHub Releases

The packaged jar file with all the required dependencies can be found here.

Building from Source

To build the jar from source, you need JDK@8 and Maven installed. Once the requirements have been satisfied, run the following command in the project root.

Apify

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Apify

Apify is a web scraping and browser automation platform featuring an app store with over 1,500 pre-built micro-apps known as Actors. These serverless cloud programs, which are essentially dockers under the hood, are designed for various web automation applications, including data collection.

One such Actor, built especially for AI and RAG applications, is Website Content Crawler.

It’s ideal for this purpose because it has built-in HTML processing and data-cleaning functions. That means you can easily remove fluff, duplicates, and other things on a web page that aren’t relevant, and provide only the necessary data to the language model.

AutoGen

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Microsoft AutoGen

AutoGen is an open-source programming framework for building AI agents and facilitating cooperation among multiple agents to solve tasks.

Multi-agent conversations: AutoGen agents can communicate with each other to solve tasks. This allows for more complex and sophisticated applications than would be possible with a single LLM.
Customization: AutoGen agents can be customized to meet the specific needs of an application. This includes the ability to choose the LLMs to use, the types of human input to allow, and the tools to employ.

AWS Bedrock

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Bedrock Embeddings

You can use AWS Bedrock with Qdrant. AWS Bedrock supports multiple embedding model providers.

You’ll need the following information from your AWS account:

Region
Access key ID
Secret key

To configure your credentials, review the following AWS article: How do I create an AWS access key.

With the following code sample, you can generate embeddings using the Titan Embeddings G1 - Text model which produces sentence embeddings of size 1536.

AWS Lakechain

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

AWS Lakechain

Project Lakechain is a framework based on the AWS Cloud Development Kit (CDK), allowing to express and deploy scalable document processing pipelines on AWS using infrastructure-as-code. It emphasizes on modularity and extensibility of pipelines, and provides 60+ ready to use components for prototyping complex processing pipelines that scale out of the box to millions of documents.

The Qdrant storage connector available with Lakechain enables uploading vector embeddings produced by other middlewares to a Qdrant collection.

Brand Resources

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Bug Bounty Program

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Bug Bounty Program Overview

We prioritize user trust and adhere to the highest privacy and security standards. This is why we actively invite security experts to identify vulnerabilities and commit to collaborating with them to resolve issues swiftly and effectively. Qdrant values the security research community and supports the responsible disclosure of vulnerabilities in our products and services. Through our bug bounty program, we reward researchers who help enhance the security of our platform.

BuildShip

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

BuildShip

BuildShip is a low-code visual builder to create APIs, scheduled jobs, and backend workflows with AI assistance.

You can use the Qdrant integration to development workflows with semantic-search capabilities.

Prerequisites

A Qdrant instance to connect to. You can get a free cloud instance at cloud.qdrant.io.
A BuildsShip for developing workflows.

Nodes

Nodes are are fundamental building blocks of BuildShip. Each responsible for an operation in your workflow.

The Qdrant integration includes the following nodes with extensibility if required.

CamelAI

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Camel

Camel is a Python framework to build and use LLM-based agents for real-world task solving.

Qdrant is available as a storage mechanism in Camel for ingesting and retrieving semantically similar data.

Usage With Qdrant

Install Camel with the vector-databases extra.

pip install "camel[vector-databases]"

Configure the QdrantStorage class.

from camel.storages import QdrantStorage, VectorDBQuery, VectorRecord
from camel.types import VectorDistance

qdrant_storage = QdrantStorage(
 url_and_api_key=(
 "https://xyz-example.eu-central.aws.cloud.qdrant.io:6333",
 "<provide-your-own-key>",
 ),
 collection_name="{collection_name}",
 distance=VectorDistance.COSINE,
 vector_dim=384,
)

The QdrantStorage class implements methods to read and write to a Qdrant instance. An instance of this class can now be passed to retrievers for interfacing with your Qdrant collections.

Cheshire Cat

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Cheshire Cat

Cheshire Cat is an open-source framework that allows you to develop intelligent agents on top of many Large Language Models (LLM). You can develop your custom AI architecture to assist you in a wide range of tasks.

Cheshire Cat and Qdrant

Cheshire Cat uses Qdrant as the default Vector Memory for ingesting and retrieving documents.

# Decide host and port for your Cat. Default will be localhost:1865
CORE_HOST=localhost
CORE_PORT=1865

# Qdrant server
# QDRANT_HOST=localhost
# QDRANT_PORT=6333

Cheshire Cat takes great advantage of the following features of Qdrant:

Chonkie

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Chonkie

Chonkie is a no-nonsense, ultra-light, and lightning-fast chunking library designed for RAG (Retrieval-Augmented Generation) applications.

Chonkie integrates seamlessly with Qdrant through the QdrantHandshake class, allowing you to chunk, embed, and store text data without ever leaving the Chonkie SDK.

Setup

Install Chonkie with Qdrant support:

pip install "chonkie[qdrant]"

Basic Usage

The QdrantHandshake provides a simple interface for storing and searching chunks:

from chonkie import QdrantHandshake, SemanticChunker

# Initialize handshake with custom embedding model
handshake = QdrantHandshake(
 url="http://localhost:6333",
 collection_name="my_documents",
 embedding_model="sentence-transformers/all-MiniLM-L6-v2"
)

# Create and write chunks
chunker = SemanticChunker()
chunks = chunker.chunk("Your text content here...")
handshake.write(chunks)

# Search using natural language
results = handshake.search(query="your search query", limit=5)
for result in results:
 print(f"{result['score']}: {result['text']}")

Qdrant Cloud

handshake = QdrantHandshake(
 url="https://your-cluster.qdrant.io",
 api_key="your-api-key",
 collection_name="my_collection",
 embedding_model="BAAI/bge-small-en-v1.5" # Change to your preferred model
)

Complete RAG Pipeline

Build end-to-end RAG pipelines using Chonkie’s fluent Pipeline API:

CocoIndex

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

CocoIndex

CocoIndex is a high performance ETL framework to transform data for AI, with real-time incremental processing.

Qdrant is available as a native built-in vector database to store and retrieve embeddings.

Install CocoIndex:

pip install -U cocoindex

Install Postgres with Docker Compose:

docker compose -f <(curl -L https://raw.githubusercontent.com/cocoindex-io/cocoindex/refs/heads/main/dev/postgres.yaml) up -d

CocoIndex is a stateful ETL framework and only processes data that has changed. It uses Postgres as a metadata store to track the state of the data.

Cognee

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Cognee

Embeddings make it easy to retrieve similar chunks of information — but most agent tasks require more: structure, temporal context, and cross-document reasoning. That’s where Cognee comes in: it turns raw data sources into AI memory —a semantic data layer based on a modular, queryable knowledge graph backed by embeddings, so agents can retrieve, reason, and remember with structure.

Why Qdrant For The Memory Layer

At runtime, Cognee’s semantic memory layer requires fast and predictable lookups to surface candidates for graph reasoning, as well as tight control over metadata to ground multi-hop traversals. Qdrant’s design aligns with those needs with its:

Cohere

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Cohere

Qdrant is compatible with Cohere co.embed API and its official Python SDK that might be installed as any other package:

pip install cohere

The embeddings returned by co.embed API might be used directly in the Qdrant client’s calls:

import cohere
import qdrant_client
from qdrant_client.models import Batch

cohere_client = cohere.Client("<< your_api_key >>")
qdrant_client = qdrant_client.QdrantClient()
qdrant_client.upsert(
 collection_name="MyCollection",
 points=Batch(
 ids=[1],
 vectors=cohere_client.embed(
 model="large",
 texts=["The best vector database"],
 ).embeddings,
 ),
)

If you are interested in seeing an end-to-end project created with co.embed API and Qdrant, please check out the “Question Answering as a Service with Cohere and Qdrant” article.

Compare all Qdrant Cloud capabilities

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Confluent Kafka

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Built by the original creators of Apache Kafka®, Confluent Cloud is a cloud-native and complete data streaming platform available on AWS, Azure, and Google Cloud. The platform includes a fully managed, elastically scaling Kafka engine, 120+ connectors, serverless Apache Flink®, enterprise-grade security controls, and a robust governance suite.

With our Qdrant-Kafka Sink Connector, Qdrant is part of the Connect with Confluent technology partner program. It brings fully managed data streams directly to organizations from Confluent Cloud, making it easier for organizations to stream any data to Qdrant with a fully managed Apache Kafka service.

Credits

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Icons made by srip from flaticon.com

Email Marketing Vector created by storyset from freepik.com

CrewAI

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

CrewAI

CrewAI is a framework for orchestrating role-playing, autonomous AI agents. By leveraging collaborative intelligence, CrewAI allows agents to work together seamlessly, tackling complex tasks.

The framework has a sophisticated memory system designed to significantly enhance the capabilities of AI agents. This system aids agents to remember, reason, and learn from past interactions. You can use Qdrant to store short-term memory and entity memories of CrewAI agents.

Short-Term Memory

Temporarily stores recent interactions and outcomes using RAG, enabling agents to recall and utilize information relevant to their current context during the current executions.

Dagster

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Dagster

Dagster is a Python framework for data orchestration built for data engineers, with integrated lineage, observability, a declarative programming model, and best-in-class testability.

The dagster-qdrant library lets you integrate Qdrant’s vector database with Dagster, making it easy to build AI-driven data pipelines. You can run vector searches and manage data directly within Dagster.

Installation

pip install dagster dagster-qdrant

Example

from dagster_qdrant import QdrantConfig, QdrantResource

import dagster as dg


@dg.asset
def my_table(qdrant_resource: QdrantResource):
 with qdrant_resource.get_client() as qdrant:
 qdrant.add(
 collection_name="test_collection",
 documents=[
 "This is a document about oranges",
 "This is a document about pineapples",
 "This is a document about strawberries",
 "This is a document about cucumbers",
 ],
 )
 results = qdrant.query(
 collection_name="test_collection", query_text="hawaii", limit=3
 )


defs = dg.Definitions(
 assets=[my_table],
 resources={
 "qdrant_resource": QdrantResource(
 config=QdrantConfig(
 host="xyz-example.eu-central.aws.cloud.qdrant.io",
 api_key="<your-api-key>",
 )
 )
 },
)

Next steps

Dagster documentation

Datadog

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Datadog is a cloud-based monitoring and analytics platform that offers real-time monitoring of servers, databases, and numerous other tools and services. It provides visibility into the performance of applications and enables businesses to detect issues before they affect users.

You can install the Qdrant integration to get real-time metrics to monitor your Qdrant deployment within Datadog including:

The performance of REST and gRPC interfaces with metrics such as total requests, total failures, and time to serve to identify potential bottlenecks and mitigate them.

DeepEval

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

DeepEval

DeepEval by Confident AI is an open-source framework for testing large language model systems. Similar to Pytest but designed for LLM outputs, it evaluates metrics like G-Eval, hallucination, answer relevancy.

DeepEval can be integrated with Qdrant to evaluate RAG pipelines — ensuring your LLM applications return relevant, grounded, and faithful responses based on retrieved vector search context.

How it works

A test case is a blueprint provided by DeepEval to unit test LLM outputs. There are two types of test cases in DeepEval:

DLT

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

DLT(Data Load Tool)

DLT is an open-source library that you can add to your Python scripts to load data from various and often messy data sources into well-structured, live datasets.

With the DLT-Qdrant integration, you can now select Qdrant as a DLT destination to load data into.

DLT Enables

Automated maintenance - with schema inference, alerts and short declarative code, maintenance becomes simple.
Run it where Python runs - on Airflow, serverless functions, notebooks. Scales on micro and large infrastructure alike.
User-friendly, declarative interface that removes knowledge obstacles for beginners while empowering senior professionals.

Usage

To get started, install dlt with the qdrant extra.

Dynamiq

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Dynamiq

Dynamiq is your all-in-one Gen AI framework, designed to streamline the development of AI-powered applications. Dynamiq specializes in orchestrating retrieval-augmented generation (RAG) and large language model (LLM) agents.

Qdrant is a vector database available in Dynamiq, capable of serving multiple roles. It can be used for writing and retrieving documents, acting as memory for agent interactions, and functioning as a retrieval tool that agents can call when needed.

Installing

First, ensure you have the dynamiq library installed:

Explore the Qdrant Ecosystem

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Feast

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Feast

Feast (Feature Store) is an open-source feature store that helps teams operate production ML systems at scale by allowing them to define, manage, validate, and serve features for production AI/ML.

Qdrant is available as a supported vectorstore in Feast to integrate in your workflows.

Insatallation

To use the Qdrant online store, you need to install Feast with the qdrant extra.

pip install 'feast[qdrant]'

Usage

An example config with Qdrant could look like:

FiftyOne

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

FiftyOne

FiftyOne is an open-source toolkit designed to enhance computer vision workflows by optimizing dataset quality and providing valuable insights about your models. FiftyOne 0.20, which includes a native integration with Qdrant, supporting workflows like image similarity search and text search.

Qdrant helps FiftyOne to find the most similar images in the dataset using vector embeddings.

FiftyOne is available as a Python package that might be installed in the following way:

Firebase Genkit

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Firebase Genkit

Genkit is a framework to build, deploy, and monitor production-ready AI-powered apps.

You can build apps that generate custom content, use semantic search, handle unstructured inputs, answer questions with your business data, autonomously make decisions, orchestrate tool calls, and more.

You can use Qdrant for indexing/semantic retrieval of data in your Genkit applications via the Qdrant-Genkit plugin.

Genkit currently supports server-side development in JavaScript/TypeScript (Node.js) with Go support in active development.

Gemini

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Gemini

Google Gemini provides embedding models that are capable of mapping text, image, video, audio, and PDFs and their interleaved combinations thereof into a single, unified vector space. Built on the Gemini architecture, it supports 100+ languages.

The following example shows how to integrate Gemini embeddings with Qdrant:

Setup

# Install the packages from PyPI
# pip install google-genai qdrant-client

// Install the packages from npm
// npm install @google/genai @qdrant/js-client-rest

Let’s see how to use the Embedding Model API to embed documents for retrieval.

Google ADK

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Google ADK

Agent Development Kit (ADK) is an open-source, code-first Python framework from Google for building, evaluating, and deploying sophisticated AI agents. While optimized for Gemini, ADK is model-agnostic and compatible with other frameworks.

You can connect ADK agents to Qdrant using the Qdrant MCP Server, giving your agent the ability to store and retrieve information using semantic search.

Installation

pip install google-adk

Usage

from google.adk.agents import Agent
from google.adk.tools.mcp_tool import McpToolset
from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams
from mcp import StdioServerParameters

QDRANT_URL = "http://localhost:6333"
COLLECTION_NAME = "my_collection"

root_agent = Agent(
 model="gemini-2.5-pro",
 name="qdrant_agent",
 instruction="Help users store and retrieve information using semantic search",
 tools=[
 McpToolset(
 connection_params=StdioConnectionParams(
 server_params=StdioServerParameters(
 command="uvx",
 args=["mcp-server-qdrant"],
 env={
 "QDRANT_URL": QDRANT_URL,
 "COLLECTION_NAME": COLLECTION_NAME,
 }
 ),
 timeout=30,
 ),
 )
 ],
)

For available tools and configuration options, see the Qdrant MCP Server documentation.

Haystack

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Haystack

Haystack serves as a comprehensive NLP framework, offering a modular methodology for constructing cutting-edge generative AI, QA, and semantic knowledge base search systems. A critical element in contemporary NLP systems is an efficient database for storing and retrieving extensive text data. Vector databases excel in this role, as they house vector representations of text and implement effective methods for swift retrieval. Thus, we are happy to announce the integration with Haystack - QdrantDocumentStore. This document store is unique, as it is maintained externally by the Qdrant team.

HoneyHive

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

HoneyHive

HoneyHive is an AI evaluation and observability platform for Generative AI applications. HoneyHive’s platform gives developers enterprise-grade tools to debug complex retrieval pipelines, evaluate performance over large test suites, monitor usage in real-time, and manage prompts within a shared workspace. Teams use HoneyHive to iterate faster, detect failures at scale, and deliver exceptional AI products.

By integrating Qdrant with HoneyHive, you can:

Trace vector database operations
Monitor latency, embedding quality, and context relevance
Evaluate retrieval performance in your RAG pipelines
Optimize parameters such as chunk_size or chunk_overlap

Prerequisites

A HoneyHive account and API key
Python 3.8+

Installation

Install the required packages:

Impressum

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Impressum

Angaben gemäß § 5 TMG

Qdrant Solutions GmbH

Chausseestraße 86
10115 Berlin

Vertreten durch:

André Zayarni

Kontakt:

Telefon: +49 30 120 201 01

E-Mail: info@qdrant.com

Registereintrag:

Eintragung im Registergericht: Berlin Charlottenburg
Registernummer: HRB 235335 B

Umsatzsteuer-ID:

Umsatzsteuer-Identifikationsnummer gemäß §27a Umsatzsteuergesetz: DE347779324

Verantwortlich für den Inhalt nach § 55 Abs. 2 RStV:

André Zayarni
Chausseestraße 86
10115 Berlin

Haftungsausschluss:

Haftung für Inhalte

Die Inhalte unserer Seiten wurden mit größter Sorgfalt erstellt. Für die Richtigkeit, Vollständigkeit und Aktualität der Inhalte können wir jedoch keine Gewähr übernehmen. Als Diensteanbieter sind wir gemäß § 7 Abs.1 TMG für eigene Inhalte auf diesen Seiten nach den allgemeinen Gesetzen verantwortlich. Nach §§ 8 bis 10 TMG sind wir als Diensteanbieter jedoch nicht verpflichtet, übermittelte oder gespeicherte fremde Informationen zu überwachen oder nach Umständen zu forschen, die auf eine rechtswidrige Tätigkeit hinweisen. Verpflichtungen zur Entfernung oder Sperrung der Nutzung von Informationen nach den allgemeinen Gesetzen bleiben hiervon unberührt. Eine diesbezügliche Haftung ist jedoch erst ab dem Zeitpunkt der Kenntnis einer konkreten Rechtsverletzung möglich. Bei Bekanntwerden von entsprechenden Rechtsverletzungen werden wir diese Inhalte umgehend entfernen.

InfinyOn Fluvio

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

InfinyOn Fluvio is an open-source platform written in Rust for high speed, real-time data processing. It is cloud native, designed to work with any infrastructure type, from bare metal hardware to containerized platforms.

Usage with Qdrant

With the Qdrant Fluvio Connector, you can stream records from Fluvio topics to Qdrant collections, leveraging Fluvio’s delivery guarantees and high-throughput.

Pre-requisites

A Fluvio installation. You can refer to the Fluvio Quickstart for instructions.
Qdrant server to connect to. You can set up a local instance or a free cloud instance at cloud.qdrant.io.

Downloading the connector

Run the following commands after setting up Fluvio.

Jina Embeddings

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Jina Embeddings

Qdrant is compatible with Jina AI embeddings. You can get a free trial key from Jina Embeddings to get embeddings.

Qdrant users can receive a 10% discount on Jina AI APIs by using the code QDRANT.

Technical Summary

Model	Dimension	Language	MRL (matryoshka)	Context
jina-embeddings-v4	2048 (single-vector), 128 (multi-vector)	Multilingual (30+)	Yes	32768 + Text/Image
jina-clip-v2	1024	Multilingual (100+, focus on 30)	Yes	Text/Image
jina-embeddings-v3	1024	Multilingual (89 languages)	Yes	8192
jina-embeddings-v2-base-en	768	English	No	8192
jina-embeddings-v2-base-de	768	German & English	No	8192
jina-embeddings-v2-base-es	768	Spanish & English	No	8192
jina-embeddings-v2-base-zh	768	Chinese & English	No	8192

Jina recommends using jina-embeddings-v4 for all tasks.

Join our team

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Keboola

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Keboola

Keboola is a data operations platform that integrates data engineering, analytics, and machine learning tools into a single environment. It helps businesses unify their data sources, transform data, and deploy ML models to production.

Prerequisites

A Qdrant instance to connect to. You can get a free cloud instance at cloud.qdrant.io.
A Keboola account to develop your data workflows.

Setting Up

In your Keboola platform, navigate to the Components section.
Find and add the Qdrant component from the component marketplace.
Configure the connection to your Qdrant instance using your URL and API key.

Using Qdrant in Keboola

With Keboola’s Qdrant integration, you can:

Kotaemon

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Kotaemon

Kotaemon is open-source clean & customizable RAG UI for chatting with your documents. Built with both end users and developers in mind.

Qdrant is supported as a vectorstore in Kotaemon for ingesting and retrieving documents.

Configuration

Refer to Getting started guide to set up Kotaemon.
To configure Kotaemon to use Qdrant as the vector store, update the flowsettings.py as follows.

KH_VECTORSTORE = {
 "__type__": "kotaemon.storages.QdrantVectorStore",
 "url": "https://xyz-example.eu-central.aws.cloud.qdrant.io:6333",
 "api_key": "<provide-your-own-key>'",
 "client_kwargs": {} # Additional options to pass to qdrant_client.QdrantClient
}

Restart Kotaemon for the changes to take effect.

The reference for all the Qdrant client options can be found here

LangChain

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

LangChain

LangChain is a library that makes developing Large Language Model-based applications much easier. It unifies the interfaces to different libraries, including major embedding providers and Qdrant. Using LangChain, you can focus on the business value instead of writing the boilerplate.

LangChain distributes the Qdrant integration as a partner package.

It might be installed with pip:

pip install langchain-qdrant

The integration supports searching for relevant documents using dense/sparse and hybrid retrieval.

LangChain4j

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

LangChain for Java

LangChain for Java, also known as Langchain4J, is a community port of Langchain for building context-aware AI applications in Java

You can use Qdrant as a vector store in LangChain4j through the langchain4j-qdrant module.

Setup

Add the langchain4j-qdrant to your project dependencies.

<dependency>
 <groupId>dev.langchain4j</groupId>
 <artifactId>langchain4j-qdrant</artifactId>
 <version>VERSION</version>
</dependency>

Usage

Before you use the following code sample, customize the following values for your configuration:

YOUR_COLLECTION_NAME: Use our Collections guide to create or list collections.
YOUR_HOST_URL: Use the GRPC URL for your system. If you used the Quick Start guide, it may be http://localhost:6334. If you’ve deployed in the Qdrant Cloud, you may have a longer URL such as https://example.location.cloud.qdrant.io:6334.
YOUR_API_KEY: Substitute the API key associated with your configuration.

import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.qdrant.QdrantEmbeddingStore;

EmbeddingStore<TextSegment> embeddingStore =
 QdrantEmbeddingStore.builder()
 // Ensure the collection is configured with the appropriate dimensions
 // of the embedding model.
 // Reference: https://qdrant.tech/documentation/manage-data/collections/
 .collectionName("YOUR_COLLECTION_NAME")
 .host("YOUR_HOST_URL")
 // GRPC port of the Qdrant server
 .port(6334)
 .apiKey("YOUR_API_KEY")
 .build();

QdrantEmbeddingStore supports all the semantic features of LangChain4j.

LangGraph

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

LangGraph

LangGraph is a library for building stateful, multi-actor applications, ideal for creating agentic workflows. It provides fine-grained control over both the flow and state of your application, crucial for creating reliable agents.

You can define flows that involve cycles, essential for most agentic architectures, differentiating it from DAG-based solutions. Additionally, LangGraph includes built-in persistence, enabling advanced human-in-the-loop and memory features.

LangGraph works seamlessly with all the components of LangChain. This means we can utilize Qdrant’s Langchain integration to create retrieval nodes in LangGraph, available in both Python and Javascript!

Leadership

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

LlamaIndex

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

LlamaIndex

Llama Index acts as an interface between your external data and Large Language Models. So you can bring your private data and augment LLMs with it. LlamaIndex simplifies data ingestion and indexing, integrating Qdrant as a vector index.

Installing Llama Index is straightforward if we use pip as a package manager. Qdrant is not installed by default, so we need to install it separately. The integration of both tools also comes as another package.

Make.com

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Make.com

Make is a platform for anyone to design, build, and automate anything—from tasks and workflows to apps and systems without code.

Find the comprehensive list of available Make apps here.

Qdrant is available as an app within Make to add to your scenarios.

Prerequisites

Before you start, make sure you have the following:

A Qdrant instance to connect to. You can get free cloud instance cloud.qdrant.io.
An account at Make.com. You can register yourself here.

Setting up a connection

Navigate to your scenario on the Make dashboard and select a Qdrant app module to start a connection.

Mastra

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Mastra

Mastra is a Typescript framework to build AI applications and features quickly. It gives you the set of primitives you need: workflows, agents, RAG, integrations, syncs and evals. You can run Mastra on your local machine, or deploy to a serverless cloud.

Qdrant is available as a vector store in Mastra node to augment application with retrieval capabilities.

Setup

npm install @mastra/core

Usage

import { QdrantVector } from "@mastra/rag";

const qdrant = new QdrantVector({
 url: "https://xyz-example.eu-central.aws.cloud.qdrant.io:6333"
 apiKey: "<YOUR_API_KEY>",
 https: true
});

Constructor Options

Name	Type	Description
`url`	`string`	REST URL of the Qdrant instance. Eg. https://xyz-example.eu-central.aws.cloud.qdrant.io:6333
`apiKey`	`string`	Optional Qdrant API key
`https`	`boolean`	Whether to use TLS when setting up the connection. Recommended.

Methods

`createIndex()`

Name	Type	Description	Default Value
`indexName`	`string`	Name of the index to create
`dimension`	`number`	Vector dimension size
`metric`	`string`	Distance metric for similarity search	`cosine`

`upsert()`

Name	Type	Description
`vectors`	`number[][]`	Array of embedding vectors
`metadata`	`Record<string, any>[]`	Metadata for each vector (optional)
`namespace`	`string`	Optional namespace for organization

`query()`

Name	Type	Description	Default Value
`vector`	`number[]`	Query vector to find similar vectors
`topK`	`number`	Number of results to return (optional)	`10`
`filter`	`Record<string, any>`	Metadata filters for the query (optional)

`listIndexes()`

Returns an array of index names as strings.

Mem0

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Mem0 is a self-improving memory layer for LLM applications, enabling personalized AI experiences that save costs and delight users. Mem0 remembers user preferences, adapts to individual needs, and continuously improves over time, ideal for chatbots and AI systems.

Mem0 supports various vector store providers, including Qdrant, for efficient data handling and search capabilities.

Installation

To install Mem0 with Qdrant support, use the following command:

pip install mem0ai

Usage

Here’s a basic example of how to use Mem0 with Qdrant:

Microsoft GraphRAG

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Microsoft GraphRAG

Microsoft GraphRAG is a Python library for building knowledge graphs from unstructured text and using them for retrieval-augmented generation. It combines graph-based indexing with vector search to improve the quality and relevance of LLM responses.

Qdrant can be used as a custom vector store backend for GraphRAG, enabling you to leverage Qdrant’s performance and scalability for storing and searching document embeddings.

Installation

Install the required packages:

pip install graphrag qdrant-client

Custom Vector Store Implementation

GraphRAG allows you to register custom vector stores by extending the VectorStore base class:

Microsoft NLWeb

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

NLWeb

Microsoft’s NLWeb is a proposed framework that enables natural language interfaces for websites, using Schema.org, formats like RSS and the emerging MCP protocol.

Qdrant is supported as a vector store backend within NLWeb for embedding storage and context retrieval.

Usage

NLWeb includes Qdrant integration by default. You can install and configure it to use Qdrant as the retrieval engine.

Installation

Clone the repo and set up your environment:

git clone https://github.com/microsoft/NLWeb
cd NLWeb
python -m venv .venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
cd code
pip install -r requirements.txt

Configuring Qdrant

To use Qdrant, update your configuration.

Mistral

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Time: 10 min	Level: Beginner

Mistral

Qdrant is compatible with the new released Mistral Embed and its official Python SDK that can be installed as any other package:

Setup

Install the client

pip install mistralai

And then we set this up:

from mistralai.client import MistralClient
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, VectorParams, Distance

collection_name = "example_collection"

MISTRAL_API_KEY = "your_mistral_api_key"
client = QdrantClient(":memory:")
mistral_client = MistralClient(api_key=MISTRAL_API_KEY)
texts = [
 "Qdrant is the best vector search engine!",
 "Loved by Enterprises and everyone building for low latency, high performance, and scale.",
]

Let’s see how to use the Embedding Model API to embed a document for retrieval.

MixedBread

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Using MixedBread with Qdrant

MixedBread is a unique provider offering embeddings across multiple domains. Their models are versatile for various search tasks when integrated with Qdrant. MixedBread is creating state-of-the-art models and tools that make search smarter, faster, and more relevant. Whether you’re building a next-gen search engine or RAG (Retrieval Augmented Generation) systems, or whether you’re enhancing your existing search solution, they’ve got the ingredients to make it happen.

Mixpeek

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Mixpeek Video Embeddings

Mixpeek’s video processing capabilities allow you to chunk and embed videos, while Qdrant provides efficient storage and retrieval of these embeddings.

Prerequisites

Python 3.7+
Mixpeek API key
Mixpeek client installed (pip install mixpeek)
Qdrant client installed (pip install qdrant-client)

Installation

Install the required packages:

pip install mixpeek qdrant-client

Set up your Mixpeek API key:

from mixpeek import Mixpeek

mixpeek = Mixpeek('your_api_key_here')

Initialize the Qdrant client:

from qdrant_client import QdrantClient

client = QdrantClient("localhost", port=6333)

Usage

1. Create Qdrant Collection

Make sure to create a Qdrant collection before inserting vectors. You can create a collection with the appropriate vector size (768 for “vuse-generic-v1” model) using:

N8N

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

N8N

N8N is an automation platform that allows you to build flexible workflows focused on deep data integration.

Qdrant’s official node for n8n enables semantic search capabilities in your workflows.

Prerequisites

A Qdrant instance to connect to. You can get a free cloud instance at cloud.qdrant.io.
A running N8N instance. You can learn more about using the N8N cloud or self-hosting here.

Setting up the node

Select and install the official Qdrant node from the list of nodes in your workflow editor.

Neo4j GraphRAG

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Neo4j GraphRAG

Neo4j GraphRAG is a Python package to build graph retrieval augmented generation (GraphRAG) applications using Neo4j and Python. As a first-party library, it offers a robust, feature-rich, and high-performance solution, with the added assurance of long-term support and maintenance directly from Neo4j. It offers a Qdrant retriever natively to search for vectors stored in a Qdrant collection.

Installation

pip install neo4j-graphrag[qdrant]

Usage

A vector query with Neo4j and Qdrant could look like:

Nomic

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Nomic

The nomic-embed-text-v1 model is an open source 8192 context length text encoder. While you can find it on the Hugging Face Hub, you may find it easier to obtain them through the Nomic Text Embeddings. Once installed, you can configure it with the official Python client, FastEmbed or through direct HTTP requests.

You can use Nomic embeddings directly in Qdrant client calls. There is a difference in the way the embeddings are obtained for documents and queries.

Nvidia

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Nvidia

Qdrant supports working with Nvidia embeddings.

You can generate an API key to authenticate the requests from the Nvidia Playground.

Setting up the Qdrant client and Nvidia session

import requests
from qdrant_client import QdrantClient

NVIDIA_BASE_URL = "https://ai.api.nvidia.com/v1/retrieval/nvidia/embeddings"

NVIDIA_API_KEY = "<YOUR_API_KEY>"

nvidia_session = requests.Session()

client = QdrantClient(":memory:")

headers = {
 "Authorization": f"Bearer {NVIDIA_API_KEY}",
 "Accept": "application/json",
}

texts = [
 "Qdrant is the best vector search engine!",
 "Loved by Enterprises and everyone building for low latency, high performance, and scale.",
]

import { QdrantClient } from '@qdrant/js-client-rest';

const NVIDIA_BASE_URL = "https://ai.api.nvidia.com/v1/retrieval/nvidia/embeddings"
const NVIDIA_API_KEY = "<YOUR_API_KEY>"

const client = new QdrantClient({ url: 'http://localhost:6333' });

const headers = {
 "Authorization": "Bearer " + NVIDIA_API_KEY,
 "Accept": "application/json",
 "Content-Type": "application/json"
}

const texts = [
 "Qdrant is the best vector search engine!",
 "Loved by Enterprises and everyone building for low latency, high performance, and scale.",
]

The following example shows how to embed documents with the embed-qa-4 model that generates sentence embeddings of size 1024.

Ollama

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Using Ollama with Qdrant

Ollama provides specialized embeddings for niche applications. Ollama supports a variety of embedding models, making it possible to build retrieval augmented generation (RAG) applications that combine text prompts with existing documents or other data in specialized areas.

Installation

You can install the required packages using the following pip command:

pip install ollama qdrant-client

Integration Example

The following code assumes Ollama is accessible at port 11434 and Qdrant at port 6333.

OpenAI

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

OpenAI

Qdrant supports working with OpenAI embeddings.

There is an official OpenAI Python package that simplifies obtaining them, and it can be installed with pip:

pip install openai

Setting up the OpenAI and Qdrant clients

import openai
import qdrant_client

openai_client = openai.Client(
 api_key="<YOUR_API_KEY>"
)

client = qdrant_client.QdrantClient(":memory:")

texts = [
 "Qdrant is the best vector search engine!",
 "Loved by Enterprises and everyone building for low latency, high performance, and scale.",
]

The following example shows how to embed a document with the text-embedding-3-small model that generates sentence embeddings of size 1536. You can find the list of all supported models here.

Our Engineering Culture

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Our Investors

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Our story

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Pipedream

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Pipedream

Pipedream is a development platform that allows developers to connect many different applications, data sources, and APIs in order to build automated cross-platform workflows. It also offers code-level control with Node.js, Python, Go, or Bash if required.

You can use the Qdrant app in Pipedream to add vector search capabilities to your workflows.

Prerequisites

A Qdrant instance to connect to. You can get a free cloud instance at cloud.qdrant.io.
A Pipedream project to develop your workflows.

Setting Up

Search for the Qdrant app in your workflow apps.

POMA

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

POMA + Qdrant: Structure-Preserving Retrieval

Time: 15 min	Level: Beginner/Intermediate	Complete Notebook	Notebook Source

Overview

POMA, as document chunking engine, is built around simplicity for operators: process files into structure-aware chunksets and send them to Qdrant with minimal boilerplate and a patented chunking approach.
Qdrant as your preferred vector search engine.

Together, they combine individual simplicity into one streamlined workflow.

This guide walks through the current POMA AI for Qdrant SDK flow: process documents, upsert chunksets, retrieve structure-preserving cheatsheets, and understand where convenience defaults end and advanced knobs begin.

Power Apps

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Power Apps

Microsoft Power Apps is a suite of apps, services, and connectors that provides a rapid development environment to build custom apps for your business needs. You can quickly build custom business apps that connect to your data stored in many online and on-premises data sources.

You can use the Qdrant Connector in Power Apps to add vector search capabilities to your flows.

Prerequisites

A Qdrant instance to connect to. You can get a free cloud instance at cloud.qdrant.io.
A Power Apps account to develop your flows.

Setting Up

Search for the Qdrant connector when adding a new action in a Power Apps flow. The connector offers an exhaustive list of pre-built Qdrant actions.

Prem AI

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Prem AI

PremAI is a unified generative AI development platform for fine-tuning deploying, and monitoring AI models.

Qdrant is compatible with PremAI APIs.

Installing the SDKs

pip install premai qdrant-client

To install the npm package:

npm install @premai/prem-sdk @qdrant/js-client-rest

Import all required packages

from premai import Prem

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

import Prem from '@premai/prem-sdk';
import { QdrantClient } from '@qdrant/js-client-rest';

Define all the constants

We need to define the project ID and the embedding model to use. You can learn more about obtaining these in the PremAI docs.

Privacy Policy

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Privacy Policy

1. Introduction

In the following, we provide information about the collection of personal data when using:

our website (https://qdrant.tech)
our Cloud Panel (https://cloud.qdrant.io/)
Qdrant’s social media profiles.

Personal data is any data that can be related to a specific natural person, such as their name or IP address.

1.1. Contact details

The controller within the meaning of Art. 4 para. 7 EU General Data Protection Regulation (GDPR) is Qdrant Solutions GmbH, Chausseestraße 86, 10115 Berlin, Germany, email: info@qdrant.com. We are legally represented by André Zayarni.

PrivateGPT

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

PrivateGPT

PrivateGPT is a production-ready AI project that allows you to inquire about your documents using Large Language Models (LLMs) with offline support.

PrivateGPT uses Qdrant as the default vectorstore for ingesting and retrieving documents.

Configuration

Qdrant settings can be configured by setting values to the qdrant property in the settings.yaml file. By default, Qdrant tries to connect to an instance at http://localhost:3000.

Example:

qdrant:
 url: "https://xyz-example.eu-central.aws.cloud.qdrant.io:6333"
 api_key: "<your-api-key>"

The available configuration options are:

Pulumi

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Pulumi is an open source infrastructure as code tool for creating, deploying, and managing cloud infrastructure.

A Qdrant SDK in any of Pulumi’s supported languages can be generated based on the Qdrant Terraform Provider.

Pre-requisites

A Pulumi Installation.
An API key to access the Qdrant cloud API.

Setup

Create a Pulumi project in any of the supported languages by running

mkdir qdrant-pulumi && cd qdrant-pulumi
pulumi new "<LANGUAGE>" -y

Generate a Pulumi SDK for Qdrant by running the following in your Pulumi project directory.

pulumi package add terraform-provider registry.terraform.io/qdrant/qdrant-cloud

Set the Qdrant cloud API as a config value.

pulumi config set qdrant-cloud:apiKey "<QDRANT_CLOUD_API_KEY>" --secret

Example Usage

The following example creates a new Qdrant cluster in Google Cloud Platform (GCP) and returns the URL of the cluster.

RAG Evaluation guide

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Redpanda Connect

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Redpanda Connect is a declarative data-agnostic streaming service designed for efficient, stateless processing steps. It offers transaction-based resiliency with back pressure, ensuring at-least-once delivery when connecting to at-least-once sources with sinks, without the need to persist messages during transit.

Connect pipelines are configured using a YAML file, which organizes components hierarchically. Each section represents a different component type, such as inputs, processors and outputs, and these can have nested child components and dynamic values.

Rig-rs

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Rig-rs

Rig is a Rust library for building scalable, modular, and ergonomic LLM-powered applications. It has full support for LLM completion and embedding workflows with minimal boiler plate.

Rig supports Qdrant as a vectorstore to ingest and search for documents semantically.

Installation

cargo add rig-core rig-qdrant qdrant-client

Usage

Here’s an example ingest and retrieve flow using Rig and Qdrant.

use qdrant_client::{
 qdrant::{PointStruct, QueryPointsBuilder, UpsertPointsBuilder},
 Payload, Qdrant,
};
use rig::{
 embeddings::EmbeddingsBuilder,
 providers::openai::{Client, TEXT_EMBEDDING_3_SMALL},
 vector_store::VectorStoreIndex,
};
use rig_qdrant::QdrantVectorStore;
use serde_json::json;

const COLLECTION_NAME: &str = "rig-collection";

// Initialize Qdrant client.
let client = Qdrant::from_url("http://localhost:6334").build()?;
// Initialize OpenAI client.
let openai_client = Client::new("<OPENAI_API_KEY>");
let model = openai_client.embedding_model(TEXT_EMBEDDING_3_SMALL);

let documents = EmbeddingsBuilder::new(model.clone())
 .simple_document("0981d983-a5f8-49eb-89ea-f7d3b2196d2e", "Definition of a *flurbo*: A flurbo is a green alien that lives on cold planets")
 .simple_document("62a36d43-80b6-4fd6-990c-f75bb02287d1", "Definition of a *glarb-glarb*: A glarb-glarb is a ancient tool used by the ancestors of the inhabitants of planet Jiro to farm the land.")
 .simple_document("f9e17d59-32e5-440c-be02-b2759a654824", "Definition of a *linglingdong*: A term used by inhabitants of the far side of the moon to describe humans.")
 .build()
 .await?;

let points: Vec<PointStruct> = documents
 .into_iter()
 .map(|d| {
 let vec: Vec<f32> = d.embeddings[0].vec.iter().map(|&x| x as f32).collect();
 PointStruct::new(
 d.id,
 vec,
 Payload::try_from(json!({
 "document": d.document,
 }))
 .unwrap(),
 )
 })
 .collect();

client
 .upsert_points(UpsertPointsBuilder::new(COLLECTION_NAME, points))
 .await?;

let query_params = QueryPointsBuilder::new(COLLECTION_NAME).with_payload(true);
let vector_store = QdrantVectorStore::new(client, model, query_params.build());

let results = vector_store
 .top_n::<serde_json::Value>("Define a glarb-glarb?", 1)
 .await?;

println!("Results: {:?}", results);

Salesforce Mulesoft

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Salesforce Mulesoft

MuleSoft Anypoint is an integration platform to connect applications, data, and devices across on-premises and cloud environments. It provides a unified platform to build, manage, and secure APIs and integrations, making digital transformation smoother and more scalable.

MAC Project is an open-source initiative to bring AI capabilities into the MuleSoft ecosystem. It provides connectors to add AI capabilities to an Anypoint project by integrating LLMs, vector databases including Qdrant.

Semantic Search 101

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Build a Semantic Search Engine in 5 Minutes

Time: 5 - 15 min	Level: Beginner

There are two versions of this tutorial:

With the version on this page, you’ll run Qdrant on your own machine. This requires you to manage your own cluster and vector embedding infrastructure.

Alternatively, you can use Qdrant Cloud to deploy a cluster and generate vector embeddings using Qdrant Cloud’s forever free tier (no credit card required). If you prefer this option, check out the Qdrant Cloud version of this tutorial.

Semantic-Router

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Semantic-Router

Semantic-Router is a library to build decision-making layers for your LLMs and agents. It uses vector embeddings to make tool-use decisions rather than LLM generations, routing our requests using semantic meaning.

Qdrant is available as a supported index in Semantic-Router for you to ingest route data and perform retrievals.

Installation

To use Semantic-Router with Qdrant, install the qdrant extra:

pip install semantic-router[qdrant]

Usage

Set up QdrantIndex with the appropriate configurations:

SmolAgents

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

SmolAgents

HuggingFace SmolAgents is a Python library for building AI agents. These agents write Python code to call tools and orchestrate other agents.

It uses CodeAgent. An LLM engine that writes its actions in code. SmolAgents suggests that this approach is demonstrated to work better than the current industry practice of letting the LLM output a dictionary of the tools it wants to call: uses 30% fewer steps (thus 30% fewer LLM calls) and reaches higher performance on difficult benchmarks.

Snowflake Models

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Snowflake

Qdrant supports working with Snowflake text embedding models. You can find all the available models on HuggingFace.

Setting up the Qdrant and Snowflake models

from qdrant_client import QdrantClient
from fastembed import TextEmbedding

qclient = QdrantClient(":memory:")
embedding_model = TextEmbedding("snowflake/snowflake-arctic-embed-s")

texts = [
 "Qdrant is the best vector search engine!",
 "Loved by Enterprises and everyone building for low latency, high performance, and scale.",
]

import {QdrantClient} from '@qdrant/js-client-rest';
import { pipeline } from '@xenova/transformers';

const client = new QdrantClient({ url: 'http://localhost:6333' });

const extractor = await pipeline('feature-extraction', 'Snowflake/snowflake-arctic-embed-s');

const texts = [
 "Qdrant is the best vector search engine!",
 "Loved by Enterprises and everyone building for low latency, high performance, and scale.",
]

The following example shows how to embed documents with the snowflake-arctic-embed-s model that generates sentence embeddings of size 384.

Spring AI

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Spring AI

Spring AI is a Java framework that provides a Spring-friendly API and abstractions for developing AI applications.

Qdrant is available as supported vector database for use within your Spring AI projects.

Installation

You can find the Spring AI installation instructions here.

Add the Qdrant boot starter package.

<dependency>
 <groupId>org.springframework.ai</groupId>
 <artifactId>spring-ai-qdrant-store-spring-boot-starter</artifactId>
</dependency>

Usage

Configure Qdrant with Spring Boot’s application.properties.

spring.ai.vectorstore.qdrant.host=<host of your qdrant instance>
spring.ai.vectorstore.qdrant.port=<the GRPC port of your qdrant instance>
spring.ai.vectorstore.qdrant.api-key=<your api key>
spring.ai.vectorstore.qdrant.collection-name=<The name of the collection to use in Qdrant>

Learn more about these options in the configuration reference.

Stanford DSPy

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Stanford DSPy

DSPy is the framework for solving advanced tasks with language models (LMs) and retrieval models (RMs). It unifies techniques for prompting and fine-tuning LMs — and approaches for reasoning, self-improvement, and augmentation with retrieval and tools.

Provides composable and declarative modules for instructing LMs in a familiar Pythonic syntax.
Introduces an automatic compiler that teaches LMs how to conduct the declarative steps in your program.

Qdrant can be used as a retrieval mechanism in the DSPy flow.

Swiftide

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Swiftide

Swiftide is a Rust library for building LLM applications. It supports everything from simple prompt completions to fast, streaming indexing and querying pipelines, and building composable agents that use tools or call other agents.

High level features

Simple primitives for common LLM tasks
Streaming indexing and querying pipelines
Composable agents and pipelines
Modular, extendable API with minimal abstractions
Integrations with popular LLMs and storage providers
Built-in pipeline transformations (or bring your own)
Graph-like workflows with Tasks
Langfuse support

Installation

Install Swiftide with Qdrant, OpenAI, and Redis support:

Sycamore

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Sycamore

Sycamore is an LLM-powered data preparation, processing, and analytics system for complex, unstructured documents like PDFs, HTML, presentations, and more. With Aryn, you can prepare data for GenAI and RAG applications, power high-quality document processing workflows, and run analytics on large document collections with natural language.

You can use the Qdrant connector to write into and read documents from Qdrant collections.

Writing to Qdrant

To write a Docset to a Qdrant collection in Sycamore, use the docset.write.qdrant(....) function. The Qdrant writer accepts the following arguments:

Talk to Sales

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Terms and Conditions

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Terms and Conditions

Last updated: December 10, 2021

Please read these terms and conditions carefully before using Our Service.

Interpretation and Definitions

Interpretation

The words of which the initial letter is capitalized have meanings defined under the following conditions. The following definitions shall have the same meaning regardless of whether they appear in singular or in plural.

Definitions

For the purposes of these Terms and Conditions:

Affiliate means an entity that controls, is controlled by or is under common control with a party, where “control” means ownership of 50% or more of the shares, equity interest or other securities entitled to vote for election of directors or other managing authority.

Terraform

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

HashiCorp Terraform is an infrastructure as code tool that lets you define both cloud and on-prem resources in human-readable configuration files that you can version, reuse, and share. You can then use a consistent workflow to provision and manage all of your infrastructure throughout its lifecycle.

With the Qdrant Terraform Provider, you can manage the Qdrant cloud lifecycle leveraging all the goodness of Terraform.

Pre-requisites

To use the Qdrant Terraform Provider, you’ll need:

Testcontainers

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Testcontainers

Testcontainers is a testing library that provides easy and lightweight APIs for bootstrapping integration tests with real services wrapped in Docker containers.

Qdrant is available as a Testcontainers module in multiple languages. It facilitates the spawning of a Qdrant instance for end-to-end testing.

Usage

import org.testcontainers.qdrant.QdrantContainer;

QdrantContainer qdrantContainer = new QdrantContainer("qdrant/qdrant");

import (
 "github.com/testcontainers/testcontainers-go"
 "github.com/testcontainers/testcontainers-go/modules/qdrant"
)

qdrantContainer, err := qdrant.RunContainer(ctx, testcontainers.WithImage("qdrant/qdrant"))

import { QdrantContainer } from "@testcontainers/qdrant";

const qdrantContainer = await new QdrantContainer("qdrant/qdrant").start();

from testcontainers.qdrant import QdrantContainer

qdrant_container = QdrantContainer("qdrant/qdrant").start()

var qdrantContainer = new QdrantBuilder()
 .WithImage("qdrant/qdrant")
 .Build();

await qdrantContainer.StartAsync();

Testcontainers modules provide options/methods to configure ENVs, volumes, and virtually everything you can configure in a Docker container.

ToolJet

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

ToolJet

ToolJet is a low-code platform for building business applications. Connect to databases, cloud storages, GraphQL, API endpoints, Airtable, Google sheets, OpenAI, etc and build apps using drag and drop application builder.

Prerequisites

A Qdrant instance to connect to. You can get a free cloud instance at cloud.qdrant.io.
A ToolJet instance to develop your workflows.

Setting Up

Search for the Qdrant plugin in the Tooljet plugins marketplace.
Set up the connection to Qdrant using your instance credentials.

Twelve Labs

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Twelve Labs

Twelve Labs Embed API provides powerful embeddings that represent videos, texts, images, and audio in a unified vector space. This space enables any-to-any searches across different types of content.

By natively processing all modalities, it captures interactions like visual expressions, speech, and context, enabling advanced applications such as sentiment analysis, anomaly detection, and recommendation systems with precision and efficiency.

We’ll look at how to work with Twelve Labs embeddings in Qdrant via the Python and Node SDKs.

txtai

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

txtai

Qdrant might be also used as an embedding backend in txtai semantic applications.

txtai simplifies building AI-powered semantic search applications using Transformers. It leverages the neural embeddings and their properties to encode high-dimensional data in a lower-dimensional space and allows to find similar objects based on their embeddings' proximity.

Qdrant is not built-in txtai backend and requires installing an additional dependency:

pip install qdrant-txtai

The examples and some more information might be found in qdrant-txtai repository.

Unstructured

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Unstructured

Unstructured is a library designed to help preprocess, structure unstructured text documents for downstream machine learning tasks.

Qdrant can be used as an ingestion destination in Unstructured.

Setup

Install Unstructured with the qdrant extra.

pip install "unstructured-ingest[qdrant]"

Usage

Depending on the use case you can prefer the command line or using it within your application.

CLI

unstructured-ingest \
 local \
 --input-path $LOCAL_FILE_INPUT_DIR \
 --chunking-strategy by_title \
 --embedding-provider huggingface \
 --partition-by-api \
 --api-key $UNSTRUCTURED_API_KEY \
 --partition-endpoint $UNSTRUCTURED_API_URL \
 --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
 qdrant-cloud \
 --url $QDRANT_URL \
 --api-key $QDRANT_API_KEY \
 --collection-name $QDRANT_COLLECTION \
 --batch-size 50 \
 --num-processes 1

For a full list of the options the CLI accepts, run unstructured-ingest <upstream connector> qdrant --help

Upstage

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Upstage

Qdrant supports working with the Solar Embeddings API from Upstage.

Solar Embeddings API features dual models for user queries and document embedding, within a unified vector space, designed for performant text processing.

You can generate an API key to authenticate the requests from the Upstage Console.

Setting up the Qdrant client and Upstage session

import requests
from qdrant_client import QdrantClient

UPSTAGE_BASE_URL = "https://api.upstage.ai/v1/solar/embeddings"

UPSTAGE_API_KEY = "<YOUR_API_KEY>"

upstage_session = requests.Session()

client = QdrantClient(url="http://localhost:6333")

headers = {
 "Authorization": f"Bearer {UPSTAGE_API_KEY}",
 "Accept": "application/json",
}

texts = [
 "Qdrant is the best vector search engine!",
 "Loved by Enterprises and everyone building for low latency, high performance, and scale.",
]

import { QdrantClient } from '@qdrant/js-client-rest';

const UPSTAGE_BASE_URL = "https://api.upstage.ai/v1/solar/embeddings"
const UPSTAGE_API_KEY = "<YOUR_API_KEY>"

const client = new QdrantClient({ url: 'http://localhost:6333' });

const headers = {
 "Authorization": "Bearer " + UPSTAGE_API_KEY,
 "Accept": "application/json",
 "Content-Type": "application/json"
}

const texts = [
 "Qdrant is the best vector search engine!",
 "Loved by Enterprises and everyone building for low latency, high performance, and scale.",
]

The following example shows how to embed documents with the recommended solar-embedding-1-large-passage and solar-embedding-1-large-query models that generates sentence embeddings of size 4096.

Vanna.AI

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Vanna.AI

Vanna is a Python package that uses retrieval augmentation to help you generate accurate SQL queries for your database using LLMs.

Vanna works in two easy steps - train a RAG “model” on your data, and then ask questions which will return SQL queries that can be set up to automatically run on your database.

Qdrant is available as a support vector store for ingesting and retrieving your RAG data.

VectaX - Mirror Security

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

VectaX by Mirror Security is an AI-centric access control and encryption system designed for managing and protecting vector embeddings. It combines similarity-preserving encryption with fine-grained RBAC to enable secure storage, retrieval, and operations on vector data.

It can be integrated with Qdrant to secure vector searches.

We’ll see how to do so using basic VectaX vector encryption and the sophisticated RBAC mechanism. You can obtain an API key and the Mirror SDK from the Mirror Security Platform.

Vectorize.io

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Vectorize.io

Vectorize is a SaaS platform that automates data extraction from several sources and lets you quickly deploy real-time RAG pipelines for your unstructured data. It also includes evaluation to help figure out the best strategies for the RAG system.

Vectorize pipelines natively integrate with Qdrant by converting unstructured data into vector embeddings and storing them in a collection. When a pipeline is running, any new change in the source data is immediately processed, keeping the vector index up-to-date.

VoltAgent

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

VoltAgent

VoltAgent is a TypeScript-based open-source framework designed for developing AI agents that support modular tool integration, LLM coordination, and adaptable multi-agent architectures. The framework includes an integrated observability dashboard similar to n8n, enabling visual monitoring of agent operations, action tracking, and streamlined debugging capabilities.

Installation

Create a new VoltAgent project with Qdrant integration:

npm create voltagent-app@latest -- --example with-qdrant
cd with-qdrant

This command generates a fully configured project combining VoltAgent and Qdrant, including example data and two distinct agent implementation patterns.

Voyage AI

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Voyage AI

Qdrant supports working with Voyage AI embeddings. The supported models’ list can be found here.

You can generate an API key from the Voyage AI dashboard to authenticate the requests.

Setting up the Qdrant and Voyage clients

from qdrant_client import QdrantClient
import voyageai

VOYAGE_API_KEY = "<YOUR_VOYAGEAI_API_KEY>"

qclient = QdrantClient(":memory:")
vclient = voyageai.Client(api_key=VOYAGE_API_KEY)

texts = [
 "Qdrant is the best vector search engine!",
 "Loved by Enterprises and everyone building for low latency, high performance, and scale.",
]

import {QdrantClient} from '@qdrant/js-client-rest';

const VOYAGEAI_BASE_URL = "https://api.voyageai.com/v1/embeddings"
const VOYAGEAI_API_KEY = "<YOUR_VOYAGEAI_API_KEY>"

const client = new QdrantClient({ url: 'http://localhost:6333' });

const headers = {
 "Authorization": "Bearer " + VOYAGEAI_API_KEY,
 "Content-Type": "application/json"
}

const texts = [
 "Qdrant is the best vector search engine!",
 "Loved by Enterprises and everyone building for low latency, high performance, and scale.",
]

The following example shows how to embed documents with the voyage-large-2 model that generates sentence embeddings of size 1536.

Welcome to Qdrant Cloud

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000