From the course: Advanced LLMs with Retrieval Augmented Generation (RAG): Practical Projects for AI Applications
Unlock this course with a free trial
Join today to access over 24,900 courses taught by industry experts.
Hands-on lab: Hybrid search
From the course: Advanced LLMs with Retrieval Augmented Generation (RAG): Practical Projects for AI Applications
Hands-on lab: Hybrid search
- Here is our notebook about hybrid search. Let's go over it and see how do we build it, how do we use it? We have the usual visual improvements and the warning suppression. Let's start with the actual sparse index. Here we're going to use a library called bm25s. It's a Python implementation of the bm25 algorithm. We're going to build both the tokenizer, we're going to use a Stemmer. We're going to talk about it in a second. We'll load those Python libraries and we're going to load the chunks of the documents from the AI archive that we used before, after they were already chunked using semantic chunking and the contextual retrieval and all the magic that we did so far. If you want to create sparse index, we need a tokenizer. This tokenizer will be slightly different than tokenizer that we had in our embedding 'cause here we are searching for words that we know that are common in the documents. Therefore, we're going to use a Stemmer. A Stemmer will remove s and ing and a few other…