LinkedIn respects your privacy

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Join now Sign in

From the course: Advanced LLMs with Retrieval Augmented Generation (RAG): Practical Projects for AI Applications

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Hands-on lab: Hybrid search

Hands-on lab: Hybrid search

From the course: Advanced LLMs with Retrieval Augmented Generation (RAG): Practical Projects for AI Applications

Start my 1-month free trial Buy for my team

Hands-on lab: Hybrid search

“

- Here is our notebook about hybrid search. Let's go over it and see how do we build it, how do we use it? We have the usual visual improvements and the warning suppression. Let's start with the actual sparse index. Here we're going to use a library called bm25s. It's a Python implementation of the bm25 algorithm. We're going to build both the tokenizer, we're going to use a Stemmer. We're going to talk about it in a second. We'll load those Python libraries and we're going to load the chunks of the documents from the AI archive that we used before, after they were already chunked using semantic chunking and the contextual retrieval and all the magic that we did so far. If you want to create sparse index, we need a tokenizer. This tokenizer will be slightly different than tokenizer that we had in our embedding 'cause here we are searching for words that we know that are common in the documents. Therefore, we're going to use a Stemmer. A Stemmer will remove s and ing and a few other…

Contents