SQL Server 2025 is on the horizon, and one of its new capabilities is support for Vector Search.
To start off, I’ll confess that I was an initial skeptic about AI and its practical uses. I’m not a fan of overused blanket terms/technology and clickbait fads… NoSQL will eliminate all SQL databases… Cloud means the end of on-prem data centers… blah, blah, blah. But I’ve taken time to dig deeper into Vector Search, its role under the umbrella of AI, and I now believe it has practical, legitimate value.
Help Me Find Something
SQL (Structured Query Language) is all about querying data to help us find data that we want. WHERE predicates enable us to do keyword searches, sometimes with wildcards, other times with ranges, and sometimes with crazy programmatic logic.
But this is not how we humans converse on a daily basis. If I’m talking to Deborah, I’m not going to ask her questions using formal SQL Syntax.
SELECT *
FROM dbo.dinner_recipes
WHERE ingredient_list LIKE ‘%chicken%’
AND cooking_time_min <= 30
AND rating > 4.0;
Instead, I’ll communicate with her with “natural” language.
“Hey Deborah. I see we have some chicken in the fridge. What can we make for dinner with that, in under 30 minutes, that’s something that we’ve like when we’ve cooked it before?”
And through the use of natural language, I’m not constrained with how I ask my questions. This gives my questioning abilities far greater power and flexibility.
“I see we have an onion, broccoli, and cheese in the fridge, plus some sourdough bread. Any ideas on what we can do with any of those things to accompany the chicken?”
So what if I could utilize the flexibility of natural language to query the data in a database? That is where vector search plays a key role in changing our data querying and retrieval methodology.
A High Level Breakdown First
Let’s pretend I have an application with search functionality. I’d argue that most applications’ searching capabilities are constrained and limited. We can only searching via specific keywords, maybe with wildcards, ranges of values, and other fairly strict criteria.
But what if that application is upgraded with vector search capabilities? How would that work?
First, I could type a natural language question into an application. That application will encode that question into a vector; I’ll call it a “question vector” for conversation. Prior, I will have also encoded a subset of my human readable data into vectors as well; I’ll refer to that set of vectors as my “dataset vectors.”
My question vector is then compared against my dataset vectors. Vector search will find a set of dataset vectors that are very close to my question vector. This is known as a Similarty Search. That result set of dataset vectors represents the information that answers my original question. Those dataset vectors are then passed to a Large Language Model (LLM). The LLM will then translate those dataset vectors and summarize the results into something that’s human consumable.
In a nutshell, I’ve just described an AI chatbot like ChatGPT. Yes, there’s other details and nuances and extra pieces of the puzzle, this is high level introduction first.
How Can This Help Me Today?
The thing to really understand about this methodology is that it is very useful for analysis of existing data, particularly when looking for data that is “similar to X” as opposed to data that is “equal to X”.
What I’d like to encourage you to do, is to think creatively about the data in your databases today. Think about what questions can you ask of your data, that I cannot easily do so now with traditional query predicates?
Would it be helpful if you could ask of your data…
Are there any common trends throughout?
Are there any anomalies or interesting outliers?
Do we have any data that is similar to this other data?
How About a Real Business Example?
Let’s pretend you have a support ticket system. You regularly get customer tickets for various issues. Each ticket has free text, like the original customer’s complaint write-up, maybe copies of customer/support e-mail exchanges, and internal notes added by support personnel and other internal employees.
If I’m a new support person and I get a ticket about Error 5920, it’s fairly easy to do a traditional keyword search on “Error 5920.” This will help me to find prior tickets and learn how it was resolved by others in the past.
Now, what if I have a customer written description of an error scenario? Maybe it has some instructions to reproduce, but no explicit error message, how can I search that? I might try to cherry pick some key words to then search with, but that can be hit or miss.
Now, if my ticket data was also vectorized and I had vector search capabilities, I could now start asking natural language questions. “I have a customer that clicked this button on this page, and it resulted in this unexpected behavior… has anyone else reported this for build version 3.12 or later?” While any search has the potential to yield false positives, this now has greater possibility to zero in on exactly what you need.
If I have vector search, I can also do bigger picture analysis of data.
Find common issues:
For all tickets logged in the last 30 days, are there any common issues that were reported repeatedly?
Find regressions:
Compare tickets from Build 3.12, 3.5, and 4.0. Are there any new common issues that appeared in 3.12, did not in 3.5, but are now back in 4.0?
Find interesting issues:
For Build 4.0, were there any tickets with much lengthier working notes that might indicate a very complex issue was encountered?
Find anomalies:
Instead of common issues, what issues were reported five times or less, in the last 60 days?
I would strongly argue that these are all very powerful questions to ask of your data, that can be answered rapidly, utilizing the power of Vector Search.
Have an Open Mind & Think Creatively
There’s been a lot of hype and noise around AI. A lot of it is fluff. But I hope that you now have a better idea why I believe that Vector Search can be an extremely powerful tool to more quickly gain key insights into your data in ways that were not reasonably possible before. I would encourage you to think deeper about how these ideas can benefit you in your applications and day-to-day.
Stay tuned for Part 2, when I dive deeper into what a Vector is, and how Vector Search works.
Thanks for reading!
P.S.
Want to give SQL Server 2025 and Vector Search a try? Go download the Public Preview now and you can use my guide to get Ollama setup on your local machine too.