T-SQL Tuesday #193: What I Wish I Knew Back When…

T-SQL Tuesday Logo

Welcome back to another edition of T-SQL Tuesday. This month’s edition is hosted by Mike Walsh, who asks authors to write two short notes to yourself – one in the past, and one from the future.

A Note to My Past Self…

Or otherwise titled, learn from what I wish I knew then and know now.

Understand Business Value

And how to communicate it. This is the lesson that I wish I knew then, that I know now. As a technologist, especially my younger self, I never cared about the business side of things. I went to school for a computer science degree – not a business degree. Business folks are good at what they do, and I wanted to do what I wanted to do, which was play in tech.

But as we all know, oftentimes in our career, to play with tech, we need buy in from the business. There’s a lot of things we must justify and understand the why’s behind it.

  • Want to fix some code? Translate that to how it impacts the business.
  • Want to attend a conference or training? Understand how increased knowledge and skill will bring value to your business

Along those lines, I also wish I knew and understood the value of writing a business plan. Or at least a business justification document. And there are several principles of communication that I wish I understood better… like focusing on solutions that drive to desired outcomes. Former me would have brushed off those terms as business-speak jargon, but nowadays I firmly believe in this mindset.

That’s it for now. Mike may have asked for a message to my future self, but instead Sebastian the dog is demanding his evening walk. 🙂

Until next time – thanks for reading!

SQL Server 2025 – Love for Standard Edition

Super excited that today, SQL Server 2025 is now Generally Available!

Standard Edition Enhancements

While all of the new features and such were available during Public Preview, one thing that was not public until now is that Standard Edition limits have been increased! Yay! The CPU core count limit is now 32 cores (or 4 sockets, whichever is lesser) and the max buffer pool memory per instance is now 256GB! Additionally, Resource Governor is now available for Standard Edition. And in SQL Server 2025, Resource Governor can also help you manage TempDB!

Why This Matters

I still encounter many organizations that rely on SQL Server Standard Edition. And many often delay upgrading SQL Server for one reason or another. In my opinion, the hardware limit changes now give organizations a much more compelling reason to consider 2025 as their next upgrade rather than going to 2022 next. Remember, if you don’t want any of the other new “fancy” 2025 features, you don’t have to use them and/or can turn them off via Database Scoped Configurations.

Check it Out Today

Developer download ISO’s are available now. And oh yeah, don’t forget that you can now install Standard Developer edition, to match your Standard edition prod environment too!

T-SQL Tuesday #192: SQL Server 2025!!!

T-SQL Tuesday Logo

Welcome back to another edition of T-SQL Tuesday! This month’s edition is hosted by Steve Jones, who is asking authors to share what they’re excited about in this new release.

Going Beyond Vector Search

Through the tight partnership between Pure Storage and Microsoft, I’ve had the privilege of “knowing what’s coming next” for quite a while now. It’s why I was able to get a jump start on Vector Search work for example. But that’s not what I’m going to talk about today.

At this point, Vector Search and AI may steal the spotlight, but there’s SO MUCH MORE in the new release! What I’m going to highlight may be the least “shiny” and “flashy” but I believe it’ll have broad impact – Availability Group enhancements.

AG’s Under the Covers

In my role at Pure Storage, I’ve learned a tremendous amount about how AG’s work under the covers. Pure offers storage array options for HA/DR replication and I regularly have conversations with customers about the pros and cons of mixing and matching application level options like AGs with storage array options.

One thing I never really fully understood, until I stumbled upon this blog about a year back, is the construct of Flow Control in AGs. In fact, if you’ve never checked out SQLTableTalk.com, take a moment to check it out – they offer a tremendous amount of deep content about SQL Server (with the two original founders being Microsoft employees – RIP Yvonne).

But I digress. All of the new options in regards to AG’s like max ucs send boxcars are there to improve performance, resiliency, and failover efficiency of Availability Groups. And I feel many DBAs do not fully appreciate how AG’s really work under the covers, which is why I feel that it is worthwhile to highlight these resources.

Your Homework

If you use AG’s today, I would encourage you to refresh your knowledge on how AG’s communicate behind the scenes. Once you do, then look into the 2025 enhancements and I believe you’ll come away with a better appreciation of how 2025 further improves on Availability Groups.

TIL: How to Add Commas to Numeric Output

One thing that’s always driven me crazy is when I have large numbers in my resultsets and the lack of commas for readability. For a lot of different things I do, the more commas a number has, the more attention I want to give to a given value.

Image
Image

When blown up like this, these are easier to read… but when I have hundreds of values that I want to skim, not so much. I’d much prefer to have my used_space also only display 2 decimal places… it’s less noise and I don’t need any more precision for what I’m doing. I did not feel like doing something like CAST(column AS DECIMAL(10, 2)) to force the decimal places, and it still would not get me the commas I wanted.

Out of annoyance, I decided to search to see if someone had a simple UDF out there that could solve my problems. And lo and behold, the heavens opened up as I got an AI generated response in addition to my usual search results…

Image

Mind… blown!!!

I had sort of remembered when FORMAT() came out, but will confess that I never really bothered looking into it. I knew it could do date and time manipulation, but had no idea it could add comma separators AND limit decimal places!!!

Image
Image

Note ‘N2’ for two decimal places and ‘N0’ (zero) for no decimal places.

Moral of this story – it’s never too late to learn something new.

T-SQL Tuesday #189: Musings on AI’s Impact at Work

T-SQL Tuesday Logo

Welcome back to another edition of T-SQL Tuesday. This month’s blog party host is Taiob Ali, who asks “How is AI changing our careers?

Rewind a Year Ago…

Had you asked me this question a year ago, I would have said not at all. Me in the summer of 2024, still looked at AI as a hyped up fad technology. At that point, all I really knew of AI were stories of people using ChatGPT for research with flawed results, writing papers and blogs with hidden inaccuracies, and generating ridiculous images. To me, it was overblown hype and fluff.

What Changed Since Then?

Two things started to change my perspective. First was SQL Server 2025 – I got involved in Private Preview and one of the flagship enhancements was Vector Search support. The second was Anthony Nocentino (b), who started sharing with me how he was making use of AI while coding. I had a number of “what, really, you can do that?” moments. He showed me how these tools were more than just “dumb chatbots” but could take wider inputs like existing code, and how one could iteratively work with them to modify and make changes.

The industry is not only changing, but changing TOO FAST.

I had the fortune of attending an AI and Data conference here in Boston where I heard that quote and it really resonated with me. I think back to other shifts and evolutions in technology in the last 30 years… the rise of the Internet… smartphones… computing… and I would argue that none of these have moved as fast as we’ve seen AI tech evolve in the last handful of years. While I haven’t keep tabs on it, I am very much doing so now.

Where Am I Today?

If you’ve been tracking my blog, you’ll see that I’ve really dug into Vector Search on SQL Server 2025. My “Practical AI in SQL Server 2025: Ollama Quick Start” blog has been one of my most popular blogs ever! And I am super happy with my new presentation: A Practical Introduction to Vector Search in SQL Server 2025.

How Am I Using AI Day to Day?

I don’t spend my day-to-day living in code, like I once did. But I do find myself having to dig deep and troubleshoot interesting things. Just the other day, had a scenario where a customer was setting up a new WSFC cluster but without shared storage, and were confused as to why some nodes saw volumes presented and others did not. We have a Gemini enterprise license, so I used the Pro model to do some deeper research on my behalf. It generated a very in-depth report for me that helped teach me a ton in regards to how Windows recognizes volumes that are presented to it when it is a standalone install vs when it is a node in a cluster.

What was more important to me is that this Deep Research functionality provided over 40 different citations, enabling me to dig deeper at the sources it used to further validate its findings. Hallucinations (or false output) are definitely a real possibility (and the “why” is a complete separate but fascinating topic), so having a more robust output with citations that I can validate, is a turning point for me.

Like It Or Not, Change is Here…

There’s still a lot of hype and a lot of noise to cut through. And there are also very sharp edges and definite threats and dangers too. But I also strongly believe that we must look past the ragebait headlines and dig deeper into the component technology pieces themselves. I believe that we, as technologists, need to better understand each component so that we can utilize these tools in a responsible, ethical manner.

The other aspect that I am very mindful of is that as a data professional, what drives all of these tools? Data! I feel that we data professionals have an amazing opportunity to future-proof our career journey, if we embrace and build our expertise in this industry.

And here’s an AI generated image of a dog, using the AI image generation built into WordPress. I still prefer real photos but this ain’t too shabby.

Image
Prompt: Generate a realistic image of a brown and white dog sitting a desk, with a laptop in front of him, facing the camera. The dog should be a chihuahua or labrador mix, with pointed ears. Behind the dog should be an array of books and bookshelves.

Practical AI in SQL Server 2025: A Vector Demo Database For You

Today, I have the honor and pleasure of debuting a new presentation for MSSQLTips: A Practical Introduction to Vector Search in SQL Server 2025 (you can watch the recording here too). To accompany that new presentation, I opted to create a new demo database instead of retrofitting one of my existing demo databases. And I’m sharing it with you so you don’t have to go through the headache of taking an existing database and creating vector embeddings.

RecipesDemoDB

Background about the Database

This new database was built with SQL Server 2025 CTP 2.1, and backed up using ZSTD-high compression, weighs in around 16GB striped across 8 backup files.

The dbo.recipes table contains just under 500k recipes, and weighs in at about 2GB. This data was sourced from kaggle and is a dump of recipes from food.com.

Next, there’s other tables under the vectors schema, that contain vector embeddings. The naming scheme is such that those tables correspond to the same named column in dbo.recipes. ex: dbo.recipes.description -> vectors.recipes_description. There is one table that is called recipes_other_cols, which is a JSON concatenation of some of the shorter columns from dbo.recipes – name, servings, and serving_size. Each of the vectors.* tables also have a vector index. All of the vector data is about 22 or 23GB, bringing the total database to about 24-25GB in full.

And finally, there’s a few example stored procedures with KNN and ANN code examples. I would also suggest checking my Practical Intro to Vector Search repo which has some other demo code.

You’ll still need to have Ollama setup and make a few changes to match your own environment. Make sure you use the same embedding model that I did (nomic-embed-text) so any vector embeddings you subsequently create match.

And finally, there is also a sub-folder on the demo-dbs repo that has all of the different “steps” that I took to create the various tables and generate the vector embeddings.

Why Should I Use this Database? Creating Vector Embeddings

I am running a Lenovo P14S with a Nvidia GeForce 3080 GPU connected via TBT3 to an external GPU housing. For the ~500k recipes, and 5 or 6 embedding tables, the entire process took an entire weekend. I don’t have an exact time, because I’d kick off one table to process, then come back later/the next day, validate the data, then run the next one. So yeah, it took a while, hence why I thought I’d share this database to save time for others.

Wrapping Up

If you decide to start using this demo database for your own learning and testing of vector search, I’d love to hear about it. And if you write any interesting demo code that you’d be willing to share, that’d also be amazing as well! As always, please let me know if you run into any quirks or have any feedback.

Happy learning – thanks for reading!

Announcing Data Saturday Chicago 2026!!

On behalf of the team that brought you SQL Saturday Chicago, I am ecstatic to announce that we are returning as Data Saturday Chicago on March 14, 2026 at Harper College.


Image

Sign up to receive E-mail Updates


How We Got Here

Chicago has a rich history of SQL Saturday events – NINE events since 2010!!!

  1. SQLSaturday #31April 17, 2010
  2. SQLSaturday #67March 26, 2011
  3. SQLSaturday #119May 19, 2012
  4. SQLSaturday #211April 13, 2013
  5. SQLSaturday #291April 26, 2014
  6. SQLSaturday #484March 5, 2016
  7. SQLSaturday #600March 11, 2017
  8. SQLSaturday #719March 17, 2018
  9. SQLSaturday #825 March 23, 2019

And we were ready to celebrate our 10th SQL Saturday event, SQL Saturday #945, on March 21st of 2020. But well, we all know how that went to straight to hell.

My Personal Journey

Writing this blog post took me down memory lane. I first attended SQL Saturday Chicago in 2012, #119. For #211 in 2013, I opted to volunteer and get involved. As chance would have it, I went from helping stuff attendee bags on Friday night, to being in charge of room monitors all day Saturday! And that started my journey as an official co-organizer, from 2014 onward.

After we got sunk in 2020, I moved to Boston in 2021. As such, I stepped down as co-leader of the Chicago Suburban SQL Server User Group and stepped away from SQL Saturday Chicago. The remaining team found it challenging to reorganize and start again. Frankly speaking, we all got our asses kicked by 2020. The landscape for in-person events is a shade of its former self, and rebooting an event is a hell of a lot more complicated than you might think.

Why Am I Doing This Again (Remotely)?

The answer is simple: I love our data community and the people.

By the community… for the community.

This community has given so much to me and I am more than happy to keep giving back. And I believe that I speak for the Data Saturday Chicago team when I say that this is one of our core values. Speaking of whom…

Introducing the 2026 Organizing Team

I am honored to be a part of an amazing team of volunteers, working together to put this event together for you (in alphabetical order):

  • Andy Yun
  • Bill Lescher
  • Bob Pusateri
  • Brandon Leach
  • Dave Bland
  • Frank Gill
  • Jared Karney
  • Lowry Kozlowski
  • Rod Carlson
  • Ross Reed
  • Wendy Pastrick – who has literally been organizing SQL Sat Chicago since the very first!

Why Rebrand as Data Saturday?

Since its inception, SQL Saturday was all about SQL Server. But back then, SQL Server was a much smaller product. It was possible to learn the entirety of what SQL Server had to offer (who remembers Microsoft Certified Masters – MCMs?).

But today, like the sizes of our databases (and my waistline), SQL Server has grown and evolved. We decided, collectively as a team, to also evolve and rebrand under Data Saturdays. The Microsoft Data Platform will continue to be our primary focus, but nowadays that encompasses so much more than simply SQL Server.

What to Expect Next

We’ve accomplished the hardest task, which is to lock down a suitable venue and date. Next, we will be focusing on our website, getting a registration platform setup, and opening a Call for Speakers. Please be patient with us as we are still a team of volunteers working this in our spare time.

I hope you are as excited as we are. See you in March 2026!

P.S. Sign up to receive e-mail updates!

T-SQL Tuesday #188 – Growing the Next Generation

T-SQL Tuesday Logo

Welcome back to another edition of T-SQL Tuesday. This month’s topic is an inspirational one, hosted by John Sterrett. John would like us to blog about what we can do to help grow the data community.

Growing new speakers is a topic that I have done a lot with already, so I won’t beat that drum. Instead, what I will suggest is something that anyone can do:

TL;DR – Recognize, Ask, & Offer

Step 1: Recognize Talent and Potential in Others

If you’re reading this, you’re part of the data community. You may not speak or blog regularly, but nonetheless, you are a participant. As such, you can watch for and recognize others. Maybe you’re reading an interesting thread on the SQL Server sub-Reddit… maybe you’re watching a conversation on the SQL Slack channel… Look for statements by individuals that pique your interest.

Step 2: Ask or Invite Them to Contribute

You’ve noticed someone talking about something really interesting? Ask them to share more of what they know about that topic. Invite them to write something more in-depth about it. Being asked and invited to share more, is validation for that other individual. And I believe that each one of us should always be looking for opportunities to validate others and raise each other up.

Step 3: Offer to Help

Anyone can help anyone else.

Stop – re-read that first statement again.

You can help.

There’s a mental trap that many of us fall into (I used to be one of those), where we believe that we are not in a position to help, because we do not believe we are an expert. You do not have to be an expert to help. You can help, simply by encouraging someone get their ideas out there and become more involved in the data community. Words of encouragement… or saying thank you for something you found useful… these are all ways to help.

One More Twist

This blog has been written, from the angle of asking you, the reader, to recognize potential in someone else, encouraging you to ask them to contribute, and offering your own help to further grow them.

Are YOU, whoever is reading this right now, the one who has that potential?

Look in the mirror… the answer is yes. You can be a part of the next generation… and there are many out there who will help you. Don’t know where to find someone? Ask… ask me… you’re reading my blog right now… contact me.

Practical AI in SQL Server 2025: Introducing Vector Search – Pt 1

SQL Server 2025 is on the horizon, and one of its new capabilities is support for Vector Search.

To start off, I’ll confess that I was an initial skeptic about AI and its practical uses. I’m not a fan of overused blanket terms/technology and clickbait fads… NoSQL will eliminate all SQL databases… Cloud means the end of on-prem data centers… blah, blah, blah. But I’ve taken time to dig deeper into Vector Search, its role under the umbrella of AI, and I now believe it has practical, legitimate value.

Help Me Find Something

SQL (Structured Query Language) is all about querying data to help us find data that we want. WHERE predicates enable us to do keyword searches, sometimes with wildcards, other times with ranges, and sometimes with crazy programmatic logic.

But this is not how we humans converse on a daily basis. If I’m talking to Deborah, I’m not going to ask her questions using formal SQL Syntax.

SELECT *
FROM dbo.dinner_recipes
WHERE ingredient_list LIKE ‘%chicken%’
AND cooking_time_min <= 30
AND rating > 4.0;

Instead, I’ll communicate with her with “natural” language.

“Hey Deborah. I see we have some chicken in the fridge. What can we make for dinner with that, in under 30 minutes, that’s something that we’ve like when we’ve cooked it before?”

And through the use of natural language, I’m not constrained with how I ask my questions. This gives my questioning abilities far greater power and flexibility.

“I see we have an onion, broccoli, and cheese in the fridge, plus some sourdough bread. Any ideas on what we can do with any of those things to accompany the chicken?”

So what if I could utilize the flexibility of natural language to query the data in a database? That is where vector search plays a key role in changing our data querying and retrieval methodology.

A High Level Breakdown First

Let’s pretend I have an application with search functionality. I’d argue that most applications’ searching capabilities are constrained and limited. We can only searching via specific keywords, maybe with wildcards, ranges of values, and other fairly strict criteria.

But what if that application is upgraded with vector search capabilities? How would that work?

First, I could type a natural language question into an application. That application will encode that question into a vector; I’ll call it a “question vector” for conversation. Prior, I will have also encoded a subset of my human readable data into vectors as well; I’ll refer to that set of vectors as my “dataset vectors.”

My question vector is then compared against my dataset vectors. Vector search will find a set of dataset vectors that are very close to my question vector. This is known as a Similarty Search. That result set of dataset vectors represents the information that answers my original question. Those dataset vectors are then passed to a Large Language Model (LLM). The LLM will then translate those dataset vectors and summarize the results into something that’s human consumable.

In a nutshell, I’ve just described an AI chatbot like ChatGPT. Yes, there’s other details and nuances and extra pieces of the puzzle, this is high level introduction first.

How Can This Help Me Today?

The thing to really understand about this methodology is that it is very useful for analysis of existing data, particularly when looking for data that is “similar to X” as opposed to data that is “equal to X”.

What I’d like to encourage you to do, is to think creatively about the data in your databases today. Think about what questions can you ask of your data, that I cannot easily do so now with traditional query predicates?

Would it be helpful if you could ask of your data…

Are there any common trends throughout?

Are there any anomalies or interesting outliers?

Do we have any data that is similar to this other data?

How About a Real Business Example?

Let’s pretend you have a support ticket system. You regularly get customer tickets for various issues. Each ticket has free text, like the original customer’s complaint write-up, maybe copies of customer/support e-mail exchanges, and internal notes added by support personnel and other internal employees.

If I’m a new support person and I get a ticket about Error 5920, it’s fairly easy to do a traditional keyword search on “Error 5920.” This will help me to find prior tickets and learn how it was resolved by others in the past.

Now, what if I have a customer written description of an error scenario? Maybe it has some instructions to reproduce, but no explicit error message, how can I search that? I might try to cherry pick some key words to then search with, but that can be hit or miss.

Now, if my ticket data was also vectorized and I had vector search capabilities, I could now start asking natural language questions. “I have a customer that clicked this button on this page, and it resulted in this unexpected behavior… has anyone else reported this for build version 3.12 or later?” While any search has the potential to yield false positives, this now has greater possibility to zero in on exactly what you need.

If I have vector search, I can also do bigger picture analysis of data.

Find common issues:
For all tickets logged in the last 30 days, are there any common issues that were reported repeatedly?

Find regressions:
Compare tickets from Build 3.12, 3.5, and 4.0. Are there any new common issues that appeared in 3.12, did not in 3.5, but are now back in 4.0?

Find interesting issues:
For Build 4.0, were there any tickets with much lengthier working notes that might indicate a very complex issue was encountered?

Find anomalies:
Instead of common issues, what issues were reported five times or less, in the last 60 days?

I would strongly argue that these are all very powerful questions to ask of your data, that can be answered rapidly, utilizing the power of Vector Search.

Have an Open Mind & Think Creatively

There’s been a lot of hype and noise around AI. A lot of it is fluff. But I hope that you now have a better idea why I believe that Vector Search can be an extremely powerful tool to more quickly gain key insights into your data in ways that were not reasonably possible before. I would encourage you to think deeper about how these ideas can benefit you in your applications and day-to-day.

Stay tuned for Part 2, when I dive deeper into what a Vector is, and how Vector Search works.

Thanks for reading!

P.S.

Want to give SQL Server 2025 and Vector Search a try? Go download the Public Preview now and you can use my guide to get Ollama setup on your local machine too.

SQL Server 2025 – New Backup Magic

SQL Server 2025 Public Preview is not even a week old, but I’m impressed with another new capability that was released – a new backup compression algorithm: ZSTD. This one came as a surprise, despite being part of Private Preview, as it was only released with Public Preview.

TL;DR – ZSTD Can Be Fast!

Today, I’m only going to do a quick blog post to share some initial test results.

This is against a new database that I’m running on a VMware Workstation VM, on my laptop, on a Samsung 990 Pro. I’m backing up my new RecipesDemoDB, which is 25.13 GB in size.

DISK = NUL (x4 files)Speed
COMPRESSION3057442 pages in 49.966 seconds (478.050 MB/sec)
STD – Level = Low3057442 pages in 44.711 seconds (534.236 MB/sec)
STD – Level = Medium3057442 pages in 33.428 seconds (714.558 MB/sec)
STD – Level = High3057442 pages in 73.147 seconds (326.551 MB/sec)
DISK = 4 Backup FilesSpeedTotal Backup Size% size of original
COMPRESSION3057434 pages in 80.761 seconds (295.764 MB/sec)16.52 GB65.74%
STD – Level = Low3057434 pages in 39.920 seconds (598.351 MB/sec)17.37 GB69.13%
STD – Level = Medium3057434 pages in 56.676 seconds (421.451 MB/sec)15.93 GB63.38%
STD – Level = High3057434 pages in 94.440 seconds (252.924 MB/sec)15.86 GB63.10%

Observations and Thoughts

First, note that there’s now three sub-options which is the compression level. Low is the default. It’s interesting to see that in this initial (SINGLE) test, that STD-Low was fastest but the final output was slightly larger than MSXpress (the legacy compression algorithm).

And a note about data composition… ~3-4GB consists of mostly text data (source) and the remaining ~20-21GB consists of vector embeddings + corresponding text chunks. Because of the nature of vector embeddings, that’ll impact the compressibility. I’ll be releasing this new database very soon with additional supporting documentation.

I’ll be writing much more on this topic later, but wanted to share these initial findings. I find them extremely compelling and am looking forward to testing this with larger databases on higher end hardware.

Thanks for reading!