Image

Re-Thinking What We’ve Known About Storage

TL;DR

Image

I had the honor of recording not one, not two, but THREE lightboard videos for Pure Storage! And they’re up on YouTube for you to enjoy!

For today’s blog post, I want to focus on the subject matter of the first Lightboard video: How Volumes work on FlashArray.

Storage is Simple, Right?

For my many years as a SQL Server database developer and administrator, I always thought rather simplistically of storage. I had a working knowledge of how spinning media worked, and basic SAN & RAID architecture knowledge from a high level. And then flash media came along and I recall learning about its differences and nuances.

But fundamentally, storage still remained a simplistic matter in my mind – it was the physical location to write your data. Frankly, I never thought about how storage and a SAN could offer much much more than simply that.

A Legacy of Spinning Platters

Many of us, myself included, grew up with spinning platters as our primary storage media. Over the early years, engineers have come up with a variety of creative ways to squeeze out better performance. One progression was to move from one single disk to many disks working together collectively in a SAN. That enabled us to stripe or “parallelize” a given workload across many disks rather than just be stuck with the physical performance constraints of a single disk.

Carve It Up

Image

In the above simplified example, we have a SAN with 16 disks. And let’s say that each disk gives us 1,000 IOPs @ 4kb. I have a SQL Server whose workload needs 4,000 IOPs for my data files and 6,000 IOPs for my transaction log. So I would have to create two volumes containing the appropriate number of disks from the SAN to give me the performance characteristics that I require for my workload. (Remember, this is a SIMPLIFIED diagram to illustrate the general point)

Now imagine being a SAN admin trying to have to juggle hundreds of volumes across dozens of attached servers, each with their own performance demands. Not only is that a huge challenge to keep organized, but it’s highly unlikely that every server will have their performance demands met, given the finite number of disks available. What a headache, right?

But what if we were no longer limited by the constraints presented by spinning platters? Can we approach this differently?

Letting Go Of What We Once Knew

One thing that can be a challenge for many technologists, myself especially, is letting go of old practices. Oftentimes those practices were learned a very hard way, so we want to make sure we never have go through whatever rough times again. Even when we’re presented with new technology, we often just stick to the “tried and true” way of doing certain things, because we know it works.

One of things “tried and true” things we can revisit with Pure Storage and FlashArray is the headache of carving up a SAN to get specific performance characteristics for our volumes. When Pure Storage first came to be, they focused solely on all-flash storage. As such, they were not tied to legacy spinning disk paradigms and could dream up new ways of doing things that suited flash storage media.

Abstractions For The Win

On FlashArray, a volume is not a subset or physical allocation of storage media assigned to it. Instead, a volume on FlashArray is just a collection of pointers to wherever the data wound up being landed.

Silly analogy: pretend you’re boarding a plane. On a traditional airline, typically first class boards first and goes to first class, then premium economy passengers go board to their section, then regular economy boards and go to their section, and basic economy finally boards and goes to theirs. But if you were on Southwest Airlines, you can choose your own seat. So you’d board, and simply go wherever you wish (and pretend you report back that you’ve taken a particular seat to an employee). Legacy storage is like that traditional airline where you (data) were limited to sit down in to your respective seat class, because that’s how the airplane was pre-allocated. But on FlashArray, you’re not limited in that way and can simply sit where you like, because you (data) have access to sit anywhere.

Another way of describing it that might resonate is that legacy storage assigned disk storage to a volume and whatever data landed on that volume landed on the corresponding assigned disk. On FlashArray, the data can be landed anywhere on the entire array, and the volume that the data was written to simply stores a pointer to wherever the data wound up on the array.

Fundamental Changes Make a Difference

This key fundamental change in how FlashArray stores your data, opens up a huge realm of other interesting capabilities that were either not possible or much more difficult to accomplish on spinning platters. I like to describe it as software-enhanced storage, because there’s many things we’re doing besides just “writing your data to disk” on the software layer. In fact, we’re not quite writing your raw data to disk… there’s an element of “pre-processing” that takes place. But that’s another blog for another day.

Take 3 Minutes for a Good Laugh

If you want to watch me draw some diagrams on a lightboard that illustrate all of this, then please go watch How Volumes work on FlashArray. It’s only a few minutes long and listening to me on 2x is quite entertaining in of itself. Just be sure to re-watch it to actually listen to the content, because I’m guarantee you’ll be laughing your ass at me chattering at 2x speed. 🙂

Thanks for reading!

T-SQL Tuesday Logo

T-SQL Tuesday #159: SQL Server 2022 & Purvi’s List

Image

Welcome to another edition of T-SQL Tuesday. This month’s blog party is hosted by Deepthi Goguri (b|t) who asks participants to blog about their new favorite feature in SQL Server 2022.

In my opinion, 2022 is absolutely a major release with significant enhancements which should make it compelling to upgrade rather than wait for another release down the line. I’m thrilled for the improvements in Intelligent Query Processing, TempDB, and (after the training helped me ‘get it’) Arc-enabled SQL Servers. But that’s not what I want to blog about today.

It’s Often the Little Things

By sheer coincidence, I had the privilege of being invited to a private SQL Server 2022 workshop taught by Bob Ward last week. And through my job, I also had the privilege of doing some testing work around QAT backups and S3 Data Virtualization during the private preview phase last summer. So while I had exposure and access to SQL Server 2022 for much longer than others, there were many things that Microsoft loaded into the 2022 release that I barely skimmed over or knew were even there.

Towards the end of the workshop, Bob presented a slide called Purvi’s List. Purvi Shah is an engineer on the SQL performance team and as Bob said, “spends her time finding ways to make SQL Server and Azure SQL faster.” When Bob put up Purvi’s List, I let out an audible “holy shit,” much to Grant’s amusement.

Image

So what caught me by surprise?

Instant File Initialization (IFI) for Transaction Logs

Okay, that’s cool!

For simplicity’s sake, I’ll just quote the documentation (as written today):

Transaction log files cannot be initialized instantaneously, however, starting with SQL Server 2022 (16.x), instant file initialization can benefit transaction log autogrowth events up to 64 MB. The default auto growth size increment for new databases is 64 MB. Transaction log file autogrowth events larger than 64 MB cannot benefit from instant file initialization.

https://learn.microsoft.com/en-us/sql/relational-databases/databases/database-instant-file-initialization?view=sql-server-ver16

So yeah, it is limited to 64MB growth size. But another entry on Purvi’s List is that the VLF algorithm has also been improved.

If growth is less than 64 MB, create 4 VLFs that cover the growth size (for example, for 1 MB growth, create 4 VLFs of size 256 KB). … and starting with SQL Server 2022 (16.x) (all editions), this is slightly different. If the growth is less than or equal to 64 MB, the Database Engine creates only 1 VLF to cover the growth size.

https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-transaction-log-architecture-and-management-guide?view=sql-server-ver16

So do I think everyone should now change their Transaction Log autogrow sizes to 64MB? Of course not. But do I think that this kind of small but interesting improvement is still notable and will hopefully be expanded on in a future release to a larger scale? Absolutely!

And there are a few other things on Purvi’s List too: reduced buffer pool I/O promotions and enhancements to the spinlock algorithms. All amazing work!

Thanks for reading.

SSMS v19 & Plan Explorer

If you’re like me, Plan Explorer has always been a must-have tool in your performance tuning arsenal. And one of the things that made it so useful was a simple little SSMS Integration that would allow you to right click on an Execution Plan and see “View with […] Plan Explorer.” 

Unfortunately, I started hearing reports of that no longer being available in SSMS v19. But I know a thing or two, so was willing to bet 30 minutes of time that I could get it back.

How Do I Get The Integration Back?!?

You’ll need another installation of SSMS v18 that already has the Plan Explorer plug-in files.

Navigate to this folder (assuming default install paths):

C:\Program Files (x86)\Microsoft SQL Server Management Studio 18\Common7\IDE\Extensions\SentryOne Plan Explorer SSMS Plugin

Copy the entire SentryOne Plan Explorer SSMS Plugin folder, and copy it to. (The folder might be named slightly differently because… well, you know… )

C:\Program Files (x86)\Microsoft SQL Server Management Studio 19\Common7\IDE\Extensions\

You’ll notice that the only thing that changed with SSMS is the folder name. At the time of this writing, the PE installer hasn’t been updated in an extremely long time and while the fix is trivial, who knows if/when it’ll actually be addressed. But until then, getting the files there and restarting SSMS should do the trick.

Thanks for reading – happy exec plan tuning.