Backup Internals – Part 2: Backup Buffers

Welcome to Part 2 of SQL Server Backup Internals. This blog series is a companion piece to my How to Accelerate Your Database Backups for MSSQLTips.com. In Part 1 of this blog series, I introduced the parts of a BACKUP operation.

Blog Series Tables of Contents

Now let’s start focusing on performance characteristics. We can impact the performance of a BACKUP operation by making changes to or more of the following:

  • Increase the Number of Backup Buffers
  • Increase the Size of Backup Buffers
  • Add more CPU Reader Threads
  • Add more CPU Writer Thread

Analogy Time!

I like to explain things with silly analogies and today will be no exception. Today, let’s pretend that we are in a shipping warehouse.

Image
Track the flow of widgets from the left to the right, then back to the left, in a clockwise fashion.

The person on the left “reads” the contents of the main container and makes copies of each widget, tossing as many widgets as they can into an available yellow basket. When the basket is full, it is sent to the other employee, who takes the contents of each basket and places it into the truck. And they also return the basket back to the first employee to be refilled.

In this silly analogy, the left employee represents a single CPU Reader Thread. They are copying data pages out of a database file and placing them into a yellow basket, which represents a backup buffer. The stack of empty yellow baskets represents the other backup buffers waiting to be filled in the Free Queue. Once a basket is filled, it is sent over to the Data Queue, which the other employee picks up one by one. That employee takes the contents of the basket and loads it into the truck. This is analogous to a CPU Writer Thread consuming a backup buffer and writing out to the backup target.

Note: The duration of transferring backup buffers from one queue to another is effectively nil. “Transfer” is probably not the best word choice here as I don’t believe any actual data movement occurs.

Backup Buffers

To help accelerate our backup operations, we can make two adjustments to our backup buffers – adding more backup buffers and increasing the size of the backup buffers.

Add More Baskets

First, let’s explore adding additional backup buffers. That’d be like having more than seven baskets to pass blocks back and forth in the diagram above. In a perfect system, both employees would be processing baskets at perfectly equal speeds. But the real world isn’t perfect – one employee may be much faster than the other.

Scenario 1

Let’s pretend the first employee is the slower one. They can fill a basket at a rate of 1 basket every 10 seconds. The second employee can process a basket at a rate of 1 every 5 seconds. So when they begin work, the second employee is sitting idle waiting for a basket to arrive. It is able to process the received basket quickly, pass it back, then sits and waits for the first employee to complete filling another basket.

Scenario 2

Conversely, if the first employee can fill a basket at a rate of 1 per second, they’ll have sent all 7 available baskets to the other employee after 7 seconds. The second employee still takes 5 seconds to process a filled basket, but the first employee will have to wait 4 seconds between receiving free baskets after they get going. The second employee now has a backlog of baskets to process because it is much slower than the first employee.

Image
Time lapse chart to help illustrate state of the Reader and Writer thread at different time intervals.

Check out the time lapse chart. One thing that is very clear is that due to the different processing rates, either the Reader or Writer will incur some idle time. Imagine what this might look like if the number of buckets was larger or smaller? And imagine what this might look like if the processing rates were orders of magnitude different?

Use Bigger Baskets

The other adjustment that can be made is to make each basket bigger. Depending on the rate of filling or processing a basket, making all baskets larger can increase overall throughput of the process. If we doubled the size of the basket, the fill rate and process rate would still increase proportionally. So in the first scenario above, the fill rate would be 1 basket every 20 seconds and the processing rate would be 1 basket every 10 seconds. And the second scenario, we’d see a fill rate of 1 basket every 2 seconds and a processing rate of 1 basket every 10 seconds.

Are you seeing a bit of a “so what’s the benefit, why bother?” conundrum here? Hold that thought for a bit.

What Are the T-SQL Parameters?

Skipped over the analogy to the good stuff? Here you go:

  • BUFFERCOUNT (BC) – Define the number of Backup Buffers to be used
  • MAXTRANSFERSIZE (MTS) – Define the size of each Backup Buffer

MAXTRANSFERSIZE can range anywhere from 64KB to 4MB. Frankly, I’m not sure what the allowed range of values is for BUFFERCOUNT. But because of the workflow above, there is a point where having too many Backup Buffers are a waste, because neither side can fill or process them fast enough.

Additionally, there’s a resource price to pay in the form of RAM consumed. Very simply, your BACKUP operation will consume BUFFERCOUNT * MAXTRANSFERSIZE in RAM. So if you run a BACKUP with BUFFERCOUNT = 10 and MAXTRANSFERSIZE = 1MB, your operation will consume 10MB of RAM. But if you crank things up, BUFFERCOUNT = 100 and MAXTRANSFERSIZE = 4MB, that’s now 400MB of RAM for your operation.

What about the existing SQL Server Buffer Pool? Backup Buffers are allocated outside of buffer pool memory. If you don’t have enough RAM available, SQL Server will shrink the buffer pool as needed to accommodate your backup buffer requirement. (I plan to talk about overall resource balancing in a future part of this series.)

What’s the Benefit?

So what’s the overall benefit of messing with these two parameters. If one is solely leveraging these two parameters and no other adjustments, not much really. The main bottleneck your BACKUP operation will experience is due to only having 1 Reader and 1 Writer thread. So adding a few more Backup Buffers can help your overall performance a bit, but as you saw with the time lapse examples, if you have different processing rates on one side or the other, the extra backup buffers don’t really help – they’ll just sit in one of the queues waiting on the CPU threads.

It is far more beneficial to make adjustments to BUFFERCOUNT and MAXTRANSFERSIZE if you also make adjustments to the number of Reader and/or Writer threads for your BACKUP operation. And that’s what I’ll be covering in Part 3 of this series, so stay tuned! Thanks for reading!

SQL Server Backup Internals – Part 1

Recently, I had the pleasure of delivering a new presentation called How to Accelerate Your Database Backups for MSSQLTips.com. This blog series is intended to be a companion piece, particularly for those who prefer to read content instead of watching a video. And this series is focused exclusively on FULL BACKUP operations. I might talk DIFFs and LOG BACKUPs another time.

Blog Series Table of Contents

They Just Work

For many of us, native BACKUPs on SQL Server “just work.” Many are content to use BACKUP with defaults and as long as errors aren’t thrown, don’t think much beyond that. But not all of us have the luxury of adequate BACKUP maintenance windows, especially when faced with the ever-growing sizes of our databases.

So to speed up our BACKUP operations, we turn to using backup compression, backup file striping, and two parameters: BUFFERCOUNT and MAXTRANSFERSIZE. Many have been blogging about these four things for many years now. But what I find interesting is that very few of those blogs have really explored WHY each brings a performance benefit for BACKUP operations.

That ends to today.

But How Do They Work?

In the simplest example, a BACKUP operation has the following players:

  • a database source volume
  • a backup output file
  • a reader thread
  • a writer thread
  • a set of backup memory buffers
Image
Note: Notice Database VOLUMES, not Database Data Files. You’ll learn why shortly.

Here is the basic flow of a BACKUP operation:

  1. The Reader Thread will read data from a database file, into an empty Backup Buffer in the Free Queue.
  2. Once the Backup Buffer is full, it will be moved into the Data Queue.
  3. The Reader Thread will start filling up another Backup Buffer in the Free Queue (if available), otherwise it must wait until the Free Queue has an empty Backup Buffer available.
  4. The Writer Thread monitors the Data Queue and once it sees that a filled Backup Buffer has entered the Data Queue, it will transfer the contents to a Backup Device.
  5. Once the Backup Buffer is emptied, it is moved back to the Free Queue to be populated again by the Reader Thread.

When you run a basic BACKUP command without any optional parameters, you will always get 1 reader thread and 1 writer thread. The number of Backup Buffers varies depending on the Backup Device type (DISK, URL, TAPE). The most commonly used is DISK, which will get 7 Backup Buffers, each Buffer being 1MB in size.

Low Overhead

It’s interesting to note that BACKUP’s resource utilization is relatively low. A BACKUP to DISK with no optional parameters operation requires 2 CPU threads (1 Reader, 1 Writer), and 7MB of RAM (7 Backup Buffers, 1MB each). My understanding is that this was architected this way intentionally, long ago when production servers were less powerful than my smartphone, to ensure that a BACKUP operation has minimal impact when running.

Give Me More Power!

As data can be moved between 1 Reader, 7 Backup Buffers, and 1 Writer, BACKUP can be rather slow. And these days, our databases are terabytes in size, with limited maintenance windows to complete them. Thankfully, we have a number of different parameters that will allow us to increase the horsepower behind a BACKUP operation, which I’ll write in Part 2 of this BACKUP Internal series!

So stay tuned for Part 2 and thanks for reading!

Data De-Duplication is Awesome

TL;DR

Image

I had the honor of recording not one, not two, but THREE lightboard videos for Pure Storage! And they’re up on YouTube for you to enjoy!

For today’s blog post, I want to focus our attention on the first Lightboard video: How Volumes work on FlashArray and a discrete benefit to FlashArray’s architecture. If you’re not yet familiar with that, I strongly suggest you read “Re-Thinking What We’ve Known About Storage” first.

Silly Analogy Time!

As many of you know from my presentations, I pride myself on silly analogies to explain concepts. So here’s a silly one that I hope will help to explain a key benefit of our “software defined” volumes.

My wife Deborah (b|t) and I have a chihuahua mix named Sebastian. And like many dog owners, we take way too many of pictures of our dog. So let’s pretend that Deborah and I are both standing next to one another and we each snap photos of the dog on our respective smartphones.

Sebastian wasn’t quite ready for for me, but posed perfectly for Deborah (of course)!

We also have a home NAS. Both Deborah and I have our own volume shares on the NAS and each of our smartphones automatically backs up to each of our respective shares. So once on the NAS, each photo would consume X megabytes per photo, right? That’s what we’re all used to.

Software Saves Space!

Now let’s pretend that instead of a regular consumer NAS, I was lucky enough to have a Pure Storage FlashArray as my home NAS. One of its special super powers is that it deduplicates data, not just within a given volume, but across all volumes across the entire array! So for example, if I have 3 different copies of the same SQL Server 2022 ISO file on the NAS, just in different volumes, FlashArray would dedupe that down to one underneath the covers.

But FlashArray’s dedupe is not just done at the full file level – it goes deeper than that!

So going back to our example, when we each upload our respective photos to our individual backup volumes, commonalities will be detected and deduplicated. Sebastian’s head is turned differently and he’s holding his right paw up in one photo but not the other. Otherwise, the two photos are practically identical.

So instead of having to store two different photos, it can singly store the identical elements of the photo as a “shared” canvas and plus the distinct differences. (If your mind goes to the binary bits and bytes that comprise a digital photo, I’m going to ask you to set that aside and just think about the visual scene that was captured. This is just a silly analogy after all.)

How About Some Numbers?

Let’s say each photo is 10MB each, and 85% of the two photos’ canvas is shared while the remaining 15% is unique to each photo. If we had traditional storage, we’d need 20MB to store both photos. But with deduplication technology, we’d only need to store 11.5MB! That breaks down as 8.5MB (shared: photo 1 & 2) + 1.5MB (unique: photo 1) + 1.5MB (unique: photo 2). That’s a huge amount of space savings, by being able to consolidate duplicate canvases!

Think At Scale

As I mentioned earlier, Deborah and I take TONS of photos of our dog. Most happen to be around our house. And because Sebastian is a jittery little guy, we’ll take a dozen shots just to try to get one good one out of the batch. And if we had data deduplication capabilities on our home NAS, that’d translate to a huge amount of capacity savings that is otherwise wasted storing redundant canvases.

What About the Real World?

Silly analogies aside, what does this look like in the real world? On Pure Storage FlashArray, a single SQL Server database get an average of 3.5:1 data reduction (comprised of data deduplication + compression), but that ratio skyrockets as you persist additional copies of a given database (ex: client federated dbs, non-prod copies on staging, QA, etc. ). If your databases are just a few hundred gigabytes, you might not care. But once you start getting into the terabyte range, the data reduction savings starts to add up FAST.

Wouldn’t you be happy if your 10TB SQL Server database only consumed 2.85TB of actual capacity? I sure would be. Thanks for reading!