Automatically convert contained database to non-contained for replication

This article was recently published on dev.getroadmap.com:

 
 
In one of my previous posts, I described how to setup replication to an Azure SQL database. This works like a charm, and I still highly recommend using this when you want to migrate data from an on-premise server (or Azure VM) to a Azure SQL db (PaaS).

But in our environment, we use SQL Server 2016 and contained databases for some of our datasets. Unfortunately (but totally understandable), you can’t setup replication from a contained database. So how do you deal with this? For our use-case, I’ve written a script to automatically change the database from contained to non-contained. And because I’m probably not the only one who needs to do this, I’d like to share it with you.

 
Steps to take
It might sound a bit difficult, but when you think about it, it’s actually quite easy. To go from a contained database with a user to a non-contained database with a login, you need to take the following steps:

 
1) Duplicate user as login on server level
2) Drop user on database level
3) Alter database to non-contained
4) Add login to database

 
The script
Before you start using this, I want to warn you. Using this script on your server(s) or in production is at your own risk. This worked for me, but maybe it doesn’t on your server(s)!

In order to let this script work, you must be able to stop applications and queries from executing on your database. In our case, we can just stop the service(s) that use a specific database (that’s the advantage of micro services). The reason for this is that you must be able to obtain an exclusive lock on the database, in order to switch from contained to non-contained.

If you can manage this, the script below could work for you too:

--====================================================================================================
/* 1 - Duplicate user on instance level as login */

USE master
GO

CREATE LOGIN [Login non-contained database] WITH PASSWORD=[Password]', DEFAULT_DATABASE=[master], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF
GO

--====================================================================================================
/* 2 - Drop user on database level */

USE [Contained database name]
GO

DROP USER [User contained database]
GO

--====================================================================================================
/* 3 - Alter database to non-contained */ 

USE master
GO

ALTER DATABASE [Contained database name] SET CONTAINMENT = NONE WITH NO_WAIT
GO

--====================================================================================================
/* 4 - Add login to database */

USE [Non-contained database name]
GO

CREATE USER [User non-contained database] FOR LOGIN [Login non-contained database]
GO

ALTER ROLE [db_datareader] ADD MEMBER [User non-contained database]
GO

ALTER ROLE [db_datawriter] ADD MEMBER [User non-contained database]
GO

--====================================================================================================

 
Or, to make it easier to read, an example with actual names:

--====================================================================================================
/* 1 - Duplicate user on instance level as login */

USE master
GO

CREATE LOGIN [Login_RW] WITH PASSWORD=N'Password123!', DEFAULT_DATABASE=[master], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF
GO

--====================================================================================================
/* 2 - Drop user on database level */

USE ContainedDatabase
GO

DROP USER Login_RW
GO

--====================================================================================================
/* 3 - Alter database to non-contained */ 

USE master
GO

ALTER DATABASE ContainedDatabase SET CONTAINMENT = NONE WITH NO_WAIT
GO

--====================================================================================================
/* 4 - Add login to database */

USE ContainedDatabase
GO

CREATE USER Login_RW FOR LOGIN Login_RW
GO

ALTER ROLE [db_datareader] ADD MEMBER Login_RW
GO

ALTER ROLE [db_datawriter] ADD MEMBER Login_RW
GO

--====================================================================================================

 
Conclusion
Even though I thought that using a contained database could be a big blocking factor for us in the process of migrating data to Azure, it really wasn’t that big of a deal. Especially now that we automated the process, it doesn’t add more than 5 minutes to the process of replicating data to Azure.

Easy pivot: From key-value pairs to columns

This article was recently published on dev.getroadmap.com:

 

If there is one thing I pretty much hate doing in T-SQL it’s PIVOT and UNPIVOT. Even though I used it a few times in the last couple of years, it’s an adventure to find out how it works every time. And I know a lot of people struggle with this part of T-SQL, so let’s take a look at a (hopefully) simple example.

 
Key-Value pairs
The idea of using key-value pairs to store data isn’t new, but with the rapid development of cloud solutions like Azure and the hype around NoSQL databases, using key-value pairs to store data got a big boost. Especially developers (in my experience) love using key-value pair to store their data, because it’s easy for them to consume the data in an application. But it gives the database professional an extra challenge because we’re used to retrieve columns with values instead of a record per value. So how can we turn those key-value pairs into rows?

 
Flights
To show you how to change key-value pairs to columns, let’s create an example based on flights. If one of our customers needs to travel and they take the plane, there is some basic information we need to show them the flight status in the app. So let’s create a table to store that dataset:

CREATE TABLE dbo.Flights
(
	FlightId UNIQUEIDENTIFIER,
	[Key] VARCHAR(255),
	[Value] VARCHAR(255)
)

 
and insert a flight:

INSERT INTO dbo.Flights
	(FlightId, [Key], [Value])
VALUES
	('2A0C8B05-682A-41CE-8516-C6070CD92851', 'FlightId', '3074e015-62b2-4f76-a8b1-463c53cd79c5'),
	('2A0C8B05-682A-41CE-8516-C6070CD92851', 'AirlineIATACode', 'VY'),
	('2A0C8B05-682A-41CE-8516-C6070CD92851', 'FlightNumber', '8336'),
	('2A0C8B05-682A-41CE-8516-C6070CD92851', 'DepartureAirportCode', 'RTM'),
	('2A0C8B05-682A-41CE-8516-C6070CD92851', 'DepartureAirportName', 'Rotterdam The Hague Airport'),
	('2A0C8B05-682A-41CE-8516-C6070CD92851', 'DepartureDateTime', '2016-08-12 12:15'),
	('2A0C8B05-682A-41CE-8516-C6070CD92851', 'DepartureTerminal', '1'),
	('2A0C8B05-682A-41CE-8516-C6070CD92851', 'Gate', NULL),
	('2A0C8B05-682A-41CE-8516-C6070CD92851', 'ArrivalAirportCode', 'BCN'),
	('2A0C8B05-682A-41CE-8516-C6070CD92851', 'ArrivalAirportName', 'Barcelona–El Prat'),
	('2A0C8B05-682A-41CE-8516-C6070CD92851', 'ScheduledArrivalDateTime', '2016-08-12 14:14'),
	('2A0C8B05-682A-41CE-8516-C6070CD92851', 'FlightIsCancelled', '0')

 
So now we have a dataset that consists of 12 key-value pairs:

Image

 
Converting to columns
But when you’re running a query you would like to have columns instead of rows. So by using a PIVOT statement, you can convert these rows into columns:

SELECT
	FlightId,
	AirlineIATACode,
	FlightNumber,
	DepartureAirportCode,
	DepartureAirportName,
	DepartureDateTime,
	DepartureTerminal,
	Gate,
	ArrivalAirportCode,
	ArrivalAirportName,
	ScheduledArrivalDateTime,
	FlightIsCancelled
FROM
	(
		SELECT
			FlightId,
			[Key],
			[Value] 
		FROM dbo.Flights
	) AS SourceTable
PIVOT
	(
	MIN([Value])	--Needs to be an aggregate function
	FOR [Key] IN
		(
			AirlineIATACode,
			FlightNumber,
			DepartureAirportCode,
			DepartureAirportName,
			DepartureDateTime,
			DepartureTerminal,
			Gate,
			ArrivalAirportCode,
			ArrivalAirportName,
			ScheduledArrivalDateTime,
			FlightIsCancelled
		)
	) AS PivotOutput

 
Because the keys are always the same for flights, I can add those Key-names in the PIVOT statement. But because the PIVOT statement needs an aggregate function to retrieve the value, I needed to add the MIN() function but this doesn’t change the output in any way.

Now we have the same dataset, but converted into rows:

Image

 
So from now on, converting key-value pairs to columns shouldn’t be a problem anymore!

Running maintenance on Azure SQL databases

This article was recently published on dev.getroadmap.com:

 

To keep your data healthy and topfit, we all know you need to run something called database maintenance on your databases. This prevents things like performance problems or unrecoverable data from happening, and that is sort of the core business of DBA’s. And when we look at how this can be performed on a regular basis, an on-premise instance is a bit easier to maintain than an Azure SQL database. That’s why I want to share our experience with you on that, hoping that it can save you some time when you start implementing this.

 
Platform setup
In order to get a better understanding of why certain choices were made, I want to share a very basic version of the setup of our environment.

Our platform consists of 2 mayor parts: a private cloud part (that we see as “on-premise”), and an Azure part. Those 2 parts combined make our platform, and this platform hosts all of our services, applications, databases, etc. The development approach we use is everything should be designed with cloud-first in mind, but only if it’s the right tool for the job, and with the exclusion of some data.

The databases we use in Azure are all what Microsoft calls “Azure SQL databases”. This means that there are no virtual machines of any kind are running on our Azure-part of the platform, also known as DBaas (Database as a Service).

When I draw this platform, and only focus on the data part, it will look something like this:

Image

 
One of the advantages of this setup is that we can leverage the power of both parts of the platform. As you’ll see later on in this blog, there are some things that we (need to) run from the on-premise instances and some things fully on Azure.

 
Big shoutout to Ola
Before I’m going into detail, I want to give full kudos to Ola Hallengren (Website | @olahallengren). He has spend a lot of his time to build a SQL Server Maintenance Solution that is completely free for everyone to use. And he did such an excellent job a lot companies (also huge companies) use his solution to run maintenance tasks on their databases.

None of the scripts below are written by me, but only small changes are made in order to make things more clear when the solution is deployed to an environment. The original scripts can be downloaded via the download page on Ola’s website.

 
Backups & Integrity check
Taking backups of your database and making sure there is no corruption in the datafiles is an essential part of the maintenance solution written by Ola. But because Azure SQL databases have a build-in maintenance solution (for example backups: full backups weekly, differentials hourly, and transaction log every 5 minutes, source), we don’t need to worry about that ourselves.

 
Index maintenance & Update Statistics
Indexes and statistics are the core of your performance-based maintenance processes. These make sure your queries run fast, and should provide you with a stable and predictable performance. This is especially needed on an Azure database, since you can’t monitor it like you would with an on-premise database.

Because Azure SQL databases are run on shared hardware that you can’t monitor yourself, Microsoft provides us with a number of different performance counters that we can use to check the status/health of our databases. The most important counters are CPU usage, Data IO, Log IO and DTU usage (a combination of the previously mentioned counters). The DTU counter is the most abstract (to me at least), because it’s explained by Microsoft as:

 

The Database Transaction Unit (DTU) is the unit of measure in SQL Database that represents the relative power of databases based on a real-world measure: the database transaction. We took a set of operations that are typical for an online transaction processing (OLTP) request, and then measured how many transactions could be completed per second under fully loaded conditions.

For example, a Premium P11 database with 1750 DTUs provides 350x more DTU compute power than a Basic database with 5 DTUs.

 
And for me, who is was used to monitoring physical hardware, that is a bit of a different approach when digging into performance-related issues. And it’s not that index and statistics maintenance isn’t important when you work on a on-premise database, but it’s a slightly bigger challenge to monitor the direct effects of a index rebuild or statistics update.

But because every Azure SQL database is a contained database, you need to deploy the stored procedures from Ola’s solution to every single database. So to keep it clear for everyone which table and stored procedures belong to the maintenance solution, I’ve changed Ola’s scripts slightly to create all objects in a specific schema named “dba”. So first of all, let’s create the schema:

CREATE SCHEMA dba
GO

 
And then create the used to log all of the maintenance commands and their outcome:

CREATE TABLE dba.CommandLog
	(ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_CommandLog PRIMARY KEY CLUSTERED,
	 DatabaseName sysname NULL,
	 SchemaName sysname NULL,
	 ObjectName sysname NULL,
	 ObjectType CHAR(2) NULL,
	 IndexName sysname NULL,
	 IndexType TINYINT NULL,
	 StatisticsName sysname NULL,
	 PartitionNumber INT NULL,
	 ExtendedInfo XML NULL,
	 Command NVARCHAR(MAX) NOT NULL,
	 CommandType NVARCHAR(60) NOT NULL,
	 StartTime DATETIME NOT NULL,
	 EndTime DATETIME NULL,
	 ErrorNumber INT NULL,
	 ErrorMessage NVARCHAR(MAX) NULL)
GO

 
Now that these are created, you can create the stored procedure that execute the actual index and statistics maintenance. They are too long to post here as code snippet, but you can download the CommandExecute script here, and the IndexOptimze script here.

But because we want to schedule these procedures later on, I decided to create an additional stored procedure in every database, that is deployed alongside the maintenance objects:

CREATE PROCEDURE dba.ExecuteMaintenance
AS


EXECUTE dba.IndexOptimize
 @Databases = '<Insert database name>',
 @FragmentationLow = 'INDEX_REORGANIZE',
 @FragmentationMedium = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
 @FragmentationHigh = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
 @FragmentationLevel1 = 5,
 @FragmentationLevel2 = 30,
 @UpdateStatistics = 'ALL',
 @LogToTable = 'Y',
 @Execute = 'Y'

GO

 
This allows us to run maintenance with specific options on the different databases.

Scheduling
After this was all set up, I needed to come up with a way to run these procedures on a schedule. And as you might know, Azure SQL databases don’t have a SQL Server Agent so that’s were our on-premise platform comes in handy. Just for this I created a new virtual machine in our private cloud, and installed SQL Server on that machine to utilize the SQL Server Agent. This server (operations server) runs all of the scheduled operational jobs (including maintenance and some monitoring jobs) on our Azure environment.

But to run a proces from this operations machine on one of our Azure databases I needed to create a linked server first:

EXEC sp_addlinkedserver 
 @server=N'ServerName__DatabaseName', 
 @srvproduct=N'Azure SQL Db',
 @provider=N'SQLNCLI', 
 @datasrc=N'<SERVERNAME>,1433',
 @catalog='DatbaseName';
 GO

EXEC sp_addlinkedsrvlogin 
 @rmtsrvname = 'ServerName__DatabaseName', 
 @useself = 'FALSE', 
 @locallogin=NULL,
 @rmtuser = '<USERNAME>',
 @rmtpassword = '<PASSWORD>'
 GO

 
And all that’s left now is to create a SQL Server Agent job, that executes the “ExecuteMaintenance” stored procedure on the Azure database:

EXEC [<LinkedServerName>].<DatabaseName>.dba.ExecuteMaintenance

 
The reason I choose this approach is that I didn’t want to lose any time on figuring out how the Azure Scheduler or Automation works, at the moment I was implementing the maintenance solution.

But if you want to have a good resource on that, please check out Pieter Vanhove (Blog | @Pieter_Vanhove) blog, because he wrote a great blog post about that.

 
Conclusion
When I look at how this solution is set up, I’m the first one who admits that this isn’t a perfect or ideal solution. But in the end, this gives the rest of the team a clear solution when they need to start or restart the maintenance process: Just log in to the operations server, start the SQL Server Management Studio (SSMS), open the SQL Server Agent jobs, find the job associated with the database they want to run the maintenance on, and that’s it. But for future scalability and maintainability, we might need to implement another solution to do this.

On which port is SQL Server running?

Earlier this week, a good friend asked me the simple question: “How can I check on which port my SQL Server is currently running?” There are multiple ways, so let’s take a look at the ways you can find out the answer!

 
SQL Server Error Log
It’s a misconception that only error messages are logged in the SQL Server error log. There are also informational messages logged, which you can use to find the port on which SQL Server is currently running:

xp_readerrorlog 0, 1, N'Server is listening on'

 
The result:

Image

 
SQL Server Configuration Manager
The SQL Server configuation manager is a tool which is installed alongside SQL Server. In this tool, you can change for example the TCP/IP settings of your SQL Server. It also shows you the current port on which SQL Server is running:

Image

 
DMV
You can also query the system DMV’s:

SELECT DISTINCT 
    local_tcp_port
FROM sys.dm_exec_connections
WHERE local_tcp_port IS NOT NULL

or

SELECT local_tcp_port
FROM   sys.dm_exec_connections
WHERE  session_id = @@SPID

 
Command prompt
Finding the used port via a command prompt requires some more information. First, you need to find the process ID (PID) that SQL Server is using. The quickest way is to use the Task Manager in Windows. Once you have the PID, you can run the commmand below:

“netstat -ano | findstr [PID]”

The result:

Image

 
Registry
The port number can also be found in the registry. Just remember, that the folder you see in the screenshot below contains my instance name (SQL2014DEV). Change this to your instance name!

Image

 
You can also use the “master.dbo.xp_regread” stored procedures to read this registry key from T-SQL:

DECLARE @TCPPort NVARCHAR(5),
        @RegKeyName VARCHAR(8000);

SET @RegKeyName = CONCAT('Software\Microsoft\Microsoft SQL Server\',
                         @@SERVICENAME,
                         '\MSSQLServer\SuperSocketNetLib\TCP')

EXEC xp_regread
  @rootkey = 'HKEY_LOCAL_MACHINE',
  @key = @RegKeyName,
  @value_name = 'TcpPort',
  @value = @TCPPort OUTPUT;

SELECT @TCPPort;

 
Event Viewer
SQL Server also logs the port in the Windows Event Viewer. If you open the application log, and you filter on eventid 26022, you’ll see the port that is in use:

Image

 
PowerShell
After I posted this blog, Johan Bijnens (@alzdba) sent me a message: I forgot the PowerShell option!

#tcpport.ps1
#Evaluates the SQL Server instances on a Windows server and returns the TCP port number used by each instance
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SMO")  | Out-Null
$m = New-Object ('Microsoft.SqlServer.Management.Smo.WMI.ManagedComputer') 'HOME'
$m.ServerInstances | ForEach-Object { $m.Name + '\' + $_.Name + ', ' +
       $m.ServerInstances[$_.Name].ServerProtocols['Tcp'].IPAddresses['IP1'].IPAddress.IPAddressToString + ':' +
       $m.ServerInstances[$_.Name].ServerProtocols['Tcp'].IPAddresses['IPAll'].IPAddressProperties['TcpDynamicPorts'].Value
       }

 
Thanks Johan for reminding me on this! The full code can be found on MSDN.

Find SQL Server Agent job history retention with T-SQL

In SQL Server, the Job Agent has its own retention that you can set. This retention determines how much execution history of SQL Server Agent jobs is being kept. But if you have a lot of instances across your domain, or if you reinstall a new server, how do you determine if this value is set correctly?

 
Doing it old-school: via the GUI
You can check the SQL Server agent retention via the GUI, by right-clicking on the SQL Server Agent, and clicking on properties. In the window that popped-up, click on History, and you’ll see something like this:

Image

 
T-SQL / Registry
The way I wanted to check it is with T-SQL. Because I want to run the statement against a 40+ instances, I use a multi-server query for that. The reason I do that, is that this returns all the results in a single table, which makes reviewing and comparing the results a lot easier.

But there doesn’t seem to be a way to query the set values. But with a little bit of searching, I found out that you use the stored procedure below to change one of the values:

EXEC msdb.dbo.sp_set_sqlagent_properties @jobhistory_max_rows = 999

 
So by dissecting this procedure, I found out SQL Server just reads the value from a registry key. I took out the piece that I need to read the retention values:

DECLARE @jobhistory_max_rows INT = NULL,
        @jobhistory_max_rows_per_job INT = NULL


EXECUTE master.dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE',
                                        N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent',
                                        N'JobHistoryMaxRows',
                                        @jobhistory_max_rows OUTPUT,
                                        N'no_output'
SELECT @jobhistory_max_rows = ISNULL(@jobhistory_max_rows, -1)


EXECUTE master.dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE',
                                        N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent',
                                        N'JobHistoryMaxRowsPerJob',
                                        @jobhistory_max_rows_per_job OUTPUT,
                                        N'no_output'


SELECT @jobhistory_max_rows, @jobhistory_max_rows_per_job

 
But when I wanted to look up the values in the registry, to see if it returned the correct results, I couldn’t find the actual keys. I thought I was losing my mind, but there’s a catch…

 
xp_instance_regread vs xp_regread
There are 2 stored procedures you can use to read registry keys from your SQL Server:

DECLARE @jobhistory_max_rows INT = NULL

EXECUTE master.dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE',
                                        N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent',
                                        N'JobHistoryMaxRows',
                                        @jobhistory_max_rows OUTPUT,
                                        N'no_output'

SELECT @jobhistory_max_rows

 
or

DECLARE @jobhistory_max_rows INT = NULL

EXECUTE master.dbo.xp_regread N'HKEY_LOCAL_MACHINE',
                                        N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent',
                                        N'JobHistoryMaxRows',
                                        @jobhistory_max_rows OUTPUT,
                                        N'no_output'

SELECT @jobhistory_max_rows

 
The difference between the two is perfectly explained in this Stack Overflow thread:

xp_regread reads the literal registry path that you specify. xp_instance_regread “converts” the path you specify so that it matches the instance of SQL Server that you’re currently using.

 
So if you run multiple instances on your machine, you want to use the instance version of the stored procedure. If you’re just running 1 (default) instance, you can use both versions.

So eventually, I searched for the instance folder, and I did find the registry keys:

Image

 
Conclusion
Especially for very large environments, it’s a good idea to write system- and/or server-checks in T-SQL. That way, you can script them in SCOM (or another monitoring tool), or if you use SQL Server Policy Based Management (and I would really like to recommend this Pluralsight course from a very good friend of mine: Boris Hristov (Blog | @BorisHristov)!), or just a multi-server query.

Restoring or Moving a database with CDC enabled

When you have CDC enabled on your database, and you want to move it on another instance or restore it somewhere, you need to take some extra steps to make sure CDC is transferred correctly. When asking myself the question how this works, I decided to do some testing.

 
Create test resources
In order to test this, we need to create some test resources. I’ve created a new database, added a table and enabled CDC:

CREATE DATABASE [CDC_DB]
 ON PRIMARY
(NAME = N'CDC_DB', FILENAME = N'C:\Databases\#Backup\CDC_DB.mdf')
 LOG ON
(NAME = N'CDC_DB_log', FILENAME = N'C:\Databases\#Backup\CDC_DB_log.ldf')
GO

USE CDC_DB
GO

CREATE TABLE dbo.Customer
  (CustomerID INT CONSTRAINT PK_Customer PRIMARY KEY IDENTITY(1,1),
   FirstName VARCHAR(50),
   LastName VARCHAR(50))
GO

EXEC sys.sp_cdc_enable_db
GO

EXEC sys.sp_cdc_enable_table
  @source_schema = 'dbo',
  @source_name = 'Customer',
  @supports_net_changes = 1,
  @role_name = NULL,
  @index_name = 'PK_Customer'
GO


INSERT INTO dbo.Customer
  (FirstName, LastName)
VALUES
  ('John', 'Doe')
GO

Now that we have the resources, we can take a backup of the database:

BACKUP DATABASE CDC_DB TO DISK = 'C:\Databases\#Backup\CDC_DB.bak'

 
Restoring the backup
To show you the default restore doesn’t work, let’s restore the backup next to the original database:

RESTORE DATABASE CDC_DB_RESTORE FROM DISK='C:\Databases\#Backup\CDC_DB.bak'
WITH
   MOVE 'CDC_DB' TO 'C:\Databases\#Backup\CDC_DB.mdf',
   MOVE 'CDC_DB_log' TO 'C:\Databases\#Backup\CDC_DB_log.ldf'

If we now try to select data from the change table of CDC, we get an error message:

Image

 
But CDC was in the old database, so why wasn’t it restored? If you look for a solution, Books Online / MSDN doesn’t say anything about this. But luckily other community members blogged about this before (see this and this article). There is an option you can add to your restore:

USE master
GO

DROP DATABASE CDC_DB_RESTORE
GO

RESTORE DATABASE CDC_DB_RESTORE FROM DISK='C:\Temp\#BackupTest\CDC_DB.bak'
WITH
   MOVE 'CDC_DB' TO 'C:\Temp\#BackupTest\CDC_DB.mdf',
   MOVE 'CDC_DB_log' TO 'C:\Temp\#BackupTest\CDC_DB_log.ldf'
, KEEP_CDC
GO

 
If you add “KEEP_CDC” to your backup statement, it will also restore CDC. But that’s not your only option.

 
Detach and Attach

Another option you have is to detach your database, copy the files to another location, and attach the database again. Let’s detach the database first:

USE master
GO
ALTER DATABASE [CDC_DB_RESTORE] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
EXEC master.dbo.sp_detach_db @dbname = N'CDC_DB_RESTORE'
GO

 
Now that’s done, we can copy the files to another directory if needed. You could also just attach it on the same instance:

USE master
GO
CREATE DATABASE [CDC_DB_RESTORE] ON 
( FILENAME = N'C:\Temp\#BackupTest\CDC_DB.mdf' ),
( FILENAME = N'C:\Temp\#BackupTest\CDC_DB_log.ldf' )
 FOR ATTACH
GO

 
This also allows you to use CDC like nothing happened. The database is still CDC enabled, the change tables are there, and the captured data is still in the change table. But there’s one thing missing when you move your CDC database to another instance: the capture and cleanup jobs.

 
CDC Jobs
So how do we recover the jobs? Script them from one instance, and create them on the other instance? There’s a better way to do that. You can just run the statement below to add the jobs to your instance:

USE CDC_DB_RESTORE
GO
EXEC sys.sp_cdc_add_job 'capture'
GO
EXEC sys.sp_cdc_add_job 'cleanup'
GO

 
And if you move your database, you can also clean your instance, and run the statement below to remove the leftover jobs:

USE CDC_DB_RESTORE
GO
EXEC sys.sp_cdc_drop_job @job_type = N'capture'
GO
EXEC sys.sp_cdc_drop_job @job_type = N'cleanup'
GO

 
It is possible, but…
Moving or restoring a database with CDC gives you a few challenges, and even though there are some issues, it’s not impossible. But before you start, you need to think about the approach you need to take, in order to move/restore a database without losing CDC. Depending on your environment and possibilities, you might want to use a specific option, or perhaps can’t use the option you would like. There is a way around, but I would recommend you to test it first, before you run this on production databases!

T-SQL Tuesday #56 – Assumptions

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Assumptions “. If you want to read the opening post, please click the image below to go to the party-starter: Dev Nambi (Blog | @DevNambi).


Image

 
This months topic is about assumptions. A few years back, I worked in a team that consisted of mainly .NET developers. Every time we mentioned “I think so…”, “I assume it works like this…” or “I think we should…”, one of them used the quote: “Assumption is the mother of all f*ck ups!”, which is a quote from the movie Under Siege 2: Dark Territory. But he was right. The moment you assume something, it’s going to blow up in your face in the end.

 
I tested it, and it works
Working for larger organizations should mean they are more prepared to certain things then smaller organizations. But I’ve seen large organization being badly prepared, or just plain unprepared. They assume their processes work, or they will never encounter failure at all.

One of the companies I worked for, took backups every night. Full backups. Of databases there were between 100 GB and 500+ GB… And they never tested a restore… Why? They used the default maintenance functionality with “Verify Backup Integrity” enabled, and they never needed to restore a database before. So they only took backups because management wanted that. They didn’t understand why, because their processes never failed, and would never fail in the future.

But one day, it wasn’t their processes that failed. A LUN went offline during the ETL process, and SQL Server naturally detected that. SQL Server put a database into suspect mode, because of database corruption. But because there were no backups, they needed to move to plan B: process about 2+ years of data (stored in XML files) again.

Eventually I solved it and recovered the database without the need of a restore, but it scared them. They now saw why they needed backups, and why they needed to test the restore on a regular basis. But they forgot about it after a few days, and we never got the time to change the maintenance processes or test any restore. After that, I made the best choice possible in my opinion: I found myself a new challenge.

 
If you don’t know what you’re talking about…
Another example of assumption I have seen a lot over the years, is people explaining stuff to other people, without any proper knowledge about a certain subject. I can recall a conversation between me and an intern. He was a .NET developer, and had some questions about how a T-SQL feature worked. Another junior BI developer started laughing, when the intern asked his question. “What a stupid question, everybody knows the answer to that!” he said. Kind of irritated by that, I asked him to provide the answer to that question. He didn’t want to. I asked him again: “you answer the question, because you laughed about it, and I want to hear the answer from you.”

He started to stutter, and he explained the functionality all wrong. When I explained it the right way, the .NET intern thanked me, and walked away with his new knowledge, ready to bring it into practice. The BI developer wanted to continue the discussion. “You’re all wrong! That feature doesn’t work that way!”. I nicely told him, that I used this feature on a daily basis, and that he was wrong. The discussion went on a little more, but I stopped the discussion by telling him: “I’m doing this for a number of years now, and working with SQL Server is what I do. You just started, and wrote your first query a few months ago. If you find any resources that show me being wrong, I’ll be happy to quit my job. Until then, please don’t explain T-SQL to other people if you don’t know what you’re talking about.”. Until this day, he never got back to me on this discussion.

 
I don’t need to check that!
Another great assumption you’ll see in several companies, are IT people that trust their own automation a little bit too much. The rule in IT is that if you need to do something more than once or a couple of times, you need to automate it. Automation is a good thing, and it can save you time. But who checks if your process doesn’t fail? You don’t want to build a system, that checks another system for you. One of the things I’ve seen is a developer that created an automated process, that checked a log table on a database server, and mailed new errors to the developer. Looking at this, it’s a perfect solution. You don’t have to monitor the log table by yourself, but an automated process does that for you.

At one moment, an application seemed to fail. It threw exceptions, and the end-users weren’t able to do anything with the application. The developer was called, and he told the users to contact the system administrator, because it must have been a server- or hardware problem. The system administrator called the developer after a few minutes, and told him the server and hardware were in perfect condition. The developer insisted his software wasn’t failing, because he didn’t receive any errors by email. But after a quick check, the developer came to the conclusion his automated process failed. The developer lost a lot of credits because of this attitude. As you see, this is another example of an assumption that went wrong.

 
Never stop asking questions
One of the most important things I wanted to show you with this blog post, is that if you don’t know the answer to a question, don’t be afraid to ask someone. The same goes for processes, tools, functionality, or any other question you want to ask. If people mock you for asking questions, they are the ones that are wrong. You’re just trying to learn and grow, so don’t feel bad about yourself!

Image

Using FOR XML to create arrays, JSON and JavaScript

Every new project brings its own challenges and pitfalls. New projects also give you a chance to be creative in finding solutions, and try new things. This keeps it interesting, and allows you to learn new things. And sometimes, the things you avoided for all these years, turn out to be your best option.

 
A missing feature in SQL Server
For a project, we needed to generate JSON or JavaScript from T-SQL. Even though the whole world seems to use JSON at this point, there is no feature to generate JSON from T-SQL or SQL Server. There are more than enough people who would like such a feature (just look at the Connect item made for this). Maybe they’ll put it in the vNext of SQL Server, who knows. But at this moment, we need to generate the JSON ourselves.

 
Generating the data array
First of all, we need to generate an array or list. The problem with this, is the number of rows. For example, if you only have 2 rows per product, you can join the table onto itself, and concatenate the columns like that. But what if you have more rows per product, or you don’t know how many rows per product are returned?

One of the ways to resolve this, is by using the FOR XML clause. You can use this in a CROSS APPLY, to create an array/list of data, grouped by another column.

So let’s create an example, by using airlines. When you fly to another state, country or continent, you’ll book a flight with an airline. Those airlines have a name, a departure airport code, and of course a ticket price. Let’s put this data in a table:

CREATE TABLE #TMP_AirlineTickets
  (ID INT IDENTITY(1,1),
   Airline VARCHAR(10),
   Departure_Airport_Code VARCHAR(5),
   Price DECIMAL(10,4))


INSERT INTO #TMP_AirlineTickets
  (Airline, Departure_Airport_Code, Price)
VALUES
  ('BA', 'RTM', 10.00),
  ('KLM', 'AMS', 125.00),
  ('BA', 'LHR', 15.00),
  ('KLM', 'BCN', 50.00),
  ('KLM', 'BHX', 75.00)

 
When we query this data, we could easily use a sub-select to query. But because I want my query to be readability, and good performing, I used the FOR XML clause:

SELECT
  AT.Airline,
  STUFF((SELECT
            CONCAT(',', SUB.Departure_Airport_Code)
         FROM #TMP_AirlineTickets SUB
         WHERE SUB.Airline = AT.Airline
         ORDER BY SUB.Departure_Airport_Code ASC
         FOR XML PATH('')
        ), 1, 1, '') AS Array_Airport_Codes
FROM #TMP_AirlineTickets AT
WHERE 1 = 1
GROUP BY AT.Airline

 
What happens in this query? The FOR XML clause creates an XML for the airline that is selected in the outer query. So a query like this is executed:

SELECT
  SUB.Departure_Airport_Code
FROM #TMP_AirlineTickets SUB
WHERE SUB.Airline = 'KLM'
ORDER BY SUB.Departure_Airport_Code ASC
FOR XML PATH('')

 
The result from that query is an XML column:

Image

 
By concatenating a character to that XML column, the XML nodes are removed. This gives us the possibility to use this as an array:

SELECT
  CONCAT(',', SUB.Departure_Airport_Code)
FROM #TMP_AirlineTickets SUB
WHERE SUB.Airline = 'KLM'
ORDER BY SUB.Departure_Airport_Code ASC
FOR XML PATH('')

 
The result of this statement looks like this:

Image

 
The next step is to use this in a query that creates an XML column for all airlines in the table:

SELECT
  AT.Airline,
  (SELECT
            CONCAT(',', SUB.Departure_Airport_Code)
         FROM #TMP_AirlineTickets SUB
         WHERE SUB.Airline = AT.Airline
         ORDER BY SUB.Departure_Airport_Code ASC
         FOR XML PATH('')
        ) AS Array_Airport_Codes
FROM #TMP_AirlineTickets AT
WHERE 1 = 1
GROUP BY AT.Airline

 
The result is an array of airport departure codes:

Image

 
As you can see, this creates the array/list for all airlines in the table. With STUFF I replace the first character, because I don’t want my result to start with a comma. This can be done with a RIGHT clause, that USES LEN – 1 to strip of the first character on the left, but this looks better in my opinion:

SELECT
  AT.Airline,
  STUFF((SELECT
            CONCAT(',', SUB.Departure_Airport_Code)
         FROM #TMP_AirlineTickets SUB
         WHERE SUB.Airline = AT.Airline
         ORDER BY SUB.Departure_Airport_Code ASC
         FOR XML PATH('')
        ), 1, 1, '') AS Array_Airport_Codes
FROM #TMP_AirlineTickets AT
WHERE 1 = 1
GROUP BY AT.Airline

 
The result looks the same, except the first comma is gone:

Image

 
One of the best things when creating demos like this for a blog post, is that you can make it as unrealistic as you want. What I mean by that, is I’m making it easy to generate XML, in order to keep the examples clear and understandable. But what if you want to use this in the real world? You’re probably seeing datasets that aren’t as easy to use as the demo above. The strings you want to concatenate are filled with special characters, etc. Luckily there’s a solution for that. Rob Farley (Blog | @rob_farley) wrote a really nice blog post about that.

 
So now we have our array/list ready, but how do we convert this to JSON or JavaScript? The main difference (sorry if I insult my developer friends with this statement), is that in JavaScript everything needs to be enclosed in double quotes (“). This isn’t needed if you create JSON. In the following example, I’ll create a JSON object. But because just airport codes isn’t what we need, I’ll adjust the query so it’ll add the prices to the object as well. This is done by an additional CONCAT in the XML select, which concatenates the airport code and the price into 1 string:

SELECT
  AT.Airline,
  CONCAT('{"ap":[',
         STUFF(
               (SELECT
                    CONCAT(',{"', SUB.Departure_Airport_Code, '":', SUB.Price, '}')
                FROM #TMP_AirlineTickets SUB
                WHERE SUB.Airline = AT.Airline
                ORDER BY SUB.Departure_Airport_Code ASC
                FOR XML PATH('')
                ), 1, 1, '')
         , ']}') AS JSON
FROM #TMP_AirlineTickets AT
WHERE 1 = 1
GROUP BY AT.Airline

 
The GROUP BY can be replaced by a DISTINCT. But either one of those is needed to make sure you’re only returning unique records:

 
Image

 
By parsing the second result in the query in an online tool, you can see that the result is valid JSON:

 
Image

 
By adding some double quotes in the script above, you can easily generate JavaScript from T-SQL. But if you just want to use the FOR XML to generate arrays, this will also work perfectly.

 
Why would you want to use this?
Generating JSON or JavaScript might come in handy for webdevelopers, that can get data from a database, and use that directly on their website. By using the FOR XML clause to generate JSON, you make sure your queries are flexible enough to generate valid JSON, even if your dataset grows. Generating arrays like this can also come in handy for developers that need data from their databases. Instead of creating an array in code, they can now use the array that is returned from the database.

So even though I tried to work my way around using FOR XML, it turned out to be the best option thinkable!

Change data through CTE’s

Even though I’ve written a number of blog posts about CTE (Common Table Expression), there’s always more functionality to discover. Last week I read an article about updating values in a table through a CTE, and I decided to try it out and see how far I could take this.

 
How to use a CTE
A question I get a lot is “When should you use a CTE?”. Personally I use a CTE to create a temporary resultset, that I can use as a source for another query. For example, when I need to transform a datetime column into a separate year, month and day columns, that I need as separate columns in my following query, I use a CTE to create a resultset I can use in my second query. Inserting the transformed data into a a temporary table is a little bit overkill, so a CTE is the easiest option. But there are a lot of different ways to use a CTE.

 
Update
Updating your source data through a CTE sounds like a long shot. Normally I prefer to update my source tables directly, but updating my source data through a CTE might make your queries more readable in some cases. So how does this work? Let’s look at an example, using the AdventureWorks2012 database. Using a CTE, we’re only selecting one specific employee: Angela. Because she worked some extra time, we’re updating here vacation hours, and increase them with 2 extra hours:

WITH EmployeeCTE AS
(
	SELECT
		NationalIDNumber,
		LoginID,
		JobTitle,
		VacationHours
	FROM HumanResources.Employee
	WHERE NationalIDNumber = 563680513
)


UPDATE EmployeeCTE
SET VacationHours += 2

 
If we select the record in the source table after the update, we see that we updated the record. The number of vacation hours is updated from 92 to 94:

Image

 
Insert data
Besides updating, inserting data is also possible. Inserting a record is as easy as updating a record:

WITH CurrencyCTE AS
(
	SELECT
		CurrencyCode,
		Name,
		ModifiedDate
	FROM Sales.Currency
	WHERE CurrencyCode = 'TST'
)


INSERT INTO CurrencyCTE
	(CurrencyCode, Name, ModifiedDate)
VALUES
	('TST', 'Test Currency', '2014-06-01')

 
If we now search for the source table for the currency code we just created, you’ll see that it’s inserted through the CTE:

Image

 
This can shorten your insert statements a lot. If you have a table with 10 columns for example, and you only want to insert 4 values in the table, you can select these columns in a CTE, and just insert those columns. On the other hand, this can make your queries unreadable, because you hide your source table in your query. So if you use this in combination with source control, it can be hard to track down where you update or insert a record in any given table.

 
Delete data
Deleting data from a CTE is just as easy. As an example, let’s delete the currency we created:

WITH CurrencyCTE AS
(
	SELECT
		CurrencyCode,
		Name,
		ModifiedDate
	FROM Sales.Currency
	WHERE CurrencyCode = 'TST'
)


DELETE
FROM CurrencyCTE
WHERE CurrencyCode = 'TST'

 
The limits
Both inserting and updating data through a CTE can be a powerful help in making your queries shorter and easier to read. But it can also backfire on you. Updating your source tables through a CTE can also make your queries hard to read, or even impossible to decipher, when you look at it a few weeks later. But what are the technical limits?

Updating a CTE that contains a join is not a problem. Except when you want to update a column in multiple tables at once:

Image

 
The same goes for an insert. If you want to insert data in 2 or more tables at once, that’s impossible through a CTE.

Another limit you can encounter, is updating columns that aren’t actual source data. For example, if you try to update a column that is transformed with a LEFT() function. If you select the LEFT(CurrencyCode, 2) in the CTE, and you try to update that, you’ll see this error:

Image

 
Because the field is a derived column (a computed/transformed version of the source column), you can’t update it. So if you try to update a transformed column, you should use an update statement on your source table.

 
Should you use it?
Even though this can be a cool feature to use, you must be careful with this. If you use CTE’s to alter data, you should be aware of the fact that your queries can become unreadable or hard to decipher. And even though you might find it readable, your colleagues might have other ideas about that. Also, as mentioned earlier, it might make it harder for you to find all inserts/updates on a specific table in your source control system.

 
If you want to read more about this #SQLCOOP topic, don’t forget to check out these blog posts:

Julie Koesmarno: Context Aware And Customised Drillthrough
Mickey Stuewe: Data Models, SQL Server, SQLite, and PowerShell
Chris Yates: Alerts – Who Knew?

Design a site like this with WordPress.com
Get started