Detecting and Filtering Special Characters Using PATINDEX() and LIKE in SQL Server

Working with real-world data often means dealing with messy strings. It’s common to find values that contain unexpected special characters. Sometimes this is due to user input, sometimes it’s from imports or third-party sources.

Either way, when we need to find and filter these special characters, SQL Server gives us some handy tools to work with. For starters, there’s the LIKE operator, which anyone who’s used SQL would be familiar with. But there’s also the PATINDEX() function, which performs a slightly different task.

Read more

Using AVG() with DISTINCT in SQL Server

When working with averages in SQL Server, it’s easy to assume that AVG() just takes all rows into account and calculates a simple mean. And that’s true. By default, AVG() includes every value in the column you point it to. But sometimes, you may want to average only unique values in that column, which is where DISTINCT comes into play.

Let’s explore this with a simple example.

Read more

Comparing COL_LENGTH() and DATALENGTH() in SQL Server

SQL Server has a COL_LENGTH() function and a DATALENGTH() function that could easily be confused for doing the same thing. They both have “length” in their name, and they do indeed return a “length”. But the length returned is different for each function.

If you’ve ever wondered why DATALENGTH() gives you different numbers than COL_LENGTH(), read on to find out.

Read more

Why Your SQL Server Averages Keep Losing Decimals

At first glance, calculating an average in SQL Server seems straightforward. Just wrap a column in the AVG() function and you’re done. But there’s a subtle catch when working with integer columns. If you pass an integer to AVG() the result will be an integer, even if the actual average includes a fractional part. If you’re not aware of this when calculating averages, you could potentially draw the wrong conclusion from your query results.

Let’s unpack the behavior and then see how we can fix it.

Read more

A Quick Look at SQL Server’s MIN() Function

The MIN() function in SQL Server returns the smallest value from a set of rows. It’s commonly used to find earliest dates, lowest prices, or in general the minimum of any column. While the function itself is simple, you may encounter it written with options like DISTINCT, ALL, or as a window function with OVER(). Some of these options don’t actually change the result in SQL Server but exist for standards compatibility, so it’s worth understanding what they mean if you ever see them in code.

Let’s take a look at a few simple examples to see how it works.

Read more

Using SUM() and AVG() with GROUP BY in SQL Server

When working with data, we often need to roll up numbers by categories. For example, calculating sales totals by region, or averaging test scores by class. SQL Server’s SUM() and AVG() functions can work perfectly for this scenario when combined with the GROUP BY clause. This combo can provide quick insights without having to do the math yourself. Let’s walk through how this works with an example.

Read more

Using MAX() in SQL Server

The MAX() function is one of SQL Server’s simplest aggregate functions. It returns the largest value from a column. While it’s usually straightforward, there are a few useful ways to apply it depending on whether you’re using it as a plain aggregate or as a window function with OVER().

You might also see MAX() that use a DISTINCT clause. Truth be told, this doesn’t actually change the results. That clause is only for standards compatibility.

In any case, let’s walk through some examples to see how it all works.

Read more

A Quick Look at SQL Server’s DATETRUNC() Function

SQL Server 2022 introduced the DATETRUNC() function, which makes working with date and time values much easier. It trims (or “truncates”) a date/time value down to a specified part (like year, month, or week) while setting all smaller units to their starting value. This helps avoid the common hack of mixing DATEADD() and DATEDIFF() just to snap a timestamp to the beginning of a period.

In this article we’ll look at some examples that demonstrate how it works.

Read more

Understanding FORMATMESSAGE() in SQL Server

When you’re working with SQL Server, sometimes you don’t just want to throw an error. Sometimes you want to build a message you can actually use elsewhere. That’s where FORMATMESSAGE() comes in. Instead of immediately printing a message like RAISERROR does, FORMATMESSAGE() gives you the formatted string back so you can decide what to do with it. This could include logging it, storing it, displaying it, or simply passing it along.

In simple terms, you can think of it as a way to take a predefined message from sys.messages (or even a custom string you provide) and turn it into a neatly formatted output. This can be quite handy when you need more control over how messages are handled in your SQL workflows.

Read more

7 Ways to Extract Data from JSON in DuckDB

Most DuckDB distributions come with the JSON extension, and this extension is loaded upon first use. That means we can go ahead and run queries against JSON data right out of the box. One common task we’ll face when working with JSON is extracting data from within the JSON documents. This can include extracting scalar values, or extracting nested JSON from within the outer document.

DuckDB provides us with multiple ways to extract such data. The option we use will largely depend on our use case. Either way, here are seven options for extracting data from JSON documents in DuckDB.

Read more