DuckDB has a string_agg() function, which allows us to concatenate strings from a group of rows into a single string. This function can be useful when we need to aggregate text data in a meaningful way.
In this article, we’ll explore how the string_agg() function works, along with some simple examples to demonstrate its usage.
What is the string_agg() Function?
The string_agg() function is an aggregate function that concatenates values from multiple rows into a single string. It is similar to the group_concat() function in other RDBMSs, such as MySQL.
The syntax for the string_agg() function is as follows:
string_agg(expression, delimiter)
So the function takes two arguments:
- The
expressionto concatenate (usually a column name). - A
delimiterthat separates the concatenated values.
Sample Data
To demonstrate how string_agg() works, let’s create a sample table and populate it with some data. We’ll use a table called employees that stores information about employees and their departments.
CREATE TABLE employees (
id INTEGER,
name VARCHAR,
department VARCHAR
);
INSERT INTO employees (id, name, department) VALUES
(1, 'Alvin', 'HR'),
(2, 'Barbara', 'HR'),
(3, 'Hesham', 'Engineering'),
(4, 'Cuong', 'Engineering'),
(5, 'Axel', 'Marketing'),
(6, 'Frady', 'Marketing'),
(7, 'Blanche', 'Marketing');
Example 1: Basic Usage of string_agg()
Let’s start with a simple example. Suppose we want to concatenate the names of all employees in the employees table, separated by a comma.
SELECT
string_agg(name, ', ') AS all_employee_names
FROM employees;
This query will produce the following result:
+-----------------------------------------------------+
| all_employee_names |
+-----------------------------------------------------+
| Alvin, Barbara, Hesham, Cuong, Axel, Frady, Blanche |
+-----------------------------------------------------+
Here, the string_agg() function concatenates all the name values from the employees table, separating them with a comma and a space.
Example 2: Grouping Data with string_agg()
We can combine string_agg() with the GROUP BY clause to get separate concatenated strings, based on their grouping. For example, we can concatenate the names of employees within each department:
SELECT
department,
string_agg(name, ', ') AS employee_names
FROM employees
GROUP BY department;
This query produces the following result:
+-------------+----------------------+
| department | employee_names |
+-------------+----------------------+
| Engineering | Hesham, Cuong |
| HR | Alvin, Barbara |
| Marketing | Axel, Frady, Blanche |
+-------------+----------------------+
In this example, the string_agg() function concatenates the names of employees within each department, and the GROUP BY clause ensures that the results are grouped by department.
Example 3: Custom Delimiters
The delimiter used in string_agg() can be any string. Let’s use a custom delimiter, such as a semicolon followed by a space, to separate the names.
SELECT
department,
string_agg(name, '; ') AS employee_names
FROM employees
GROUP BY department;
Result:
+-------------+----------------------+
| department | employee_names |
+-------------+----------------------+
| HR | Alvin; Barbara |
| Engineering | Hesham; Cuong |
| Marketing | Axel; Frady; Blanche |
+-------------+----------------------+
And here’s one that uses a hyphen as the separator:
SELECT
department,
string_agg(name, '-') AS employee_names
FROM employees
GROUP BY department;
Result:
+-------------+--------------------+
| department | employee_names |
+-------------+--------------------+
| Engineering | Hesham-Cuong |
| Marketing | Axel-Frady-Blanche |
| HR | Alvin-Barbara |
+-------------+--------------------+
Example 4: Ordering the Concatenated Values
You can also control the order in which the values are concatenated by using the ORDER BY clause within the string_agg() function.
For instance, let’s concatenate the names in alphabetical order within each department.
SELECT
department,
string_agg(name, ', ' ORDER BY name) AS employee_names
FROM employees
GROUP BY department;
Result:
+-------------+----------------------+
| department | employee_names |
+-------------+----------------------+
| Marketing | Axel, Blanche, Frady |
| Engineering | Cuong, Hesham |
| HR | Alvin, Barbara |
+-------------+----------------------+
In this case, the names are concatenated in alphabetical order within each department.
Example 5: Handling NULL Values
By default, string_agg() ignores NULL values. However, if you want to include a placeholder for NULL values, you can use the coalesce() function to replace NULL with a default value before concatenation.
Let’s modify our employees table to include a NULL value:
INSERT INTO employees (id, name, department) VALUES
(8, NULL, 'HR');
First, we’ll see how string_agg() handles it by default:
-- Set a string for null values so we can see if it appears
.nullvalue 'null'
-- Run the query
SELECT
department,
string_agg(name, ', ') AS employee_names
FROM employees
GROUP BY department;
Result:
+-------------+----------------------+
| department | employee_names |
+-------------+----------------------+
| Engineering | Hesham, Cuong |
| HR | Alvin, Barbara |
| Marketing | Axel, Frady, Blanche |
+-------------+----------------------+
We can see that the NULL value doesn’t turn up in the result. This is because string_agg() skipped it.
In this case, I specifically specified .nullvalue 'null' so that the string null would appear if the null value was in fact going to appear. I did this to eliminate the possibility that an empty string appears for null values (which is the default).
Now let’s use the coalesce() function to output a value in place of the NULL value:
SELECT
department,
string_agg(coalesce(name, 'Unknown'), ', ') AS employee_names
FROM employees
GROUP BY department;
Result:
+-------------+-------------------------+
| department | employee_names |
+-------------+-------------------------+
| Engineering | Hesham, Cuong |
| HR | Alvin, Barbara, Unknown |
| Marketing | Axel, Frady, Blanche |
+-------------+-------------------------+
This time the NULL value in the name column is replaced with 'Unknown' before concatenation.
Conclusion
The string_agg() function in DuckDB is a handy tool for concatenating strings from multiple rows into a single string.
Whether you need to aggregate names, generate comma-separated lists, or handle NULL values, string_agg() provides a simple and efficient way to achieve your goals. By combining it with GROUP BY and ORDER BY, we can create useful queries that transform our data into meaningful, aggregated strings.