Summary: in this tutorial, you will learn how to use the SQLite CUME_DIST() function to calculate the cumulative distribution of a value within a group of values.
Introduction to SQLite CUME_DIST() Function
The CUME_DIST() is a window function that returns the cumulative distribution of a value relative to the values in the group.
Here’s the syntax of the CUME_DIST() function:
CUME_DIST() OVER (
[PARTITION BY partition_expression]
[ORDER BY order_list]
)Code language: SQL (Structured Query Language) (sql)In this syntax:
- The
PARTITION BYclause specifies how the rows are grouped into partitions to which theCUME_DIST()function applies. If you skip thePARTITION BYclause, the function treats the whole result set as a single partition. - The
ORDER BYclause specifies the order of rows in each partition to which theCUME_DIST()function applies. If you omit theORDER BYclause, the function returns 1 for all rows.
Suppose N is the value of the current row of the column specified in the ORDER BY clause and the order of rows is from low to high, the cumulative distribution of a value is calculated using the following formula:
The number of rows with values <= N / The number of rows in the window or partition
Code language: SQL (Structured Query Language) (sql)The return value of the CUME_DIST() function is greater than 0 and less than or equal to 1:
0 < CUME_DIST() <= 1Code language: SQL (Structured Query Language) (sql)The rows with the same values receive the same result.
SQLite CUME_DIST() function example
First, create a new table named products for the demonstration:
CREATE TABLE products(
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
color TEXT NOT NULL,
price REAL NOT NULL
);Code language: SQL (Structured Query Language) (sql)Second, insert some rows into the products table:
INSERT INTO products
(name, color, price)
VALUES
('A', 'red', 100),
('B', 'red', 200),
('C', 'red', 200),
('D', 'black', 300),
('E', 'black', 400),
('F', 'white', 500);Code language: SQL (Structured Query Language) (sql)Third, query data from the products table:
SELECT * FROM products;Code language: SQL (Structured Query Language) (sql)Output:
id | name | color | price
---+------+-------+------
1 | A | red | 100.0
2 | B | red | 200.0
3 | C | red | 200.0
4 | D | black | 300.0
5 | E | black | 400.0
6 | F | white | 500.0
(6 rows)Fourth, calculate the cumulative distribution of the price in the products table:
SELECT
name,
CUME_DIST() OVER (
ORDER BY
price
) PriceCumulativeDistribution
FROM
products;Code language: SQL (Structured Query Language) (sql)Here is the output:
name | PriceCumulativeDistribution
-----+----------------------------
A | 0.16666666666666666
B | 0.5
C | 0.5
D | 0.6666666666666666
E | 0.8333333333333334
F | 1.0
(6 rows)Because we skipped the PARTITION BY clause, the function treated the whole result set as a single partition. Therefore, the number of rows to be evaluated is 6.
The following example uses the CUME_DIST() function to calculate the cumulative distribution of prices partitioned by colors:
SELECT
name,
color,
price,
CUME_DIST() OVER (
PARTITION BY color
ORDER BY price
) PriceCumulativeDistribution
FROM
products
ORDER BY color;Code language: SQL (Structured Query Language) (sql)Output:
name | color | price | PriceCumulativeDistribution
-----+-------+-------+----------------------------
D | black | 300.0 | 0.5
E | black | 400.0 | 1.0
A | red | 100.0 | 0.3333333333333333
B | red | 200.0 | 1.0
C | red | 200.0 | 1.0
F | white | 500.0 | 1.0
(6 rows)Summary
- Use the
CUME_DIST()function to calculate the cumulative distribution of value in a set of values.