PARTITION BY Clause in PostgreSQL
Last Updated :
23 Jul, 2025
In PostgreSQL, the PARTITION BY clause plays an important role in dividing datasets into partitions so that various window functions can efficiently operate on those partitions.
In this guide, we will cover the syntax, examples, and the advantages of using the PARTITION BY clause, making it a handy tool for working with PostgreSQL table partitioning.
PARTITION BY Clause in PostgreSQL
Using the PARTITION BY Clause we need to calculate row numbers, rank employees based on salary, or calculate cumulative totals. PARTITION BY allows us to perform these operations on subsets of data without losing the dataset's integrity. It is particularly useful for PostgreSQL partitioning of large datasets and making queries faster.
Syntax:
window_function() OVER (PARTITION BY column_name ORDER BY column_name)
key terms
- window_function(): This can be any window function like ROW_NUMBER(), RANK(), SUM(), etc.
- PARTITION BY column_name: Defines the column(s) by which the data will be partitioned.
- ORDER BY column_name: Specifies the order in which rows within each partition are processed by the window function.
Why Use PARTITION BY in PostgreSQL?
Partitioning data with the PARTITION BY clause is beneficial in various scenarios:
- Data Analysis: When we need to group data for window functions like
ROW_NUMBER(), RANK(), or SUM(), PARTITION BY is essential to analyze each group or partition separately.
- Performance Optimization: It enhances query performance by breaking down large datasets into smaller, more manageable chunks.
- Efficient Querying: Helps PostgreSQL efficiently retrieve and process only relevant data within each partition, reducing query time and load on the database.
Examples of PARTITION BY Clause in PostgreSQL
Let's look at a few examples to demonstrate how the PARTITION BY clause works in PostgreSQL. Suppose we have a table called employees with the following data. This table shows the employee_id (automatically generated by the SERIAL type), their department, and their salary.
Query:
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
department VARCHAR(50),
salary NUMERIC
);
INSERT INTO employees (department, salary) VALUES
('HR', 50000),
('HR', 60000),
('IT', 70000),
('IT', 80000),
('Sales', 55000),
('Sales', 65000);
Output:
| employee_id | department | salary |
|---|
| 1 | HR | 50000 |
| 2 | HR | 60000 |
| 3 | IT | 70000 |
| 4 | IT | 80000 |
| 5 | Sales | 55000 |
| 6 | Sales | 65000 |
Example 1: Using PARTITION BY RANGE in PostgreSQL
In this example, we will partition employees by department and assign row numbers based on salary in descending order. We want to assign a row number to each employee within their respective department.
Query:
SELECT employee_id, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_number
FROM employees;
Output:
| employee_id | department | salary | row_number |
|---|
| 2 | HR | 60000 | 1 |
| 1 | HR | 50000 | 2 |
| 4 | IT | 80000 | 1 |
| 3 | IT | 70000 | 2 |
| 6 | Sales | 65000 | 1 |
| 5 | Sales | 55000 | 2 |
Explanation:
In this query, the PARTITION BY department groups the data by department, and within each partition, the rows are assigned a unique row number based on salary in descending order. This allows for separate row numbering in each department.
Example 2: Using PARTITION BY LIST for Cumulative Totals
In this example, we will calculate the cumulative salary for employees within each department. This operation helps in determining cumulative data within a partitioned subset.
Query:
SELECT employee_id, department, salary,
SUM(salary) OVER (PARTITION BY department ORDER BY salary) AS cumulative_salary
FROM employees;
Output:
| employee_id | department | salary | cumulative_salary |
|---|
| 1 | HR | 50000 | 50000 |
| 2 | HR | 60000 | 110000 |
| 3 | IT | 70000 | 70000 |
| 4 | IT | 80000 | 150000 |
| 5 | Sales | 55000 | 55000 |
| 6 | Sales | 65000 | 120000 |
Explanation:
In this query, the SUM() function calculates the cumulative salary for employees within each department. The PARTITION BY department ensures that the cumulative sum is calculated within each department, while the ORDER BY salary sorts the employees by their salaries before applying the cumulative sum.
Example 3: Using PARTITION BY HASH in PostgreSQL
Using the PARTITION BY HASH, we can split data based on hash values. This method is useful when we want to evenly distribute rows across partitions. For example, we could partition employee records based on their department's hash value:
Query:
CREATE TABLE employees_partitioned_by_hash (
employee_id SERIAL PRIMARY KEY,
department VARCHAR(50),
salary NUMERIC
)
PARTITION BY HASH (department);
CREATE TABLE employees_hr PARTITION OF employees_partitioned_by_hash
FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE employees_it PARTITION OF employees_partitioned_by_hash
FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE employees_sales PARTITION OF employees_partitioned_by_hash
FOR VALUES WITH (MODULUS 3, REMAINDER 2);
Explanation:
In this case, the table is PARTITION BY HASH, with each partition based on the hash value of the department. This approach helps in distributing data more evenly across partitions.
Conclusion
PARTITION BY clause is another important feature in PostgreSQL when using window functions as it creates partitions from a dataset to carry out operations and analysis on personal subsets of data. With this clause, we are in a position to get analytical details of the results without affecting the whole database.
Explore
Basics
Database Operations
Data Types
Querying Tables
Table Operations
Modifying Data
Conditionals
Control Flow
Transactions & Constraints
JOINS & Schemas