Planet PostgreSQL

Zero-Downtime PostgreSQL Maintenance with pgEdge

Posted by Antony Pegg in pgEdge on 2025-12-12 at 06:02

PostgreSQL maintenance doesn't have to mean downtime anymore. With pgEdge's zero-downtime node addition, you can perform critical maintenance tasks like version upgrades, hardware replacements, and cluster expansions without interrupting production workloads. Your applications stay online. Your users stay connected. Your business keeps running.This capability is available across both single-primary deployments (with integration of the open source extension Spock) and globally distributed deployments (by default), giving you the same operational advantages whether you're running a single-region deployment or a globally distributed system. Now, for an incredibly quick and easy approach to zero-downtime node addition, you can use the pgEdge Postgres Control Plane (hosted on GitHub). This approach provides drastically simplified management and orchestration of Postgres databases using a declarative API, whether you're running a single-primary or a globally distributed deployment of PostgreSQL clusters. Spock and other high-availability components come built-in accompanying community PostgreSQL for quick database administration with simple commands.And because pgEdge and all associated components are 100% open source under the PostgreSQL license, using 100% core community PostgreSQL, you get to leverage the high-availability components that enable zero-downtime maintenance without vendor lock-in or compatibility concerns.

What Is the Spock Extension?

Spock is pgEdge's advanced logical replication extension for PostgreSQL that enables active-active (multi-master) replication in clusters with row filtering, column projection, conflict handling, and more. Even though Spock originated from earlier projects like pgLogical and BDR 1, it has seen enormous growth. Our dedicated team of PostgreSQL experts continues to push Spock forward, making it a high-performance, enterprise-grade replication system built for distributed environments.

Zero-Downtime Node Addition: Maintenance Without Interruption

Adding a new node [...]

VACUUM Is a Lie (About Your Indexes)

Posted by Radim Marek on 2025-12-11 at 23:30

There is common misconception that troubles most developers using PostgreSQL: tune VACUUM or run VACUUM, and your database will stay healthy. Dead tuples will get cleaned up. Transaction IDs recycled. Space reclaimed. Your database will live happily ever after.

But there are couple of dirty "secrets" people are not aware of. First of them being VACUUM is lying to you about your indexes.

The anatomy of storage🔗

When you delete a row in PostgreSQL, it is just marked as a 'dead tuple'. Invisible for new transactions but still physically present. Only when all transactions referencing the row are finished, VACUUM can come along and actually remove them - reclamining the space in the heap (table) space.

To understand why this matters differently for tables versus indexes, you need to picture how PostgreSQL actually stores your data.

Your table data lives in the heap - a collection of 8 KB pages where rows are stored wherever they fit. There's no inherent order. When you INSERT a row, PostgreSQL finds a page with enough free space and slots the row in. Delete a row, and there's a gap. Insert another, and it might fill that gap - or not - they might fit somewhere else entirely.

This is why SELECT * FROM users without an ORDER BY can return rows in order initially, and after some updates in seemingly random order, and that order can change over time. The heap is like Tetris. Rows drop into whatever space is available, leaving gaps when deleted.

Heap Page

When VACUUM runs, it removes those dead tuples and compacts the remaining rows within each page. If an entire page becomes empty, PostgreSQL can reclaim it entirely.

And while indexes are on surface the same collection of 8KB pages, they are different. A B-tree index must maintain sorted order - that's the whole point of their existence and the reason why WHERE id = 12345 is so fast. PostgreSQL can binary-search down the tree instead of scanning every possible row. You can learn more about the fundamentals of B-Tree Indexes and what makes th

[...]

pgBackRest PITR in Docker: a simple demo

Posted by Stefan Fercot in Data Egret on 2025-12-11 at 13:25

While moving production database workloads towards cloud-native (Kubernetes) environments has become very popular lately, plenty of users still rely on good old Docker containers. Compared to running PostgreSQL on bare metal, on virtual machines, or via a Kubernetes operator, Docker adds a bit of complexity, especially once you want to go beyond simple pg_dump / pg_restore for backups, upgrades, and disaster recovery.

A few years back, it was common to find open-sourced Dockerfiles from trusted companies bundling PostgreSQL with the extensions and tooling you needed. Most of those projects are now deprecated, as the ecosystem’s focus has shifted hard towards cloud-native patterns.

In many of my conversations with pgBackRest users, one theme comes up regularly: deploying pgBackRest for backups in Docker is straightforward, but restoring from those backups feels much harder. The usual advice was to “use a trusted Docker image and follow their guidelines”, but those images are mostly out-of-date now. These days you typically need to maintain your own image, and more importantly you need a reliable recovery playbook.

So I asked myself: how hard is point-in-time recovery (PITR) with pgBackRest for a PostgreSQL 18 Docker container? Turns out: not that hard, once you’ve seen it end-to-end.

This post is a small lab you can run locally. You’ll:

build a PostgreSQL 18 + pgBackRest image
take a full backup
create restore points
delete data on purpose
restore to the moment just before the delete

The tiny lab setup

The lab image is deliberately small. It just layers pgBackRest on top of the official PostgreSQL 18 image, adds a minimal config, and enables WAL archiving on first init.

Dockerfile

FROM postgres:18

# Install PGDG repository
RUN apt-get update && apt-get install -y \
wget \
gnupg \
lsb-release \
&& wget --quiet -O /usr/share/keyrings/postgresql-archive-keyring.asc https://www.postgresql.org/media/keys/ACCC4CF8.asc \
&& echo "deb [signed-by=/usr/share/keyri

[...]

Postgres 18 New Default for Data Checksums and How to Deal with Upgrades

Posted by Greg Sabino Mullane in Crunchy Data on 2025-12-11 at 13:00

In a recent Postgres patch authored by Greg Sabino Mullane, Postgres has a new step forward for data integrity: data checksums are now enabled by default.

This appears in the release notes as a fairly minor change but it significantly boosts the defense against one of the sneakiest problems in data management - silent data corruption.

Let’s dive into what this feature is, what the new default means for you, and how it impacts upgrades.

What is a data checksum?

A data checksum is a simple but powerful technique to verify the integrity of data pages stored on disk. It's like a digital fingerprint for every 8KB block of data (a "page") in your database.

Creation: When Postgres writes a data page (table and indexes) to disk, it runs an algorithm on the page's contents to calculate a derived, small value—the checksum.
Storage: This checksum is stored in the page header alongside the data.
Verification: Whenever Postgres reads that page back from disk, it immediately recalculates the checksum from the data and compares it to the stored value.

If the two values do not match, it means the data page has been altered or corrupted since it was last written. This is important because data corruption can happen silently. By detecting a mismatch, Postgres can immediately raise an error and alert you to a potential problem. Checksums are also an integral part of pgBackRest which uses these checksums to verify backups.

What is initdb and why does it matter?

The initdb command in Postgres is the utility used to create a new Postgres database cluster and initializes the data directory where Postgres stores all the permanent data. When you run initdb, it does things like:

create the directory structure
create the template databases like template1 and postgres
populate the initial system catalog tables
create the initial version of the server configuration files
enable and start keeping track of checkums

The syntax often looks something lik

[...]

PGIBZ 2025: An Event for the Postgres Community in Ibiza

Posted by Evan Stanton in Data Bene on 2025-12-11 at 00:00

Postgres Ibiza (PGIBZ): An open source conference designed to bring together people with a love for PostgreSQL in Ibiza, a relaxed place for fresh and innovative discussions. An international event run by the nonprofit PostgreSQL España.

This was the first time that the Data Bene team attended the event, and we’re happy to share that it was a very positive experience.

The Conference

Location and Venue

As its name suggests, this conference takes place on the Mediterranean island of Ibiza. For those who are less familiar, the Official Tourism Site provides a nice overview that you may refer to. While it should go without saying, this UNESCO World Heritage Site has incredible offerings outside of the conference itself. Combined with a potentially long-haul flight, it’s strongly recommended to plan an extra day or two (or more) to explore the island and enjoy the local experience.

The event itself is hosted at the Palacio de Congresos de Ibiza; a two-story conference center with multiple rooms and catering. Though located outside of the capital city, the venue can be reached by car in good timing (~15 minutes). To add, the venue is situated within walking distance of numerous hotels, restaurants, and a beach! We happened to have stayed at Hotel Tres Torres, which we can recommend, though admittedly it’s difficult to make a wrong decision, especially following a sprinkle of background research.

Briefly, on the topic of extracurriculars, our go-to recommendation would be to visit the capital city, Ibiza. There you will be able to find a surprisingly large amount of shopping, restaurants, and sightseeing. Adding to that a walk through the historic neighborhood and you’ve got quite the memorable experience.

Attendance

This year’s PGIBZ welcomed about 30-40 attendees from across the globe, including China and the United States. And while there are certainly two sides to the coin, I consider this group size as more of a positive. Indeed, during both the presentation Q/A sessions, as well

[...]

Introducing pg_clickhouse

Posted by David Wheeler on 2025-12-10 at 16:34

The ClickHouse blog has a posted a piece by yours truly introducing pg_clickhouse, a PostgreSQL extension to run ClickHouse queries from PostgreSQL:

While clickhouse_fdw and its predecessor, postgres_fdw, provided the foundation for our FDW, we set out to modernize the code & build process, to fix bugs & address shortcomings, and to engineer into a complete product featuring near universal pushdown for analytics queries and aggregations.

Such advances include:

Adopting standard PGXS build pipeline for PostgreSQL extensions

Adding prepared INSERT support to and adopting the latest supported

release of the ClickHouse C++ library

Creating test cases and CI workflows to ensure it works on PostgreSQL versions 13-18 and ClickHouse versions 22-25

Support for TLS-based connections for both the binary protocol and the HTTP API, required for ClickHouse Cloud

Bool, Decimal, and JSON support

Transparent aggregate function pushdown, including for ordered-set aggregates like percentile_cont()

SEMI JOIN pushdown

I’ve spent most of the last couple months working on this project, learning a ton about ClickHouse, foreign data wrappers, C and C++, and query pushdown. Interested? Try ou the Docker image:

docker run --name pg_clickhouse -e POSTGRES_PASSWORD=my_pass \
       -d ghcr.io/clickhouse/pg_clickhouse:18
docker exec -it pg_clickhouse psql -U postgres -c 'CREATE EXTENSION pg_clickhouse'

Or install it from PGXN (requires C and C++ build tools, cmake, and the openssl libs, libcurl, and libuuid):

pgxn install pg_clickhouse

Or download it and build it yourself from:

PGXN
GitHub

Let me know what you think!

[...]

What you should know about constraints in PostgreSQL

Posted by Gülçin Yıldırım Jelínek in Xata on 2025-12-10 at 12:00

In this blog, we explore Postgres constraints through the pg_constraint catalog, covering table vs. column constraints, constraint triggers, domains and more.

Building a RAG Server with PostgreSQL - Part 3: Deploying Your RAG API

Posted by Dave Page in pgEdge on 2025-12-10 at 09:24

In Part 1 we loaded our documentation into PostgreSQL. In Part 2 we chunked those documents and generated vector embeddings. Now it's time to put it all together with an API that your applications can use.
In this final post, we'll deploy the pgEdge RAG Server to provide a simple HTTP API for asking questions about your content. By the end, you'll have a working RAG system that can answer questions using your own documentation.

What the RAG Server Does

The RAG server sits between your application and the LLM, handling the retrieval part of Retrieval-Augmented Generation. When a query comes in, it:

Converts the query to a vector embedding

Searches for relevant chunks using both semantic (vector) and keyword (BM25) matching

Combines and ranks the results

Formats the top results as context for the LLM

Sends the context and query to the LLM

Returns the generated answer

This hybrid search approach - combining vector similarity with traditional keyword matching - tends to give better results than either method alone. Vector search catches semantically related content even when the exact words differ, while BM25 ensures you don't miss obvious keyword matches.

Prerequisites

Before we start, you'll need:

The database we set up in Parts 1 and 2, with documents and embeddings

An API key for your chosen LLM provider (Anthropic, OpenAI, or local Ollama)

Go 1.23 or later for building from source

Installing the RAG Server

Clone and build the server:This creates the binary at .

Configuration

The RAG server uses a YAML configuration file. Here's a basic setup:Save this as . Let's break down the key sections: - Where the API listens. Default is port 8080 on all interfaces. - Paths to files containing your API keys. Each file should contain just the key, nothing else. Make sure they have restrictive permissions (chmod 600). - This is where it gets interesting. A pipeline defines a complete RAG configuration: which database to query, which tables to sear[...]

PostGIS Performance: Simplification

Posted by Paul Ramsey in Crunchy Data on 2025-12-09 at 13:00

There’s nothing simple about simplification! It is very common to want to slim down the size of geometries, and there are lots of different approaches to the problem.

We will explore different methods starting with ST_Letters for this rendering of the letter “a”.

alt

SELECT ST_Letters('a');

This is a good starting point, but to show the different effects of different algorithms on things like redundant linear points, we need a shape with more vertices along the straights, and fewer along the curves.

alt

SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1);

Here we add in vertices every one meter with ST_Segmentize and ST_RemoveRepeatedPoints to thin out the points along the curves. Already we are simplifying!

Lets apply the same “remove repeated” algorithm, with a 10 meter tolerance.

alt

WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_RemoveRepeatedPoints(a, 10) FROM a;

We do have a lot fewer points, and the constant angle curves are well preserved, but some straight lines are no longer legible as such, and there are redundant vertices in the vertical straight lines.

The ST_Simplify function applies the Douglas-Peuker line simplification algorithm to the rings of the polygon. Because it is a line simplifier it does a cruder job preserving some aspects of the polygon area like squareness of the top ligature.

alt

WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_Simplify(a, 1) FROM a;

The ST_SimplifyVW function applies the Visvalingam–Whyatt algorithm to the rings of the polygon. Visvalingam–Whyatt is better for preserving the shapes of polygons than Douglas-Peuker, but the differences are subtle.

alt

WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_SimplifyVW(a, 5) FROM a;

Coercing a shape onto a fixed precision grid is another form of simplification, sometimes used to force the edges of adjacent objects

[...]

PostgreSQL High-Availability Architectures

Posted by Hans-Juergen Schoenig in Cybertec on 2025-12-09 at 08:25

PostgreSQL is highly suitable for powering critical applications in all industries. However, to run critical applications, there are key requirements which are absolutely needed: High-Availability and automatic failover.

This document explains which options are available and which problems one can solve with PostgreSQL. We have listed the most common setups and some of the most common recommendations.

Scenario 1: Automatic failover with Patroni

The first scenario has been designed to provide a solution to the following requirements and scenario:

Run servers on premise or on VMs
Automatically handle server errors
Make the cluster transparent to the application
Handle recovery automatically

This is a typical scenario which is commonly seen in real world deployments. An application has to rely on a highly available database and in case of error we simply want to switch to a backup system.

The solution in this case is a Patroni cluster. It uses distributed consensus based on “etcd” and fully automated failover as well as database recovery. The following image shows how this works:

The image shows a 3-node cluster. The primary server is able to handle read as well as write transactions. The remaining two servers will accept reads.

Advantages of this approach

This approach has a couple of advantages that are important to understand:

The primary server is a single source of truth
Replicas allow read scalability
A “Shared nothing infrastructure” is highly resilient
Easy to scale as needed
Replicas can exist in various locations

In general, achieving high availability is possible if you follow a simple design pattern: Keep it simple. The more complexity you add to the system, the more likely it is that something goes wrong or something impacts operations. Simplicity is a key advantage of this approach.

Transparent connection handling

When dealing with clusters, there is an important question which has to be answered: H

[...]

Building a RAG Server with PostgreSQL - Part 2: Chunking and Embeddings

Posted by Dave Page in pgEdge on 2025-12-09 at 06:30

In Part 1 of this series, we loaded our documentation into PostgreSQL using the pgEdge Document Loader. Our documents are sitting in the database as clean Markdown content, ready for the next step: turning them into something an LLM can search through semantically.In this post, we'll use pgEdge Vectorizer to chunk those documents and generate vector embeddings. By the end, you'll have a searchable vector database that can find relevant content based on meaning rather than just keywords.

What Are Embeddings and Why Chunk?

Before we dive in, let's quickly cover the concepts. are numerical representations of text that capture semantic meaning. Similar concepts end up close together in vector space, so "PostgreSQL replication" and "database synchronisation" would have similar embeddings even though they share few words. This is what makes semantic search possible. is necessary because embedding models have token limits (typically 8,000 tokens or so), and more importantly, smaller chunks provide more focused results. If you embed an entire 50-page document as one vector, searching for "how to create an index" might return that whole document when you really want just the relevant paragraph. Breaking documents into smaller pieces gives you more precise retrieval.

Why Vectorize in the Database?

You could run a separate service to generate embeddings, but pgEdge Vectorizer takes a different approach: it runs inside PostgreSQL as an extension. When you insert or update documents, triggers automatically chunk the text and queue it for embedding. Background workers process the queue asynchronously, calling your chosen embedding API and storing the results.This has several advantages:

No external service to deploy and manage

Embeddings stay in sync with your data automatically

Everything lives in one transactional system

You can use standard SQL for similarity search

Prerequisites

Before we start, you'll need:

PostgreSQL 14 or later (we set this up in Part 1)

The pgvector ex

[...]

PostGIS Day 2025 Recap: AI, Lakehouses and Geospatial Community

Posted by Paul Ramsey in Crunchy Data on 2025-12-08 at 17:23

On Nov. 20, the day after GIS Day, Elizabeth Christensen and I hosted the 7th annual PostGIS Day, a celebration of the Little Spatial Database That Could. Brought to you this year by Snowflake, the event featured an amazing collection of speakers from around the globe — from India to Africa, Europe to North America.

The themes this year were, if you can forgive the pun, all over the map! Moving beyond the generic hype of AI into the practical reality of agents, we saw tools from Felt, Carto and Bunting Labs that can actually write, execute and debug spatial SQL on the fly. We also saw the lakehouse architecture take center stage, with PostGIS acting less as a data silo and more as a high-performance connector to the vast world of object storage, Iceberg and GeoParquet via tools such as pg_lake and Apache Sedona.

But it wasn't all bleeding edge; we grounded the day in massive, industrial-scale success stories. From enabling mapping at IGN France for more than 20 years to powering State Farm’s critical disaster response platforms, PostGIS remains the bedrock of modern geospatial infrastructure. A full playlist of PostGIS Day 2025 is available on the Snowflake YouTube channel. Here's a deeper look at what we learned.

Spatial AI agents: The real deal

We've all heard the buzzwords, but this year, we saw the code. The concept of AI agents was everywhere, but crucially, it wasn't just chatbots hallucinating SQL. We saw systems designed to iterate.

Brendan Ashworth from Bunting Labs drilled down to what "agent" actually means in the geospatial context. It isn't just asking a question and getting an answer. It's a loop. The agent writes a query, runs it against PostGIS, reads the error message if it fails, corrects itself, and runs it again. He showed us Mundi, an AI-native WebGIS that treats PostGIS as its execution engine, solving complex spatial problems without a human holding its hand.

Jaime Sanchez and Mamata Akella from Felt showed how they are teaching Claude to think spatially. They are

[...]

PostgreSQL, MongoDB, and what “cannot scale” really means

Posted by Umair Shahid in Stormatics on 2025-12-08 at 15:58

Last week, I read The Register’s coverage of MongoDB CEO Chirantan “CJ” Desai telling analysts that a “super-high growth AI company … switched from PostgreSQL to MongoDB because PostgreSQL could not just scale.” (The Register)

I believe you can show the value of your own technology without tearing down another. That is really what this post is about.

I run a company that lives inside PostgreSQL production issues every day. We deal with performance, HA and DR design, cost optimisation, and the occasional “we are down, please help” call from a fintech or SaaS platform. My perspective comes from that hands-on work, not from a quarterly earnings script.

Scaling stories are always more complex than a soundbite

When a CEO says, “This AI customer could not scale with PostgreSQL,” a lot of details disappear:

What did the schema and access patterns look like?
How was the application managing connections and transactions?
What was running underneath? Cloud managed PostgreSQL, a home-rolled cluster, or something else?
Were vertical and horizontal scaling options fully explored?
Did the team have PostgreSQL specialists involved, or was everything on default settings?

None of that reduces MongoDB’s strengths. Document databases are excellent for certain workloads: highly variable document shapes, rapid iteration, and development teams that think in JSON first. MongoDB has earned its place.

My concern is with the narrative that PostgreSQL, as a technology, “cannot scale,” especially given that it is the most popular database among professional developers and has overtaken MongoDB in the DB-Engines ranking over the last decade. (DB-Engines)

The method for calculating this ranking is detailed here: https://db-engines.com/en/ranking_definition.

Every dat

[...]

Contributions for week 50, 2025

Posted by Cornelia Biacsics in postgres-contrib.org on 2025-12-07 at 21:26

PGUG.EE met on December 3 2025 in Estonia, organized by Kaarel Moppel & Ervin Weber

Talks

Mayuresh Bagayatkar
Alexander Matrunich
Ants Aasma
Kaarel Moppel

Bruce Momjian spoke at the PG Armenia Community Meetup, organised by Emma Saroyan on December 4 2025.

A Meetup Quiz?

Posted by Bruce Momjian in EDB on 2025-12-05 at 13:00

I have attended over one hundred Postgres meetups over the years. The usual format is: food with individual discussion, lecture with group questions, and finally more individual discussion. I just spoke at an Armenia PostgreSQL User Group meetup and the event organizer Emma Saroyan did something different — she did a group mobile phone quiz after my lecture. To create the quiz questions before the event, she studied my slides and viewed a previous recording of my talk. The winner got a copy of her book.

I think the quiz was a great post-lecture group activity, and gave dynamism to the content. I know people will remember the quiz fondly when they think about the meetup. I encourage all meetups to have similar quizzes, though they do take work to create.

A deeper look at old UUIDv4 vs new UUIDv7 in PostgreSQL 18

Posted by Josef Machytka in credativ on 2025-12-05 at 11:43

In the past there have been many discussions about using UUID as a primary key in PostgreSQL. For some applications, even a BIGINT column does not have sufficient range: it is a signed 8‑byte integer with range −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807. Although these values look big enough, if we think about web services that collect billions or more records daily, this number becomes less impressive. Simple integer values can also cause conflicts of values in distributed system, in Data Lakehouses when combining data from multiple source databases etc.

However, the main practical problem with UUIDv4 as a primary key in PostgreSQL was not lack of range, but the complete randomness of the values. This randomness causes frequent B‑tree page splits, a highly fragmented primary key index, and therefore a lot of random disk I/O. There have already been many articles and conference talks describing this problem. What many of these resources did not do, however, was dive deep into the on‑disk structures. That’s what I wanted to explore here.

What are UUIDs

UUID (Universally Unique Identifier) is a 16‑byte integer value (128 bits), which has 2^128 possible combinations (approximately 3.4 × 10^38). This range is so large that, for most applications, the probability of a duplicate UUID is practically zero. Wikipedia shows a calculation demonstrating that the probability to find a duplicate within 103 trillion version‑4 UUIDs is about one in a billion. Another often‑quoted rule of thumb is that to get a 50% chance of one collision, you’d have to generate roughly 1 billion UUIDs every second for about 86 years.

Values are usually represented as a 36‑character string with hexadecimal digits and hyphens, for example: f47ac10b-58cc-4372-a567-0e02b2c3d479. The canonical layout is 8‑4‑4‑4‑12 characters. The first character in the third block and the first character in the fourth block have special meaning: xxxxxxxx-xxxx-Vxxx-Wxxx-xxxxxxxxxxxx – V marks UUI

[...]

The Future of the PostgreSQL Hacking Workshop

Posted by Robert Haas in EDB on 2025-12-04 at 18:58

The PostgreSQL Hacking Workshop will be taking a well-earned Christmas break in December of 2025. The future of the workshop is a little bit unclear, because I'm continuing to have a bit of trouble finding enough good talks online to justify doing one per month: the best source of talks for the event is pgconf.dev, but not all of those talks are about hacking on PostgreSQL, and not all of those that are about hacking are equally interesting to potential attendees. Also, we've already eaten through much of the backlog of older hacking-related talks that are posted online. Many conferences have few hacking-related talks, or just don't post any videos (which are expensive and time-consuming to produce).

PostgreSQL Contributor Story: Bryan Green

Posted by Floor Drees in EDB on 2025-12-04 at 15:52

Earlier this year we started a program (“Developer U”) to help colleagues who show promise for PostgreSQL Development to become contributors. Meet Bryan Green, working on the Platform Operations team at EDB, who just enjoys understanding how things work at the lowest levels.

JIT, episode III: warp speed ahead

Posted by Pierre Ducroquet on 2025-12-04 at 14:43

Previously…

In our first JIT episode, we discussed how we could, using copy-patch, easily create a JIT compiler for PostgreSQL, with a slight improvement in performance compared to the PostgreSQL interpreter.

In our second episode, I talked about the performance wall and how hard it was to have a real leap in performance compared to the interpreter. But it ended with a positive outlook, a nice performance jump that I was preparing at that moment…

Beating the interpreter, let’s get back to the basics

The interpreter will run each opcode for every record it has to process. Everything it has to do for each record that could be done only once is better done once, obviously. And this is where a JIT can beat it. The JIT compiler can choose optimizations that would require checks at each opcode for the interpreter, and thus self-defeating for the interpreter. For instance, I mentioned creating inlined opcodes for common function calls like int4eq : replacing the indirect call to int4eq with a comparison of the function pointer and then an inlined version would indeed be silly, since the comparison is going to waste a lot of time already.

So, what can’t the interpreter do? It sure can’t easily remove indirect calls, but this is a 1% performance gain, 2% at most. You won’t get to the headlines with that, right? Well, when in doubt, look at the past…

A short story about belief oriented programming…

A decade ago, I worked at a small company where I heard the weirdest thing ever regarding system performance: “our application is slower when built in 64 bits mode because the bigger pointer size makes it slower”. I didn’t buy this, spent two days digging into the code, and found that it was the opposite: 64 bits brought such a performance improvement that the entire system collapsed on a mutex that held a core structure in the application… Removing the mutex made the application fly in both 32 and 64 bits, with 64 bits beating 32 bits obviously.

But why is 64 bits faster? We are talking databa

[...]

Postgres Scan Types in EXPLAIN Plans

Posted by Elizabeth Garrett Christensen in Crunchy Data on 2025-12-04 at 13:00

The secret to unlocking performance gains often lies not just in what you ask in a query, but in how Postgres finds the answer. The Postgres EXPLAIN system is great for understanding how data is being queried. One of secretes to reading EXPLAIN plans is understanding the type of scan done to retrieve the data. The scan type can be the difference between a lightning-fast response or a slow query.

postgres explain plan

Today I’ll break down the most common scan types, how they work, and when you’ll see them in your queries.

Sequential scan

This type of data scan reads the entire table, row by row checking to see what matches the query conditions. If you have a WHERE or FILTER, Postgres just scans each row looking for matches.

Sequence scans are kind of the foundation of how scans are done and for many searches, this is what Postgres will use. For very large data sets, or those queried often, sequential scans are not ideal and an index scan may be faster. For that reason - knowing how to spot a seq scan vs index scan when reading an EXPLAIN plan is one the most important parts of reading a scan type in a query plan.

EXPLAIN select * from accounts;

QUERY PLAN
-------------------------------------------------------------
Seq Scan on accounts  (cost=0.00..22.70 rows=1270 width=36)
(1 row)

Index Scan

When you create an index in Postgres, you’re creating a column or multi-column reference that is stored on disk. Postgres is able to use this index as a map to the data stored in the table. A basic index scan uses a B-tree to quickly find the exact location of the data using a a two-step process: first Postgres finds the entry in the index, uses the reference, and then it fetches the rest of the row data from the table.

EXPLAIN select * from accounts where id = '5';

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Index Scan using accounts_pkey on accounts  (cost=0.15..2.37 rows=1 width=36)
   Index Cond: (id = 5)
(2

[...]

Integrating Custom Storages with pgwatch

Posted by ahmed gouda in Cybertec on 2025-12-04 at 08:32

As a PostgreSQL-specific monitoring solution, pgwatch is mostly known for storing collected metrics in a PostgreSQL database. While great, as you probably should "just Use Postgres for everything" xD... in some scenarios and specific setups, this might be a limitation.

You may want to store your metrics in a particular OLAP database, or a specific format, etc. This is where pgwatch's gRPC client shines, as it enables users to integrate custom storage solutions by developing a compatible gRPC server that writes the metrics to their desired destination.

In this post, we will go through the process of developing a PoC for a custom gRPC server in Go that pgwatch can use to write metrics to ClickHouse.

pgwatch protobuf

First, let's understand the pgwatch gRPC client protobuf file.

In the file, you can see the main Receiver service defined with a couple of methods; let's explain them one by one.

UpdateMeasurements(MeasurementEnvelope) returns (Reply).
- It's the main write function that gets passed the measurement and writes it to the desired destination.
- The passed MeasurementEnvelope data type consists of DBName, which is the monitored source name, Tags, representing the tags associated with this source, MetricName, the metric being measured, and Data, a list of rows that resulted from running the metric query.
- Returns a log message in Reply.
SyncMetric(SyncReq) returns (Reply).
- It's a special function that synchronizes to the server the addition/deletion of a metric under a specific monitored source, so it can pre-allocate/free resources.
- The passed SyncReq data type consists of DBName, representing the source name under which the metric got added/deleted, MetricName, the metric added/deleted, and Operation, either addition or deletion.
- Returns a log message in Reply.
DefineMetrics(google.protobuf.Struct) returns (Reply).
- It's an optional, re

[...]

Building a RAG Server with PostgreSQL - Part 1: Loading Your Content

Posted by Dave Page in pgEdge on 2025-12-04 at 06:30

Retrieval-Augmented Generation (RAG) has become one of the most practical ways to give Large Language Models (LLMs) access to your own data. Rather than fine-tuning a model or hoping it somehow knows about your documentation, RAG lets you retrieve relevant content from your own sources and provide it as context to the LLM at query time. The result is accurate, grounded responses based on your actual content.In this three-part series, I'll walk through building a complete RAG server using PostgreSQL as the foundation. We'll cover:

Part 1
(this post): Creating a schema and loading your documents

Part 2
: Chunking documents and generating embeddings with pgEdge Vectorizer

Part 3
: Deploying a RAG API server for your applications

By the end of the series, you'll have a working RAG system that can answer questions using your own documentation or knowledge base.

Why PostgreSQL for RAG?

If you're already running PostgreSQL (and let's face it, you probably are), adding RAG capabilities to your existing infrastructure makes a lot of sense. With the pgvector extension, Postgres becomes a capable vector database without requiring you to deploy and manage yet another specialised system. Your documents, embeddings, and application data can all live in one place, with the transactional guarantees and operational tooling you already know.

The Architecture

Our RAG system consists of three components:

In this first post, we'll focus on getting your documents into the d[...]

Introducing Snowflake Sequences in a Postgres Extension-2

Posted by Ahsan Hadi in pgEdge on 2025-12-03 at 06:36

In a PostgreSQL database, sequences provide a convenient way to generate a unique identifier, and are often used for key generation. From the community, PostgreSQL provides functions and SQL language to help manage sequence generation, but the sequences themselves are not without limitations in a multi-master environment. As a result, we introduced 100% open-source Snowflake sequences that work seamlessly in a multi-master PostgreSQL cluster to remove those limitations and ensure your data can thrive at the network edge. (Check out the extension on GitHub!) The easiest way to get started using Snowflake sequences is to try out pgEdge Postgres, either the Enterprise or the Distributed edition, available for self-hosting or fully managed hosting on virtual machines, containers, or the cloud. The pgEdge Control Plane (also 100% open source and available on GitHub) makes it even easier to deploy and orchestrate Postgres clusters, and comes with the snowflake extension by default (amongst others).

Why are Sequences an Issue?

In a distributed multi-master database system, sequences can get complicated. Ensuring consistency and uniqueness across the nodes in your cluster is a problem if you use PostgreSQL sequences. The Snowflake extension is designed to step up to automatically mitigate this issue. PostgreSQL sequence values are prepared for assignment in a table in your PostgreSQL database; as each sequence value is used, the next sequence value is incremented. Changes to the next available sequence value are not replicated to the other nodes in your replication cluster. In a simple example, you might have a table on node , with 10 rows, each with a primary key that is assigned a sequence value from 1 to 10; the next prepared sequence value on will be 11. Rows are replicated from to without issue until you add a row on . The PostgreSQL sequence value table on has not been incrementing sequence values in step with the sequence value table on n1. When you add a row on , it will try to use the next available seque[...]

Contributions for week 49, 2025

Posted by Boriss Mejias in postgres-contrib.org on 2025-12-02 at 12:10

PostgreSQL Belgium User Group had a new meetup on November 25, in Haasrode, Leuven, organized by Kim Jansen, An Vercammen, Boriss Mejías and Stefan Fercot.

Speakers:

An Vercammen
Marco Huygen
Priyanka Mittal
Boriss Mejías

What is better: a lookup table or an enum type?

Posted by Laurenz Albe in Cybertec on 2025-12-02 at 11:01

If people used lookup tables: a man is standing in front of a wall calendar and musing, "'Child 1 birthday' - Now who was child 1?"
© Laurenz Albe 2025

Sometimes a string column should only contain one of a limited number of distinct values. Examples would be the current state of a service request or the US state in an address. There are several ways to implement such a column in PostgreSQL, the most interesting being an enum type or a lookup table. In this article, I will explore the benefits and disadvantages of these methods.

Usually not the best solution: a string with a check constraint

To show this implementation, let's define a simplified table of Austrian citizens with their residential state:

CREATE UNLOGGED TABLE person_s (
   person_id bigint NOT NULL,
   state text NOT NULL,
   CHECK (state IN ('Wien', 'Niederösterreich', 'Burgenland',
                    'Steiermark', 'Oberösterreich', 'Kärnten',
                    'Tirol', 'Salzburg', 'Vorarlberg'))
);

There are around 8 million Austrians, so let's insert 8 million rows. Not all states are equally populous, so I use a CASE expression to assign states in an approximately realistic distribution:

INSERT INTO person_s (person_id, state)
SELECT i,
       CASE WHEN hashint4(i) < -1198295875 THEN 'Wien'
            WHEN hashint4(i) < -390842024 THEN 'Niederösterreich'
            WHEN hashint4(i) < -249108103 THEN 'Burgenland'
            WHEN hashint4(i) < 343597384 THEN 'Steiermark'
            WHEN hashint4(i) < 1060856922 THEN 'Oberösterreich'
            WHEN hashint4(i) < 1327144894 THEN 'Kärnten'
            WHEN hashint4(i) < 1692217115 THEN 'Tirol'
            WHEN hashint4(i) < 1958505087 THEN 'Salzburg'
            ELSE 'Vorarlberg'
       END
FROM generate_series(1, 8000000) AS i;

I'll add the primary key constraint and an index on the states to the table after loading the data because that is faster. Also, I'll run VACUUM and ANALYZE to improve the query performance:

ALTER TABLE person_s
   ADD CONSTRAINT person_s_pkey PRIMARY KEY (person_id);

CREATE INDEX person_s_state_idx ON person_s (state);

VACUUM (ANALYZE) person_s;

Disadvantages of t

[...]

CNPG Recipe 23 - Managing extensions with ImageVolume in CloudNativePG

Posted by Gabriele Bartolini in EDB on 2025-12-01 at 20:35

Say goodbye to the old way of distributing Postgres extensions as part of the main pre-built operand image. Leveraging the Kubernetes ImageVolume feature, CloudNativePG now allows you to mount extensions like pgvector and PostGIS from separate, dedicated images. This new declarative method completely decouples the PostgreSQL core from the extension binaries, enabling dynamic addition, easier evaluation, and simplified updates without ever having to build or manage monolithic custom container images.

Part 3: Postgres Journey to the top with developers

Posted by Tom Kincaid in EDB on 2025-12-01 at 18:33

This blog provides my opinion on how Postgres, according to the annual Stack Overflow developer survey, became the most admired, desired and used database by developers. This is part three in my series about the Postgres journey to the top with developers. Here are the first parts of this story:

Postgres incredible rise to the top with developers. This presented the fact as a result of the changing survey results over the years.
Part 2: PostgreSQL’s incredible trip to the top with developers. This presented my opinion on why Postgres didn’t simply go away when it was a distant second, third or

November Meetup Recording

Posted by Henrietta Dombrovskaya on 2025-12-01 at 13:28

And our last 2025 recording is here! Check out Jay Miller’s talk!

PgPedia Week, 2025-11-30

Posted by Ian Barwick on 2025-12-01 at 09:45

PostgreSQL 19 changes this week debug_exec_backend GUC to show EXEC_BACKEND state pg_buffercache view pg_buffercache_os_pages added following functions added: pg_buffercache_mark_dirty() pg_buffercache_mark_dirty_relation() pg_buffercache_mark_dirty_all() pg_replication_slots column slotsync_skip_reason added pg_stat_replication_slots columns slotsync_skip_count and slotsync_skip_at added Planner will now replace COUNT(ANY) with COUNT(*) , when possible PostgreSQL 19 articles Teaching Query Planner to See Inside C Functions (2025-11-27) - Robins Tharakan Settling COUNT(*) vs COUNT(1) debate in Postgres 19 (2025-11-27) - Robins Tharakan Super fast aggregations in PostgreSQL 19 (2025-11-25) - Hans-Jürgen Schönig / CYBERTEC PostgreSQL 18 articles

more...

Contributions for week 48, 2025

Posted by Floor Drees in postgres-contrib.org on 2025-12-01 at 08:10

Seattle Postgres Users Group (SEAPUG) maintained the PostgreSQL booth at PASS Data Community Summit 2025 from November 17-21, 2025:

Lloyd Albin
Jeremy Schneider
Harry Pierson
Ben Chobot
Deon Gill
Rick Lowe
Pavlo Golub

Speakers:

John Martin
Grant Fritchey
Lukas Fittl
Ryan Booz
Taiob Ali
John Martin
Pavlo Golub
Ramesh Pathuri
Abhishek Shukla
Ezat Karimi
Adam Machanic
Shane Borde
William Mentaze
Alvaro Costa-Neto
James Phillips
Janis Griffin
Jeremy Schneider
Shivam Gulati
Tim Steward
Sudhir Amin
Minesh Chande

The Mumbai PostgreSQL User Group meetup took place on 20th Nov, 2025, organized by Sovenath Shaw, Ajit Gadge and Deepak Mahto.

Speakers:

Jignesh Shah (AWS)
Suraj Kharage(EDB)
Jessica Shelar (DataCloudGaze Consulting)
Aniket Jadhav, Rushikesh Ghantewar (CDAC)
Trushar Borse , Sagar Sidhpura(Google Cloud)

PostgreSQL Conference Japan 2025 took place on November 21st. It was organized by Haruka Takatsuka, Kazushiro Takeda, Rintaro Ikeda, Tetsuo Sakata, Kousuke Kida, and Junichi Tado.

During the conference, the following volunteers helped running the conference: Tamotsu Odaka, Taiki Koshino, Nozomi Anzai, Jun Kuwamura, Hiroki Kataoka, Makoto Kaga, Tetsuji Koyama, Seiji Goto, Shinobu Honma, Kenji Sato, Katsuaki Fukushima, Thiduong Dang, Yung-Chung Ku, Wataru Takahashi, and Kouhei Ito.

Latest Blog Posts

Posted by Antony Pegg in pgEdge on 2025-12-12 at 06:02

What Is the Spock Extension?

Zero-Downtime Node Addition: Maintenance Without Interruption

Posted by Radim Marek on 2025-12-11 at 23:30

The anatomy of storage🔗

Posted by Stefan Fercot in Data Egret on 2025-12-11 at 13:25

The tiny lab setup

Dockerfile

Posted by Greg Sabino Mullane in Crunchy Data on 2025-12-11 at 13:00

Posted by Evan Stanton in Data Bene on 2025-12-11 at 00:00

Posted by David Wheeler on 2025-12-10 at 16:34

Posted by Gülçin Yıldırım Jelínek in Xata on 2025-12-10 at 12:00

Posted by Dave Page in pgEdge on 2025-12-10 at 09:24

What the RAG Server Does

Prerequisites

Installing the RAG Server

Configuration

Posted by Paul Ramsey in Crunchy Data on 2025-12-09 at 13:00

Posted by Hans-Juergen Schoenig in Cybertec on 2025-12-09 at 08:25

Scenario 1: Automatic failover with Patroni

Advantages of this approach

Transparent connection handling

Posted by Dave Page in pgEdge on 2025-12-09 at 06:30

What Are Embeddings and Why Chunk?

Why Vectorize in the Database?

Prerequisites

Posted by Paul Ramsey in Crunchy Data on 2025-12-08 at 17:23

Spatial AI agents: The real deal

Posted by Umair Shahid in Stormatics on 2025-12-08 at 15:58

Scaling stories are always more complex than a soundbite

Posted by Cornelia Biacsics in postgres-contrib.org on 2025-12-07 at 21:26

Posted by Bruce Momjian in EDB on 2025-12-05 at 13:00

Posted by Josef Machytka in credativ on 2025-12-05 at 11:43

What are UUIDs

Posted by Robert Haas in EDB on 2025-12-04 at 18:58

Posted by Floor Drees in EDB on 2025-12-04 at 15:52

Posted by Pierre Ducroquet on 2025-12-04 at 14:43

Previously…

Beating the interpreter, let’s get back to the basics

A short story about belief oriented programming…

Posted by Elizabeth Garrett Christensen in Crunchy Data on 2025-12-04 at 13:00

Posted by ahmed gouda in Cybertec on 2025-12-04 at 08:32

pgwatch protobuf

Posted by Dave Page in pgEdge on 2025-12-04 at 06:30

Why PostgreSQL for RAG?

The Architecture

Posted by Ahsan Hadi in pgEdge on 2025-12-03 at 06:36

Why are Sequences an Issue?

Posted by Boriss Mejias in postgres-contrib.org on 2025-12-02 at 12:10

Posted by Laurenz Albe in Cybertec on 2025-12-02 at 11:01

Usually not the best solution: a string with a check constraint

Disadvantages of t

Posted by Gabriele Bartolini in EDB on 2025-12-01 at 20:35

Posted by Tom Kincaid in EDB on 2025-12-01 at 18:33

Posted by Henrietta Dombrovskaya on 2025-12-01 at 13:28

Posted by Ian Barwick on 2025-12-01 at 09:45

Posted by Floor Drees in postgres-contrib.org on 2025-12-01 at 08:10

Top posters

Top teams

Feeds

Planet

Contact