There’s nothing simple about simplification! It is very common to want to slim down the size of geometries, and there are lots of different approaches to the problem.
We will explore different methods starting with ST_Letters for this rendering of the letter “a”.
SELECT ST_Letters('a');
This is a good starting point, but to show the different effects of different algorithms on things like redundant linear points, we need a shape with more vertices along the straights, and fewer along the curves.
SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1);
Here we add in vertices every one meter with ST_Segmentize and ST_RemoveRepeatedPoints to thin out the points along the curves. Already we are simplifying!
Lets apply the same “remove repeated” algorithm, with a 10 meter tolerance.
WITH a AS (
SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_RemoveRepeatedPoints(a, 10) FROM a;
We do have a lot fewer points, and the constant angle curves are well preserved, but some straight lines are no longer legible as such, and there are redundant vertices in the vertical straight lines.
The ST_Simplify function applies the Douglas-Peuker line simplification algorithm to the rings of the polygon. Because it is a line simplifier it does a cruder job preserving some aspects of the polygon area like squareness of the top ligature.
WITH a AS (
SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_Simplify(a, 1) FROM a;
The ST_SimplifyVW function applies the Visvalingam–Whyatt algorithm to the rings of the polygon. Visvalingam–Whyatt is better for preserving the shapes of polygons than Douglas-Peuker, but the differences are subtle.
WITH a AS (
SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_SimplifyVW(a, 5) FROM a;
Coercing a shape onto a fixed precision grid is another form of simplification, sometimes used to force the edges of adjacent objects
[...]PostgreSQL is highly suitable for powering critical applications in all industries. However, to run critical applications, there are key requirements which are absolutely needed: High-Availability and automatic failover.
This document explains which options are available and which problems one can solve with PostgreSQL. We have listed the most common setups and some of the most common recommendations.
The first scenario has been designed to provide a solution to the following requirements and scenario:
This is a typical scenario which is commonly seen in real world deployments. An application has to rely on a highly available database and in case of error we simply want to switch to a backup system.
The solution in this case is a Patroni cluster. It uses distributed consensus based on “etcd” and fully automated failover as well as database recovery. The following image shows how this works:
The image shows a 3-node cluster. The primary server is able to handle read as well as write transactions. The remaining two servers will accept reads.
This approach has a couple of advantages that are important to understand:
In general, achieving high availability is possible if you follow a simple design pattern: Keep it simple. The more complexity you add to the system, the more likely it is that something goes wrong or something impacts operations. Simplicity is a key advantage of this approach.
When dealing with clusters, there is an important question which has to be answered: H
[...]In Part 1 of this series, we loaded our documentation into PostgreSQL using the pgEdge Document Loader. Our documents are sitting in the database as clean Markdown content, ready for the next step: turning them into something an LLM can search through semantically.In this post, we'll use pgEdge Vectorizer to chunk those documents and generate vector embeddings. By the end, you'll have a searchable vector database that can find relevant content based on meaning rather than just keywords.
On Nov. 20, the day after GIS Day, Elizabeth Christensen and I hosted the 7th annual PostGIS Day, a celebration of the Little Spatial Database That Could. Brought to you this year by Snowflake, the event featured an amazing collection of speakers from around the globe — from India to Africa, Europe to North America.
The themes this year were, if you can forgive the pun, all over the map! Moving beyond the generic hype of AI into the practical reality of agents, we saw tools from Felt, Carto and Bunting Labs that can actually write, execute and debug spatial SQL on the fly. We also saw the lakehouse architecture take center stage, with PostGIS acting less as a data silo and more as a high-performance connector to the vast world of object storage, Iceberg and GeoParquet via tools such as pg_lake and Apache Sedona.
But it wasn't all bleeding edge; we grounded the day in massive, industrial-scale success stories. From enabling mapping at IGN France for more than 20 years to powering State Farm’s critical disaster response platforms, PostGIS remains the bedrock of modern geospatial infrastructure. A full playlist of PostGIS Day 2025 is available on the Snowflake YouTube channel. Here's a deeper look at what we learned.
We've all heard the buzzwords, but this year, we saw the code. The concept of AI agents was everywhere, but crucially, it wasn't just chatbots hallucinating SQL. We saw systems designed to iterate.
Brendan Ashworth from Bunting Labs drilled down to what "agent" actually means in the geospatial context. It isn't just asking a question and getting an answer. It's a loop. The agent writes a query, runs it against PostGIS, reads the error message if it fails, corrects itself, and runs it again. He showed us Mundi, an AI-native WebGIS that treats PostGIS as its execution engine, solving complex spatial problems without a human holding its hand.
Jaime Sanchez and Mamata Akella from Felt showed how they are teaching Claude to think spatially. They are
[...]PGUG.EE met on December 3 2025 in Estonia, organized by Kaarel Moppel & Ervin Weber
Talks
Bruce Momjian spoke at the PG Armenia Community Meetup, organised by Emma Saroyan on December 4 2025.
I have attended over one hundred Postgres meetups over the years. The usual format is: food with individual discussion, lecture with group questions, and finally more individual discussion. I just spoke at an Armenia PostgreSQL User Group meetup and the event organizer Emma Saroyan did something different — she did a group mobile phone quiz after my lecture. To create the quiz questions before the event, she studied my slides and viewed a previous recording of my talk. The winner got a copy of her book.
I think the quiz was a great post-lecture group activity, and gave dynamism to the content. I know people will remember the quiz fondly when they think about the meetup. I encourage all meetups to have similar quizzes, though they do take work to create.
In the past there have been many discussions about using UUID as a primary key in PostgreSQL. For some applications, even a BIGINT column does not have sufficient range: it is a signed 8‑byte integer with range −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807. Although these values look big enough, if we think about web services that collect billions or more records daily, this number becomes less impressive. Simple integer values can also cause conflicts of values in distributed system, in Data Lakehouses when combining data from multiple source databases etc.
However, the main practical problem with UUIDv4 as a primary key in PostgreSQL was not lack of range, but the complete randomness of the values. This randomness causes frequent B‑tree page splits, a highly fragmented primary key index, and therefore a lot of random disk I/O. There have already been many articles and conference talks describing this problem. What many of these resources did not do, however, was dive deep into the on‑disk structures. That’s what I wanted to explore here.
UUID (Universally Unique Identifier) is a 16‑byte integer value (128 bits), which has 2^128 possible combinations (approximately 3.4 × 10^38). This range is so large that, for most applications, the probability of a duplicate UUID is practically zero. Wikipedia shows a calculation demonstrating that the probability to find a duplicate within 103 trillion version‑4 UUIDs is about one in a billion. Another often‑quoted rule of thumb is that to get a 50% chance of one collision, you’d have to generate roughly 1 billion UUIDs every second for about 86 years.
Values are usually represented as a 36‑character string with hexadecimal digits and hyphens, for example: f47ac10b-58cc-4372-a567-0e02b2c3d479. The canonical layout is 8‑4‑4‑4‑12 characters. The first character in the third block and the first character in the fourth block have special meaning: xxxxxxxx-xxxx-Vxxx-Wxxx-xxxxxxxxxxxx – V marks UUI
[...]The PostgreSQL Hacking Workshop will be taking a well-earned Christmas break in December of 2025. The future of the workshop is a little bit unclear, because I'm continuing to have a bit of trouble finding enough good talks online to justify doing one per month: the best source of talks for the event is pgconf.dev, but not all of those talks are about hacking on PostgreSQL, and not all of those that are about hacking are equally interesting to potential attendees. Also, we've already eaten through much of the backlog of older hacking-related talks that are posted online. Many conferences have few hacking-related talks, or just don't post any videos (which are expensive and time-consuming to produce).
Read more »In our first JIT episode, we discussed how we could, using copy-patch, easily create a JIT compiler for PostgreSQL, with a slight improvement in performance compared to the PostgreSQL interpreter.
In our second episode, I talked about the performance wall and how hard it was to have a real leap in performance compared to the interpreter. But it ended with a positive outlook, a nice performance jump that I was preparing at that moment…
The interpreter will run each opcode for every record it has to process. Everything it has to do for each record that could be done only once is better done once, obviously. And this is where a JIT can beat it. The JIT compiler can choose optimizations that would require checks at each opcode for the interpreter, and thus self-defeating for the interpreter. For instance, I mentioned creating inlined opcodes for common function calls like int4eq : replacing the indirect call to int4eq with a comparison of the function pointer and then an inlined version would indeed be silly, since the comparison is going to waste a lot of time already.
So, what can’t the interpreter do? It sure can’t easily remove indirect calls, but this is a 1% performance gain, 2% at most. You won’t get to the headlines with that, right? Well, when in doubt, look at the past…
A decade ago, I worked at a small company where I heard the weirdest thing ever regarding system performance: “our application is slower when built in 64 bits mode because the bigger pointer size makes it slower”. I didn’t buy this, spent two days digging into the code, and found that it was the opposite: 64 bits brought such a performance improvement that the entire system collapsed on a mutex that held a core structure in the application… Removing the mutex made the application fly in both 32 and 64 bits, with 64 bits beating 32 bits obviously.
But why is 64 bits faster? We are talking databa
[...]
The secret to unlocking performance gains often lies not just in what you ask in a query, but in how Postgres finds the answer. The Postgres EXPLAIN system is great for understanding how data is being queried. One of secretes to reading EXPLAIN plans is understanding the type of scan done to retrieve the data. The scan type can be the difference between a lightning-fast response or a slow query.
Today I’ll break down the most common scan types, how they work, and when you’ll see them in your queries.
This type of data scan reads the entire table, row by row checking to see what matches the query conditions. If you have a WHERE or FILTER, Postgres just scans each row looking for matches.
Sequence scans are kind of the foundation of how scans are done and for many searches, this is what Postgres will use. For very large data sets, or those queried often, sequential scans are not ideal and an index scan may be faster. For that reason - knowing how to spot a seq scan vs index scan when reading an EXPLAIN plan is one the most important parts of reading a scan type in a query plan.
EXPLAIN select * from accounts;
QUERY PLAN
-------------------------------------------------------------
Seq Scan on accounts (cost=0.00..22.70 rows=1270 width=36)
(1 row)
When you create an index in Postgres, you’re creating a column or multi-column reference that is stored on disk. Postgres is able to use this index as a map to the data stored in the table. A basic index scan uses a B-tree to quickly find the exact location of the data using a a two-step process: first Postgres finds the entry in the index, uses the reference, and then it fetches the rest of the row data from the table.
EXPLAIN select * from accounts where id = '5';
QUERY PLAN
-------------------------------------------------------------------------------
Index Scan using accounts_pkey on accounts (cost=0.15..2.37 rows=1 width=36)
Index Cond: (id = 5)
(2 [...]
As a PostgreSQL-specific monitoring solution, pgwatch is mostly known for storing collected metrics in a PostgreSQL database. While great, as you probably should "just Use Postgres for everything" xD... in some scenarios and specific setups, this might be a limitation.
You may want to store your metrics in a particular OLAP database, or a specific format, etc. This is where pgwatch's gRPC client shines, as it enables users to integrate custom storage solutions by developing a compatible gRPC server that writes the metrics to their desired destination.
In this post, we will go through the process of developing a PoC for a custom gRPC server in Go that pgwatch can use to write metrics to ClickHouse.
First, let's understand the pgwatch gRPC client protobuf file.
In the file, you can see the main Receiver service defined with a couple of methods; let's explain them one by one.
UpdateMeasurements(MeasurementEnvelope) returns (Reply).
MeasurementEnvelope data type consists of DBName, which is the monitored source name, Tags, representing the tags associated with this source, MetricName, the metric being measured, and Data, a list of rows that resulted from running the metric query.
Reply.SyncMetric(SyncReq) returns (Reply).
SyncReq data type consists of DBName, representing the source name under which the metric got added/deleted, MetricName, the metric added/deleted, and Operation, either addition or deletion.
Reply.DefineMetrics(google.protobuf.Struct) returns (Reply).
Retrieval-Augmented Generation (RAG) has become one of the most practical ways to give Large Language Models (LLMs) access to your own data. Rather than fine-tuning a model or hoping it somehow knows about your documentation, RAG lets you retrieve relevant content from your own sources and provide it as context to the LLM at query time. The result is accurate, grounded responses based on your actual content.In this three-part series, I'll walk through building a complete RAG server using PostgreSQL as the foundation. We'll cover:
In a PostgreSQL database, sequences provide a convenient way to generate a unique identifier, and are often used for key generation. From the community, PostgreSQL provides functions and SQL language to help manage sequence generation, but the sequences themselves are not without limitations in a multi-master environment. As a result, we introduced 100% open-source Snowflake sequences that work seamlessly in a multi-master PostgreSQL cluster to remove those limitations and ensure your data can thrive at the network edge. (Check out the extension on GitHub!) The easiest way to get started using Snowflake sequences is to try out pgEdge Postgres, either the Enterprise or the Distributed edition, available for self-hosting or fully managed hosting on virtual machines, containers, or the cloud. The pgEdge Control Plane (also 100% open source and available on GitHub) makes it even easier to deploy and orchestrate Postgres clusters, and comes with the snowflake extension by default (amongst others).
PostgreSQL Belgium User Group had a new meetup on November 25, in Haasrode, Leuven, organized by Kim Jansen, An Vercammen, Boriss Mejías and Stefan Fercot.
Speakers:
Related blog post by Kim Jansen
Prague PostgreSQL Meetup hosted a new meetup on November 24, organized by Gulcin Yildirim Jelinek
Speakers:

© Laurenz Albe 2025
Sometimes a string column should only contain one of a limited number of distinct values. Examples would be the current state of a service request or the US state in an address. There are several ways to implement such a column in PostgreSQL, the most interesting being an enum type or a lookup table. In this article, I will explore the benefits and disadvantages of these methods.
To show this implementation, let's define a simplified table of Austrian citizens with their residential state:
CREATE UNLOGGED TABLE person_s (
person_id bigint NOT NULL,
state text NOT NULL,
CHECK (state IN ('Wien', 'Niederösterreich', 'Burgenland',
'Steiermark', 'Oberösterreich', 'Kärnten',
'Tirol', 'Salzburg', 'Vorarlberg'))
);
There are around 8 million Austrians, so let's insert 8 million rows. Not all states are equally populous, so I use a CASE expression to assign states in an approximately realistic distribution:
INSERT INTO person_s (person_id, state)
SELECT i,
CASE WHEN hashint4(i) < -1198295875 THEN 'Wien'
WHEN hashint4(i) < -390842024 THEN 'Niederösterreich'
WHEN hashint4(i) < -249108103 THEN 'Burgenland'
WHEN hashint4(i) < 343597384 THEN 'Steiermark'
WHEN hashint4(i) < 1060856922 THEN 'Oberösterreich'
WHEN hashint4(i) < 1327144894 THEN 'Kärnten'
WHEN hashint4(i) < 1692217115 THEN 'Tirol'
WHEN hashint4(i) < 1958505087 THEN 'Salzburg'
ELSE 'Vorarlberg'
END
FROM generate_series(1, 8000000) AS i;
I'll add the primary key constraint and an index on the states to the table after loading the data because that is faster. Also, I'll run VACUUM and ANALYZE to improve the query performance:
ALTER TABLE person_s ADD CONSTRAINT person_s_pkey PRIMARY KEY (person_id); CREATE INDEX person_s_state_idx ON person_s (state); VACUUM (ANALYZE) person_s;
Say goodbye to the old way of distributing Postgres extensions as part of the main pre-built operand image. Leveraging the Kubernetes ImageVolume feature, CloudNativePG now allows you to mount extensions like pgvector and PostGIS from separate, dedicated images. This new declarative method completely decouples the PostgreSQL core from the extension binaries, enabling dynamic addition, easier evaluation, and simplified updates without ever having to build or manage monolithic custom container images.
This blog provides my opinion on how Postgres, according to the annual Stack Overflow developer survey, became the most admired, desired and used database by developers. This is part three in my series about the Postgres journey to the top with developers. Here are the first parts of this story:
And our last 2025 recording is here! Check out Jay Miller’s talk!
Seattle Postgres Users Group (SEAPUG) maintained the PostgreSQL booth at PASS Data Community Summit 2025 from November 17-21, 2025:
Speakers:
The Mumbai PostgreSQL User Group meetup took place on 20th Nov, 2025, organized by Sovenath Shaw, Ajit Gadge and Deepak Mahto.
Speakers:
PostgreSQL Conference Japan 2025 took place on November 21st. It was organized by Haruka Takatsuka, Kazushiro Takeda, Rintaro Ikeda, Tetsuo Sakata, Kousuke Kida, and Junichi Tado.
During the conference, the following volunteers helped running the conference: Tamotsu Odaka, Taiki Koshino, Nozomi Anzai, Jun Kuwamura, Hiroki Kataoka, Makoto Kaga, Tetsuji Koyama, Seiji Goto, Shinobu Honma, Kenji Sato, Katsuaki Fukushima, Thiduong Dang, Yung-Chung Ku, Wataru Takahashi, and Kouhei Ito.
PASS Data Community Summit 2025 wrapped up last week. This conference originated 25 years ago with the independent, user-led, not-for-profit “Professional Association for SQL Server (PASS)” and the annual summit in Seattle continues to attract thousands of database professionals each year. After the pandemic it was reorganized and broadened as a “Data Community” event, including a Postgres track.
Starting in 2023, volunteers from the Seattle Postgres User Group have staffed a postgres community booth on the exhibition floor. We provide information about Postgres User Groups around the world and do our best to answer all kinds of questions people have about Postgres. The booth consistently gets lots of traffic and questions.
The United States PostgreSQL Association has generously supplied one of their booth kits each year, which has a banner/background and some booth materials like stickers and a map with many user groups and a “welcome to postgres” handout and postgres major version handouts. We supplement with extra pins and stickers and printouts like the happiness hints I’ve put together, a list of common extensions that Rox made, and a list of Postgres events that Lloyd made. Every year, we also bring leftover Halloween candy that we want to get rid of and we put it in a big bowl on the table.
One of the top questions people ask is how and where they can learn more about Postgres. Next year I might just print out the Links section from my blog, which has a bunch of useful free resources. Another idea I have is for RedGate and EnterpriseDB – I think both of these companies have paid training but also give free access to a few introductory classes – it would be nice if they made a small card with a link to their free training. I think we could have a stack of these cards at our user groups and at the PASS booth. The company can promote paid training, but the free content can benefit anyone even if they aren’t interested in the paid training. I might also reach out to other companies who have paid
[...]
Recently, I read an article titled "Alias Settings of Engineers Around the World" (in Japanese). As a PostgreSQL engineer, this got me thinking: "If they are customizing their bash aliases, how are they configuring their psql environments?"
Driven by curiosity, I decided to investigate GitHub repositories to see how developers commonly configure their psqlrc files.
psql is the terminal-based front-end for PostgreSQL, allowing you to execute SQL interactively. Its configuration file is named psqlrc, and it has the following characteristics:
psqlrc file is stored in ../etc/ relative to the directory containing the PostgreSQL executable. This directory can be explicitly set using the environment variable PGSYSCONFDIR.
.psqlrc file is stored in the executing user's home directory. This file can be explicitly set using the environment variable PSQLRC.
.psqlrc-14.
psqlrc.
I used the Search API. Here are the key points to consider:
for loop to fetch all pages.
sleep commands.
#!/bin/bash
GITHUB_TOKEN=hoge # Set your own GitHub token here
# API call for files 0 to 1000 bytes
for size in `seq 0 50 950`;
do
for page in `seq 10`;
do
curl -ksS \
-H "AccepBack in October, before PGConf.EU, I explained the issues impacting the prolonged wait for TDE in PostgreSQL 18. Explanations were needed as users were buzzing with anticipation, and they deserved to understand what caused the delays and what the roadmap looked like. In that blog post I have shared that due to one of the features newly added in 18.0, the Asynchronous IO (AIO), we have decided to give ourselves time until 18.1 has been released to provide a build with TDE. We wanted to ensure best quality of the solution and that takes time.
Number of posts in the past two months
Number of posts in the past two months
Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.