How to check if two tables are identical?

Question

How can I check if the contents of two tables are identical in PostgreSQL? I've checked this question but I couldn't find a solution: Checking whether two tables have identical content in PostgreSQL

What if

my tables don't have a primary key?
my tables contain duplicate rows?
all columns are nullable (so in theory there could be rows containing only nulls)?

Maybe this scenario is not very likely, but is there a way to safely check if two tables with arbitrary content are completely identical?

This is supposed to return 0 rows if both tables contains identical rows. But a join doesn't work if columns contain null values. You need at least one ID column that doesn't contain any null values.

SELECT *
FROM table_a FULL OUTER JOIN table_b
    USING (<list of columns to compare>)
WHERE a.id IS NULL
   OR b.id IS NULL;

This is also supposed to return 0 rows if both tables are identical. But it doesn't work if the tables contain duplicate rows.

(TABLE a EXCEPT TABLE b)
UNION ALL
(TABLE b EXCEPT TABLE a)

Why would a table have duplicate rows in the first place? Does it not have a primary/unique key? — Charlieface
– Charlieface, Commented Dec 3 at 17:29
How do you know if the rows from different tables, each containing all null columns, are identical or not? Two null values are neither identical nor not identical, this also applies to multiple columns. — mustaccio
– mustaccio, Commented Dec 3 at 19:13
In short, if you don't want to use the usual rules of identity, you have to define your own. What are yours? — mustaccio
– mustaccio, Commented Dec 3 at 19:16
I don't feel like answering questions on tables without a primary key. If you have no primary key, consistency, integrity and data quality cannot be very important for you. — Laurenz Albe
– Laurenz Albe, Commented Dec 4 at 9:02

Luuk · Accepted Answer · 2025-12-03 15:22:39Z

5

You can add a row number when checking:

(SELECT *, ROW_NUMBER() OVER (ORDER BY a,b,c) as R FROM table_a 
EXCEPT 
SELECT *, ROW_NUMBER() OVER (ORDER BY a,b,c) as R FROM table_b)
UNION ALL
(SELECT *, ROW_NUMBER() OVER (ORDER BY a,b,c) as R FROM table_b 
EXCEPT 
SELECT *, ROW_NUMBER() OVER (ORDER BY a,b,c) as R FROM table_a)

see: DBFIDDLE

P.S.: I am ignoring the fact that this might be slow when you have a lot of records....

answered Dec 3 at 15:22

Luuk

9845 silver badges13 bronze badges

2

You can use the whole, unpacked row and order by table_a to avoid having to list all columns explicitly. fiddle

Zegarek
– Zegarek

2025-12-04 10:30:10 +00:00
Commented Dec 4 at 10:30
Indeed, but that only works in postgresql? And, but I was not talking about performance 😉, when ordering by unpacked row the sort is wider, so slower (32 vs 4, see: dbfiddle.uk/L--mUVA9?hide=40 )

Luuk
– Luuk

2025-12-04 13:52:41 +00:00
Commented Dec 4 at 13:52
1

Performance is worse but that width doesn't seem to have much to do with it. It's rather due to composite vs value-by-value comparison: if you spawn more columns and paren-wrap them all, the width in non-wrapped version jumps, while the paren-wrapped stays at 32. The wrapped/composite version will be slower despite being narrower: dbfiddle.uk/30Xymov5 You can also order by t.* to the same effect as using the composite, but by that I also mean the worse performance - see the plan and looped tests at the end.

Zegarek
– Zegarek

2025-12-05 11:05:54 +00:00
Commented 2 days ago
This got interesting dbfiddle.uk/XlTugpzq What bothers me is that a in that context is being rewritten to a.* but both seem to act and perform like a composite a, instead of getting expanded to a.c1, a.c2, ... or at least performing like the explicit list does. Feels like it should be optimised the other way around. Doing row(a.*) instead, helps a bit and does do the expansion.

Zegarek
– Zegarek

2025-12-05 11:47:14 +00:00
Commented 2 days ago

Add a comment |

Zegarek · Accepted Answer · 2025-12-04 10:30:42Z

EXCEPT also has an ALL clause, which makes your last example work fine: _{demo at db<>fiddle}

CREATE TABLE a AS VALUES(1),(1),(2),(null),(null),(null);
CREATE TABLE b AS VALUES(1),(2),(2),(null);

(TABLE a EXCEPT ALL TABLE b)
UNION ALL
(TABLE b EXCEPT ALL TABLE a);

column1
null
null
1
2

SELECT 4

This also checks if the values are not distinct from each other, instead of 3VL equal, so null differences are significant. Duplicates are treated as distinct as if they were numbered, without you really having to row_number() them.

It might be a good idea to mark the origin of the difference and count the repetitions:

(SELECT'a-b',diff,count(*)FROM(TABLE a EXCEPT ALL TABLE b)AS diff GROUP BY 1,2)
UNION ALL
(SELECT'b-a',diff,count(*)FROM(TABLE b EXCEPT ALL TABLE a)AS diff GROUP BY 1,2);

	diff	count
a-b	(1)	1
a-b	()	2
b-a	(2)	1

I'm using diff as a whole-row composite value column.

RonJohn · Accepted Answer · 2025-12-05 21:01:58Z

1

Another method is to use the COPY command:

$ psql dba -Xc "COPY (SELECT * FROM foo ORDER BY f1) TO STDOUT" | md5sum
c0710d6b4f15dfa88f600b0e6b624077  -

Run that on both tables. If their MD5 sums match, they're identical.

answered yesterday

RonJohn

7042 gold badges13 silver badges31 bronze badges

Good to note md5sum is a bash/Linux/WSL utility. On Windows, that'll have to go into certutil instead. And, same remark as under @Luuk's post: order by foo, or by foo.* or row(foo.*) if you don't want to check and list all columns each time, for the price of a slight performance hit. It's also limited to the if in the title, without addressing the option to see how they differ, in the question body. Other than these nitpicks, +1.

Zegarek
– Zegarek

2025-12-05 22:41:32 +00:00
Commented yesterday
@Zegarek I've read the question three times, but don't see mention of wanting to see how they differ. All I see is it asking about identicality.

RonJohn
– RonJohn

2025-12-07 04:44:13 +00:00
Commented 14 hours ago
You're right. My assumption that they likely want the how was based on the fact in both their attempts, they request all columns. If they really only wanted an if that'd be a boolean-returning exists, a count(*) or event an empty select that only returns row count in the command tag. That's a guess, so yours is as good as mine.

Zegarek
– Zegarek

2025-12-07 09:16:21 +00:00
Commented 10 hours ago

Add a comment |

Stack Exchange Network

How to check if two tables are identical?

3 Answers 3

Your Answer

Linked

Hot Network Questions

How to check if two tables are identical?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions