3

How can I check if the contents of two tables are identical in PostgreSQL? I've checked this question but I couldn't find a solution: Checking whether two tables have identical content in PostgreSQL

What if

  • my tables don't have a primary key?
  • my tables contain duplicate rows?
  • all columns are nullable (so in theory there could be rows containing only nulls)?

Maybe this scenario is not very likely, but is there a way to safely check if two tables with arbitrary content are completely identical?


This is supposed to return 0 rows if both tables contains identical rows. But a join doesn't work if columns contain null values. You need at least one ID column that doesn't contain any null values.

SELECT *
FROM table_a FULL OUTER JOIN table_b
    USING (<list of columns to compare>)
WHERE a.id IS NULL
   OR b.id IS NULL;

This is also supposed to return 0 rows if both tables are identical. But it doesn't work if the tables contain duplicate rows.

(TABLE a EXCEPT TABLE b)
UNION ALL
(TABLE b EXCEPT TABLE a)
New contributor
I like tech is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
4
  • 1
    Why would a table have duplicate rows in the first place? Does it not have a primary/unique key? Commented Dec 3 at 17:29
  • How do you know if the rows from different tables, each containing all null columns, are identical or not? Two null values are neither identical nor not identical, this also applies to multiple columns. Commented Dec 3 at 19:13
  • 1
    In short, if you don't want to use the usual rules of identity, you have to define your own. What are yours? Commented Dec 3 at 19:16
  • 1
    I don't feel like answering questions on tables without a primary key. If you have no primary key, consistency, integrity and data quality cannot be very important for you. Commented Dec 4 at 9:02

3 Answers 3

5

You can add a row number when checking:

(SELECT *, ROW_NUMBER() OVER (ORDER BY a,b,c) as R FROM table_a 
EXCEPT 
SELECT *, ROW_NUMBER() OVER (ORDER BY a,b,c) as R FROM table_b)
UNION ALL
(SELECT *, ROW_NUMBER() OVER (ORDER BY a,b,c) as R FROM table_b 
EXCEPT 
SELECT *, ROW_NUMBER() OVER (ORDER BY a,b,c) as R FROM table_a)

see: DBFIDDLE

P.S.: I am ignoring the fact that this might be slow when you have a lot of records....

4
  • 2
    You can use the whole, unpacked row and order by table_a to avoid having to list all columns explicitly. fiddle Commented Dec 4 at 10:30
  • Indeed, but that only works in postgresql? And, but I was not talking about performance 😉, when ordering by unpacked row the sort is wider, so slower (32 vs 4, see: dbfiddle.uk/L--mUVA9?hide=40 ) Commented Dec 4 at 13:52
  • 1
    Performance is worse but that width doesn't seem to have much to do with it. It's rather due to composite vs value-by-value comparison: if you spawn more columns and paren-wrap them all, the width in non-wrapped version jumps, while the paren-wrapped stays at 32. The wrapped/composite version will be slower despite being narrower: dbfiddle.uk/30Xymov5 You can also order by t.* to the same effect as using the composite, but by that I also mean the worse performance - see the plan and looped tests at the end. Commented 2 days ago
  • This got interesting dbfiddle.uk/XlTugpzq What bothers me is that a in that context is being rewritten to a.* but both seem to act and perform like a composite a, instead of getting expanded to a.c1, a.c2, ... or at least performing like the explicit list does. Feels like it should be optimised the other way around. Doing row(a.*) instead, helps a bit and does do the expansion. Commented 2 days ago
3

EXCEPT also has an ALL clause, which makes your last example work fine: demo at db<>fiddle

CREATE TABLE a AS VALUES(1),(1),(2),(null),(null),(null);
CREATE TABLE b AS VALUES(1),(2),(2),(null);
(TABLE a EXCEPT ALL TABLE b)
UNION ALL
(TABLE b EXCEPT ALL TABLE a);
column1
null
null
1
2
SELECT 4

This also checks if the values are not distinct from each other, instead of 3VL equal, so null differences are significant. Duplicates are treated as distinct as if they were numbered, without you really having to row_number() them.

It might be a good idea to mark the origin of the difference and count the repetitions:

(SELECT'a-b',diff,count(*)FROM(TABLE a EXCEPT ALL TABLE b)AS diff GROUP BY 1,2)
UNION ALL
(SELECT'b-a',diff,count(*)FROM(TABLE b EXCEPT ALL TABLE a)AS diff GROUP BY 1,2);
diff count
a-b (1) 1
a-b () 2
b-a (2) 1

I'm using diff as a whole-row composite value column.

1

Another method is to use the COPY command:

$ psql dba -Xc "COPY (SELECT * FROM foo ORDER BY f1) TO STDOUT" | md5sum
c0710d6b4f15dfa88f600b0e6b624077  -

Run that on both tables. If their MD5 sums match, they're identical.

3
  • Good to note md5sum is a bash/Linux/WSL utility. On Windows, that'll have to go into certutil instead. And, same remark as under @Luuk's post: order by foo, or by foo.* or row(foo.*) if you don't want to check and list all columns each time, for the price of a slight performance hit. It's also limited to the if in the title, without addressing the option to see how they differ, in the question body. Other than these nitpicks, +1. Commented yesterday
  • @Zegarek I've read the question three times, but don't see mention of wanting to see how they differ. All I see is it asking about identicality. Commented 14 hours ago
  • You're right. My assumption that they likely want the how was based on the fact in both their attempts, they request all columns. If they really only wanted an if that'd be a boolean-returning exists, a count(*) or event an empty select that only returns row count in the command tag. That's a guess, so yours is as good as mine. Commented 10 hours ago

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.