Discussion:
Atomicity of INSERT INTO ... SELECT ... WHERE NOT EXISTS ...
Jim Garrison
2014-08-27 22:22:43 UTC
Permalink
Given (pseudocode)

CREATE TABLE kvstore (
k varchar primary key,
v varchar);

CREATE OR REPLACE FUNCTION store_key_value_pair(k varchar, v varchar) returns boolean as $$
BEGIN
INSERT INTO kvstore (k, v)
SELECT :k, :v
WHERE NOT EXISTS (select 1 from kvstore where k = :k);
RETURN FOUND;
END;
$$ LANGUAGE plpgsql;

I have a few questions:

1) Does INSERT statement set FOUND based on whether or not the row was inserted?
2) If this is invoked without a transaction in progress, is there any guarantee of atomicity between checking the EXISTS and attempting to insert the row? If this is being executed in two (or more) sessions, can the SELECT succeed but then have the INSERT fail with a duplicate-key exception?
3) Will the behavior be different if the invoking processes have a transaction in progress?
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
David G Johnston
2014-08-28 00:07:16 UTC
Permalink
Post by Jim Garrison
Given (pseudocode)
CREATE TABLE kvstore (
k varchar primary key,
v varchar);
CREATE OR REPLACE FUNCTION store_key_value_pair(k varchar, v varchar)
returns boolean as $$
BEGIN
INSERT INTO kvstore (k, v)
SELECT :k, :v
WHERE NOT EXISTS (select 1 from kvstore where k = :k);
RETURN FOUND;
END;
$$ LANGUAGE plpgsql;
1) Does INSERT statement set FOUND based on whether or not the row was inserted?
2) If this is invoked without a transaction in progress, is there any
guarantee of atomicity between checking the EXISTS and attempting to
insert the row? If this is being executed in two (or more) sessions, can
the SELECT succeed but then have the INSERT fail with a duplicate-key
exception?
3) Will the behavior be different if the invoking processes have a transaction in progress?
1) The top-level query controls FOUND; so yes
2) Impossible - functions always execute in a transaction. Actually,
everything executes in a transaction the choice is whether you want to
auto-commit.
3) see #2


Still not super fluent wrt concurrency reasoning but here it goes:

Since we are dealing with MVCC here, and the default READ COMMITTED
isolation level, the data that each session/statement would see would be
stable and not include any INSERTs concurrently performed by the other.
Thus if two sessions try to simultaneously insert the same (k,v) the one
that commits second will error.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Atomicity-of-INSERT-INTO-SELECT-WHERE-NOT-EXISTS-tp5816655p5816668.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Jerry Sievers
2014-08-28 00:34:08 UTC
Permalink
Post by Jim Garrison
Given (pseudocode)
CREATE TABLE kvstore (
k varchar primary key,
v varchar);
CREATE OR REPLACE FUNCTION store_key_value_pair(k varchar, v varchar) returns boolean as $$
BEGIN
INSERT INTO kvstore (k, v)
SELECT :k, :v
WHERE NOT EXISTS (select 1 from kvstore where k = :k);
RETURN FOUND;
END;
$$ LANGUAGE plpgsql;
1) Does INSERT statement set FOUND based on whether or not the row was inserted?
Yes unless triggers/rules are in volved..
Post by Jim Garrison
2) If this is invoked without a transaction in progress, is there any
guarantee of atomicity between checking the EXISTS and attempting to
insert the row? If this is being executed in two (or more) sessions,
can the SELECT succeed but then have the INSERT fail with a
duplicate-key exception?
You will either be at risk of a race condition or more likely your
insert will try the dupe insert and block waiting for another session
that's already inserted same row to commit/abort. And if other guy does
commit then you will raise a dupe key exception.

Your code is wise trying to avoid a dupe insert if the row is already
existing and visible but you will still need to trap the dupe key
exception when you get one which will happen eventually if there will
be other sessions trying this same insert.
Post by Jim Garrison
3) Will the behavior be different if the invoking processes have a transaction in progress?
Every statement you run is a transaction of its own but you are far more
at risk of testing negative for an existing row, proceeding to try the
insert but then hanging because there is the same insert already
pending... if there are longer running complex transactions involved.

Suppose...

session A
begin;
insert into your table key=1
... do more work here...

meanwhile...
session B
test for row and I don't see it
try insert and hang here till commit/abort of session A

session A
commit

session B
Doh!! dupe key error

HTH
--
Jerry Sievers
Postgres DBA/Development Consulting
e: ***@comcast.net
p: 312.241.7800
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Craig Ringer
2014-08-28 06:33:55 UTC
Permalink
Post by Jim Garrison
Given (pseudocode)
CREATE TABLE kvstore (
k varchar primary key,
v varchar);
CREATE OR REPLACE FUNCTION store_key_value_pair(k varchar, v varchar) returns boolean as $$
BEGIN
INSERT INTO kvstore (k, v)
SELECT :k, :v
WHERE NOT EXISTS (select 1 from kvstore where k = :k);
RETURN FOUND;
END;
$$ LANGUAGE plpgsql;
1) Does INSERT statement set FOUND based on whether or not the row was inserted?
2) If this is invoked without a transaction in progress, is there any guarantee of atomicity between checking the EXISTS and attempting to insert the row? If this is being executed in two (or more) sessions, can the SELECT succeed but then have the INSERT fail with a duplicate-key exception?
This code can still fail with a unique violation, yes, as the select can
occur in both transactions then the insert in both.
Post by Jim Garrison
3) Will the behavior be different if the invoking processes have a transaction in progress?
No, because all functions run in transactions. There is no such thing as
"not in a transaction" in PostgreSQL (except for a few special system
management commands).

If it's in a SERIALIZABLE transaction instead of the default READ
COMMITTED then it might fail with a serialization failure instead of a
unique violation, but it'll still fail.

Please read the detailed guidance on this problem that already exists:

http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE

http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/

http://stackoverflow.com/questions/1109061/insert-on-duplicate-update-in-postgresql
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Loading...