Check The Replication Delay -4 In Postgresql

November 26, 2016

You can get the delay in bytes from the master side quite easily using pg_xlog_location_diff to compare the master’s pg_current_xlog_insert_location with the replay_location for that backend’s pg_stat_replication entry.

postgres=# SELECT

pg_last_xlog_receive_location() receive,

pg_last_xlog_replay_location() replay,

(

extract(epoch FROM now()) –

extract(epoch FROM pg_last_xact_replay_timestamp())

)::int lag;

receive | replay | lag

————+————+——-

1/AB861728 | 1/AB861728 | 2027

the lag is only only important when receive is different than replay. execute the query on the slave

Time-Based Replication Monitoring In The Hot_Standby_Delay

November 26, 2016

By postgresdba

This was something that had been a long-standing item on my personal TODO list, and happened to scratch the itch of a couple of clients at the time.

Previously it would only take an integer representing how many bytes of WAL data the master could be ahead of a replica before the threshold is crossed:

check_hot_standby_delay –dbhost=master,replica1 –critical=16777594

This is certainly useful for, say, keeping an eye on whether you’re getting close to running over your wal_keep_segments value. Of course it can also be used to indicate whether the replica is still processing WAL, or has become stuck for some reason. But for the (arguably more common) problem of determining whether a replica is falling too far behind determining what byte thresholds to use, beyond simply guessing, isn’t easy to figure out.

Postgres 9.1 introduced a handy function to help solve this problem: pg_last_xact_replay_timestamp(). It measures a slightly different thing than the pg_last_xlog_* functions the action previously used. And it’s for that reason that the action now has a more complex format for its thresholds:

check_hot_standby_delay –dbhost=master,replica1 –critical=”16777594 and 5 min”

For backward compatibility, of course, it’ll still take an integer and work the same as it did before. Or alternatively if you only want to watch the chronological lag, you could even give it just a time interval, ‘5 min’, and the threshold only takes the transaction replay timestamp into account. But if you specify both, as above, then both conditions must be met before the threshold activates.

monitor replication delay -four solution in postgreSql

November 26, 2016

By postgresdba

Looking at the documentation and all the blog posts about how to monitor

replication delay I don’t think there is one good and most importantly safe

solution which works all the time.

Solution 1:

I used to check replication delay/lag by running the following query on the

slave:

SELECT EXTRACT(EPOCH FROM (now() – pg_last_xact_replay_timestamp()))::INT;

This query works great and it is a very good query to give you the lag in

seconds. The problem is if the master is not active, it doesn’t mean a

thing. So you need to first check if two servers are in sync and if they

are, return 0.

Solution 2:

This can be achieved by comparing pg_last_xlog_receive_location() and

pg_last_xlog_replay_location() on the slave, and if they are the same it

returns 0, otherwise it runs the above query again:

SELECT

CASE

WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0

ELSE EXTRACT (EPOCH FROM now() – pg_last_xact_replay_timestamp())::INTEGER

END

AS replication_lag;

This query is all good, but the problem is that it is not safe. If for some

reason the master stops sending transaction logs, this query will continue

to return 0 and you will think the replication is working, when it is not.

Solution 3:

Master:

SELECT pg_current_xlog_location();

Slave:

SELECT pg_last_xlog_receive_location();

and by comparing these two values you could see if the servers are in sync.

The problem yet again is that if streaming replication fails, both of these

functions will continue to return same values and you could still end up

thinking the replication is working. But also you need to query both the

master and slave to be able to monitor this, which is not that easy on

monitoring systems, and you still don’t have the information about the

actual lag in seconds, so you would still need to run the first query.

Solution 4:

You could query pg_stat_replication on the master, compare sent_location

and replay_location, and if they are the same, the replication is in sync.

One more good thing about pg_stat_replication is that if streaming

replication fails it will return an empty result, so you will know it

failed. But the biggest problem with this system view is that only the postgres

user can read it, so it’s not that monitoring friendly since you don’t want

to give your monitoring system super user privileges, and you still don’t

have the delay in seconds.

I think the best one would be 2 combined

with a check if the wal receiver process is running before running that

query with something like:

$ ps aux | egrep ‘wal\sreceiver’

postgres 3858 0.0 0.0 2100112 3312 ? Ss 19:35 0:01 postgres:

wal receiver process streaming 36/900A738

This solution would only be run on the slave and it is pretty easy to setup.

Check For Postgresql Replication Delay/Lag -Part2

November 26, 2016

By postgresdba

Master:

SELECT pg_current_xlog_location();

Slave:

SELECT pg_last_xlog_receive_location();

and by comparing these two values you could see if the servers are in sync.

The problem yet again is that if streaming replication fails, both of these

functions will continue to return same values and you could still end up

thinking the replication is working. But also you need to query both the

master and slave to be able to monitor this, which is not that easy on

monitoring systems, and you still don’t have the information about the

actual lag in seconds, so you would still need to run the first query.