Postgresql Hot Standby Installation And Test The Monitor Of Standby

November 26, 2016

By postgresdba

1- Keygen Generate And Copy

[postgres@pg01 ~] ssh-keygen -t rsa

[postgres@pg02 ~] ssh-keygen -t rsa

[postgres@pg01 ~] ssh-copy-id -i .ssh/id_rsa.pub postgres@pg02

[postgres@pg02 ~] ssh-copy-id -i .ssh/id_rsa.pub postgres@pg01

2- Create Streaming Replication User on Primary DB

psql# createuser -U postgres -p 5432 repuser -P -c 10 –replication

3- Primary DB pg_hba.conf configuration

[root@pg01 ~] vi /pgdata/data/postgresql.conf

Add below line

host replication repuser 192.168.10.12/32 md5

[root@pg01 ~] systemctl reload postgresql.service

4- Create Archive Directory on Hot Standby Server

[root@pg02 ~] mkdir -p /pgdata/ARCHIVELOG

5- Postgresql.conf configuration on Primary DB

[root@pg01 ~] vi /pgdata/data/postgresql.conf

Parameters change like below

listen_addresses = ‘192.168.10.10’

wal_level = hot_standby # minimal, archive, hot_standby, or logical

archive_mode = on # enables archiving; off, on, or always

archive_command = ‘rsync -a %p [email protected]:/pgdata/ARCHIVELOG/%f’ # command to use to archive a logfile segment

max_wal_senders = 3 # max number of walsender processes

wal_keep_segments = 1000 # in logfile segments, 16MB each; 0 disables

max_replication_slots = 4 # max number of replication slots(change requires restart)

[root@pg01 ~] systemctl restart postgresql.service

6- Base Backup

You can use pg_basebackup when primary db is start or stop. Stop status is more confident.

[root@pg01 ~] systemctl stop postgresql.service

/pgdata/data directory has to be empty on hot standby server. pg_basebackup command will fill up that directory.

[root@pg02 ~] /usr/pgsql-9.5/bin/pg_basebackup -h 192.168.10.10 -D /pgdata/data -U repuser -p 5432 -v -P –xlog-method=stream

7- Postgresql.conf configuration on Hot Standby Server

Primary DB configuration has to be disabled with # mark.

#wal_level = hot_standby # minimal, archive, hot_standby, or logical

#archive_mode = on # enables archiving; off, on, or always

#archive_command = ‘rsync -a %p [email protected]:/pgdata/ARCHIVELOG/%f’ # command to use to archive a logfile segment

#max_wal_senders = 3 # max number of walsender processes

#wal_keep_segments = 1000 # in logfile segments, 16MB each; 0 disables

#max_replication_slots = 4

listen_addresses = ‘*’

hot_standby = on

8- Create Recovery.conf file on Hot Standby Server

[root@pg02 ~] vi /pgdata/data/recovery.conf

restore_command = ‘cp /pgdata/ARCHIVELOG/%f %p’ # e.g. ‘cp /mnt/server/archivedir/%f %p’

archive_cleanup_command = ‘/usr/pgsql-9.5/bin/pg_archivecleanup /pgdata/ARCHIVELOG/%r’

standby_mode = on

primary_conninfo = ‘host=192.168.10.10 port=5432 user=repuser password=repuserpass’

trigger_file = ‘/pgdata/data/failover.uygula’

If you want to delay to apply committed values. You can use recovery_min_apply_delay parameter in recovery.conf like below.

recovery_min_apply_delay = 5min

This parameter provide 5 minutes delay. When you commit a transaction at primary side, hot standby will apply this transaction 5 minutes later.

9- Start Hot Standby PostgreSQL Service

[root@pg02 ~] systemctl start postgresql.service

10- Test

Primary DB

Replications list

psql# select * from pg_stat_replication ;

Sender process check

ps -ef | grep sender

postgres 18388 18298 0 17:04 ? 00:00:00 postgres: wal sender process repuser 192.168.10.12(33700) streaming 0/9000348

psql# create table test (name text);

psql# insert into test values(‘nijam’);

Hot Standby DB

Recovery mode check

psql# select pg_is_in_recovery();

Receiver process check

ps -ef | grep receiver

postgres 20936 20919 0 17:04 ? 00:00:00 postgres: wal receiver process streaming 0/9000268

psql# select * from test;

name

———–

nijam

Determining replication monitoring in postgreSql

November 26, 2016

By postgresdba

Checking Log Position

replica1=# select pg_last_xlog_receive_location();

pg_last_xlog_receive_location

——————————-

41A/10808DE8

replica2=# select pg_last_xlog_receive_location();

pg_last_xlog_receive_location

——————————-

41A/FFD1560

Timestamp for the replica replay:

replica1=# select pg_last_xact_replay_timestamp();

pg_last_xact_replay_timestamp

——————————-

2012-10-05 10:35:47.527-07

compare two xlog locations to see which one is higher:

replica2=# select pg_xlog_location_diff(‘1/10808DE8′,’1/FFD1560’);

pg_xlog_location_diff

———————–

8616072

The way you would use them is as follows:

1.The master goes down.

2.Check xlog_location_numeric() on each backend.

3.Pick the backend with the highest (or tied for highest) numeric position, and check how far behind it is in replay using replay_lag_mb(), but see below.

4.If the highest replica isn’t too far behind on replay, promote it.

5.If the highest replica is too far behind, drop to the next-highest and check replay lag.

easily done using pg_xlog_location_diff:

replica2=# SELECT pg_xlog_location_diff(pg_xlog_last_receive_location(), pg_xlog_last_replay_location());

pg_xlog_location_diff

—————————-

16192

replay lag in megabytes, regardless of PostgreSQL version.

bench=# select replay_lag_mb();

replay_lag_mb

—————

93.7

all_replayed() and replay_lag_mb() are designed to be run only on replicas. They will return NULL on standalone or master servers.

these functions will return the same results regardless of which database they’re installed in. However, they can only be called from the database in which they are installed. So you might want to install them in the “postgres” scratch database.

xlog_location_numeric() returns a numeric value which can exceed a 64-bit integer in size. So make sure your calling and sorting code is prepared to handle a larger-than-eight-byte number.

Time-Based Replication Monitoring In The Hot_Standby_Delay

November 26, 2016

By postgresdba

This was something that had been a long-standing item on my personal TODO list, and happened to scratch the itch of a couple of clients at the time.

Previously it would only take an integer representing how many bytes of WAL data the master could be ahead of a replica before the threshold is crossed:

check_hot_standby_delay –dbhost=master,replica1 –critical=16777594

This is certainly useful for, say, keeping an eye on whether you’re getting close to running over your wal_keep_segments value. Of course it can also be used to indicate whether the replica is still processing WAL, or has become stuck for some reason. But for the (arguably more common) problem of determining whether a replica is falling too far behind determining what byte thresholds to use, beyond simply guessing, isn’t easy to figure out.

Postgres 9.1 introduced a handy function to help solve this problem: pg_last_xact_replay_timestamp(). It measures a slightly different thing than the pg_last_xlog_* functions the action previously used. And it’s for that reason that the action now has a more complex format for its thresholds:

check_hot_standby_delay –dbhost=master,replica1 –critical=”16777594 and 5 min”

For backward compatibility, of course, it’ll still take an integer and work the same as it did before. Or alternatively if you only want to watch the chronological lag, you could even give it just a time interval, ‘5 min’, and the threshold only takes the transaction replay timestamp into account. But if you specify both, as above, then both conditions must be met before the threshold activates.

Monitoring Streaming Replication with script also in postgreSql

November 26, 2016

By postgresdba

One of the easiest ways to monitor slave lag when using streaming replication is to turn hot standby on your slave and use pg_last_xact_replay_timestamp() and/or the other recovery information functions. Here’s an example query to run on the slave systems to get the number of seconds behind it is:

SELECT extract(epoch from now() – pg_last_xact_replay_timestamp()) AS slave_lag

The issue with this query is that while your slave(s) may be 100% caught up, the time interval being returned is always increasing until new write activity occurs on the master that the slave can replay. This can cause your monitoring to give false positives that your slave is falling behind if you have things set up to ensure your slaves are no more than a few minutes behind. A side affect of this monitoring query can also give you an indication that writes to your master have stopped for some reason.

One of our clients has a smaller sized database that doesn’t get quite as much write traffic as our typical clients do. But it still has failover slaves and still needs to be monitored just like our other larger clients to ensure it doesn’t fall too far behind. So, my coworker introduced me to the pg_stat_replication view that was added in PostgreSQL 9.1. Querying this from the master returns information about streaming replication slaves connected to it.

postgres=# select * from pg_stat_replication;

-[ RECORD 1 ]—-+——————————

pid | 16649

usesysid | 16388

usename | replication

application_name | walreceiver

client_addr | xxx.xxx.xxx.xxx

client_hostname | db1-prod-ca

client_port | 58085

backend_start | 2013-10-29 19:57:51.48142+00

state | streaming

sent_location | 147/11000000

write_location | 147/11000000

flush_location | 147/11000000

replay_location | 147/11000000

sync_priority | 0

sync_state | async

-[ RECORD 2 ]—-+——————————

pid | 7999

usesysid | 16388

usename | replication

application_name | walreceiver

client_addr | yyy.yyy.yyy.yyy

client_hostname | db2-prod

client_port | 54932

backend_start | 2013-10-29 15:32:47.256794+00

state | streaming

sent_location | 147/11000000

write_location | 147/11000000

flush_location | 147/11000000

replay_location | 147/11000000

sync_priority | 0

sync_state | async

He also provided a handy query to get back a simple, easy to understand numeric value to indicate slave lag. The issue I ran into using the query is that this view uses pg_stat_activity as one of its sources. If you’re not a superuser, you’re not going to get any statistics on sessions that aren’t your own (and hopefully you’re not using a superuser role as the role for your monitoring solution). So, instead I made a function with SECURITY DEFINER set, made a superuser role the owner, and gave my monitoring role EXECUTE privileges on the function.

CREATE OR REPLACE FUNCTION streaming_slave_check() RETURNS TABLE (client_hostname text, client_addr inet, byte_lag float)

LANGUAGE SQL SECURITY DEFINER

AS $$

SELECT

client_hostname,

client_addr,

sent_offset – (replay_offset – (sent_xlog – replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag

FROM (

SELECT

client_hostname,

client_addr,

(‘x’ || lpad(split_part(sent_location::text, ‘/’, 1), 8, ‘0’))::bit(32)::bigint AS sent_xlog,

(‘x’ || lpad(split_part(replay_location::text, ‘/’, 1), 8, ‘0’))::bit(32)::bigint AS replay_xlog,

(‘x’ || lpad(split_part(sent_location::text, ‘/’, 2), 8, ‘0’))::bit(32)::bigint AS sent_offset,

(‘x’ || lpad(split_part(replay_location::text, ‘/’, 2), 8, ‘0’))::bit(32)::bigint AS replay_offset

FROM pg_stat_replication

) AS s;

$$;

Running this query gives back a few handy columns that should be good enough for most monitoring tools. You can easily add more columns from pg_stat_replication or any other tables you need to join against for more info.

postgres=# select * from streaming_slave_check();

client_hostname | client_addr | byte_lag

—————–+—————–+———-

db1-prod-ca | xxx.xxx.xxx.xxx | 160

db2-prod | yyy.yyy.yyy.yyy | 160

UPDATE: If you’re running PostgreSQL 9.2+, there is a new, built-in function that avoids needing the above function all together and can just query pg_stat_replication directly.

postgres=# SELECT client_hostname

, client_addr

, pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location) AS byte_lag

FROM pg_stat_replication;

client_hostname | client_addr | byte_lag

—————–+—————+———-

db1-prod-ca | xxx.xxx.xx.xx | 0

db2-prod | yy.yyy.yyy.yy | 0

Unfortunately, this function still requires superuser privileges to obtain all relevant data and most monitoring tools do not use a superuser role (I hope). So, in that case you do still need a SECURITY DEFINER function, but it can be a much much simpler one

CREATE OR REPLACE FUNCTION streaming_slave_check() RETURNS TABLE (client_hostname text, client_addr inet, byte_lag numeric)

LANGUAGE SQL SECURITY DEFINER

AS $$

SELECT client_hostname

, client_addr

, pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location) AS byte_lag

FROM pg_stat_replication;

$$;

This can also be useful to monitor slave lag when you don’t have hot standby turned on for your slaves to allow read-only queries.

Combining both of the replication monitoring solutions mentioned in this post should give you a much better overall picture of the status of your master/slave systems.

monitor replication delay -four solution in postgreSql

November 26, 2016

By postgresdba

Looking at the documentation and all the blog posts about how to monitor

replication delay I don’t think there is one good and most importantly safe

solution which works all the time.

Solution 1:

I used to check replication delay/lag by running the following query on the

slave:

SELECT EXTRACT(EPOCH FROM (now() – pg_last_xact_replay_timestamp()))::INT;

This query works great and it is a very good query to give you the lag in

seconds. The problem is if the master is not active, it doesn’t mean a

thing. So you need to first check if two servers are in sync and if they

are, return 0.

Solution 2:

This can be achieved by comparing pg_last_xlog_receive_location() and

pg_last_xlog_replay_location() on the slave, and if they are the same it

returns 0, otherwise it runs the above query again:

SELECT

CASE

WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0

ELSE EXTRACT (EPOCH FROM now() – pg_last_xact_replay_timestamp())::INTEGER

END

AS replication_lag;

This query is all good, but the problem is that it is not safe. If for some

reason the master stops sending transaction logs, this query will continue

to return 0 and you will think the replication is working, when it is not.

Solution 3:

Master:

SELECT pg_current_xlog_location();

Slave:

SELECT pg_last_xlog_receive_location();

and by comparing these two values you could see if the servers are in sync.

The problem yet again is that if streaming replication fails, both of these

functions will continue to return same values and you could still end up

thinking the replication is working. But also you need to query both the

master and slave to be able to monitor this, which is not that easy on

monitoring systems, and you still don’t have the information about the

actual lag in seconds, so you would still need to run the first query.

Solution 4:

You could query pg_stat_replication on the master, compare sent_location

and replay_location, and if they are the same, the replication is in sync.

One more good thing about pg_stat_replication is that if streaming

replication fails it will return an empty result, so you will know it

failed. But the biggest problem with this system view is that only the postgres

user can read it, so it’s not that monitoring friendly since you don’t want

to give your monitoring system super user privileges, and you still don’t

have the delay in seconds.

I think the best one would be 2 combined

with a check if the wal receiver process is running before running that

query with something like:

$ ps aux | egrep ‘wal\sreceiver’

postgres 3858 0.0 0.0 2100112 3312 ? Ss 19:35 0:01 postgres:

wal receiver process streaming 36/900A738

This solution would only be run on the slave and it is pretty easy to setup.

Check for PostgreSQL replication delay/lag -part3

November 26, 2016

By postgresdba

I think the best one would be 2 combined with a check if the wal receiver process is running before running that

query with something like:

$ ps aux | egrep ‘wal\sreceiver’

postgres 3858 0.0 0.0 2100112 3312 ? Ss 19:35 0:01 postgres:

wal receiver process streaming 36/900A738

This solution would only be run on the slave and it is pretty easy to setup.

Check For Postgresql Replication Delay/Lag

November 26, 2016

By postgresdba

This can be achieved by comparing pg_last_xlog_receive_location() and

pg_last_xlog_replay_location() on the slave, and if they are the same it

returns 0, otherwise it runs the above query again:

SELECT

CASE

WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0

ELSE EXTRACT (EPOCH FROM now() – pg_last_xact_replay_timestamp())::INTEGER

END

AS replication_lag;

This query is all good, but the problem is that it is not safe. If for some

reason the master stops sending transaction logs, this query will continue

to return 0 and you will think the replication is working, when it is not.

Check Replication Delay/Lag In Postgressql -Part1

November 26, 2016

By postgresdba

I used to check replication delay/lag by running the following query on the

slave:

SELECT EXTRACT(EPOCH FROM (now() – pg_last_xact_replay_timestamp()))::INT;

This query works great and it is a very good query to give you the lag in

seconds. The problem is if the master is not active, it doesn’t mean a

thing. So you need to first check if two servers are in sync and if they

are, return 0.

Monitoring Approach For Streaming Replication With Hot Standby In Postgresql 9.3

November 26, 2016

By postgresdba

The people using PostgreSQL and the Streaming Replication feature seem to ask many of the same questions:

1. How best to monitor Streaming Replication?

2. What is the best way to do that?

3. Are there alternatives, when monitoring on Standby, to using the pg_stat_replication view on Master?

4. How should I calculate replication lag-time, in seconds, minutes, etc.?

In light of these commonly asked questions, I thought a blog would help. The following are some methods I’ve found to be useful.

Monitoring is critical for large infrastructure deployments where you have Streaming Replication for:

1. Disaster recovery

2. Streaming Replication is for High Availability

3. Load balancing, when using Streaming Replication with Hot Standby

PostgreSQL has some building blocks for replication monitoring, and the following are some important

functions and views which can be use for monitoring the replication:

1. pg_stat_replication view on master/primary server.

This view helps in monitoring the standby on Master. It gives you the following details:

pid: Process id of walsender process

usesysid: OID of user which is used for Streaming replication.

usename: Name of user which is used for Streaming replication

application_name: Application name connected to master

client_addr: Address of standby/streaming replication

client_hostname: Hostname of standby.

client_port: TCP port number on which standby communicating with WAL sender

backend_start: Start time when SR connected to Master.

state: Current WAL sender state i.e streaming

sent_location: Last transaction location sent to standby.

write_location: Last transaction written on disk at standby

flush_location: Last transaction flush on disk at standby.

replay_location: Last transaction flush on disk at standby.

sync_priority: Priority of standby server being chosen as synchronous standby

sync_state: Sync State of standby (is it async or synchronous).

e.g.:

postgres=# select * from pg_stat_replication ;

-[ RECORD 1 ]—-+———————————

pid | 1114

usesysid | 16384

usename | repuser

application_name | walreceiver

client_addr | 172.17.0.3

client_hostname |

client_port | 52444

backend_start | 15-MAY-14 19:54:05.535695 -04:00

state | streaming

sent_location | 0/290044C0

write_location | 0/290044C0

flush_location | 0/290044C0

replay_location | 0/290044C0

sync_priority | 0

sync_state | async

2. pg_is_in_recovery() : Function which tells whether standby is still in recovery mode or not.

e.g.postgres=# select pg_is_in_recovery();

pg_is_in_recovery

——————-

(1 row)

3. pg_last_xlog_receive_location: Function which tells location of last transaction log which was streamed by Standby and also written on standby disk.

e.g.postgres=# select pg_last_xlog_receive_location();

pg_last_xlog_receive_location

——————————-

0/29004560

(1 row)

4. pg_last_xlog_replay_location: Function which tells last transaction replayed during recovery process. e.g is given below:

postgres=# select pg_last_xlog_replay_location();

pg_last_xlog_replay_location

——————————

0/29004560

(1 row)

5. pg_last_xact_replay_timestamp: This function tells about the time stamp of last transaction which was replayed during recovery. Below is an example:

postgres=# select pg_last_xact_replay_timestamp();

pg_last_xact_replay_timestamp

———————————-

15-MAY-14 20:54:27.635591 -04:00

(1 row)

Above are some important functions/views, which are already available in PostgreSQL for monitoring the streaming replication.

So, the logical next question is, “What’s the right way to monitor the Hot Standby with Streaming Replication on Standby Server?”

If you have Hot Standby with Streaming Replication, the following are the points you should monitor:

1. Check if your Hot Standby is in recovery mode or not:

For this you can use pg_is_in_recovery() function.

2.Check whether Streaming Replication is working or not.

And easy way of doing this is checking the pg_stat_replication view on Master/Primary. This view gives information only on master if Streaming Replication is working.

3. Check If Streaming Replication is not working and Hot standby is recovering from archived WAL file.

For this, either the DBA can use the PostgreSQL Log file to monitor it or utilize the following functions provided in PostgreSQL 9.3:

pg_last_xlog_replay_location();

pg_last_xact_replay_timestamp();

4. Check how far off is the Standby from Master.

There are two ways to monitor lag for Standby.

i. Lags in Bytes: For calculating lags in bytes, users can use the pg_stat_replication view on the master with the function pg_xlog_location_diff function. Below is an example:

pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location)

which gives the lag in bytes.

ii. Calculating lags in Seconds. The following is SQL, which most people uses to find the lag in seconds:

SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()

THEN 0

ELSE EXTRACT (EPOCH FROM now() – pg_last_xact_replay_timestamp())

END AS log_delay;

Including the above into your repertoire can give you good monitoring for PostgreSQL.

I will in a future post include the script that can be used for monitoring the Hot Standby with PostgreSQL streaming replication

find out table size,database size,tablespace size,

November 25, 2016

By postgresdba

Database Object Size Functions:-
The functions shows to calculate the disk space usage of database objects.
pg_column_size(any):- Number of bytes used to store a particular value (possibly compressed)
pg_database_size(oid):- Disk space used by the database with the specified OID
pg_database_size(name):- Disk space used by the database with the specified name
pg_indexes_size(regclass):- Total disk space used by indexes attached to the specified table
pg_relation_size(relation regclass, fork text):- Disk space used by the specified fork (‘main’, ‘fsm’, ‘vm’, or ‘init’) of the specified table or index
pg_relation_size(relation regclass):- Shorthand for pg_relation_size(…, ‘main’)
pg_size_pretty(bigint):- Converts a size in bytes expressed as a 64-bit integer into a human-readable format with size units
pg_size_pretty(numeric):- Converts a size in bytes expressed as a numeric value into a human-readable format with size units
pg_table_size(regclass):- Disk space used by the specified table, excluding indexes (but including TOAST, free space map, and visibility map)
pg_tablespace_size(oid):- Disk space used by the tablespace with the specified OID
pg_tablespace_size(name):- Disk space used by the tablespace with the specified name
pg_total_relation_size(regclass):- Total disk space used by the specified table, including all indexes and TOAST data

The storage space (in bytes) for one specific table:
ads=# select pg_relation_size(‘size_test’);
pg_relation_size
——————
5668864
(1 row)

Same query, but with the result in a human-readable format:
ads=# select pg_size_pretty(pg_relation_size(‘size_test’));
pg_size_pretty
—————-
5536 kB(1 row)

Remember that this is only the size of the table, not included an index or additional stuff.
The size including any index can be found out with:
ads=# select pg_size_pretty(pg_total_relation_size(‘size_test’));
pg_size_pretty
—————-
7656 kB
(1 row)

The size of a complete database:
ads=# select pg_size_pretty(pg_database_size(‘ads’));
pg_size_pretty
—————-
11 MB
(1 row)

You can even find out, how much space a specific value needs:
ads=# select pg_column_size(5::smallint);
pg_column_size
—————-
2
(1 row)

ads=# select pg_column_size(5::int);
pg_column_size
—————-
4
(1 row)

ads=# select pg_column_size(5::bigint);
pg_column_size
—————-
8
(1 row)

ads=# select pg_column_size(‘This is a string’::varchar);
pg_column_size
—————-
20
(1 row)

ads=# select length(‘This is a string’::varchar);
length
——–
16
(1 row)

Viewing the path of a relation
SELECT pg_relation_filepath(‘tablename’);

size of a postgresql tablespace
postgres=# SELECT pg_size_pretty(pg_tablespace_size(‘my_tablespace’));
pg_size_pretty
—————-
118 MB
(1 row)