Database block lifecycle

Discussion:

pinker

2014-08-12 21:41:19 UTC

Yesterday I had an interesting discussion with my colleague about shared
buffers size for our new server. This machine (is dedicated for db) has got
512GB of RAM and database size is about 80GB, so he assumes that db will
never have to read from disk, so there is no point to adjust read ahead
setting, because every block gonna be read from RAM. As I've red in Greg
Smith book, once a block is changed it will be written to a disk and buffers
page is marked as clean, which would mean than changes occur in the same
page as before? What if dirty page doesn't have enough space for another row
and row has to be written to another page? Is it still occurs in RAM? If
that's true all updates of FSM occurs in RAM as well?

What about buffers_clean and pg_clog then? Are those maintained completely
in RAM as well without direct read from disk at all?

To be precise, does the path to update and read updated row looks like a or
b?:
a). clean page (shared buffers) -> dirty page (shared buffers) -> to disk ->
read from disk -> shared buffers -> query
b). clean page (shared buffers) -> dirty page (shared buffers) -> to disk
& dirty page (shared buffers) -> clean page (shared buffers) -> query

btw. 512MB if we assume up to 600 connection is a reasonable value?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Database-block-lifecycle-tp5814627.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

2014-08-12 22:09:35 UTC

Permalink

Post by pinker
btw. 512MB if we assume up to 600 connection is a reasonable value?

thats an insanely high connection count, if you actually expect those
connections to be executing concurrent queries, unless you have
something north of 100 CPU cores.

you'd be much better to have a MUCH smaller connection count, and use a
connection pooler such as pgbouncer, in transaction mode... let 600
client htreads connect to the pooler, but have the pooler share maybe 4X
your CPU core/thread count of actual connections for transactions in
progress.
--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pinker

2014-08-12 22:29:36 UTC

Permalink

yes, I know the count is quite high. It is the max value we've
estimated, but probably on average day it will be 100-200, and yes we
use pgpool.

Post by John R Pierce

Post by pinker
btw. 512MB if we assume up to 600 connection is a reasonable value?

thats an insanely high connection count, if you actually expect those
connections to be executing concurrent queries, unless you have
something north of 100 CPU cores.
you'd be much better to have a MUCH smaller connection count, and use
a connection pooler such as pgbouncer, in transaction mode... let 600
client htreads connect to the pooler, but have the pooler share maybe
4X your CPU core/thread count of actual connections for transactions
in progress.

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

2014-08-12 22:39:35 UTC

Permalink

Post by pinker
yes, I know the count is quite high. It is the max value we've
estimated, but probably on average day it will be 100-200, and yes we
use pgpool.

if you're using a pooler, then why would you be using 200 concurrent
connections, unless you have a 50 or 100 CPU cores/threads ?

if you have 1000 transactions to execute on a 32 core server, and you
try and do 200 at once, it will take longer than if you do 64 at a time
and let the rest queue up.
--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pinker

2014-08-12 22:52:05 UTC

Permalink

Ok, I wasn't precisely enough, you are right. It's brand new server,
nothing is yet configured and we have not even os installed. The number
was the overall count we expect for a whole cluster.

But the main question is: is it possible to completely avoid disk read
if there is huge amount of RAM available?

Post by John R Pierce

Post by pinker
yes, I know the count is quite high. It is the max value we've
estimated, but probably on average day it will be 100-200, and yes we
use pgpool.

if you're using a pooler, then why would you be using 200 concurrent
connections, unless you have a 50 or 100 CPU cores/threads ?
if you have 1000 transactions to execute on a 32 core server, and you
try and do 200 at once, it will take longer than if you do 64 at a
time and let the rest queue up.

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

2014-08-12 22:56:18 UTC

Permalink

Post by pinker
Ok, I wasn't precisely enough, you are right. It's brand new server,
nothing is yet configured and we have not even os installed. The
number was the overall count we expect for a whole cluster.
But the main question is: is it possible to completely avoid disk read
if there is huge amount of RAM available?

the OS file cache will ensure that.
--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tatsuo Ishii

2014-08-13 00:33:00 UTC

Permalink

Post by pinker
Yesterday I had an interesting discussion with my colleague about shared
buffers size for our new server. This machine (is dedicated for db) has got
512GB of RAM and database size is about 80GB, so he assumes that db will
never have to read from disk, so there is no point to adjust read ahead
setting, because every block gonna be read from RAM. As I've red in Greg
Smith book, once a block is changed it will be written to a disk and buffers
page is marked as clean, which would mean than changes occur in the same
page as before? What if dirty page doesn't have enough space for another row
and row has to be written to another page? Is it still occurs in RAM? If
that's true all updates of FSM occurs in RAM as well?
What about buffers_clean and pg_clog then? Are those maintained completely
in RAM as well without direct read from disk at all?

As long as they are on shared buffers, they are read from the buffer,
not from disk.

Post by pinker
To be precise, does the path to update and read updated row looks like a or
a). clean page (shared buffers) -> dirty page (shared buffers) -> to disk ->
read from disk -> shared buffers -> query
b). clean page (shared buffers) -> dirty page (shared buffers) -> to disk
& dirty page (shared buffers) -> clean page (shared buffers) -> query

I'm not sure what you exactly mean by a) or b) but both look incorrect
for me. A reader can read a page from shared bufferes even if it's
dirty. So:

clean page (shared buffers) -> dirty page (shared buffers) -> query

will be closer to the reality. Note that dirty page will be written by
bgwriter process at different timing.

Also note that I completely ignore lock or
buffer replacement algorithm. Please read
src/backend/storage/buffer/README for more precise information.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jeff Janes

2014-08-13 00:37:36 UTC

Permalink

Do you ever plan on restarting this server? Doing maintenance? Applying
security patches?

Post by pinker
so there is no point to adjust read ahead
setting, because every block gonna be read from RAM. As I've red in Greg
Smith book, once a block is changed it will be written to a disk and buffers
page is marked as clean, which would mean than changes occur in the same
page as before? What if dirty page doesn't have enough space for another row
and row has to be written to another page? Is it still occurs in RAM? If
that's true all updates of FSM occurs in RAM as well?

None of that still should need to read from disk regularly once the
database is warmed up.

Post by pinker
What about buffers_clean and pg_clog then? Are those maintained completely
in RAM as well without direct read from disk at all?
To be precise, does the path to update and read updated row looks like a or
a). clean page (shared buffers) -> dirty page (shared buffers) -> to disk ->
read from disk -> shared buffers -> query
b). clean page (shared buffers) -> dirty page (shared buffers) -> to disk
& dirty page (shared buffers) -> clean page (shared buffers) -> query

More like b), but you are missing all the states that involve "clean in
shared_buffers, dirty in FS cache" and such.

Post by pinker
btw. 512MB if we assume up to 600 connection is a reasonable value?

Reasonable value for what?

Cheers,

Jeff

pinker

2014-08-13 08:07:09 UTC

Permalink

On Tuesday, August 12, 2014, pinker <

Post by Jeff Janes
Do you ever plan on restarting this server? Doing maintenance? Applying
security patches?

Sure, I assumed when db is up and running, of course after first read from
disk when whole data should be in RAM.

Post by Jeff Janes
More like b), but you are missing all the states that involve "clean in
shared_buffers, dirty in FS cache" and such.

Ok, so modified block is taken from shared_buffers or from RAM when
needed, and is readed always from shared buffers?

Post by Jeff Janes
btw. 512MB if we assume up to 600 connection is a reasonable value?
Reasonable value for what?

For normal server load.
Cheers,
Jeff

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Database-block-lifecycle-tp5814627p5814672.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jeff Janes

2014-08-13 16:23:17 UTC

Permalink

Post by pinker
btw. 512MB if we assume up to 600 connection is a reasonable value?
Reasonable value for what?

For normal server load.

512MB is being questioned as a reasonable value for what? shared_buffers?
work_mem? maintenance_work_mem?

Cheers,

Jeff

Robin

2014-08-13 18:19:41 UTC

Permalink

Post by Jeff Janes

Post by pinker
btw. 512MB if we assume up to 600 connection is a reasonable value?
Reasonable value for what?

For normal server load.

512MB is being questioned as a reasonable value for what?
shared_buffers? work_mem? maintenance_work_mem?
Cheers,
Jeff

Generally speaking, folk imagine that DBMS performance is all about disk
access - in reality chucking as much memory as possible at the server(s)
is an optimal investment. analyse your queries and store time critical
stuff in memory

R+C