Discussion:
PG index architecture
Andy Colson
2014-07-15 20:26:52 UTC
Permalink
I was thinking about indexes, and am kinda curious about sequential access.

I know nothing of PG guts, so this might even be a dumb question.

As I understand indexes, they are a key value pair, that contain a value
and a position. You lookup the value then use the position to seek into
the database to load the record.

Do we, or could we, load all the the matching index records, then sort
them by position? (maybe not all, maybe large batches)

When loading from the database, if access was slightly more sequential
(vs very random), would it increase performance?

Said another way:

I think of table scanning as sequential, and fast. That would be
loading db record 1,2,3, etc.

Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4

-Andy
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Igor Neyman
2014-07-15 20:43:08 UTC
Permalink
-----Original Message-----
Sent: Tuesday, July 15, 2014 4:27 PM
To: pgsql
Subject: [GENERAL] PG index architecture
I was thinking about indexes, and am kinda curious about sequential access.
I know nothing of PG guts, so this might even be a dumb question.
As I understand indexes, they are a key value pair, that contain a value and a
position. You lookup the value then use the position to seek into the
database to load the record.
Do we, or could we, load all the the matching index records, then sort them
by position? (maybe not all, maybe large batches)
When loading from the database, if access was slightly more sequential (vs
very random), would it increase performance?
I think of table scanning as sequential, and fast. That would be loading db
record 1,2,3, etc.
Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4
-Andy
It is called CLUSTER:

http://www.postgresql.org/docs/9.2/static/sql-cluster.html

Regards,
Igor Neyman
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Tom Lane
2014-07-15 20:51:54 UTC
Permalink
Post by Andy Colson
As I understand indexes, they are a key value pair, that contain a value
and a position. You lookup the value then use the position to seek into
the database to load the record.
Do we, or could we, load all the the matching index records, then sort
them by position? (maybe not all, maybe large batches)
This is more or less what a "bitmap index scan" does.

regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
John R Pierce
2014-07-15 20:54:17 UTC
Permalink
Post by Andy Colson
As I understand indexes, they are a key value pair, that contain a
value and a position. You lookup the value then use the position to
seek into the database to load the record.
indexes are stored as a B-tree. each terminal node has a block number
for the target record.
Post by Andy Colson
Do we, or could we, load all the the matching index records, then sort
them by position? (maybe not all, maybe large batches)
b-trees are inherently ordered. data records, however are not.
Post by Andy Colson
When loading from the database, if access was slightly more sequential
(vs very random), would it increase performance?
I think of table scanning as sequential, and fast. That would be
loading db record 1,2,3, etc.
database tables are unordered sets, there is no record 1,2,3.
Post by Andy Colson
Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4
its unclear to me what you mean here.
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andy Colson
2014-07-15 21:54:00 UTC
Permalink
Post by John R Pierce
Post by Andy Colson
As I understand indexes, they are a key value pair, that contain a
value and a position. You lookup the value then use the position to
seek into the database to load the record.
indexes are stored as a B-tree. each terminal node has a block number
for the target record.
Post by Andy Colson
Do we, or could we, load all the the matching index records, then sort
them by position? (maybe not all, maybe large batches)
b-trees are inherently ordered. data records, however are not.
Post by Andy Colson
When loading from the database, if access was slightly more sequential
(vs very random), would it increase performance?
I think of table scanning as sequential, and fast. That would be
loading db record 1,2,3, etc.
database tables are unordered sets, there is no record 1,2,3.
Post by Andy Colson
Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4
its unclear to me what you mean here.
Ah, yea, sorry, I don't really mean record #1, I mean hard drive seek to
address 1. (where address came from the index lookup)

Know what I mean?

-Andy
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
John R Pierce
2014-07-15 22:02:54 UTC
Permalink
Post by Andy Colson
Ah, yea, sorry, I don't really mean record #1, I mean hard drive seek
to address 1. (where address came from the index lookup)
of course, its a file+offset, so there's at least 1-2 more levels of
indirection before you get to physical addresses on the hard disk.
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Loading...