PG index architecture

Discussion:

Andy Colson

2014-07-15 20:26:52 UTC

I was thinking about indexes, and am kinda curious about sequential access.

I know nothing of PG guts, so this might even be a dumb question.

As I understand indexes, they are a key value pair, that contain a value
and a position. You lookup the value then use the position to seek into
the database to load the record.

Do we, or could we, load all the the matching index records, then sort
them by position? (maybe not all, maybe large batches)

When loading from the database, if access was slightly more sequential
(vs very random), would it increase performance?

Said another way:

I think of table scanning as sequential, and fast. That would be
loading db record 1,2,3, etc.

Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4

-Andy

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Igor Neyman

2014-07-15 20:43:08 UTC

Permalink

-----Original Message-----
Sent: Tuesday, July 15, 2014 4:27 PM
To: pgsql
Subject: [GENERAL] PG index architecture
I was thinking about indexes, and am kinda curious about sequential access.
I know nothing of PG guts, so this might even be a dumb question.
As I understand indexes, they are a key value pair, that contain a value and a
position. You lookup the value then use the position to seek into the
database to load the record.
Do we, or could we, load all the the matching index records, then sort them
by position? (maybe not all, maybe large batches)
When loading from the database, if access was slightly more sequential (vs
very random), would it increase performance?
I think of table scanning as sequential, and fast. That would be loading db
record 1,2,3, etc.
Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4
-Andy

It is called CLUSTER:

http://www.postgresql.org/docs/9.2/static/sql-cluster.html

Regards,
Igor Neyman

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom Lane

2014-07-15 20:51:54 UTC

Permalink

Post by Andy Colson
As I understand indexes, they are a key value pair, that contain a value
and a position. You lookup the value then use the position to seek into
the database to load the record.
Do we, or could we, load all the the matching index records, then sort
them by position? (maybe not all, maybe large batches)

This is more or less what a "bitmap index scan" does.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

2014-07-15 20:54:17 UTC

Permalink

Post by Andy Colson
As I understand indexes, they are a key value pair, that contain a
value and a position. You lookup the value then use the position to
seek into the database to load the record.

indexes are stored as a B-tree. each terminal node has a block number
for the target record.

Post by Andy Colson
Do we, or could we, load all the the matching index records, then sort
them by position? (maybe not all, maybe large batches)

b-trees are inherently ordered. data records, however are not.

Post by Andy Colson
When loading from the database, if access was slightly more sequential
(vs very random), would it increase performance?
I think of table scanning as sequential, and fast. That would be
loading db record 1,2,3, etc.

database tables are unordered sets, there is no record 1,2,3.

Post by Andy Colson
Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4

its unclear to me what you mean here.
--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Andy Colson

2014-07-15 21:54:00 UTC

Permalink

Post by John R Pierce

Post by Andy Colson
As I understand indexes, they are a key value pair, that contain a
value and a position. You lookup the value then use the position to
seek into the database to load the record.

indexes are stored as a B-tree. each terminal node has a block number
for the target record.

Post by Andy Colson
Do we, or could we, load all the the matching index records, then sort
them by position? (maybe not all, maybe large batches)

b-trees are inherently ordered. data records, however are not.

database tables are unordered sets, there is no record 1,2,3.

Post by Andy Colson
Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4

its unclear to me what you mean here.

Ah, yea, sorry, I don't really mean record #1, I mean hard drive seek to
address 1. (where address came from the index lookup)

Know what I mean?

-Andy

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

2014-07-15 22:02:54 UTC

Permalink

Post by Andy Colson
Ah, yea, sorry, I don't really mean record #1, I mean hard drive seek
to address 1. (where address came from the index lookup)

of course, its a file+offset, so there's at least 1-2 more levels of
indirection before you get to physical addresses on the hard disk.
--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general