感谢大神的分享,如下, http://blog.itpub.net/6906/ https://www.cnblogs.com/xueqiuqiu/articles/10994404.html 通过上面的分析,我们可以思考一下,一个root page能够存多少个item我们知道B+树的基本数据结构,root page 就是B+树的顶层。在我们的实验中,一个索引元组的大小为16,每个索引元组都有一个行指针,4字节,页大小为8192,页头24字节,special空间为16,那么理论上可以存(8192-24-16)/(16+4)=407个元组来验证一下,
postgres=# drop table if exists t_index; DROP TABLE postgres=# vacuum full; VACUUM postgres=# create table t_index (id int,c1 char(8),c2 varchar(16)); CREATE TABLE postgres=# alter table t_index add constraint pk_t_index primary key(id); ALTER TABLE postgres=# do $$ postgres$# begin postgres$# for i in 1..407 loop postgres$# insert into t_index (id, c1, c2) values (i, '#'||i||'#', '#'||i||'#'); postgres$# end loop; postgres$# end $$; DO postgres=#
我们插入407个元组查看一下索引表,如下
postgres=# select * from bt_metap('pk_t_index');
magic | version | root | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples
--------+---------+------+-------+----------+-----------+-------------+-------------------------
340322 | 4 | 1 | 0 | 1 | 0 | 0 | -1
(1 row)
postgres=# select * from bt_page_stats('pk_t_index',1);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
1 | l | 407 | 0 | 16 | 8192 | 8 | 0 | 0 | 0 | 3
(1 row)
postgres=# select * from page_header(get_raw_page('pk_t_index',1));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/5E73F38 | 0 | 0 | 1652 | 1664 | 8176 | 8192 | 4 | 0
(1 row)
这个时候还没有分层,这里的索引有序性是通过行指针来实现的。现在插入一条新数据
postgres=# insert into t_index values(408, '$$408', '##408##'); INSERT 0 1 postgres=#
查看索引表
postgres=# select * from bt_metap('pk_t_index');
magic | version | root | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples
--------+---------+------+-------+----------+-----------+-------------+-------------------------
340322 | 4 | 3 | 1 | 3 | 1 | 0 | -1
(1 row)
postgres=# select * from bt_page_stats('pk_t_index',1);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
1 | l | 367 | 0 | 16 | 8192 | 808 | 0 | 2 | 0 | 1
(1 row)
postgres=# select * from bt_page_stats('pk_t_index',2);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
2 | l | 42 | 0 | 16 | 8192 | 7308 | 1 | 0 | 0 | 1
(1 row)
postgres=# select * from bt_page_stats('pk_t_index',3);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
3 | r | 2 | 0 | 12 | 8192 | 8116 | 0 | 0 | 1 | 2
(1 row)
postgres=#
可以看到分层了,除root外,还有一层(level = 1),page 1 为左子树(btpo_next = 2),page 2 为右子树(btpo_prev=1),其中page 1 有367条记录,page 2有42条记录,总共409条元组,符合B+树所有的索引都保存在叶子节点的定义,这里比408条多一条,下面会解释多的一条是什么 同样的查看一下metapage如下
postgres=# select * from page_header(get_raw_page('pk_t_index',0));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/5E8CEF8 | 0 | 0 | 64 | 8176 | 8176 | 8192 | 4 | 0
(1 row)
postgres=#
可以发现,空间使用情况和分层之前没有区别,当然其中数据是肯定发生了变化,这里就不赘述了。接下去,看一下root page
postgres=# select * from bt_page_items('pk_t_index',3);
itemoffset | ctid | itemlen | nulls | vars | data
------------+-------+---------+-------+------+-------------------------
1 | (1,0) | 8 | f | f |
2 | (2,1) | 16 | f | f | 6f 01 00 00 00 00 00 00
(2 rows)
postgres=# select * from bt_page_stats('pk_t_index',3);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
3 | r | 2 | 0 | 12 | 8192 | 8116 | 0 | 0 | 1 | 2
(1 row)
postgres=#
这里root节点有两个item,其中一个指向最左边的叶节点,不需要保存最小值,对于这一点,另一个指向右叶节点,需要保存右节点的最小值,又称为保存右链路,这里的值为367对于这一点的理解,是这样的,我们思考一下在本例中查找一个索引id,因为本例只有两个叶节点,那么只需要判断索引id是大于367,还是小于367,如果是小于367,那么找左叶节点,如果大于367,找右叶节点,如果=367,根据citd知道,2号page的1号元组中保存的ctid,也就是(2,53),如下
postgres=# select * from bt_page_items('pk_t_index',2) limit 10;
itemoffset | ctid | itemlen | nulls | vars | data
------------+--------+---------+-------+------+-------------------------
1 | (2,53) | 16 | f | f | 6f 01 00 00 00 00 00 00
2 | (2,54) | 16 | f | f | 70 01 00 00 00 00 00 00
3 | (2,55) | 16 | f | f | 71 01 00 00 00 00 00 00
4 | (2,56) | 16 | f | f | 72 01 00 00 00 00 00 00
5 | (2,57) | 16 | f | f | 73 01 00 00 00 00 00 00
6 | (2,58) | 16 | f | f | 74 01 00 00 00 00 00 00
7 | (2,59) | 16 | f | f | 75 01 00 00 00 00 00 00
8 | (2,60) | 16 | f | f | 76 01 00 00 00 00 00 00
9 | (2,61) | 16 | f | f | 77 01 00 00 00 00 00 00
10 | (2,62) | 16 | f | f | 78 01 00 00 00 00 00 00
(10 rows)
然后在找堆表的2号page的53号元组,如下
postgres=# select * from heap_page_items(get_raw_page('t_index',2)) where t_ctid = '(2,53)';
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid |
t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+------------------
------------------------
53 | 5648 | 1 | 43 | 902 | 0 | 366 | (2,53) | 3 | 2306 | 24 | | | \x6f0100001323333
637232020200d2333363723
(1 row)
postgres=#
所以不用保存最小值。这里不去对root page 的special空间进行查看,两个关键字段btpo = 1 说明还没有到最底层(最底层btpo=0, 这种页里面存储的ctid才代表指向heap page的地址)btpo_flags=2 说明这个页是root page 再来看一下,索引page 1
postgres=# select * from bt_page_stats('pk_t_index',1);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
1 | l | 367 | 0 | 16 | 8192 | 808 | 0 | 2 | 0 | 1
(1 row)
postgres=# select * from bt_page_items('pk_t_index',1) limit 10;
itemoffset | ctid | itemlen | nulls | vars | data
------------+-------+---------+-------+------+-------------------------
1 | (2,1) | 16 | f | f | 6f 01 00 00 00 00 00 00
2 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00
3 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00
4 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00
5 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00
6 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
7 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00
8 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00
9 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00
10 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00
(10 rows)
postgres=#
postgres=# select count(*) from bt_page_items('pk_t_index',1);
count
-------
367
(1 row)
postgres=#
这里可以看到,page 1 有右兄弟节点,所以它的第一条item指向右兄弟节点的第一个元组,这符合B+树的设计,叶节点和banch节点是一个双向链表,其实我们进一步确定,如下
postgres=# select * from bt_page_items('pk_t_index',1) where ctid='(2,1)';
itemoffset | ctid | itemlen | nulls | vars | data
------------+-------+---------+-------+------+-------------------------
1 | (2,1) | 16 | f | f | 6f 01 00 00 00 00 00 00
316 | (2,1) | 16 | f | f | 3b 01 00 00 00 00 00 00
(2 rows)
postgres=#
可以看到,page 1 的ctid为(2,1)的记录有两条,进一步检查
postgres=# select * from heap_page_items(get_raw_page('t_index',2)) where t_ctid = '(2,1)';
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid |
t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+------------------
------------------------
1 | 8144 | 1 | 43 | 902 | 0 | 314 | (2,1) | 3 | 2306 | 24 | | | \x3b0100001323333
135232020200d2333313523
(1 row)
postgres=# select * from bt_page_items('pk_t_index',2) limit 5;
itemoffset | ctid | itemlen | nulls | vars | data
------------+--------+---------+-------+------+-------------------------
1 | (2,53) | 16 | f | f | 6f 01 00 00 00 00 00 00
2 | (2,54) | 16 | f | f | 70 01 00 00 00 00 00 00
3 | (2,55) | 16 | f | f | 71 01 00 00 00 00 00 00
4 | (2,56) | 16 | f | f | 72 01 00 00 00 00 00 00
5 | (2,57) | 16 | f | f | 73 01 00 00 00 00 00 00
(5 rows)
postgres=#
可以看到,的确page 1的第一个items指向的是page 2的第一个元组,这么做的意义是,假设要查找 id > 100的元组,那么从root开始,走左子树,找到data等于100的bt元组的,行指针,该行指针之后所有的元组都是大于100的(因为行指针是排好序的),那么对于右兄弟节点上的所有数据,需要找到一个入口,这个入口就是1号item对应的地址。再看一下,page 2
postgres=# select * from bt_page_stats('pk_t_index',2);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
2 | l | 42 | 0 | 16 | 8192 | 7308 | 1 | 0 | 0 | 1
(1 row)
postgres=# select * from bt_page_items('pk_t_index',2) limit 10;
itemoffset | ctid | itemlen | nulls | vars | data
------------+--------+---------+-------+------+-------------------------
1 | (2,53) | 16 | f | f | 6f 01 00 00 00 00 00 00
2 | (2,54) | 16 | f | f | 70 01 00 00 00 00 00 00
3 | (2,55) | 16 | f | f | 71 01 00 00 00 00 00 00
4 | (2,56) | 16 | f | f | 72 01 00 00 00 00 00 00
5 | (2,57) | 16 | f | f | 73 01 00 00 00 00 00 00
6 | (2,58) | 16 | f | f | 74 01 00 00 00 00 00 00
7 | (2,59) | 16 | f | f | 75 01 00 00 00 00 00 00
8 | (2,60) | 16 | f | f | 76 01 00 00 00 00 00 00
9 | (2,61) | 16 | f | f | 77 01 00 00 00 00 00 00
10 | (2,62) | 16 | f | f | 78 01 00 00 00 00 00 00
(10 rows)
可以看到,page 2的所有item均为索引元组,其与左兄弟节点的关系,通过btpo_prev体现。可以想象,如果,有3个叶节点,中间的节点,页一定会有一个指向右兄弟节点的item实际上上述的博客中,都提到有,这里不赘述了,可以参考上述博客
