PostgreSQL执行计划变化,当PostgreSQL使用绑定变量时,由于优化器无法预估表达式返回的行数,就默认选择统计信息收集行数的5%,评估的行数会远远低于实际执行时返回的行数。在表关联时,这种评估会对关联方式产生影响,驱动表返回行数少,优化器更倾向于使用nested loop关联方式,执行效率会下降。下面是测试案例:
创建测试关联表。
create table tym_p1 (
id varchar(32) not null default sys_guid()
, tym_no varchar(32) not null
, tym_date timestamp(0) not null
, primary key (id)
)
;
create index tym_p1_no on tym_p1 (tym_no);
创建测试关联子表。
create table tym_c1 (
id varchar(32) not null default sys_guid()
, tym_no varchar(32) not null
, tym_date timestamp(0) not null
, tym_user varchar(32) not null
, primary key (id)
)
;
create index tym_c1_no on tym_c1 (tym_no);
生成测试数据
for i in `seq 100000`; do psql -d phroc -c "insert into tym_p1 (tym_no,tym_date) values(to_char($i,'999999'),now())";done
insert into tym_c1 select a.*, tym_no from tym_p1 a;
准备语句
prepare tym_prep04 (
varchar(50),varchar(50)
) as
select * from tym_p1 p where
p.tym_date
BETWEEN to_timestamp($1, 'yyyy-MM-dd hh24:mi:ss')
AND to_timestamp($2, 'yyyy-MM-dd hh24:mi:ss')
;
explain analyze execute tym_prep04(
'2022-03-02 00:00:00','2022-03-09 23:59:59');
执行结果,第一次至第五次的执行计划,从执行计划预估行数是返回 99980 行,实际执行时 100000 行。
explain analyze execute tym_prep04(
'2022-03-02 00:00:00','2022-03-09 23:59:59');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on tym_p1 p (cost=0.00..3031.00 rows= 99980 width=49) (actual time=0.011..148.967 rows=100000 loops=1)
Filter: ((tym_date >= to_timestamp('2022-03-02 00:00:00'::text, 'yyyy-MM-dd hh24:mi:ss'::text)) AND (tym_date <= to_timestamp('2022-03-09 23:59:59'::text,
Planning Time: 0.119 ms
Execution Time: 152.085 ms
(4 rows)
执行结果,第六次的执行计划,,从执行计划预估行数是返回 500 行,实际执行时 100000 行。
explain analyze execute tym_prep04(
'2022-03-02 00:00:00','2022-03-09 23:59:59');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on tym_p1 p (cost=0.00..3031.00 rows= 500 width=49) (actual time=0.041..151.032 rows=100000 loops=1)
Filter: ((tym_date >= to_timestamp(($1)::text, 'yyyy-MM-dd hh24:mi:ss'::text)) AND (tym_date <= to_timestamp(($2)::text, 'yyyy-MM-dd hh24:mi:ss'::text)))
Planning Time: 0.122 ms
Execution Time: 154.185 ms
(4 rows)
如果是几张表关联时,这种评估会对连接方式产生影响。
prepare tym_prep05 (
varchar(50),varchar(50)
) as
select * from tym_p1 p, tym_c1 c where
p.tym_no = c.tym_no and
p.tym_date
BETWEEN to_timestamp($1, 'yyyy-MM-dd hh24:mi:ss')
AND to_timestamp($2, 'yyyy-MM-dd hh24:mi:ss')
;
explain analyze execute tym_prep05(
'2022-03-02 00:00:00','2022-03-09 23:59:59');
第一次到第五次执行,由于预估返回表tym_p1的行数是99980,优先采用了hash join的方式。
explain analyze execute tym_prep05(
'2022-03-02 00:00:00','2022-03-09 23:59:59');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3387.00..7792.72 rows=99980 width=106) (actual time=35.425..229.255 rows=100000 loops=1)
Hash Cond: ((p.tym_no)::text = (c.tym_no)::text)
-> Seq Scan on tym_p1 p (cost=0.00..3031.00 rows= 99980 width=49) (actual time=0.016..152.250 rows=100000 loops=1)
Filter: ((tym_date >= to_timestamp('2022-03-02 00:00:00'::text, 'yyyy-MM-dd hh24:mi:ss'::text)) AND (tym_date <= to_timestamp('2022-03-09 23:59:59':
-> Hash (cost=2137.00..2137.00 rows=100000 width=57) (actual time=34.740..34.741 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 10399kB
-> Seq Scan on tym_c1 c (cost=0.00..2137.00 rows=100000 width=57) (actual time=0.006..12.159 rows=100000 loops=1)
Planning Time: 0.434 ms
Execution Time: 234.067 ms
(9 rows)
Time: 234.852 ms
第六次执行,由于预估返回表tym_p1的行数是294,优先采用了nested loop的方式。
explain analyze execute tym_prep05(
'2022-03-02 00:00:00','2022-03-09 23:59:59');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..4474.63 rows=500 width=106) (actual time=0.509..538.023 rows=100000 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (cost=0.42..3424.63 rows=294 width=106) (actual time=0.086..496.033 rows=50000 loops=2)
-> Parallel Seq Scan on tym_p1 p (cost=0.00..2207.47 rows= 294 width=49) (actual time=0.035..85.307 rows=50000 loops=2)
Filter: ((tym_date >= to_timestamp(($1)::text, 'yyyy-MM-dd hh24:mi:ss'::text)) AND (tym_date <= to_timestamp(($2)::text, 'yyyy-MM-dd hh24:mi:s
-> Index Scan using tym_c1_no on tym_c1 c (cost=0.42..4.13 rows=1 width=57) (actual time=0.008..0.008 rows=1 loops=100000)
Index Cond: ((tym_no)::text = (p.tym_no)::text)
Planning Time: 0.380 ms
Execution Time: 543.929 ms
(10 rows)
解决方案:
不使用绑定变量,优化器可以根据统计信息的直方图评估较为接近的行数,生成相对较优的执行计划。
使用pg hint插件,通过插件固化执行计划。
