读懂 PostgreSQL 的 Explain

查询优化的第一步是要读懂 PG 是如何执行查询的，而要用到的工具就是 EXPLAIN。

Explain

使用 EXPLAIN 命令可以查询 Planner 为查询创建的执行计划。

查询计划是用树结构表示的，每一个树的节点都是一个子计划。树结构的最底层节点一般是扫描节点，负责从表中读取数据行。不同的扫描节点对应不同的表数据获取方法：

顺序扫描（sequential scans）
索引扫描（index scans）
位图扫描（bitmap index scans）

pagila=# explain select * from film;
                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on film  (cost=0.00..65.00 rows=1000 width=390)
(1 row)

没有 where 条件，所以需要扫描整张表，对应的扫描计划是 Seq Scan。括号里面从左到右4个数据的含义依次是：

估计的启动开销，指输出阶段开始之前所花费的时间，同样的，在 sort 节点中指排序的时间。
估计的总的开销。
此计划节点输出的估计行数。
此计划节点输出的行的估计平均宽度（单位：bytes）。

代价的计算公式： (disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost)

pagila=# SELECT relpages, reltuples FROM pg_class WHERE relname = 'film';
 relpages | reltuples
----------+-----------
       55 |      1000
(1 row)

默认的, seq_page_cost 是 1.0 and cpu_tuple_cost 是 0.01，(55 * 1.0) + (0.01 * 1000) = 65。

记住：

上层节点的开销包括其所有子节点的开销。
开销只反映了计划器关心的东西。
开销不考虑将结果行传输到客户端花费的时间，这个在实际的时间花费中可能占很大一部分，计划器不考虑是因为，计划器改变计划也改不不了这个时间。

pagila=# explain select * from film where length > 100;
                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on film  (cost=0.00..67.50 rows=611 width=390)
   Filter: (length > 100)
(2 rows)

rows 因为过滤条件减少了，但是总的开销并没有减少，还增加了，因为还是要扫描 1000 行，每行还要判断是否满足过滤条件。增加的开销：1000 * cpu_operator_cost，该参数默认值是 0.0025，所以 65 + 1000 * 0.0025 得到 67.5。

Explain Analyze

加上 Analyze 参数后，查询会被执行，explain 会在原来的基础上显示每个节点真实的时间花费。

pagila=# explain analyze select f.title, l.name from film f inner join language l on f.language_id = l.language_id;
                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=24.62..92.27 rows=1000 width=99) (actual time=0.408..22.136 rows=1000 loops=1)
   Hash Cond: (f.language_id = l.language_id)
   ->  Seq Scan on film f  (cost=0.00..65.00 rows=1000 width=19) (actual time=0.029..7.197 rows=1000 loops=1)
   ->  Hash  (cost=16.50..16.50 rows=650 width=88) (actual time=0.253..0.270 rows=6 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on language l  (cost=0.00..16.50 rows=650 width=88) (actual time=0.055..0.131 rows=6 loops=1)
 Planning time: 0.594 ms
 Execution time: 29.335 ms
(8 rows)

actual_time 的单位是毫秒，而开销的估算单位是随意的，两者不可相提并论。

pagila=# explain analyze select * from film where length > 100;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Seq Scan on film  (cost=0.00..67.50 rows=611 width=390) (actual time=0.046..6.862 rows=610 loops=1)
   Filter: (length > 100)
   Rows Removed by Filter: 390
 Planning time: 0.195 ms
 Execution time: 13.014 ms
(5 rows)

Rows Removed 显示 where 条件过滤掉了多少行。

Explain Buffers

explain 有个 buffers 参数可以和 analyze 一起使用获取更多的统计信息。buffers 提供的数据可以帮我们识别出哪部分查询时最 IO 密集的。

pagila=# explain (analyze, buffers) select * from film where length > 100;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Seq Scan on film  (cost=0.00..67.50 rows=611 width=390) (actual time=0.020..7.365 rows=610 loops=1)
   Filter: (length > 100)
   Rows Removed by Filter: 390
   Buffers: shared hit=55
 Planning time: 0.110 ms
 Execution time: 14.068 ms
(6 rows)

hit 的意思是数据块已经在缓存中了，避免了读操作。

shared block：包含从表和索引来的数据。

local block：包含从临时表和索引来的数据。

temp block：sort、hash、materialize 节点用到的临时数据。

EXPLAIN Verbose

显示有关计划的其他信息。具体来说，包括计划树中每个节点的输出列列表、模式限定表和函数名，始终使用范围表别名标记表达式中的变量，并始终打印显示统计信息的每个触发器的名称。

参考

[1] EXPLAIN

[1] Using EXPLAIN

Protream's Notes