空间都去哪里了?(中)

来源:这里教程网 时间:2026-03-01 17:31:36 作者:

来源:PostgreSQL学徒

前言

话接上回,空间都去哪里了?(上),当异常退出的时候,可能会留下许多孤儿文件,就像蛀虫一样,悄无声息蚕食你的空间。上回提到可以使用SQL+人肉巡检来排查:

SELECT
    *
FROM
    pg_ls_dir('/home/postgres/16data/base/5'AS file
WHERE
    file ~ '^[0-9]*'
    AND file::text NOT IN (
        SELECT
            oid::text
        FROM
            pg_class);

但是效率太低,并且还会误报,当一个进行中的事务正在建表、重写表的时候,都会被SQL查询出来,所以你还要人为仔细检查才行。那么有没有更加优雅的办法?

pg_catacheck

首先我想到的是pg_catacheck,顾名思义,检查系统表

!! PostgreSQL stores the metadata for SQL objects such as tables and functions using special tables called system catalog tables. Users do not normally modify these tables directly, but instead modify them using SQL commands such as CREATE, ALTER, and DROP.

DDL本质上就是系统表的DML,因此一旦系统表损坏了,就很尴尬。那么既然是孤儿文件,pg_catacheck能否检测出来呢?Try try see...

postgres=# begin;
BEGIN
postgres=*# create table tt1(id int,info text);
CREATE TABLE
postgres=*# insert into tt1 select n,'test' from generate_series(1,100000) as n;
INSERT 0 100000
postgres=*# select pg_relation_filepath('tt1');
 pg_relation_filepath 
----------------------
 base/5/49765
(1 row)
postgres=*# select pg_backend_pid(),txid_current();
 pg_backend_pid | txid_current 
----------------+--------------
          20680 |         1676
(1 row)

然后kill -9杀掉该会话

[postgres@xiongcc 5]$ kill -9 20680
[postgres@xiongcc 5]$ ls -lrth 49765
-rw------- 1 postgres postgres 4.3M Nov 14 21:59 49765

这样操作过后,系统表中便会留下死元祖,同时留下孤儿文件

postgres=# select lp, t_xmin, t_xmax, t_ctid,
       infomask(t_infomask, 1) as infomask,
       infomask(t_infomask2, 2) as infomask2
from heap_page_items(get_raw_page('pg_class', 12)) where t_xmin = 1676;
 lp | t_xmin | t_xmax | t_ctid  |                 infomask                  |    infomask2    
----+--------+--------+---------+-------------------------------------------+-----------------
 25 |   1676 |   1676 | (12,28) | XMIN_INVALID|HASNULL                      | HOT_UPDATED
 26 |   1676 |      0 | (12,26) | XMAX_INVALID|XMIN_INVALID|HASNULL         | 
 27 |   1676 |      0 | (12,27) | XMAX_INVALID|XMIN_INVALID|HASNULL         | 
 28 |   1676 |      0 | (12,28) | UPDATED|XMAX_INVALID|XMIN_INVALID|HASNULL | HEAP_ONLY_TUPLE
(4 rows)

但是很遗憾,pg_catcheck并不能检测出来,其实并不意外,因为本质上系统表并未损坏,充其量只是留下了一条死元祖,大不了vacuum full重建一下。

[postgres@xiongcc 5]$ pg_catcheck
progress: done (0 inconsistencies, 0 warnings, 0 errors)

当然假如你采用非常规操作恶意损坏,是可以检测出来的

postgres=# create table t1(id int);
CREATE TABLE
postgres=# insert into t1 values(generate_series(1,100));
INSERT 0 100
postgres=# delete from pg_class where relname = 't1';
DELETE 2
[postgres@xiongcc 5]$ pg_catcheck
notice: pg_type row has invalid typrelid "33371"no matching entry in pg_class
row identityoid="33373"
notice: pg_type row has invalid typrelid "57946"no matching entry in pg_class
row identityoid="57948"
notice: pg_attribute row has invalid attrelid "33371"no matching entry in pg_class
row identity: attrelid="33371" attname="id" attnum="1"
notice: pg_attribute row has invalid attrelid "33371"no matching entry in pg_class
row identity: attrelid="33371" attname="ctid" attnum="-1"
notice: pg_attribute row has invalid attrelid "33371"no matching entry in pg_class
row identity: attrelid="33371" attname="xmin" attnum="-2"
notice: pg_attribute row has invalid attrelid "33371"no matching entry in pg_class
...
...

根据需要,可以搭配checksum定期巡检,防患于未然。

pg_orphaned

这两天我再捣鼓的时候,偶然发现了这个插件,顿时喜出望外!根据其名字便可知晓作用,这个插件提供了五个函数

pg_list_orphaned(interval): to list orphaned files. Orphaned files older than the interval parameter (default 1 Day) are listed with the "older" field set to true. 列出孤儿文件,默认情况下超过1天的孤儿文件,older字段会显示会true
pg_move_orphaned(interval): to move orphaned files to a "orphaned_backup" directory. Only orphaned files older than the interval parameter (default 1 Day) are moved. 将孤儿文件移除到"orphaned_backup"目录中
pg_list_orphaned_moved(): to list the orphaned files that have been moved to the "orphaned_backup" directory. 列出"orphaned_backup"中的孤儿文件
pg_move_back_orphaned(): to move back the orphaned files from the "orphaned_backup" directory to their orginal location (if still orphaned). 将"orphaned_backup"中的孤儿文件挪回来
pg_remove_moved_orphaned(): to remove the orphaned files located in the "orphaned_backup" directory.彻底删除

感觉有点像回收站的味道。让我们看下效果:

postgres=# select * from pg_list_orphaned();
  dbname  |  path  |   name    |  size   |        mod_time        | relfilenode | reloid | older 
----------+--------+-----------+---------+------------------------+-------------+--------+-------
 postgres | base/5 | 16998     |       0 | 2023-11-04 13:06:25+08 |       16998 |      0 | t
 postgres | base/5 | 49769     |    8192 | 2023-11-14 21:59:02+08 |       49769 |      0 | f
 postgres | base/5 | 41566     |    8192 | 2023-11-14 21:46:34+08 |       41566 |      0 | f
 postgres | base/5 | 16999     |    8192 | 2023-11-04 13:10:30+08 |       16999 |      0 | t
 postgres | base/5 | 33398     |       0 | 2023-11-14 21:45:06+08 |       33398 |      0 | f
 postgres | base/5 | 25292     |    8192 | 2023-11-10 11:28:11+08 |       25292 |      0 | t
 postgres | base/5 | 49768     |       0 | 2023-11-14 21:58:21+08 |       49768 |      0 | f
 postgres | base/5 | 33395     | 4431872 | 2023-11-14 21:45:48+08 |       33395 |      0 | f
 postgres | base/5 | 33395_fsm |   24576 | 2023-11-14 21:45:48+08 |       33395 |      0 | f
 postgres | base/5 | 41565     |       0 | 2023-11-14 21:46:02+08 |       41565 |      0 | f
 postgres | base/5 | 33371     |       0 | 2023-11-10 11:48:59+08 |       33371 |      0 | t
 postgres | base/5 | 33399     |    8192 | 2023-11-14 21:45:48+08 |       33399 |      0 | f
 postgres | base/5 | 57946     |    8192 | 2023-11-14 22:04:08+08 |       57946 |      0 | f
 postgres | base/5 | 49765     | 4431872 | 2023-11-14 21:59:02+08 |       49765 |      0 | f
 postgres | base/5 | 49765_fsm |   24576 | 2023-11-14 21:59:02+08 |       49765 |      0 | f
 postgres | base/5 | 25288     | 6832128 | 2023-11-10 11:28:11+08 |       25288 |      0 | t
 postgres | base/5 | 25288_fsm |   24576 | 2023-11-10 11:28:11+08 |       25288 |      0 | t
 postgres | base/5 | 25291     |       0 | 2023-11-10 11:27:20+08 |       25291 |      0 | t
(18 rows)

好家伙,原来系统中已经有这么多孤儿文件了。那让我们删除一下

postgres=# select pg_move_orphaned();
 pg_move_orphaned 
------------------
                7
(1 row)
postgres=# select * from pg_list_orphaned();
  dbname  |  path  |   name    |  size   |        mod_time        | relfilenode | reloid | older 
----------+--------+-----------+---------+------------------------+-------------+--------+-------
 postgres | base/5 | 49769     |    8192 | 2023-11-14 21:59:02+08 |       49769 |      0 | f
 postgres | base/5 | 41566     |    8192 | 2023-11-14 21:46:34+08 |       41566 |      0 | f
 postgres | base/5 | 33398     |       0 | 2023-11-14 21:45:06+08 |       33398 |      0 | f
 postgres | base/5 | 49768     |       0 | 2023-11-14 21:58:21+08 |       49768 |      0 | f
 postgres | base/5 | 33395     | 4431872 | 2023-11-14 21:45:48+08 |       33395 |      0 | f
 postgres | base/5 | 33395_fsm |   24576 | 2023-11-14 21:45:48+08 |       33395 |      0 | f
 postgres | base/5 | 41565     |       0 | 2023-11-14 21:46:02+08 |       41565 |      0 | f
 postgres | base/5 | 33399     |    8192 | 2023-11-14 21:45:48+08 |       33399 |      0 | f
 postgres | base/5 | 57946     |    8192 | 2023-11-14 22:09:06+08 |       57946 |      0 | f
 postgres | base/5 | 49765     | 4431872 | 2023-11-14 21:59:02+08 |       49765 |      0 | f
 postgres | base/5 | 49765_fsm |   24576 | 2023-11-14 21:59:02+08 |       49765 |      0 | f
(11 rows)

删除过后,这些文件会放到orphaned_backup目录中,那么这个目录在哪里呢?让我们看下代码

 /* default tablespace */
 if (!restore)
  snprintf(dir, sizeof(dir), "base/%u", dbOid);
 else
  snprintf(dir, sizeof(dir), "%s/%u/base/%u", orphaned_backup_dir, dbOid, dbOid);

很简单,位于实例目录下,以数据库的oid来命名。

[postgres@xiongcc 16data]$ cd orphaned_backup/
[postgres@xiongcc orphaned_backup]$ tree 
.
└── 5
    └── base
        └── 5
            ├── 16998
            ├── 16999
            ├── 25288
            ├── 25288_fsm
            ├── 25291
            ├── 25292
            └── 33371
3 directories, 7 files

小结

很不错的一个工具,麻雀虽小,短短1000行代码,却能帮助我们解决很大的烦恼。

下一期让我们聊聊——空间都去哪里了(下),那些奇奇怪怪的场景。

相关推荐