MySQL数据对比是常见的日常任务,使用pt-checksum 只能对比主从复制的数据,并且效率和使用上也不太方便。 因此写了一个对比的工具mydiff,经过测试对比50张500万的表,能在30s内完成。 要求,mysql 表必须有自增主键,对比期间不会加锁。对于不是非常非常高并发的主从复制,也可以进行数据的对比。 可以非常方便的加入到日常的定时任务中。建议使用单独的机器运行对比,并把结果保存在单独的数据库实例中。 使用:在db.conf.json中指定需要对比的数据库信息。以及保存结果的数据库信息即可。 CheckTables 可以指定全部对比,也可以指定对比特定的表
{
"SourceMySQLHost": "127.0.0.1",
"SourceMySQLPort": 20996,
"SourceMySQLDB": "test",
"SourceMySQLUser": "msandbox",
"SourceMySQLUserPassword": "msandbox",
"TargetMySQLHost": "127.0.0.1",
"TargetMySQLPort": 20996,
"TargetMySQLDB": "test2",
"TargetMySQLUser": "msandbox",
"TargetMySQLUserPassword": "msandbox",
"ChecksumHost": "127.0.0.1",
"ChecksumPort": 20996,
"ChecksumDB": "test",
"ChecksumUser": "msandbox",
"ChecksumPassword": "msandbox",
"CheckTables":"*",
"ChunkSize":2000,
"CommitLimit":1000 --批量提交crc结果数量
}
测试输出:
go run *.go 2022-01-13 09:49:33 INFO Read config: db.conf.json table not in target:[accounts accounts2 accounts3 accounts_del active_node crc_1 crc_2 crc_result cycle_count_task_assignment_rule_v2_tab dba_test experiment_config_tab inbound_mbn_tab node_health node_health_history rs sbtest5_bak t_checksum user_bak user_infos] table not in source:[sbtest5 user2] tables to be compare:[user] begin compare table:%s int db tables user test test2 Getting nothing in the write queue. Sleeping... diff values chunk len: 5 begin to find diff rows in table: test user compare diff.......... Table rows in test.user chunk_num:5854 crc is not same,check end_pk row Getting nothing in the write queue. Sleeping... begin to find diff rows in table: test user compare diff.......... Table rows in test.user chunk_num:7841 crc is not same,check end_pk row Getting nothing in the write queue. Sleeping... begin to find diff rows in table: test user compare diff.......... Table rows in test.user chunk_num:9275 crc is not same,check end_pk row Getting nothing in the write queue. Sleeping... begin to find diff rows in table: test user compare diff.......... Table rows in test.user chunk_num:11507 crc is not same,check end_pk row Getting nothing in the write queue. Sleeping... begin to find diff rows in table: test user compare diff.......... Table rows in test.user chunk_num:13173 crc is not same,check end_pk row Getting nothing in the write queue. Sleeping... table compare elapsed: 25.583540092s done
根据提示查找对应的chunk,或者直接用sql查找crc_result表,获取不一样的记录
select * from crc_result where chunk_num=11507; +-------+--------+------------+-----------+----------+--------+----------+-----+---------------------+--------+ | id | dbname | table_name | chunk_num | start_pk | end_pk | crc | cnt | ctime | which | +-------+--------+------------+-----------+----------+--------+----------+-----+---------------------+--------+ | 59719 | test | user | 11507 | 81022 | 81024 | ac9e9b3 | 1 | 2022-01-13 09:49:58 | source | | 59720 | test2 | user | 11507 | 81022 | 81024 | 7573c5a8 | 2 | 2022-01-13 09:49:58 | target | +-------+--------+------------+-----------+----------+--------+----------+-----+---------------------+--------+
select * from crc_result where chunk_num=9275; +-------+--------+------------+-----------+----------+--------+----------+-----+---------------------+--------+ | id | dbname | table_name | chunk_num | start_pk | end_pk | crc | cnt | ctime | which | +-------+--------+------------+-----------+----------+--------+----------+-----+---------------------+--------+ | 55255 | test | user | 9275 | 12457 | 12458 | a2e9ba83 | 1 | 2022-01-13 09:49:56 | source | | 55256 | test2 | user | 9275 | 12457 | 12458 | 67fab947 | 1 | 2022-01-13 09:49:56 | target | +-------+--------+------------+-----------+----------+--------+----------+-----+---------------------+--------+
第一个输出中看到crc不一样,cnt也不一样,说明chunk中的记录行数不一样 第二个输出中看到crc内容不一样,对应的行内容不一样
