Redis异构集群数据实时迁移
背景
由于历史原因,公司的缓存方案使用的是Codis,并且一个大部门公用一个集群,我们计划废弃Codis,用Redis原生的集群架构,为什么要废弃Codis呢,主要有两个原因:1、Codis官方已经很久没有更新维护了,Redis官方版本已经迭代到5.x.x了,codis-server还是3.x.x,Redis的一些新特性无法支持;2、基于风险均摊、鸡蛋不放一个篮子的原则,目前我们这样的用法违背了这一原则,如果一个集群出问题,那么整个部门的全部服务都受影响。在前期和业务部门调研的过程中发现,大家用Codis不仅仅是做缓存,有些业务场景还当储存用,比如计数器等;所以我们需要一个数据实时迁移方案,这样业务才能无感知的从Codis迁移到Redis。
方案选型
需求
1、支持从Codis到Redis Cluster做数据迁移
2、支持从Codis到哨兵集群做数据迁移
3、支持只迁移部分key
4、支持查看迁移进度
调研
1、redis-migrate-tool
redis-migrate-tool是唯品会开源的一款Redis异构集群之间的数据实时迁移工具,不过已经有两年没有更新了,我个人觉得这是一款比较完善的工具,特别是数据校验,详细功能介绍见GitHub:
https://github.com/vipshop/redis-migrate-tool
2、RedisShake
RedisShake是阿里云基于豌豆荚开源的redis-port进行二次开发的一个支持Redis异构集群实时同步的工具,它和redis-migrate-tool相比较,我觉得它的优点在于支持前缀key的同步,支持多DB同步,而redis-migrate-tool 只能全量同步,并且如果源做了分库,同步到目标Redis的时候都同步到了db0一个库里面了,这对于做了分库场景的业务是不可行的,关于RedisShake的详细功能介绍见GitHub:
https://github.com/alibaba/RedisShake
3、redis-port
redis-port是豌豆荚当年为了让大家方便从redis迁移到Codis开源的一个Redis数据迁移工具,现在也已经很久没更新了,关于它的功能也用法见GitHub:
https://github.com/CodisLabs/redis-port
实践
环境
codis---》哨兵
分片master | 密码 | codis版本 | 哨兵地址 | master地址 | master密码 | 哨兵redis版本 |
---|---|---|---|---|---|---|
192.168.46.150:10379 | xxx | 3.2.4 | 192.168.9.87:6385 | 192.168.9.87:6384 | 123456 | 5.0.2 |
192.168.47.150:10379 | xxx | 3.2.4 | 192.168.9.88:6385 | 192.168.9.87:6384 | 123456 | 5.0.2 |
xxx | 3.2.4 | 192.168.9.89:6385 | 192.168.9.87:6384 | 123456 | 5.0.2 |
codis---》Redis Cluster
分片master | 密码 | codis版本 | master node | master密码 | redis cluster版本 |
---|---|---|---|---|---|
192.168.46.150:10379 | xxx | 3.2.4 | 192.168.9.87:6383 | 123456 | 5.0.2 |
192.168.47.150:10379 | xxx | 3.2.4 | 192.168.9.89:6382 | 123456 | 5.0.2 |
xxx | 3.2.4 | 192.168.9.88:6381 | 123456 | 5.0.2 |
使用redis-migrate-tool进行数据迁移
迁移工具安装
按官方文档进行编译安装即可
编写配置文件
迁移哨兵的配置文件
vim /chj/app/redis-migrate-tool/rmt_sentinel.conf[source]type: singleservers :- 192.168.46.150:10379- 192.168.47.150:10379redis_auth: xxx[target]type: singleservers:- 192.168.9.87:6384redis_auth: 123456[common]listen: 0.0.0.0:8888
迁移redis cluster的配置文件
vim /chj/app/redis-migrate-tool/rmt_cluster.conf[source]type: singleservers :- 192.168.46.150:10379- 192.168.47.150:10379redis_auth: xxx[target]type: redis clusterservers:- 192.168.9.87:6383- 192.168.9.89:6382- 192.168.9.88:6381redis_auth: 123456[common]listen: 0.0.0.0:8889
启动同步程序
cd /chj/app/redis-migrate-tool#condis迁移数据到哨兵集群./src/redis-migrate-tool -c rmt_sentinel.conf -o rmt.log -d #condis迁移数据到redis cluster./src/redis-migrate-tool -c rmt_cluster.conf -o rmt_cluster.log -d
数据校验
cd /chj/app/redis-migrate-tool[root@devops-template-test redis-migrate-tool]# ./src/redis-migrate-tool -c rmt_sentinel.conf -C "redis_check 60000"Check job is running...[2019-06-25 11:12:09.414] rmt_check.c:848 ERROR: key checked failed: check key's value error, value is inconsistent. key(len:17, type:hash): BigData-IpParse:4Checked keys: 60000Inconsistent value keys: 1Inconsistent expire keys : 0Other check error keys: 0Checked OK keys: 59999Check job finished, used 16.622sPS1、"-C "redis_check 60000" 指定要执行数据校验,60000指的是校验数据的样本数,默认是10002、如果有异常,需要确认执行异常key的情况
同步状态确认
total_msgs_outqueue可以判断是否有oplog在队列中等待处理,如果total_msgs_outqueue>0,请继续等待,直到total_msgs_outqueue=0才能切换
[root@devops-template-test redis-migrate-tool]# redis-cli -h 127.0.0.1 -p 8889 info Serverversion:0.1.0os:Linux 3.10.0-693.5.2.el7.x86_64 x86_64multiplexing_api:epollgcc_version:4.8.5process_id:10137tcp_port:8889uptime_in_seconds:1201uptime_in_days:0config_file:/chj/app/redis-migrate-tool/rmt_cluster.conf Clientsconnected_clients:1max_clients_limit:100total_connections_received:1 Memorymem_allocator:jemalloc-0.0.0 Groupsource_nodes_count:2target_nodes_count:4Statsall_rdb_received:1all_rdb_parsed:1all_aof_loaded:0rdb_received_count:2rdb_parsed_count:2aof_loaded_count:0total_msgs_recv:357666total_msgs_sent:357666total_net_input_bytes:78804395total_net_output_bytes:1688068278total_net_input_bytes_human:75.15Mtotal_net_output_bytes_human:1.57Gtotal_mbufs_inqueue:0total_msgs_outqueue:0
使用RedisShake进行数据迁移
工具安装
mkdir /chj/app/redis-shakecd /chj/app/redis-shakewget https://github.com/alibaba/RedisShake/releases/download/release-v1.6.9-20190624/redis-shake.tar.gztar -zxvf redis-shake.tar.gz
编写配置文件
在原来的配置文件上修改,只修改下面有注释的项,其他保持不变
id = redis-shakelog.file = ./redis-shake.loglog.level = infopid_path =system_profile = 9310http_profile = 9320ncpu = 0parallel = 32source.type = cluster #源类型选择clustersource.address = 192.168.46.150:10379;192.168.47.150:10379 #codis 分片master的地址source.password_raw = xxx #codis的密码source.auth_type = authsource.tls_enable = falsetarget.type = sentinel #目标的类型是哨兵#target.type = cluster #目标是redis clustertarget.address = sentinel-zhj2-redis-sentinel-dev-6384@192.168.9.87:6385;192.168.9.88:6385;192.168.9.89:6385 #目标哨兵集群的地址#target.address = 192.168.9.87:6383;192.168.9.89:6382;192.168.9.88:6381 #目标redis cluster的地址target.password_raw = 123456 #目标redis的密码target.auth_type = authtarget.db = -1target.tls_enable = falserdb.input = localrdb.output = local_dumprdb.parallel = 0rdb.special_cloud =fake_time =rewrite = truefilter.db = 0 #只同步db0filter.key =mms;vcc #只同步mms和vcc开头的keyfilter.slot =filter.lua = falsebig_key_threshold = 524288000psync = falsemetric = truemetric.print_log = falseheartbeat.url =heartbeat.interval = 3heartbeat.external = test externalheartbeat.network_interface =sender.size = 104857600sender.count = 5000sender.delay_channel_size = 65535keep_alive = 0scan.key_number = 50scan.special_cloud =scan.key_file =qps = 200000replace_hash_tag = falseextra = false
启动同步程序
/chj/app/redis-shake/start.sh /chj/app/redis-shake/redis-shake.conf sync
查看同步状态
通过比较PullCommandTotal - BypassCommandTotal == PushCommandTotal 确定同步是否完成
curl http://192.168.47.253:9320/metric| python -m json.tool[ { "AvgDelay": "0.43 ms", "BypassCmdCount": 0, "BypassCmdCountTotal": 0, "Delay": "null ms", "Details": null, "FailCmdCount": 0, "FailCmdCountTotal": 0, "FullSyncProgress": 100, "NetworkFlowTotal": 42006, "NetworkSpeed": 0, "ProcessingCmdCount": 0, "PullCmdCount": 0, "PullCmdCountTotal": 897, "PushCmdCount": 0, "PushCmdCountTotal": 839, "SenderBufCount": 0, "SourceAddress": "192.168.46.150:10379", "SourceDBOffset": 0, "StartTime": "2019-06-25T17:45:23Z", "Status": "incr", "SuccessCmdCount": 0, "SuccessCmdCountTotal": 839, "TargetAddress": [ "192.168.9.87:6384" ], "TargetDBOffset": 0 }, { "AvgDelay": "0.60 ms", "BypassCmdCount": 1, "BypassCmdCountTotal": 4067, "Delay": "null ms", "Details": null, "FailCmdCount": 0, "FailCmdCountTotal": 0, "FullSyncProgress": 100, "NetworkFlowTotal": 37629, "NetworkSpeed": 0, "ProcessingCmdCount": 0, "PullCmdCount": 1, "PullCmdCountTotal": 5106, "PushCmdCount": 0, "PushCmdCountTotal": 333, "SenderBufCount": 0, "SourceAddress": "192.168.47.150:10379", "SourceDBOffset": 0, "StartTime": "2019-06-25T17:45:23Z", "Status": "incr", "SuccessCmdCount": 0, "SuccessCmdCountTotal": 333, "TargetAddress": [ "192.168.9.87:6384" ], "TargetDBOffset": 0 }]