Redis Cluster 自动化安装,扩容和缩容

Redis Cluster 自动化安装,扩容和缩容

之前写过一篇基于python的redis集群自动化安装的实现,基于纯命令的集群实现还是相当繁琐的,因此官方提供了redis-trib.rb这个工具
虽然官方的的redis-trib.rb提供了集群创建、 检查、 修复、均衡等命令行工具,之所个人接受不了redis-trib.rb,原因在于redis-trib.rb无法自定义实现集群中节点的主从关系。
比如ABCDEF6个节点,在创建集群的过程中必然要明确指定哪些是主,哪些是从,主从对应关系,可惜通过redis-trib.rb无法自定义控制,参考如下截图。
更多的时候,是需要明确指明哪些机器作为主节点,哪些作为从节点,redis-trib.rb做不到自动控制集群中的哪些机器(实例)作为主,哪些机器(实例)作为从。
如果使用redis-trib.rb,还需要解决ruby的环境依赖,因此个人不太接受使用redis-trib.rb搭建集群。

引用《Redis开发与运维》里面的原话:
如果部署节点使用不同的IP地址, redis-trib.rb会尽可能保证主从节点不分配在同一机器下, 因此会重新排序节点列表顺序。
节点列表顺序用于确定主从角色, 先主节点之后是从节点。
这说明:使用redis-trib.rb是无法人为地完全控制主从节点的分配的。

后面redis 5.0版本的Redis-cli --cluster已经实现了集群的创建,无需依赖redis-trib.rb,包括ruby环境,redis 5.0版本Redis-cli --cluster本身已经实现了集群等相关功能
但是基于纯命令本身还是比较复杂的,尤其是在较为复杂的生产环境,通过手动方式来创建集群,扩容或者缩容,会存在一系列的手工操作,以及一些不安全因素。
所以,自动化的集群创建 ,扩容以及缩容是有必要的。

测试环境

这里基于python3,以redis-cli --cluster命令为基础,实现redis自动化集群,自动化扩容,自动化缩容

测试环境以单机多实例为示例,一共8个节点,
1,自动化集群的创建,6各节点(10001~10006)创建为3主(10001~10002)3从(10004~10006)的集群
2,集群的自动化扩容,增加新节点10007为主节点,同时添加10008为10007节点的slave节点
3,集群的自动化缩容,与2相反,移除集群中的10007以及其slave的10008节点

Redis集群创建

集群的本质是执行两组命令,一个是将主节点加入到集群中,一个是依次对主节点添加slave节点。
但是期间会涉及到找到各个节点id的逻辑,因此手动实现的话,比较繁琐。
主要命令如下:

################# create cluster #################
redis-cli --cluster create 127.0.0.1:10001 127.0.0.1:10002 127.0.0.1:10003 -a ****** --cluster-yes
################# add slave nodes #################
redis-cli --cluster add-node 127.0.0.1:10004 127.0.0.1:10001 --cluster-slave --cluster-master-id 6164025849a8ff9297664fc835bc851af5004f61 -a ******
redis-cli --cluster add-node 127.0.0.1:10005 127.0.0.1:10002 --cluster-slave --cluster-master-id 64e634307bdc339b503574f5a77f1b156c021358 -a ******
redis-cli --cluster add-node 127.0.0.1:10006 127.0.0.1:10003 --cluster-slave --cluster-master-id 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a -a ******

这里使用python创建的过程中打印出来redis-cli --cluster 命令的日志信息

[root@JD redis_install]# python3 create_redis_cluster.py################# flush master/slave slots ################################## create cluster #################redis-cli --cluster create 127.0.0.1:10001 127.0.0.1:10002 127.0.0.1:10003   -a ****** --cluster-yesWarning: Using a password with '-a' or '-u' option on the command line interface may not be safe.>>> Performing hash slots allocation on 3 nodes...Master[0] -> Slots 0 - 5460Master[1] -> Slots 5461 - 10922Master[2] -> Slots 10923 - 16383M: 6164025849a8ff9297664fc835bc851af5004f61 127.0.0.1:10001   slots:[0-5460] (5461 slots) masterM: 64e634307bdc339b503574f5a77f1b156c021358 127.0.0.1:10002   slots:[5461-10922] (5462 slots) masterM: 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 127.0.0.1:10003   slots:[10923-16383] (5461 slots) master>>> Nodes configuration updated>>> Assign a different config epoch to each node>>> Sending CLUSTER MEET messages to join the clusterWaiting for the cluster to join.>>> Performing Cluster Check (using node 127.0.0.1:10001)M: 6164025849a8ff9297664fc835bc851af5004f61 127.0.0.1:10001   slots:[0-5460] (5461 slots) masterM: 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 127.0.0.1:10003   slots:[10923-16383] (5461 slots) masterM: 64e634307bdc339b503574f5a77f1b156c021358 127.0.0.1:10002   slots:[5461-10922] (5462 slots) master[OK] All nodes agree about slots configuration.>>> Check for open slots...>>> Check slots coverage...[OK] All 16384 slots covered.0################# add slave nodes #################redis-cli --cluster add-node 127.0.0.1:10004 127.0.0.1:10001 --cluster-slave --cluster-master-id 6164025849a8ff9297664fc835bc851af5004f61 -a ******Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.>>> Adding node 127.0.0.1:10004 to cluster 127.0.0.1:10001>>> Performing Cluster Check (using node 127.0.0.1:10001)M: 6164025849a8ff9297664fc835bc851af5004f61 127.0.0.1:10001   slots:[0-5460] (5461 slots) masterM: 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 127.0.0.1:10003   slots:[10923-16383] (5461 slots) masterM: 64e634307bdc339b503574f5a77f1b156c021358 127.0.0.1:10002   slots:[5461-10922] (5462 slots) master[OK] All nodes agree about slots configuration.>>> Check for open slots...>>> Check slots coverage...[OK] All 16384 slots covered.>>> Send CLUSTER MEET to node 127.0.0.1:10004 to make it join the cluster.Waiting for the cluster to join>>> Configure node as replica of 127.0.0.1:10001.[OK] New node added correctly.0redis-cli --cluster add-node 127.0.0.1:10005 127.0.0.1:10002 --cluster-slave --cluster-master-id 64e634307bdc339b503574f5a77f1b156c021358 -a ******Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.>>> Adding node 127.0.0.1:10005 to cluster 127.0.0.1:10002>>> Performing Cluster Check (using node 127.0.0.1:10002)M: 64e634307bdc339b503574f5a77f1b156c021358 127.0.0.1:10002   slots:[5461-10922] (5462 slots) masterS: 026f0179631f50ca858d46c2b2829b3af71af2c8 127.0.0.1:10004   slots: (0 slots) slave   replicates 6164025849a8ff9297664fc835bc851af5004f61M: 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 127.0.0.1:10003   slots:[10923-16383] (5461 slots) masterM: 6164025849a8ff9297664fc835bc851af5004f61 127.0.0.1:10001   slots:[0-5460] (5461 slots) master   1 additional replica(s)[OK] All nodes agree about slots configuration.>>> Check for open slots...>>> Check slots coverage...[OK] All 16384 slots covered.>>> Send CLUSTER MEET to node 127.0.0.1:10005 to make it join the cluster.Waiting for the cluster to join>>> Configure node as replica of 127.0.0.1:10002.[OK] New node added correctly.0redis-cli --cluster add-node 127.0.0.1:10006 127.0.0.1:10003 --cluster-slave --cluster-master-id 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a -a ******Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.>>> Adding node 127.0.0.1:10006 to cluster 127.0.0.1:10003>>> Performing Cluster Check (using node 127.0.0.1:10003)M: 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 127.0.0.1:10003   slots:[10923-16383] (5461 slots) masterM: 64e634307bdc339b503574f5a77f1b156c021358 127.0.0.1:10002   slots:[5461-10922] (5462 slots) master   1 additional replica(s)S: 23e1871c4e1dc1047ce567326e74a6194589146c 127.0.0.1:10005   slots: (0 slots) slave   replicates 64e634307bdc339b503574f5a77f1b156c021358M: 6164025849a8ff9297664fc835bc851af5004f61 127.0.0.1:10001   slots:[0-5460] (5461 slots) master   1 additional replica(s)S: 026f0179631f50ca858d46c2b2829b3af71af2c8 127.0.0.1:10004   slots: (0 slots) slave   replicates 6164025849a8ff9297664fc835bc851af5004f61[OK] All nodes agree about slots configuration.>>> Check for open slots...>>> Check slots coverage...[OK] All 16384 slots covered.>>> Send CLUSTER MEET to node 127.0.0.1:10006 to make it join the cluster.Waiting for the cluster to join>>> Configure node as replica of 127.0.0.1:10003.[OK] New node added correctly.0################# cluster nodes info: #################8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 127.0.0.1:10003@20003 myself,master - 0 1575947748000 53 connected 10923-1638364e634307bdc339b503574f5a77f1b156c021358 127.0.0.1:10002@20002 master - 0 1575947748000 52 connected 5461-1092223e1871c4e1dc1047ce567326e74a6194589146c 127.0.0.1:10005@20005 slave 64e634307bdc339b503574f5a77f1b156c021358 0 1575947746000 52 connected6164025849a8ff9297664fc835bc851af5004f61 127.0.0.1:10001@20001 master - 0 1575947748103 51 connected 0-5460026f0179631f50ca858d46c2b2829b3af71af2c8 127.0.0.1:10004@20004 slave 6164025849a8ff9297664fc835bc851af5004f61 0 1575947749000 51 connected9f265545ebb799d2773cfc20c71705cff9d733ae 127.0.0.1:10006@20006 slave 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 0 1575947749105 53 connected[root@JD redis_install]#

Redis集群扩容

redis扩容主要分为两步:
1,增加主节点,同时为主节点增加从节点。
2,重新分配slot到新增加的master节点上。

主要命令如下:

增加主节点到集群中
redis-cli --cluster add-node 127.0.0.1:10007 127.0.0.1:10001 -a ******
为增加的主节点添加从节点
redis-cli --cluster add-node 127.0.0.1:10008 127.0.0.1:10007 --cluster-slave --cluster-master-id 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 -a ******

重新分片slot
############################ execute reshard #########################################
redis-cli -a redis@password --cluster reshard 127.0.0.1:10001 --cluster-from 6164025849a8ff9297664fc835bc851af5004f61 --cluster-to 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 --cluster-slots 1365 --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000 --cluster-replace >/dev/null 2>&1
############################ execute reshard #########################################
redis-cli -a redis@password --cluster reshard 127.0.0.1:10002 --cluster-from 64e634307bdc339b503574f5a77f1b156c021358 --cluster-to 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 --cluster-slots 1365 --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000 --cluster-replace >/dev/null 2>&1
############################ execute reshard #########################################
redis-cli -a redis@password --cluster reshard 127.0.0.1:10003 --cluster-from 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a --cluster-to 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 --cluster-slots 1365 --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000 --cluster-replace >/dev/null 2>&1

################# cluster nodes info: #################
026f0179631f50ca858d46c2b2829b3af71af2c8 127.0.0.1:10004@20004 slave 6164025849a8ff9297664fc835bc851af5004f61 0 1575960493000 64 connected
9f265545ebb799d2773cfc20c71705cff9d733ae 127.0.0.1:10006@20006 slave 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 0 1575960493849 66 connected
64e634307bdc339b503574f5a77f1b156c021358 127.0.0.1:10002@20002 master - 0 1575960494852 65 connected 6826-10922
23e1871c4e1dc1047ce567326e74a6194589146c 127.0.0.1:10005@20005 slave 64e634307bdc339b503574f5a77f1b156c021358 0 1575960492000 65 connected
4854375c501c3dbfb4e2d94d50e62a47520c4f12 127.0.0.1:10008@20008 slave 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 0 1575960493000 67 connected
8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 127.0.0.1:10003@20003 master - 0 1575960493000 66 connected 12288-16383
3645e00a8ec3a902bd6effb4fc20c56a00f2c982 127.0.0.1:10007@20007 myself,master - 0 1575960493000 67 connected 0-1364 5461-6825 10923-12287
6164025849a8ff9297664fc835bc851af5004f61 127.0.0.1:10001@20001 master - 0 1575960492848 64 connected 1365-5460
可见新加的节点成功重新分配了slot,集群扩容成功。

这里有几个需要注意的两个问题,如果是自动化安装的话:
1,add-node之后(不管是柱节点还是从节点),要sleep足够长的时间(这里是20秒),让集群中所有的节点都meet到新节点,否则会扩容失败
2,新节点的reshard之后要sleep足够长的时间(这里是20秒),否则继续reshard其他节点的slot会导致上一个reshared失败

整个过程如下

[root@JD redis_install]# python3 create_redis_cluster.py#########################cleanup instance##########################################################add node into cluster################################# redis-cli --cluster add-node 127.0.0.1:10007 127.0.0.1:10001  -a redis@passwordWarning: Using a password with '-a' or '-u' option on the command line interface may not be safe.>>> Adding node 127.0.0.1:10007 to cluster 127.0.0.1:10001>>> Performing Cluster Check (using node 127.0.0.1:10001)M: 6164025849a8ff9297664fc835bc851af5004f61 127.0.0.1:10001   slots:[0-5460] (5461 slots) master   1 additional replica(s)S: 9f265545ebb799d2773cfc20c71705cff9d733ae 127.0.0.1:10006   slots: (0 slots) slave   replicates 8b75325c59a7242344d0ebe5ee1e0068c66ffa2aM: 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 127.0.0.1:10003   slots:[10923-16383] (5461 slots) master   1 additional replica(s)S: 026f0179631f50ca858d46c2b2829b3af71af2c8 127.0.0.1:10004   slots: (0 slots) slave   replicates 6164025849a8ff9297664fc835bc851af5004f61S: 23e1871c4e1dc1047ce567326e74a6194589146c 127.0.0.1:10005   slots: (0 slots) slave   replicates 64e634307bdc339b503574f5a77f1b156c021358M: 64e634307bdc339b503574f5a77f1b156c021358 127.0.0.1:10002   slots:[5461-10922] (5462 slots) master   1 additional replica(s)[OK] All nodes agree about slots configuration.>>> Check for open slots...>>> Check slots coverage...[OK] All 16384 slots covered.>>> Send CLUSTER MEET to node 127.0.0.1:10007 to make it join the cluster.[OK] New node added correctly.0 redis-cli --cluster add-node 127.0.0.1:10008 127.0.0.1:10007 --cluster-slave --cluster-master-id 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 -a ******Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.>>> Adding node 127.0.0.1:10008 to cluster 127.0.0.1:10007>>> Performing Cluster Check (using node 127.0.0.1:10007)M: 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 127.0.0.1:10007   slots: (0 slots) masterS: 026f0179631f50ca858d46c2b2829b3af71af2c8 127.0.0.1:10004   slots: (0 slots) slave   replicates 6164025849a8ff9297664fc835bc851af5004f61S: 9f265545ebb799d2773cfc20c71705cff9d733ae 127.0.0.1:10006   slots: (0 slots) slave   replicates 8b75325c59a7242344d0ebe5ee1e0068c66ffa2aM: 64e634307bdc339b503574f5a77f1b156c021358 127.0.0.1:10002   slots:[5461-10922] (5462 slots) master   1 additional replica(s)S: 23e1871c4e1dc1047ce567326e74a6194589146c 127.0.0.1:10005   slots: (0 slots) slave   replicates 64e634307bdc339b503574f5a77f1b156c021358M: 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 127.0.0.1:10003   slots:[10923-16383] (5461 slots) master   1 additional replica(s)M: 6164025849a8ff9297664fc835bc851af5004f61 127.0.0.1:10001   slots:[0-5460] (5461 slots) master   1 additional replica(s)[OK] All nodes agree about slots configuration.>>> Check for open slots...>>> Check slots coverage...[OK] All 16384 slots covered.>>> Send CLUSTER MEET to node 127.0.0.1:10008 to make it join the cluster.Waiting for the cluster to join>>> Configure node as replica of 127.0.0.1:10007.[OK] New node added correctly.0#########################reshard slots############################################################# execute reshard #########################################redis-cli -a redis@password --cluster reshard 127.0.0.1:10001 --cluster-from 6164025849a8ff9297664fc835bc851af5004f61 --cluster-to 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 --cluster-slots 1365 --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000   --cluster-replace  >/dev/null 2>&1############################ execute reshard #########################################redis-cli -a redis@password --cluster reshard 127.0.0.1:10002 --cluster-from 64e634307bdc339b503574f5a77f1b156c021358 --cluster-to 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 --cluster-slots 1365 --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000   --cluster-replace  >/dev/null 2>&1############################ execute reshard #########################################redis-cli -a redis@password --cluster reshard 127.0.0.1:10003 --cluster-from 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a --cluster-to 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 --cluster-slots 1365 --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000   --cluster-replace  >/dev/null 2>&1################# cluster nodes info: #################026f0179631f50ca858d46c2b2829b3af71af2c8 127.0.0.1:10004@20004 slave 6164025849a8ff9297664fc835bc851af5004f61 0 1575960493000 64 connected9f265545ebb799d2773cfc20c71705cff9d733ae 127.0.0.1:10006@20006 slave 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 0 1575960493849 66 connected64e634307bdc339b503574f5a77f1b156c021358 127.0.0.1:10002@20002 master - 0 1575960494852 65 connected 6826-1092223e1871c4e1dc1047ce567326e74a6194589146c 127.0.0.1:10005@20005 slave 64e634307bdc339b503574f5a77f1b156c021358 0 1575960492000 65 connected4854375c501c3dbfb4e2d94d50e62a47520c4f12 127.0.0.1:10008@20008 slave 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 0 1575960493000 67 connected8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 127.0.0.1:10003@20003 master - 0 1575960493000 66 connected 12288-163833645e00a8ec3a902bd6effb4fc20c56a00f2c982 127.0.0.1:10007@20007 myself,master - 0 1575960493000 67 connected 0-1364 5461-6825 10923-122876164025849a8ff9297664fc835bc851af5004f61 127.0.0.1:10001@20001 master - 0 1575960492848 64 connected 1365-5460[root@JD redis_install]#

Redis集群缩容

缩容按道理是扩容的反向操作.
从这个命令就可以看出来:del-node host:port node_id #删除给定的一个节点,成功后关闭该节点服务。
缩容就缩容了,从集群中移除掉(cluster forget nodeid)某个主节点就行了,为什么还要关闭?因此本文不会采用redis-cli --cluster del-node的方式缩容,而是通过普通命令行来缩容。

这里的自定义缩容实质上分两步
1,将移除的主节点的slot分配回集群中其他节点,这里测试四个主节点缩容为三个主节点,实际上执行命令如下。
2,集群中的节点依次执行cluster forget master_node_id(slave_node_id)

############################ execute reshard #########################################
redis-cli -a ****** --cluster reshard 127.0.0.1:10001 --cluster-from 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 --cluster-to 6164025849a8ff9297664fc835bc851af5004f61 --cluster-slots 1365 --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000 --cluster-replace >/dev/null 2>&1
############################ execute reshard #########################################
redis-cli -a ****** --cluster reshard 127.0.0.1:10002 --cluster-from 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 --cluster-to 64e634307bdc339b503574f5a77f1b156c021358 --cluster-slots 1365 --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000 --cluster-replace >/dev/null 2>&1
############################ execute reshard #########################################
redis-cli -a ****** --cluster reshard 127.0.0.1:10003 --cluster-from 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 --cluster-to 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a --cluster-slots 1365 --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000 --cluster-replace >/dev/null 2>&1

{'host': '127.0.0.1', 'port': 10001, 'password': '******'}--->cluster forget 3645e00a8ec3a902bd6effb4fc20c56a00f2c982
{'host': '127.0.0.1', 'port': 10001, 'password': '******'}--->cluster forget 4854375c501c3dbfb4e2d94d50e62a47520c4f12
{'host': '127.0.0.1', 'port': 10002, 'password': '******'}--->cluster forget 3645e00a8ec3a902bd6effb4fc20c56a00f2c982
{'host': '127.0.0.1', 'port': 10002, 'password': '******'}--->cluster forget 4854375c501c3dbfb4e2d94d50e62a47520c4f12
{'host': '127.0.0.1', 'port': 10003, 'password': '******'}--->cluster forget 3645e00a8ec3a902bd6effb4fc20c56a00f2c982
{'host': '127.0.0.1', 'port': 10003, 'password': '******'}--->cluster forget 4854375c501c3dbfb4e2d94d50e62a47520c4f12
{'host': '127.0.0.1', 'port': 10004, 'password': '******'}--->cluster forget 3645e00a8ec3a902bd6effb4fc20c56a00f2c982
{'host': '127.0.0.1', 'port': 10004, 'password': '******'}--->cluster forget 4854375c501c3dbfb4e2d94d50e62a47520c4f12
{'host': '127.0.0.1', 'port': 10005, 'password': '******'}--->cluster forget 3645e00a8ec3a902bd6effb4fc20c56a00f2c982
{'host': '127.0.0.1', 'port': 10005, 'password': '******'}--->cluster forget 4854375c501c3dbfb4e2d94d50e62a47520c4f12
{'host': '127.0.0.1', 'port': 10006, 'password': '******'}--->cluster forget 3645e00a8ec3a902bd6effb4fc20c56a00f2c982
{'host': '127.0.0.1', 'port': 10006, 'password': '******'}--->cluster forget 4854375c501c3dbfb4e2d94d50e62a47520c4f12

完整代码如下

[root@JD redis_install]# python3 create_redis_cluster.py
############################ execute reshard #########################################
redis-cli -a ****** --cluster reshard 127.0.0.1:10001 --cluster-from 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 --cluster-to 6164025849a8ff9297664fc835bc851af5004f61 --cluster-slots 1365 --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000 --cluster-replace >/dev/null 2>&1
############################ execute reshard #########################################
redis-cli -a ****** --cluster reshard 127.0.0.1:10002 --cluster-from 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 --cluster-to 64e634307bdc339b503574f5a77f1b156c021358 --cluster-slots 1365 --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000 --cluster-replace >/dev/null 2>&1
############################ execute reshard #########################################
redis-cli -a ****** --cluster reshard 127.0.0.1:10003 --cluster-from 3645e00a8ec3a902bd6effb4fc20c56a00f2c982 --cluster-to 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a --cluster-slots 1365 --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000 --cluster-replace >/dev/null 2>&1
{'host': '127.0.0.1', 'port': 10001, 'password': '******'}--->cluster forget 3645e00a8ec3a902bd6effb4fc20c56a00f2c982
{'host': '127.0.0.1', 'port': 10001, 'password': '******'}--->cluster forget 4854375c501c3dbfb4e2d94d50e62a47520c4f12
{'host': '127.0.0.1', 'port': 10002, 'password': '******'}--->cluster forget 3645e00a8ec3a902bd6effb4fc20c56a00f2c982
{'host': '127.0.0.1', 'port': 10002, 'password': '******'}--->cluster forget 4854375c501c3dbfb4e2d94d50e62a47520c4f12
{'host': '127.0.0.1', 'port': 10003, 'password': '******'}--->cluster forget 3645e00a8ec3a902bd6effb4fc20c56a00f2c982
{'host': '127.0.0.1', 'port': 10003, 'password': '******'}--->cluster forget 4854375c501c3dbfb4e2d94d50e62a47520c4f12
{'host': '127.0.0.1', 'port': 10004, 'password': '******'}--->cluster forget 3645e00a8ec3a902bd6effb4fc20c56a00f2c982
{'host': '127.0.0.1', 'port': 10004, 'password': '******'}--->cluster forget 4854375c501c3dbfb4e2d94d50e62a47520c4f12
{'host': '127.0.0.1', 'port': 10005, 'password': '******'}--->cluster forget 3645e00a8ec3a902bd6effb4fc20c56a00f2c982
{'host': '127.0.0.1', 'port': 10005, 'password': '******'}--->cluster forget 4854375c501c3dbfb4e2d94d50e62a47520c4f12
{'host': '127.0.0.1', 'port': 10006, 'password': '******'}--->cluster forget 3645e00a8ec3a902bd6effb4fc20c56a00f2c982
{'host': '127.0.0.1', 'port': 10006, 'password': '******'}--->cluster forget 4854375c501c3dbfb4e2d94d50e62a47520c4f12
################# cluster nodes info: #################
23e1871c4e1dc1047ce567326e74a6194589146c 127.0.0.1:10005@20005 slave 64e634307bdc339b503574f5a77f1b156c021358 0 1575968426000 76 connected
026f0179631f50ca858d46c2b2829b3af71af2c8 127.0.0.1:10004@20004 slave 6164025849a8ff9297664fc835bc851af5004f61 0 1575968422619 75 connected
6164025849a8ff9297664fc835bc851af5004f61 127.0.0.1:10001@20001 myself,master - 0 1575968426000 75 connected 0-5460
9f265545ebb799d2773cfc20c71705cff9d733ae 127.0.0.1:10006@20006 slave 8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 0 1575968425000 77 connected
8b75325c59a7242344d0ebe5ee1e0068c66ffa2a 127.0.0.1:10003@20003 master - 0 1575968427626 77 connected 10923-16383
64e634307bdc339b503574f5a77f1b156c021358 127.0.0.1:10002@20002 master - 0 1575968426000 76 connected 5461-10922

[root@JD redis_install]#

其实到这里并没有结束,这里要求缩容之后集群中的所有节点都要成功地执行cluster forget master_node_id(和slave_node_id)
否则其他节点仍然有10007节点的心跳信息,超过1分钟之后,仍旧会将已经踢出集群的10007节点(以及从节点10008)会被添加回来
这就一开始就遇到一个奇葩问题,因为没有在缩容后的集群的slave节点上执行cluster forget,被移除的节点,会不断地被添加回来……。
参考这里:http://www.redis.cn/commands/cluster-forget.html

完整的代码实现如下

import osimport timeimport redisfrom time import ctime,sleepdef create_redis_cluster(list_master_node,list_slave_node):    print('################# flush master/slave slots #################')    for node in list_master_node:        currenrt_conn = redis.StrictRedis(host=node["host"], port=node["port"], password=node["password"], decode_responses=True)        currenrt_conn.execute_command('flushall')        currenrt_conn.execute_command('cluster reset')    for node in list_slave_node:        currenrt_conn = redis.StrictRedis(host=node["host"], port=node["port"], password=node["password"], decode_responses=True)        #currenrt_conn.execute_command('flushall')        currenrt_conn.execute_command('cluster reset')    print('################# create cluster #################')    master_nodes = ''    for node in list_master_node:        master_nodes = master_nodes + node["host"] + ':' + str(node["port"]) + ' '    command = "redis-cli --cluster create {0}  -a ****** --cluster-yes".format(master_nodes)    print(command)    msg = os.system(command)    print(msg)    time.sleep(5)    print('################# add slave nodes #################')    counter = 0    for node in list_master_node:        currenrt_conn = redis.StrictRedis(host=node["host"], port=node["port"], password=node["password"], decode_responses=True)        current_master_node = node["host"] + ':' + str(node["port"])        current_slave_node = list_slave_node[counter]["host"] + ':' + str(list_slave_node[counter]["port"])        myid = currenrt_conn.cluster('myid')        #slave 节点在前,master节点在后        command = "redis-cli --cluster add-node {0} {1} --cluster-slave --cluster-master-id {2} -a ****** ". format(current_slave_node,current_master_node,myid)        print(command)        msg = os.system(command)        counter = counter + 1        print(msg)    # show cluster nodes info    time.sleep(10)    print("################# cluster nodes info: #################")    cluster_nodes = currenrt_conn.execute_command('cluster nodes')    print(cluster_nodes)# 返回扩容后,原始节点中,每个主节点需要迁出的slot数量def get_migrated_slot(list_master_node,n):    migrated_slot_count = int(16384/len(list_master_node)) - int(16384/(len(list_master_node)+n))    return migrated_slot_countdef redis_cluster_expansion(list_master_node,dict_master_node,dict_slave_node):    new_master_node =  dict_master_node["host"] + ':' + str(dict_master_node["port"])    new_slave_node = dict_slave_node["host"] + ':' + str(dict_slave_node["port"])    print("#########################cleanup instance#################################")    new_master_conn = redis.StrictRedis(host=dict_master_node["host"], port=dict_master_node["port"], password=dict_master_node["password"], decode_responses=True)    new_master_conn.execute_command('flushall')    new_master_conn.execute_command('cluster reset')    new_master_id = new_master_conn.cluster('myid')    new_slave_conn = redis.StrictRedis(host=dict_slave_node["host"], port=dict_slave_node["port"], password=dict_slave_node["password"], decode_responses=True)    new_slave_conn.execute_command('cluster reset')    new_slave_id = new_slave_conn.cluster('myid')    #new_slave_conn.execute_command('slaveof no one')    # 判断新增的节点是否归属于当前集群,    # 如果已经归属于当前集群且不占用slot,则先踢出当前集群 cluster forget nodeid,或者终止,给出告警,总之,怎么开心怎么来    # 登录集群中的任何一个节点    cluster_node_conn = redis.StrictRedis(host=list_master_node[0]["host"], port=list_master_node[0]["port"], password=list_master_node[0]["password"],decode_responses=True)    dict_node_info = cluster_node_conn.cluster('nodes')    '''dict_node_info format example :    {    '127.0.0.1:10008@20008': {'node_id': '1d10c3ce3b9b7f956a26122980827fe6ce623d22', 'flags': 'master', 'master_id': '-','last_ping_sent': '0', 'last_pong_rcvd': '1575599442000', 'epoch': '8', 'slots': [], 'connected': True},     '127.0.0.1:10002@20002': {'node_id': '64e634307bdc339b503574f5a77f1b156c021358', 'flags': 'master', 'master_id': '-', 'last_ping_sent': '0', 'last_pong_rcvd': '1575599442000', 'epoch': '7', 'slots': [['5461', '10922']], 'connected': True},     '127.0.0.1:10001@20001': {'node_id': '6164025849a8ff9297664fc835bc851af5004f61', 'flags': 'myself,master', 'master_id': '-', 'last_ping_sent': '0', 'last_pong_rcvd': '1575599438000', 'epoch': '6', 'slots': [['0', '5460']], 'connected': True},     '127.0.0.1:10007@20007': {'node_id': '307f589ec7b1eb7bd65c680527afef1e30ce2303', 'flags': 'master', 'master_id': '-', 'last_ping_sent': '0', 'last_pong_rcvd': '1575599443599', 'epoch': '5', 'slots': [], 'connected': True},     '127.0.0.1:10005@20005': {'node_id': '23e1871c4e1dc1047ce567326e74a6194589146c', 'flags': 'slave', 'master_id': '64e634307bdc339b503574f5a77f1b156c021358', 'last_ping_sent': '0', 'last_pong_rcvd': '1575599441000', 'epoch': '7', 'slots': [], 'connected': True},     '127.0.0.1:10004@20004': {'node_id': '026f0179631f50ca858d46c2b2829b3af71af2c8', 'flags': 'slave', 'master_id': '6164025849a8ff9297664fc835bc851af5004f61', 'last_ping_sent': '0', 'last_pong_rcvd': '1575599440000', 'epoch': '6', 'slots': [], 'connected': True},     '127.0.0.1:10006@20006': {'node_id': '9f265545ebb799d2773cfc20c71705cff9d733ae', 'flags': 'slave', 'master_id': '8b75325c59a7242344d0ebe5ee1e0068c66ffa2a', 'last_ping_sent': '0', 'last_pong_rcvd': '1575599442000', 'epoch': '8', 'slots': [], 'connected': True},     '127.0.0.1:10003@20003': {'node_id': '8b75325c59a7242344d0ebe5ee1e0068c66ffa2a', 'flags': 'master', 'master_id': '-', 'last_ping_sent': '0', 'last_pong_rcvd': '1575599442599', 'epoch': '8', 'slots': [['10923', '16383']], 'connected': True}    }    '''    dict_master_node_in_cluster = 0    dict_slave_node_in_cluster = 0    for key_node in dict_node_info:        if new_master_node in key_node:            dict_master_node_in_cluster = 1            if len(dict_node_info[key_node]['slots']) > 0:                print('error: ' +new_master_node + ' already existing in cluster and alloted slots,execute break......')                return        if new_slave_node in key_node:            dict_slave_node_in_cluster = 1            if len(dict_node_info[key_node]['slots']) > 0:                print('error: ' +new_slave_node + ' already existing in cluster and alloted slots,execute break......')                return    if dict_master_node_in_cluster == 1:        for master_node in list_master_node:            key_node_conn = redis.StrictRedis(host=master_node["host"], port=master_node["port"],password=master_node["password"], decode_responses=True)            print('waring: ' + new_master_node + ' already existing in cluster,cluster forget it......')            forget_command = 'cluster forget {0}'.format(new_master_id)            key_node_conn.execute_command(forget_command)    if dict_slave_node_in_cluster == 1:        for master_node in list_master_node:            key_node_conn = redis.StrictRedis(host=master_node["host"], port=master_node["port"],password=master_node["password"], decode_responses=True)            print('waring: ' + new_slave_node + ' already existing in cluster,forget it......')            forget_command = 'cluster forget {0}'.format(new_slave_id)            key_node_conn.execute_command(forget_command)    print("#########################add node into cluster#################################")    try:        cluster_node = list_master_node[0]["host"] + ':' + str(list_master_node[0]["port"])        # 1,待加入节点在前,第二个节点为集群中的任意一个节点        add_node_command = " redis-cli --cluster add-node {0} {1}  -a ****** ".format(new_master_node,cluster_node)        print(add_node_command)        print(os.system(add_node_command))        time.sleep(20)        # slave 节点在前,master节点在后        add_node_command = " redis-cli --cluster add-node {0} {1} --cluster-slave --cluster-master-id {2} -a ****** ". format(new_slave_node,new_master_node,new_master_id)        print(add_node_command)        print(os.system(add_node_command))        time.sleep(20)    except Exception as e:        print('add new node error,the reason is:')        print(e)    print("#########################reshard slots#################################")    migrated_slot_count = get_migrated_slot(list_master_node,1)    for node in list_master_node:        current_master_conn = redis.StrictRedis(host=node["host"], port=node["port"], password=node["password"], decode_responses=True)        current_master_node = node["host"] + ':' + str(node["port"])        current_master_node_id = current_master_conn.cluster('myid')        '''        example:3节点-->扩容4节点,每个迁移1365        '''        try:            command = r'''redis-cli -a ****** --cluster reshard {0} --cluster-from {1} --cluster-to {2} --cluster-slots {3} --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000   --cluster-replace  >/dev/null 2>&1 '''. format(current_master_node,current_master_node_id,new_master_id,migrated_slot_count)            print('############################ execute reshard #########################################')            print(command)            msg = os.system(command)            time.sleep(20)        except Exception as e:            print('reshard slots error,the reason is:')            print(e)    print("################# cluster nodes info: #################")    cluster_nodes = new_master_conn.execute_command('cluster nodes')    print(cluster_nodes)def redis_cluster_shrinkage(list_master_node,list_slave_node,dict_master_node,dict_slave_node):    # 判断新增的节点是否归属于当前集群,    # 如果不归属当前集群,则退出    cluster_node_conn = redis.StrictRedis(host=list_master_node[0]["host"], port=list_master_node[0]["port"], password=list_master_node[0]["password"],decode_responses=True)    dict_node_info = cluster_node_conn.cluster('nodes')    removed_master_node = dict_master_node["host"] + ':' + str(dict_master_node["port"])+'@'+str(dict_master_node["port"]+10000)    removed_slave_node = dict_slave_node["host"] + ':' + str(dict_slave_node["port"])+'@'+str(dict_slave_node["port"]+10000)    if not removed_master_node in dict_node_info.keys():        print('Error:'+ str(removed_master_node) +' not in cluster,exiting')        return    if not removed_slave_node in dict_node_info.keys():        print('Error:' + str(removed_slave_node) + ' not in cluster,exiting')        return    removed_master_conn = redis.StrictRedis(host=dict_master_node["host"], port=dict_master_node["port"], password=dict_master_node["password"], decode_responses=True)    removed_master_id = removed_master_conn.cluster('myid')    removed_slave_conn = redis.StrictRedis(host=dict_slave_node["host"], port=dict_slave_node["port"], password=dict_slave_node["password"], decode_responses=True)    removed_slave_id = removed_slave_conn.cluster('myid')    for node in list_master_node:        current_master_conn = redis.StrictRedis(host=node["host"], port=node["port"], password=node["password"], decode_responses=True)        current_master_node = node["host"] + ':' + str(node["port"])        current_master_node_id = current_master_conn.cluster('myid')        '''        4节点-->缩容3节点,平均将slot归还到三个master节点        '''        try:            command = r'''redis-cli -a ****** --cluster reshard {0} --cluster-from {1} --cluster-to {2} --cluster-slots 1365 --cluster-yes --cluster-timeout 50000 --cluster-pipeline 10000   --cluster-replace  >/dev/null 2>&1 '''.                format(current_master_node, removed_master_id, current_master_node_id)            print('############################ execute reshard #########################################')            print(command)            msg = os.system(command)            time.sleep(10)        except Exception as e:            print('reshard slots error,the reason is:')            print(e)    removed_master_conn.execute_command('cluster reset')    removed_slave_conn.execute_command('cluster reset')    for master_node in list_master_node:        master_node_conn =  redis.StrictRedis(host=master_node["host"], port=master_node["port"],password=master_node["password"], decode_responses=True)        foget_master_command = 'cluster forget {0}'.format(removed_master_id)        foget_slave_command = 'cluster forget {0}'.format(removed_slave_id)        print(str(master_node)+ '--->' + foget_master_command)        print(str(master_node)+ '--->' + foget_slave_command)        master_node_conn.execute_command(foget_master_command)        master_node_conn.execute_command(foget_slave_command)    for slave_node in list_slave_node:        slave_node_conn = redis.StrictRedis(host=slave_node["host"], port=slave_node["port"], password=slave_node["password"], decode_responses=True)        foget_master_command = 'cluster forget {0}'.format(removed_master_id)        foget_slave_command = 'cluster forget {0}'.format(removed_slave_id)        print(str(slave_node)+ '--->' +foget_master_command)        print(str(slave_node)+ '--->' +foget_slave_command)        slave_node_conn.execute_command(foget_master_command)        slave_node_conn.execute_command(foget_slave_command)    print("################# cluster nodes info: #################")    cluster_nodes = cluster_node_conn.execute_command('cluster nodes')    print(cluster_nodes)if __name__ == '__main__':    # master    node_1 = {'host': '127.0.0.1', 'port': 10001, 'password': '******'}    node_2 = {'host': '127.0.0.1', 'port': 10002, 'password': '******'}    node_3 = {'host': '127.0.0.1', 'port': 10003, 'password': '******'}    # slave    node_4 = {'host': '127.0.0.1', 'port': 10004, 'password': '******'}    node_5 = {'host': '127.0.0.1', 'port': 10005, 'password': '******'}    node_6 = {'host': '127.0.0.1', 'port': 10006, 'password': '******'}    # 主从节点个数必须相同    list_master_node = [node_1, node_2, node_3]    list_slave_node = [node_4, node_5, node_6]        # 自动化集群创建    #create_redis_cluster(list_master_node,list_slave_node)    # 自动化扩容    node_1 = {'host': '127.0.0.1', 'port': 10007, 'password': '******'}    node_2 = {'host': '127.0.0.1', 'port': 10008, 'password': '******'}    redis_cluster_expansion(list_master_node,node_1,node_2)        # 自动化缩容,    #redis_cluster_shrinkage(list_master_node,list_slave_node,node_1,node_2)

 

参考:https://www.cnblogs.com/zhoujinyi/p/11606935.html

(0)

相关推荐

  • 硬核干货!Redis 分布式集群部署实战

    原理: Redis集群采用一致性哈希槽的方式将集群中每个主节点都分配一定的哈希槽,对写入的数据进行哈希后分配到某个主节点进行存储. 集群使用公式(CRC16 key)& 16384计算键key ...

  • redis学习

    邻近年底,决定辞职,自认为即使没有找到很满意的工作,也应该不会差.事实是有点自信回头,虽然找到了工作,但途中也受到了打击,不得不承认学习的激情不知何时,渐渐熄灭.直到认识一个女生,她让我对未来充满着信 ...

  • PXE Kickstart实现自动化安装CentOS系统

    上一篇文档介绍了CentOS系统下PXE服务器的搭建与部署,今天重点介绍PXE结合Kickstart实现无人值守自动化安装CentOS系统 通常,我们在安装操作系统的过程中,需要大量的和服务器交互操作 ...

  • Debian10.7 自动化安装镜像制作

    Debian10.7 自动化安装镜像制作 一.debian的自动化安装方式介绍 第一种.FAI方式自动化安装 FAI是用于Linux的无人值守大规模部署的工具 具体方法可以参考https://fai- ...

  • NTP时间同步 服务端 客户端 自动化安装配置

    #!/bin/sh#运行环境 centos6.centos7# NTP时间同步 服务端 客户端 自动化安装配置#Mady by Elvenfunction setntp1() {. /etc/init ...

  • Fluid 0.5版本发布:开启数据集缓存在线弹性扩缩容之路

    导读:为了解决大数据.AI 等数据密集型应用在云原生场景下,面临的异构数据源访问复杂.存算分离 I/O 速度慢.场景感知弱调度低效等痛点问题,南京大学PASALab.阿里巴巴.Alluxio 在 20 ...

  • 如何根据不同业务场景调节 HPA 扩缩容灵敏度

    背景 在 K8s 1.18 之前,HPA 扩容是无法调整灵敏度的: 对于缩容,由 kube-controller-manager 的 --horizontal-pod-autoscaler-downs ...

  • 实现Redis Cluster并实现Python链接集群

    目录 一.Redis Cluster简单介绍 二.背景 三.环境准备 3.1 主机环境 3.2 主机规划 四.部署Redis 4.1 安装Redis软件 4.2 编辑Redis配置文件 4.3 启动R ...

  • Redis Cluster

    前言 redis 是我们目前大规模使用的缓存中间件,由于它强大高效而又便捷的功能,得到了广泛的使用.现在的2.x的稳定版本是2.8.19,也是我们项目中普遍用到的版本. redis在年初发布了3.0. ...

  • 如何在Centos下快速安装redis

    如何在Centos下快速安装redis 什么是redis? Redis是现在最受欢迎的NoSQL数据库之一,Redis是一个使用ANSI C编写的开源.包含多种数据结构.支持网络.基于内存.可选持久性 ...

  • Redis系列(一):介绍、安装(Docker、Windows、Linux)

    一.介绍 Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cac ...