Redis 客户端和工具集 潘海龙 平安健康互联网 2018-03-29
建立集群 节点可用性检测 1.节点连通性 2.是否配置了cluster模式(if cluster_enabled==1?) 3、是否已经为其他集群成员(if cluster_known_nodes==1?) 4.检查至少有三个可用节点 配置slot和节点role 对主节点进行slot分配(addslots),cluster_state become ok and cluster_slots_assigned become not null 为每个节点设置不同的config_epoch(set-config-epoch) 节点加入集群(CLUSTER MEET) 主从配置(replicate)
Reshard redis-trib.rb reshard --from 36f35d6a4b7e124fa36769f3ba8abb31ac1c56dd,7aafe0d0318cba23837b9833c1398eb315329539,9eb273fc679d2687e40db8fc78e8d4654cb10c3f,ea71af515c10cbe6574176760728a4f2dfa03eaa,00403a9e516ba69ad1b22d629c44c36a2a0d6fc9,d77f827188851248ee6860ff984be9c965768445 --to aa9f6da7fab5ee3de97320dd5f31a32797eb67ee --slots 1365 --yes 10.129.160.28:7011
计算需要的slot 775行 通过slot个数对源节点进行排序,slot多的排在前面 776行-778行 计算源节点的slot总个数 779行-785行 可以看到按照节点占slot总数的百分比来迁移slot,及slot个数越多的节点将被迁移更多。还可以看到slot节点最多的节点会为slot的最大整数 786行-791行 将slot的分派到节点的信息插入moved变量中
客户端初始化 Wait refresh interval , near real-time search
节点选择 Translog reach maximum size or Every 30 minutes, flush is trigged .
场景: Master: 10.129.80.49:7013 Slave: 10.129.80.49:8013 过程: 场景: Master: 10.129.80.49:7013 Slave: 10.129.80.49:8013 过程: startup_nodes Time.sleep(60) Kill 10.129.80.49:7013 Set a 1 客户端反应: ConnectionError (10.129.80.49:7013) Remove connection(10.129.80.49:7013) Random node(10.129.80.49:7012) MoveError(MOVED 15495 10.129.80.49:8013) update partitial node and slot information(10.129.80.49:8013替换10.129.80.49:7013) Send command(10.129.80.49:8013) Query cause view every segments ,so use file handles, memory, and CPU.merge.merge process is automatic .use lot of cpu and io
1.创建socket连接(ClusterConnectionPool.make_connection) 2.线程池的维护 命令接口调用主要涉及三个重要环节, 1.创建socket连接(ClusterConnectionPool.make_connection) 2.线程池的维护 3.命令的执行(connection.Connection.send_packed_command) Set a 1转换为Redis协议 [set,a ,1] AOF分析 Master node is chosed among the master eligible node by election.Master eligible node hande with distribution shard ,create drop index.Data node storage data and perform action related to search,prodcast, aggregate. 模拟客户端
shard = hash(routing) % number_of_primary_shards
2. Broadcast request (more replicate more time it cost) 3 2.Broadcast request (more replicate more time it cost) 3.return id and scrore 4. aggregate
Tcp通信
Nagle和DelayedAcknowledgment
THANKS