# Redis-cluster监控部署方案

redis-cluster监控部署方案

## 概述

生产中为了避免单点故障，生产环境中redis升级为扩展模式，需要对redis进行监控，一旦有中断出现故障便触发报警。Redis有自带的redis-cli客户端，通过群集信息命令能查询到生成的运行情况，我们可以写一个shell脚本，通过zabbix来调用这个脚本实现监控。

## 一、集群信息命令的使用

命令格式：

redis-cli -h \[主机名] -p \[端口] -a \[密码]集群信息

1、 查询可执行运行状态，1为正常，0为故障

```bash
redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info | grep "cluster_state"
```

2、 redis赋予已分配的槽

```bash
redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info | grep -w "cluster_slots_assigned" | awk -F':' '{print $2}'
```

3、 redis分配槽的状态是ok的数量

```bash
redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info | grep -w "cluster_slots_ok" | awk -F':' '{print $2}'
```

4、 redis可行可能失效的槽数

```bash
redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info | grep -w "cluster_slots_pfail" | awk -F':' '{print $2}'
```

5、 redis能够已经失效的槽数

```bash
redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info | grep -w "cluster_slots_fail" | awk -F':' '{print $2}'
```

6、redis赋予中所有子系统个数

```bash
redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info | grep -w "cluster_known_nodes" | awk -F':' '{print $2}'
```

7、redis赋予中主报价个数

```bash
redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info | grep -w "cluster_size" | awk -F':' '{print $2}'
```

8、redis部署中的currentEpoch值

（累积中的currentEpoch总是一致的，currentEpoch变为，代表路由器的配置或操作越新，积累中最大的那个node epoch

当发生的状态发生改变时，某些代表为了执行一些动作需要寻求其他例程的同意时，就会增加currentEpoch的值）

```bash
redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info | grep -w "cluster_current_epoch" | awk -F':' '{print $2}'
```

9，redis容量中二进制总线接收的消息数量

```bash
redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info | grep -w "cluster_stats_messages_sent" | awk -F':' '{print $2}'
```

10，redis容量中二进制总线发送的消息数量

```bash
redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info | grep -w "cluster_stats_messages_received" | awk -F':' '{print $2}'
```

11，redis端口存活监控

```bash
redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 ping | grep -c PONG
```

## 二、创建监控脚本

### 监控redis幸存的脚本

```bash
# cat redis-port.sh

#!/bin/bash
#根据实际情况修改
REDISCLI="/usr/local/bin/redis-cli"
HOST="192.168.2.14"
port1=7001
port2=7002
port3=7003
port4=7004
port5=7005
port6=7006
if [[ $# == 1 ]];then
case $1 in
7001)
result=`$REDISCLI -h $HOST -p $port1 ping 2>/dev/null | grep -c PONG`
echo $result
;;
7002)
result=`$REDISCLI -h $HOST -p $port2 ping 2>/dev/null | grep -c PONG`
echo $result
;;
7003)
result=`$REDISCLI -h $HOST -p $port3 ping 2>/dev/null | grep -c PONG`
echo $result
;;
7004)
result=`$REDISCLI -h $HOST -p $port4 ping 2>/dev/null | grep -c PONG`
echo $result
;;
7005)
result=`$REDISCLI -h $HOST -p $port5 ping 2>/dev/null | grep -c PONG`
echo $result
;;
7006)
result=`$REDISCLI -h $HOST -p $port6 ping 2>/dev/null | grep -c PONG`
echo $result
;;
*)
echo -e "\033[33mUsage: $0 {7001|7002|7003}\033[0m"
;;
esac
fi
```

### 监控redis-cluster部署的脚本

```bash
# redis-cluster.sh

#!/bin/bash
#根据实际情况修改
REDISCLI="/usr/local/bin/redis-cli"
HOST="192.168.2.14"
PORT=`ps -ef | grep -w redis-server | grep -v "grep" | awk '{print $9}' | awk -F ':' '{print $2}' | head -1`

if [[ $# == 1 ]];then
case $1 in
cluster_state)
result=`$REDISCLI -h $HOST -p $PORT cluster info 2>/dev/null | grep -w "cluster_state" | awk -F':' '{print $2}' | grep -c "ok"`
echo $result
;;
cluster_slots_assigned)
result=`$REDISCLI -h $HOST -p $PORT cluster info 2>/dev/null | grep -w "cluster_slots_assigned" | awk -F':' '{print $2}'`
echo $result
;;
cluster_slots_ok)
result=`$REDISCLI -h $HOST -p $PORT cluster info 2>/dev/null | grep -w "cluster_slots_ok" | awk -F':' '{print $2}'`
echo $result
;;
cluster_slots_pfail)
result=`$REDISCLI -h $HOST -p $PORT cluster info 2>/dev/null | grep -w "cluster_slots_pfail" | awk -F':' '{print $2}'`
echo $result
;;
cluster_slots_fail)
result=`$REDISCLI -h $HOST -p $PORT cluster info 2>/dev/null | grep -w "cluster_slots_fail" | awk -F':' '{print $2}'`
echo $result
;;
cluster_known_nodes)
result=`$REDISCLI -h $HOST -p $PORT cluster info 2>/dev/null | grep -w "cluster_known_nodes" | awk -F':' '{print $2}'`
echo $result
;;
cluster_size)
result=`$REDISCLI -h $HOST -p $PORT cluster info 2>/dev/null | grep -w "cluster_size" | awk -F':' '{print $2}'`
echo $result
;;
cluster_current_epoch)
result=`$REDISCLI -h $HOST -p $PORT cluster info 2>/dev/null | grep -w "cluster_current_epoch" | awk -F':' '{print $2}'`
echo $result
;;
cluster_stats_messages_sent)
result=`$REDISCLI -h $HOST -p $PORT cluster info 2>/dev/null | grep -w "cluster_stats_messages_sent" | awk -F':' '{print $2}'`
echo $result
;;
cluster_stats_messages_received)
result=`$REDISCLI -h $HOST -p $PORT cluster info 2>/dev/null | grep -w "cluster_stats_messages_received" | awk -F':' '{print $2}'`
echo $result
;;
*)
echo -e "\033[33mUsage: $0 {cluster_state|cluster_slots_assigned|cluster_slots_ok|cluster_slots_pfail|cluster_slots_fail|cluster_known_nodes|cluster size|cluster current epoch|cluster_stats_messages_sent|cluster_stats_messages_received}\033[0m"
;;
esac
fi
```

### 授予脚本调试权限

```bash
[root@admin conf]# chmod +x redis-cluster.sh
[root@admin conf]# chmod +x redis-port.sh
```

### 脚本测试

```bash
# 脚本测试
[root@admin conf]# ./redis-cluster.sh cluster_state
1

# 查看redis 7001端口状态
[root@admin conf]# ./redis-port.sh 7001
1
```

### 创建redis分布式监控配置文件

> 结合 zabbix

```bash
# redis-cluster实施监控
UserParameter = Redis.Cluster[*],/home/zabbix/conf/redis-cluster.sh $1

# redis-port端口监控
UserParameter = Redis.Port[*],/home/zabbix/conf/redis-port.sh $1
```

```bash
systemctl restart zabbix-agentd
```

```bash
[root@zabbix ~]# zabbix_get -s 192.168.2.14 -p 10050 -k "Redis.Cluster[cluster_state]"
1
[root@zabbix ~]# zabbix_get -s 192.168.2.14 -p 10050 -k "Redis.Port[7001]"
1
```

### zabbix上创建redis监控模板

* 创建redis-cluster模板名称

![](/files/vGsEqu8iRlK4nSDE99KB)

* 创建监控项

例如：（根据自身情况决定）&#x20;

名称：redis能够运行状态监测&#x20;

类型：zabbix客户端 键值：Redis.Cluster \[cluster\_state]&#x20;

信息类型：数字（无正负）&#x20;

更新间隔：30s

![](/files/7VYhg83nW31Ed0N8akfn)

![](/files/lJu0EYSrZpLQPGmSBd0m)

![](/files/mMaUqwha7LzzeHpWwvLn)

* 创造主轴

例如：（根据自身情况决定）&#x20;

名称：redis发挥状态失败

严重性：一般严重

> 表达式：{Redis-cluster Service：Redis.Cluster \[cluster\_state] .count（#3,1,"ne"）}> 2

![](/files/bPFShfr7hUiQMkU6oFzD)

![](/files/tDPhgeQzL21pqxjWYhE3)

![](/files/DE9UuAV3VXT9YAlVnB6G)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://close.gitbook.io/yun-wei-bi-ji/centos/redis/rediscluster-jian-kong-bu-shu-fang-an.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
