监控-基础指标

Node 是否运行

[root@k8s-node02 rules]# cat node.yml 
groups:
- name: Node_exporter Down
  rules:
  - alert: Node实例已宕机
    expr: up == 0
    for: 10s
    labels:
      user: root
      severity: Warning
    annotations:
      summary: "{{ $labels.job }}"
      address: "{{ $labels.instance }}"
      description: "Node_exporter 客户端在1分钟内连接失败."

CPU 使用率

[root@k8s-node02 rules]# cat cpu.yml 
groups:
- name: CPU
  rules:
  - alert: CPU使用率过高
    expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 90
    for: 1m
    labels:
      severity: Warning
    annotations:
      summary: "{{ $labels.instance }} CPU使用率过高"
      description: "{{ $labels.instance }}: CPU使用率超过90%,当前使用率({{ $value }})."

MEM使用率


总基础指标

Last updated