跨VPC网络-二进制ETCD集群

etcd 是分布式键值对数据库,应用于存储 k8s 集群的配置、运行数据等。

虽然 k8s 官方部署工具 kubeadm 可以快速地搭建起 etcd 集群,但是也造成一个不好的影响:etcd 的管理跟 k8s 集群耦合在一起。

在生产环境中,我们更希望数据库是独立于其他系统。

先部署独立的外部 etcd 集群,再部署 k8s 集群,既有利于后续部署高可用集群,又方便运维管理,是更加稳健的生产方案。

外部 etcd 集群部署极其灵活:在节点数量上,既可以是单个节点,也可以是多个节点;在部署位置上,既可以部署在 k8s 集群节点上,也可以部署在独立的非 k8s 集群节点上。

除了二进制手动部署之外,还有自动化工具 etcdadm,但是目前还是开发版本,不建议用它部署生产环境。

|节点名称|节点内网IP| 节点外面IP| |:---|:---|

内网ip
外网ip
hostname
区域

172.19.200.6

43.129.25.161

etcd1

香港

172.22.0.10

43.134.80.11

etcd2

新加坡

172.26.0.9

43.135.160.117

etcd3

美国

注意私网IP,不要使用同一个网段,防止冲突

1. 在每台服务器中将三台服务器的内、外网 IP 和对应的 hostname 写入 hosts 文件

我们只走内网,所有不用加外网

cat >>  /etc/hosts << EOF
172.19.200.6   etcd1
43.129.25.161  etcd1

172.22.0.10    etcd2
43.134.80.11   etcd2

172.26.0.9    etcd3
43.135.160.117 etcd3
EOF

2. 创建存储目录(所有节点)

mkdir  /data/etcd
mkdir  /data/etcd/data
mkdir  /data/etcd/bin
mkdir  /data/etcd/ssl

3. 制作安全证书(主节点)

etcd 通过证书来实现安全验证。可以通过 cfssl 或者 openssl 工具制作证书。本文采用 cfssl。 在主节点 etcd1 安装 cfssl 套件,制作证书后传输给其他节点。

wget https://github.com/cloudflare/cfssl/releases/download/v1.6.2/cfssl_1.6.2_linux_amd64 -O /data/etcd/bin/cfssl
wget https://github.com/cloudflare/cfssl/releases/download/v1.6.2/cfssljson_1.6.2_linux_amd64 -O /data/etcd/bin/cfssljson
wget https://github.com/cloudflare/cfssl/releases/download/v1.6.2/cfssl-certinfo_1.6.2_linux_amd64 -O /data/etcd/bin/cfssl-certinfo


chmod +x /data/etcd/bin/cfssl*

4. 制作 CA 根证书(主节点)

    1. 生成 csr 文件

    1. 创建证书

填写 CA 的 csr 信息,json 格式 csr 全称 Certificate Signing Request,即“证书签名请求”,类似于申请表,填写申请人的基本信息

4.1 填写 CA 的 csr 信息,json 格式

/data/etcd/bin/cfssl print-defaults csr > /data/etcd/ssl/ca-csr.json

vim /data/etcd/ssl/ca-csr.json
{
    "CN": "etcd",
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
        {
            "C": "CN",
            "ST": "Guangdong",
            "L": "GuangZhou",
            "O": "etcd"
        }
    ]
}

4.2 创建 CA 根证书,以 ca 为前缀,保存在 /data/etcd/ssl/

ca.pem 是公钥,ca-key.pem 是私钥。 有了根证书,我们这台服务器就可以算一个 CA 机构了,能够给 etcd 颁发证书。

/data/etcd/bin/cfssl gencert -initca /data/etcd/ssl/ca-csr.json | /data/etcd/bin/cfssljson -bare /data/etcd/ssl/ca


ls /data/etcd/ssl
ca.csr  ca-csr.json  ca-key.pem  ca.pem

4.3 配置证书策略

profiles:为不同角色配置不同的证书参数,此处只设了 etcd 一个角色,有需要的话可以添加多个角色。 重要的参数包括有效期、用途(签名、密钥加密、服务端认证、客户端认证)

/data/etcd/bin/cfssl print-defaults config > /data/etcd/ssl/ca-config.json

cat > /data/etcd/ssl/ca-config.json << EOF
{
    "signing": {
        "default": {
            "expiry": "87600h"
        },
        "profiles": {
            "etcd": {
                "expiry": "87600h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "server auth",
                    "client auth"
                ]
            }
        }
    }
}
EOF

4.4 颁发 etcd 的安全证书

此处多了个 hosts 字段,需要包括所有 etcd 节点的内、外网 IP 地址(这里只使用内网)。 如果 etcd 的配置包含了本地回环地址,也需要加上去。 如果后面新增加 etcd 节点,需要先更新 etcd 的 csr 信息,重新制作证书。

/data/etcd/bin/cfssl print-defaults csr > /data/etcd/ssl/etcd-csr.json

cat > /data/etcd/ssl/etcd-csr.json << EOF
{
    "CN": "etcd",
    "hosts": [
        "127.0.0.1",
        "172.19.200.6",
        "43.129.25.161",
        "172.22.0.10",
        "43.134.80.11",
        "172.26.0.9",
        "43.135.160.117"
    ],
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
        {
            "C": "CN",
            "ST": "Guangdong",
            "L": "GuangZhou",
            "O": "etcd"
        }
    ]
}
EOF

4.5 创建 etcd 的安全证书,以 etcd 为前缀,保存在 /data/etcd/ssl/

/data/etcd/bin/cfssl gencert \
-ca=/data/etcd/ssl/ca.pem \
-ca-key=/data/etcd/ssl/ca-key.pem \
--config=/data/etcd/ssl/ca-config.json --profile=etcd \
/data/etcd/ssl/etcd-csr.json | /data/etcd/bin/cfssljson -bare /data/etcd/ssl/etcd


ls /data/etcd/ssl/
ca-config.json  ca.csr  ca-csr.json  ca-key.pem  ca.pem  etcd.csr  etcd-csr.json  etcd-key.pem  etcd.pem

4.6 分发 etcd 证书到其他 etcd 节点, 要提前在 etcd2 和 etcd3 创建 ssl 目录

scp /data/etcd/ssl/*  etcd2:/data/etcd/ssl/
scp /data/etcd/ssl/*  etcd3:/data/etcd/ssl/

5. 部署 etcd 集群(所有节点)

cd /usr/local/src
wget https://github.com/etcd-io/etcd/releases/download/v3.5.4/etcd-v3.5.4-linux-amd64.tar.gz
tar zxvf etcd-v3.5.4-linux-amd64.tar.gz
cp  etcd-v3.5.4-linux-amd64/etcd*  /data/etcd/bin/
  • 成员标记的环境变量,其中的 url 是内部可见的 IP(内网 IP、本地回环 IP)

  • ETCD_NAME:节点的 hostname

  • ETCD_LISTEN_PEER_URLS:本地 etcd 端对端监听 url

  • ETCD_LISTEN_CLIENT_URLS:本地 etcd 客户端监听 url

  • 集群标记的环境变量,其中的 url 可用内网IP || (外网 IP)

  • ETCD_INITIAL_ADVERTISE_PEER_URLS:初始化对外端对端通讯 url

  • ETCD_INITIAL_CLUSTER:初始化集群的所有 etcd 节点 url 的集合

  • ETCD_ADVERTISE_CLIENT_URLS:对外客户端监听 url

  • 安全标记部分用到了刚刚创建的证书,所有 url 用 https 协议。

  • 此处列举的是 etcd1 的配置,其他 etcd 节点的配置类似,只需要修改 ETCD_NAME 和与 url 相关的所有环境变量。

cat >> /data/etcd/etcd.conf << EOF
# [Member tag] 成员标记部分
ETCD_NAME="etcd1"
ETCD_DATA_DIR="/data/etcd/data"
ETCD_LISTEN_PEER_URLS="https://172.19.200.6:2380"
ETCD_LISTEN_CLIENT_URLS="https://172.19.200.6:2379, https://127.0.0.1:2379"

# [Cluster tag] 成员标记部分
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://43.129.25.161:2380"
ETCD_INITIAL_CLUSTER="etcd1=https://43.129.25.161:2380,etcd2=https://43.134.80.11:2380,etcd3=https://43.135.160.117:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://43.129.25.161:2379"

# [Safety mark] 安全标记部分
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_PEER_CERT_FILE="/data/etcd/ssl/etcd.pem"
ETCD_PEER_KEY_FILE="/data/etcd/ssl/etcd-key.pem"
ETCD_PEER_TRUSTED_CA_FILE="/data/etcd/ssl/ca.pem"

ETCD_CERT_FILE="/data/etcd/ssl/etcd.pem"
ETCD_KEY_FILE="/data/etcd/ssl/etcd-key.pem"
ETCD_TRUSTED_CA_FILE="/data/etcd/ssl/ca.pem"
EOF

6. 用 systemd 服务启动 etcd (所有节点)

cat >> /usr/lib/systemd/system/etcd.service << EOF
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
EnvironmentFile=/data/etcd/etcd.conf
WorkingDirectory=/data/etcd/data/
ExecStart=/data/etcd/bin/etcd   
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable etcd.service
systemctl start etcd.service
systemctl status etcd.service # 查看etcd运行状况


/data/etcd/bin/etcd --config-file=/date/etcd/etcd.conf  # 如果启动失败,则使用命令启动的方式启动排查错误

7. 在任意的 etcd 节点检查 etcd 集群健康状况

/data/etcd/bin/etcdctl -w table \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379 \
endpoint health  # 查询是否健康


\+-----------------------------+--------+--------------+-------+
|          ENDPOINT           | HEALTH |     TOOK     | ERROR |
+-----------------------------+--------+--------------+-------+
|  https://43.129.25.161:2379 |   true |  46.344875ms |       |
|   https://43.134.80.11:2379 |   true | 212.562093ms |       |  ---> 不是同一地区,延迟有点高
| https://43.135.160.117:2379 |   true | 771.006121ms |       |  ---> 不是同一地区,延迟有点高
+-----------------------------+--------+--------------+-------+


/data/etcd/bin/etcdctl -w table \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379 \
member list       # started 证明我们的集群处于正常运行状态   
root@VM-200-6-ubuntu:/data/etcd# /data/etcd/bin/etcdctl -w table \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379 \
member list 
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
|        ID        | STATUS  | NAME  |         PEER ADDRS          |        CLIENT ADDRS         | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| 7072fcb185597cf2 | started | etcd1 |  https://43.129.25.161:2380 |  https://43.129.25.161:2379 |      false |
| 85c33cde25893026 | started | etcd2 |   https://43.134.80.11:2380 |   https://43.134.80.11:2379 |      false |
| ed52e80ca78eb287 | started | etcd3 | https://43.135.160.117:2380 | https://43.135.160.117:2379 |      false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+

/data/etcd/bin/etcdctl -w table \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379 \
endpoint status   # 查询集群内哪个节点是Leader节点
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|  https://43.129.25.161:2379 | 7072fcb185597cf2 |   3.5.4 |   20 kB |      true |      false |         2 |         23 |                 23 |        |
|   https://43.134.80.11:2379 | 85c33cde25893026 |   3.5.4 |   20 kB |     false |      false |         2 |         23 |                 23 |        |
| https://43.135.160.117:2379 | ed52e80ca78eb287 |   3.5.4 |   20 kB |     false |      false |         2 |         23 |                 23 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+




8. 优化一下

cp /etc/security/limits.conf /etc/security/limits.conf.bak
cat >>/etc/security/limits.conf <<EOF
* soft nproc 65535
* hard nproc 65535
* soft nofile 65535
* hard nofile 65535
EOF
echo "ulimit -SHn 65535" >> /etc/profile
echo "ulimit -SHn 65535" >> /etc/rc.local

9. 时间同步一下

mv /etc/localtime /etc/localtime.bak
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
ntpdate cn.pool.ntp.org && hwclock -w

echo "10 * * * * root /usr/sbin/ntpdate cn.pool.ntp.org >> /var/log/ntpdate.log" >> /etc/crontab

10. 重启服务器,重新查看状态

reboot

11. ETCD集群备份

[root@VM-0-12-centos ~]# /data/etcd/bin/etcdctl  --cacert=/data/etcd/ssl/ca.pem --cert=/data/etcd/ssl/etcd.pem --key=/data/etcd/ssl/etcd-key.pem --endpoints=https://172.19.200.6:2379 snapshot save /data/backup/etcd-202205081723.db
{"level":"info","ts":"2023-10-08T21:36:27.784+0800","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/data/backup/etcd-202205081723.db.part"}
{"level":"info","ts":"2023-10-08T21:36:27.791+0800","logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2023-10-08T21:36:27.791+0800","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://172.19.200.6:2379"}
{"level":"info","ts":"2023-10-08T21:36:27.794+0800","logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2023-10-08T21:36:27.796+0800","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://172.19.200.6:2379","size":"20 kB","took":"now"}
{"level":"info","ts":"2023-10-08T21:36:27.796+0800","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/data/backup/etcd-202205081723.db"}
Snapshot saved at /data/backup/etcd-202310082136.db

12. ETCD集群恢复

    1. 找到备份的 ETCD 数据文件

    1. 通过命令恢复

/data/etcd/bin/etcdctl  \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://172.19.200.6:2379 \
snapshot restore /data/backup/etcd-202310082136.db

13. ETCD 节点扩容

例如新加节点IP: 172.22.0.8

    1. 所有同步 /etc/hosts 节点信息

    1. 添加节点到 /data/etcd/ssl/etcd-csr.json , 重新生成新的证书

    1. 同步证书到所有节点 scp /data/etcd/ssl/* 节点地址:/data/etcd/ssl/

    1. 新节点安装 etcd 二进制

    1. 修改配置文件 etcd.conf, 在 ETCD_INITIAL_CLUSTER 后添加 etcd4=https://43.129.25.161:2380, 并同步所有,重启服务

    1. 如下

    1. 启动新添加的节点

/data/etcd/bin/etcdctl \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints="https://43.129.25.161:2379"  \
member add etcd4 --peer-urls=https://43.135.160.118:2380

 /data/etcd/bin/etcdctl -w table \
 --cacert=/data/etcd/ssl/ca.pem \
 --cert=/data/etcd/ssl/etcd.pem \
 --key=/data/etcd/ssl/etcd-key.pem \
 --endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379 \
 member list
+------------------+-----------+-------+--------------------------+--------------------------+------------+
|        ID        | STATUS  | NAME  |         PEER ADDRS          |        CLIENT ADDRS         | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| 7072fcb185597cf2 | started | etcd1 |  https://43.129.25.161:2380 |  https://43.129.25.161:2379 |      false |
| 85c33cde25893026 | started | etcd2 |   https://43.134.80.11:2380 |   https://43.134.80.11:2379 |      false |
| ed52e80ca78eb287 | started | etcd3 | https://43.135.160.117:2380 | https://43.135.160.117:2379 |      false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| ee2c6ab6d3897479 | unstarted |       |  https://43.135.160.118:2380 |                          |      false |

~... member list -w table 查看是信息

14. ETCD 节点扩容

  1. 先获取节点ID

  2. 删除节点ID

/data/etcd/bin/etcdctl -w table \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379,https://43.135.160.118:2379 \
endpoint status


+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|         ENDPOINT         |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|  https://43.129.25.161:2379 | 7072fcb185597cf2 |   3.5.4 |   20 kB |      true |      false |         2 |         23 |                 23 |        |
|   https://43.134.80.11:2379 | 85c33cde25893026 |   3.5.4 |   20 kB |     false |      false |         2 |         23 |                 23 |        |
| https://43.135.160.117:2379 | ed52e80ca78eb287 |   3.5.4 |   20 kB |     false |      false |         2 |         23 |                 23 |        |
|  https://43.135.160.118:2379 | ee2c6ab6d3897479 |   3.5.4 |   20 kB |     false |      false |        15 |        107 |                107 |        |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+


/data/etcd/bin/etcdctl  \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379 \
member remove <ID>

Last updated