etcd 是分布式键值对数据库,应用于存储 k8s 集群的配置、运行数据等。
虽然 k8s 官方部署工具 kubeadm 可以快速地搭建起 etcd 集群,但是也造成一个不好的影响:etcd 的管理跟 k8s 集群耦合在一起。
在生产环境中,我们更希望数据库是独立于其他系统。
先部署独立的外部 etcd 集群,再部署 k8s 集群,既有利于后续部署高可用集群,又方便运维管理,是更加稳健的生产方案。
外部 etcd 集群部署极其灵活:在节点数量上,既可以是单个节点,也可以是多个节点;在部署位置上,既可以部署在 k8s 集群节点上,也可以部署在独立的非 k8s 集群节点上。
除了二进制手动部署之外,还有自动化工具 etcdadm,但是目前还是开发版本,不建议用它部署生产环境。
|节点名称|节点内网IP| 节点外面IP| |:---|:---|
注意私网IP,不要使用同一个网段,防止冲突
1. 在每台服务器中将三台服务器的内、外网 IP 和对应的 hostname 写入 hosts 文件
我们只走内网,所有不用加外网
cat >> /etc/hosts << EOF
172.19.200.6 etcd1
43.129.25.161 etcd1
172.22.0.10 etcd2
43.134.80.11 etcd2
172.26.0.9 etcd3
43.135.160.117 etcd3
EOF
2. 创建存储目录(所有节点)
mkdir /data/etcd
mkdir /data/etcd/data
mkdir /data/etcd/bin
mkdir /data/etcd/ssl
3. 制作安全证书(主节点)
etcd 通过证书来实现安全验证。可以通过 cfssl 或者 openssl 工具制作证书。本文采用 cfssl。 在主节点 etcd1 安装 cfssl 套件,制作证书后传输给其他节点。
wget https://github.com/cloudflare/cfssl/releases/download/v1.6.2/cfssl_1.6.2_linux_amd64 -O /data/etcd/bin/cfssl
wget https://github.com/cloudflare/cfssl/releases/download/v1.6.2/cfssljson_1.6.2_linux_amd64 -O /data/etcd/bin/cfssljson
wget https://github.com/cloudflare/cfssl/releases/download/v1.6.2/cfssl-certinfo_1.6.2_linux_amd64 -O /data/etcd/bin/cfssl-certinfo
chmod +x /data/etcd/bin/cfssl*
4. 制作 CA 根证书(主节点)
填写 CA 的 csr 信息,json 格式 csr 全称 Certificate Signing Request,即“证书签名请求”,类似于申请表,填写申请人的基本信息
4.1 填写 CA 的 csr 信息,json 格式
/data/etcd/bin/cfssl print-defaults csr > /data/etcd/ssl/ca-csr.json
vim /data/etcd/ssl/ca-csr.json
{
"CN": "etcd",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "Guangdong",
"L": "GuangZhou",
"O": "etcd"
}
]
}
4.2 创建 CA 根证书,以 ca 为前缀,保存在 /data/etcd/ssl/
ca.pem 是公钥,ca-key.pem 是私钥。 有了根证书,我们这台服务器就可以算一个 CA 机构了,能够给 etcd 颁发证书。
/data/etcd/bin/cfssl gencert -initca /data/etcd/ssl/ca-csr.json | /data/etcd/bin/cfssljson -bare /data/etcd/ssl/ca
ls /data/etcd/ssl
ca.csr ca-csr.json ca-key.pem ca.pem
4.3 配置证书策略
profiles:为不同角色配置不同的证书参数,此处只设了 etcd 一个角色,有需要的话可以添加多个角色。 重要的参数包括有效期、用途(签名、密钥加密、服务端认证、客户端认证)
/data/etcd/bin/cfssl print-defaults config > /data/etcd/ssl/ca-config.json
cat > /data/etcd/ssl/ca-config.json << EOF
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"etcd": {
"expiry": "87600h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
}
}
}
}
EOF
4.4 颁发 etcd 的安全证书
此处多了个 hosts 字段,需要包括所有 etcd 节点的内、外网 IP 地址(这里只使用内网)。 如果 etcd 的配置包含了本地回环地址,也需要加上去。 如果后面新增加 etcd 节点,需要先更新 etcd 的 csr 信息,重新制作证书。
/data/etcd/bin/cfssl print-defaults csr > /data/etcd/ssl/etcd-csr.json
cat > /data/etcd/ssl/etcd-csr.json << EOF
{
"CN": "etcd",
"hosts": [
"127.0.0.1",
"172.19.200.6",
"43.129.25.161",
"172.22.0.10",
"43.134.80.11",
"172.26.0.9",
"43.135.160.117"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "Guangdong",
"L": "GuangZhou",
"O": "etcd"
}
]
}
EOF
4.5 创建 etcd 的安全证书,以 etcd 为前缀,保存在 /data/etcd/ssl/
/data/etcd/bin/cfssl gencert \
-ca=/data/etcd/ssl/ca.pem \
-ca-key=/data/etcd/ssl/ca-key.pem \
--config=/data/etcd/ssl/ca-config.json --profile=etcd \
/data/etcd/ssl/etcd-csr.json | /data/etcd/bin/cfssljson -bare /data/etcd/ssl/etcd
ls /data/etcd/ssl/
ca-config.json ca.csr ca-csr.json ca-key.pem ca.pem etcd.csr etcd-csr.json etcd-key.pem etcd.pem
4.6 分发 etcd 证书到其他 etcd 节点, 要提前在 etcd2 和 etcd3 创建 ssl 目录
scp /data/etcd/ssl/* etcd2:/data/etcd/ssl/
scp /data/etcd/ssl/* etcd3:/data/etcd/ssl/
5. 部署 etcd 集群(所有节点)
cd /usr/local/src
wget https://github.com/etcd-io/etcd/releases/download/v3.5.4/etcd-v3.5.4-linux-amd64.tar.gz
tar zxvf etcd-v3.5.4-linux-amd64.tar.gz
cp etcd-v3.5.4-linux-amd64/etcd* /data/etcd/bin/
成员标记的环境变量,其中的 url 是内部可见的 IP(内网 IP、本地回环 IP)
ETCD_LISTEN_PEER_URLS:本地 etcd 端对端监听 url
ETCD_LISTEN_CLIENT_URLS:本地 etcd 客户端监听 url
集群标记的环境变量,其中的 url 可用内网IP || (外网 IP)
ETCD_INITIAL_ADVERTISE_PEER_URLS:初始化对外端对端通讯 url
ETCD_INITIAL_CLUSTER:初始化集群的所有 etcd 节点 url 的集合
ETCD_ADVERTISE_CLIENT_URLS:对外客户端监听 url
安全标记部分用到了刚刚创建的证书,所有 url 用 https 协议。
此处列举的是 etcd1 的配置,其他 etcd 节点的配置类似,只需要修改 ETCD_NAME 和与 url 相关的所有环境变量。
cat >> /data/etcd/etcd.conf << EOF
# [Member tag] 成员标记部分
ETCD_NAME="etcd1"
ETCD_DATA_DIR="/data/etcd/data"
ETCD_LISTEN_PEER_URLS="https://172.19.200.6:2380"
ETCD_LISTEN_CLIENT_URLS="https://172.19.200.6:2379, https://127.0.0.1:2379"
# [Cluster tag] 成员标记部分
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://43.129.25.161:2380"
ETCD_INITIAL_CLUSTER="etcd1=https://43.129.25.161:2380,etcd2=https://43.134.80.11:2380,etcd3=https://43.135.160.117:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://43.129.25.161:2379"
# [Safety mark] 安全标记部分
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_PEER_CERT_FILE="/data/etcd/ssl/etcd.pem"
ETCD_PEER_KEY_FILE="/data/etcd/ssl/etcd-key.pem"
ETCD_PEER_TRUSTED_CA_FILE="/data/etcd/ssl/ca.pem"
ETCD_CERT_FILE="/data/etcd/ssl/etcd.pem"
ETCD_KEY_FILE="/data/etcd/ssl/etcd-key.pem"
ETCD_TRUSTED_CA_FILE="/data/etcd/ssl/ca.pem"
EOF
6. 用 systemd 服务启动 etcd (所有节点)
cat >> /usr/lib/systemd/system/etcd.service << EOF
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
EnvironmentFile=/data/etcd/etcd.conf
WorkingDirectory=/data/etcd/data/
ExecStart=/data/etcd/bin/etcd
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable etcd.service
systemctl start etcd.service
systemctl status etcd.service # 查看etcd运行状况
/data/etcd/bin/etcd --config-file=/date/etcd/etcd.conf # 如果启动失败,则使用命令启动的方式启动排查错误
7. 在任意的 etcd 节点检查 etcd 集群健康状况
/data/etcd/bin/etcdctl -w table \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379 \
endpoint health # 查询是否健康
\+-----------------------------+--------+--------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+-----------------------------+--------+--------------+-------+
| https://43.129.25.161:2379 | true | 46.344875ms | |
| https://43.134.80.11:2379 | true | 212.562093ms | | ---> 不是同一地区,延迟有点高
| https://43.135.160.117:2379 | true | 771.006121ms | | ---> 不是同一地区,延迟有点高
+-----------------------------+--------+--------------+-------+
/data/etcd/bin/etcdctl -w table \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379 \
member list # started 证明我们的集群处于正常运行状态
root@VM-200-6-ubuntu:/data/etcd# /data/etcd/bin/etcdctl -w table \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379 \
member list
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| 7072fcb185597cf2 | started | etcd1 | https://43.129.25.161:2380 | https://43.129.25.161:2379 | false |
| 85c33cde25893026 | started | etcd2 | https://43.134.80.11:2380 | https://43.134.80.11:2379 | false |
| ed52e80ca78eb287 | started | etcd3 | https://43.135.160.117:2380 | https://43.135.160.117:2379 | false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
/data/etcd/bin/etcdctl -w table \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379 \
endpoint status # 查询集群内哪个节点是Leader节点
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://43.129.25.161:2379 | 7072fcb185597cf2 | 3.5.4 | 20 kB | true | false | 2 | 23 | 23 | |
| https://43.134.80.11:2379 | 85c33cde25893026 | 3.5.4 | 20 kB | false | false | 2 | 23 | 23 | |
| https://43.135.160.117:2379 | ed52e80ca78eb287 | 3.5.4 | 20 kB | false | false | 2 | 23 | 23 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
8. 优化一下
cp /etc/security/limits.conf /etc/security/limits.conf.bak
cat >>/etc/security/limits.conf <<EOF
* soft nproc 65535
* hard nproc 65535
* soft nofile 65535
* hard nofile 65535
EOF
echo "ulimit -SHn 65535" >> /etc/profile
echo "ulimit -SHn 65535" >> /etc/rc.local
9. 时间同步一下
mv /etc/localtime /etc/localtime.bak
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
ntpdate cn.pool.ntp.org && hwclock -w
echo "10 * * * * root /usr/sbin/ntpdate cn.pool.ntp.org >> /var/log/ntpdate.log" >> /etc/crontab
10. 重启服务器,重新查看状态
11. ETCD集群备份
[root@VM-0-12-centos ~]# /data/etcd/bin/etcdctl --cacert=/data/etcd/ssl/ca.pem --cert=/data/etcd/ssl/etcd.pem --key=/data/etcd/ssl/etcd-key.pem --endpoints=https://172.19.200.6:2379 snapshot save /data/backup/etcd-202205081723.db
{"level":"info","ts":"2023-10-08T21:36:27.784+0800","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/data/backup/etcd-202205081723.db.part"}
{"level":"info","ts":"2023-10-08T21:36:27.791+0800","logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2023-10-08T21:36:27.791+0800","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://172.19.200.6:2379"}
{"level":"info","ts":"2023-10-08T21:36:27.794+0800","logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2023-10-08T21:36:27.796+0800","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://172.19.200.6:2379","size":"20 kB","took":"now"}
{"level":"info","ts":"2023-10-08T21:36:27.796+0800","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/data/backup/etcd-202205081723.db"}
Snapshot saved at /data/backup/etcd-202310082136.db
12. ETCD集群恢复
/data/etcd/bin/etcdctl \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://172.19.200.6:2379 \
snapshot restore /data/backup/etcd-202310082136.db
13. ETCD 节点扩容
例如新加节点IP: 172.22.0.8
添加节点到 /data/etcd/ssl/etcd-csr.json
, 重新生成新的证书
同步证书到所有节点 scp /data/etcd/ssl/* 节点地址:/data/etcd/ssl/
修改配置文件 etcd.conf, 在 ETCD_INITIAL_CLUSTER 后添加 etcd4=https://43.129.25.161:2380, 并同步所有,重启服务
/data/etcd/bin/etcdctl \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints="https://43.129.25.161:2379" \
member add etcd4 --peer-urls=https://43.135.160.118:2380
/data/etcd/bin/etcdctl -w table \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379 \
member list
+------------------+-----------+-------+--------------------------+--------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| 7072fcb185597cf2 | started | etcd1 | https://43.129.25.161:2380 | https://43.129.25.161:2379 | false |
| 85c33cde25893026 | started | etcd2 | https://43.134.80.11:2380 | https://43.134.80.11:2379 | false |
| ed52e80ca78eb287 | started | etcd3 | https://43.135.160.117:2380 | https://43.135.160.117:2379 | false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| ee2c6ab6d3897479 | unstarted | | https://43.135.160.118:2380 | | false |
~... member list -w table 查看是信息
14. ETCD 节点扩容
/data/etcd/bin/etcdctl -w table \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379,https://43.135.160.118:2379 \
endpoint status
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://43.129.25.161:2379 | 7072fcb185597cf2 | 3.5.4 | 20 kB | true | false | 2 | 23 | 23 | |
| https://43.134.80.11:2379 | 85c33cde25893026 | 3.5.4 | 20 kB | false | false | 2 | 23 | 23 | |
| https://43.135.160.117:2379 | ed52e80ca78eb287 | 3.5.4 | 20 kB | false | false | 2 | 23 | 23 | |
| https://43.135.160.118:2379 | ee2c6ab6d3897479 | 3.5.4 | 20 kB | false | false | 15 | 107 | 107 | |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
/data/etcd/bin/etcdctl \
--cacert=/data/etcd/ssl/ca.pem \
--cert=/data/etcd/ssl/etcd.pem \
--key=/data/etcd/ssl/etcd-key.pem \
--endpoints=https://43.129.25.161:2379,https://43.134.80.11:2379,https://43.135.160.117:2379 \
member remove <ID>