管理面HA改造答疑.md
Keepalived 概述
Keepalived 为 Linux 系统提供了负载均衡和高可用能力。负载均衡的能力来自 Linux 内核中的 LVS 项目模块 IPVS(IP Virtual Server)。
Keepalived 运行在 Linux 系统中,它会启动内核中的 LVS 服务来创建虚拟服务器。比如我们在两台服务器上都启动了一个 Keepalived 服务,然后 LVS 会虚拟化出来一个 IP(VIP),但是只有一个 Keepalived 会接管这个 VIP,就是说客户端的请求只会到 Master Keepalived 节点上。这样流量就只会到一台 keepalived 上了,然后 keepalived 可以配置几台真实的服务 IP 地址和端口,通过负载调度算法将流量分摊到这些服务上。对于另外一台 Backup Keepalived 节点,它是待机状态,没有流量接入的。
Keepalived 使用 C 语言编写的开源软件项目,项目的目的主要是简化 LVS 项目的配置并增强 LVS 的稳定性。简单来说,Keepalived 就是对 LVS 的扩展增强。管理面没有 LVS,所以管理面 HA 其实存在的意义不大。
---
title: 部署视图
---
graph TD;
subgraph LAN["局域网"]
switch[("交换机")]
end
subgraph Internet["互联网"]
userDevice1(("用户设备1"))
userDevice2(("用户设备2"))
router[("路由器")]
userDevice1 --> router
userDevice2 --> router
router --> switch
end
subgraph Servers["服务器集群"]
master[("主服务器")]
backup1[("备份服务器1")]
backup2[("备份服务器2")]
switch --> master
switch --> backup1
switch --> backup2
end
VIP(("虚拟IP")) -.-> master
master -. "故障转移" .-> backup1
master -. "故障转移" .-> backup2
classDef default fill:#1E90FF,stroke:#333,stroke-width:2px;
classDef network fill:#708090,stroke:#333,stroke-width:2px;
class LAN,Internet,Servers network;
- 用户通过互联网访问服务,请求先到达路由器,然后通过交换机转发。
- 主服务器拥有虚拟IP(VIP),作为对外服务的唯一入口点。
- 备份服务器准备在主服务器发生故障时接管VIP,保证服务的持续可用性。
- 虚拟IP(VIP)指向主服务器,但在需要时可以快速迁移到备份服务器。
- 故障转移的逻辑被表示为从主服务器指向备份服务器的箭头。
---
title: 故障转移视图
---
graph TD;
subgraph Internet["互联网"]
userDevice1(("用户设备1"))
userDevice2(("用户设备2"))
router[("路由器")]
userDevice1 -->|请求服务| VIP
userDevice2 -->|请求服务| VIP
end
subgraph LAN["局域网"]
switch[("交换机")]
VIP(("虚拟IP")) --> switch
end
subgraph Servers["服务器集群"]
master[("主服务器")]
backup1[("备份服务器1")]
backup2[("备份服务器2")]
switch --> master
switch --> backup1
switch --> backup2
end
VIP -. "当前指向" .-> master
master -. "故障时转移至" .-> backup1
backup1 -. "故障时转移至" .-> backup2
classDef default fill:#1E90FF,stroke:#333,stroke-width:2px;
classDef network fill:#708090,stroke:#333,stroke-width:2px;
class Internet,LAN,Servers network;
Keepalived 在传统高可用架构下的位置
---
title: 传统高可用架构部署视图
---
graph TD;
subgraph Internet["互联网"]
userDevice1(("用户设备1"))
userDevice2(("用户设备2"))
userDevice1 -->|请求服务| VIP
userDevice2 -->|请求服务| VIP
end
subgraph LAN["局域网"]
switch[("交换机")]
VIP(("虚拟IP")) --> switch
end
subgraph LB_Cluster["负载均衡器集群(LVS)"]
LVS_Master[("LVS 主节点")]
LVS_Backup1[("LVS 备份节点1")]
LVS_Backup2[("LVS 备份节点2")]
switch --> LVS_Master
switch ----> LVS_Backup1
switch ----> LVS_Backup2
end
subgraph Keepalived_Cluster["Keepalived 集群"]
Keepalived_Master[("Keepalived 主")]
Keepalived_Backup1[("Keepalived 备份1")]
Keepalived_Backup2[("Keepalived 备份2")]
LVS_Master --> Keepalived_Master
LVS_Backup --> Keepalived_Backup1
LVS_Backup --> Keepalived_Backup2
end
subgraph Backend_Servers["后端服务器"]
server1[("服务器1")]
server2[("服务器2")]
server3[("服务器3")]
LVS_Master --> server1
LVS_Master --> server2
LVS_Master --> server3
end
VIP -. "当前指向" .-> LVS_Master
LVS_Master -. "故障时转移至" .-> LVS_Backup
Keepalived_Master -. "管理" .-> LVS_Master
Keepalived_Backup1 -. "备用管理" .-> LVS_Backup
Keepalived_Backup2 -. "备用管理" .-> LVS_Backup
classDef default fill:#1E90FF,stroke:#333,stroke-width:2px;
classDef network fill:#708090,stroke:#333,stroke-width:2px;
class Internet,LAN,LB_Cluster,Keepalived_Cluster,Backend_Servers network;
- 用户设备通过互联网向虚拟IP(VIP)发送服务请求。
- 虚拟IP(VIP)由Keepalived管理,并指向LVS的主节点。
- LVS的主节点和备份节点构成了负载均衡器集群,负责将请求分发到后端服务器。
- Keepalived集群包括主节点和备份节点,用于管理LVS集群的高可用性,确保在LVS主节点发生故障时能够将VIP快速迁移到备份节点。
- 后端服务器接收来自LVS主节点的请求,并处理这些请求。
改造前引擎高可用机制数据流向
---
title: 改造前引擎高可用机制数据流向
---
graph TD;
subgraph Internet["互联网"]
userDevice1(("客户设备1"))
userDevice2(("客户设备2"))
router_public[("公网路由器|防火墙")]
userDevice1 -->|访问总控| router_public
userDevice2 -->|访问总控| router_public
end
subgraph PublicNetwork["公网可访问网络"]
switch_public[("公网交换机")]
router_public --> switch_public
end
subgraph SC[总控]
sc[(sc)]
sv[(sv)]
switch_public --> sc
end
subgraph DataCenter["数据中心网络"]
router_datacenter[("数据中心路由器|防火墙")]
switch_datacenter[("数据中心交换机")]
router_datacenter --> switch_datacenter
end
subgraph SEM[管理面]
VIP((虚拟IP))
subgraph SEMNode1[管理面节点1]
other_comp1[(管理面组件)]
end
subgraph SEMNode2[管理面节点2]
other_comp2[(管理面组件)]
end
subgraph SEMNode3[管理面节点3]
other_comp3[(管理面组件)]
end
subgraph KeepalivedSEM[管理面Keepalived]
keepalived_master_SEM[("Keepalived 主")]
keepalived_backup_SEM[("Keepalived 备份")]
keepalived_master_SEM --> keepalived_backup_SEM
keepalived_backup_SEM -.-> keepalived_master_SEM
end
other_comp1 <-->|服务搭建等管理工作| keepalived_master_SEM
VIP -.-> other_comp1 -. "故障转移" .-> keepalived_backup_SEM
end
subgraph SES[调度集群]
VIP_SES((业务VIP表))
subgraph SES_LVS[调度LVS]
LVS_SES_Master[("LVS 主节点")]
lvs_policy((策略管理))
LVS_SES_Backup[("LVS 备份节点")]
LVS_SES_Master -->|调度请求| LVS_SES_Backup
LVS_SES_Backup -.->|故障转移| LVS_SES_Master
LVS_SES_Master -.-> lvs_policy
end
subgraph KeepalivedSES[调度Keepalived]
keepalived_master_SES[("Keepalived 主")]
keepalived_backup_SES[("Keepalived 备份")]
keepalived_master_SES --> keepalived_backup_SES
keepalived_backup_SES -.-> keepalived_master_SES
end
VIP_SES -.-> LVS_SES_Master
LVS_SES_Master --> keepalived_master_SES
end
subgraph SEW[工作集群]
subgraph Backend_Servers1["工作节点集群"]
envoy1[("Envoy")]
polycuber1[("Polycuber")]
end
subgraph Backend_Servers2["工作节点集群"]
envoy2[("Envoy")]
polycuber2[("Polycuber")]
end
subgraph Backend_Servers3["工作节点集群"]
envoy3[("Envoy")]
polycuber3[("Polycuber")]
end
end
subgraph UserServer[业务服务集群]
subgraph ServerCluster1["业务服务集群1"]
node1[(node)]
node2[(node)]
end
subgraph ServerCluster2["业务服务集群2"]
node3[(node)]
end
subgraph ServerCluster3["业务服务集群3"]
node4[(node)]
end
end
subgraph UserSpace["用户|服务访问流量"]
user1((用户访问))
frontend((前端业务))
end
other_comp1 --> |管理| SES
other_comp1 ---> |管理| SEW
LVS_SES_Master --业务流量转发--> envoy1 & envoy2 & envoy3
user1 & frontend --> VIP_SES
envoy1 -->|反向代理| node1 & node2 & node3
envoy2 -->|重定向| node4
envoy3 <-->direct_response[直接响应]
switch_public -->|接受来自SC的流量| router_datacenter
switch_datacenter --> VIP
classDef network fill:#bbf,stroke:#333,stroke-width:2px;
class Internet,PublicNetwork,DataCenter network;
class Internet internet
class PublicNetwork publicNet;
class DataCenter dataCenter;
class SEM management;
class SES scheduling;
class SEW workCluster;
class UserServer userService;
class UserSpace userAccess;
classDef internet fill:#bde0fe,stroke:#333,stroke-width:2px;
classDef publicNet fill:#fed7b2,stroke:#333,stroke-width:2px;
classDef dataCenter fill:#d4a5a5,stroke:#333,stroke-width:4px;
classDef management fill:#fbcfe8,stroke:#333,stroke-width:2px;
classDef scheduling fill:#fef08a,stroke:#333,stroke-width:2px;
classDef workCluster fill:#bbf7d0,stroke:#333,stroke-width:2px;
classDef userService fill:#d9f99d,stroke:#333,stroke-width:2px;
classDef userAccess fill:#a7f3d0,stroke:#333,stroke-width:2px;
classDef flowLine stroke:#2563eb,stroke-width:2px,stroke-dasharray: 5, 5;
linkStyle 3,4,17,18,29,30 stroke:#db2777,stroke-width:12px;
linkStyle 19,20,21,22,23,24,25,26 stroke:#2563eb,stroke-width:2px,stroke-width:8px ;
改造后引擎高可用机制数据流向
---
title: 改造后引擎高可用机制数据流向
---
graph TD;
subgraph Internet["互联网"]
userDevice1(("客户设备1"))
userDevice2(("客户设备2"))
router_public[("公网路由器|防火墙")]
userDevice1 -->|访问总控| router_public
userDevice2 -->|访问总控| router_public
end
subgraph PublicNetwork["公网可访问网络"]
switch_public[("公网交换机")]
router_public --> switch_public
end
subgraph SC[总控]
sc_m_nic((管理口))
sc[(sc)]
sv[(sv)]
sc_core[(sc-core)]
switch_public <-.-> sc_m_nic --> sc_core
sc & sv -.-> sc_core
end
subgraph DataCenter["数据中心网络"]
router_datacenter[("数据中心路由器|防火墙")]
switch_datacenter[("数据中心交换机")]
router_datacenter <-.-> switch_datacenter
end
subgraph SEM[管理面]
subgraph SEMNode1[管理面节点1]
m_nic1((管理口))
other_comp1[(管理面组件流量入口)]
controller1[(管理面组件)]
apiserver_node1[apiserver]
m_nic1 <-.-> other_comp1
other_comp1 --> controller1 --> apiserver_node1
end
subgraph SEMNode2[管理面节点2]
m_nic2((管理口))
other_comp2[(管理面组件流量入口)]
controller2[(管理面组件)]
apiserver_node2[apiserver]
m_nic2 <-.-> other_comp2
other_comp2 --> controller2 --> apiserver_node2
end
subgraph SEMNode3[管理面节点3]
m_nic3((管理口))
other_comp3[(管理面组件流量入口)]
controller3[(管理面组件)]
apiserver_node3[apiserver]
m_nic3 <-.-> other_comp3
other_comp3 --> controller3 --> apiserver_node3
end
subgraph Apiserver[Apiserver集群]
apiserver1[(Apiserver)]
apiserver2[(Apiserver)]
apiserver3[(Apiserver)]
end
apiserver_node1 -.- apiserver1
apiserver_node2 -.- apiserver2
apiserver_node3 -.- apiserver3
end
subgraph SES[调度集群]
VIP_SES((业务VIP表))
subgraph SES_LVS[调度LVS]
subgraph SESNode1
ses_m_nic1((管理口))
ses_d_nic1((业务口))
LVS_SES_Master[("LVS 主节点")]
lvs_policy((策略管理))
ses_m_nic1 -.- lvs_policy
end
subgraph SESNode2
ses_m_nic2((管理口))
ses_d_nic2((业务口))
LVS_SES_Backup[("LVS 备份节点")]
end
LVS_SES_Master -->|调度请求| LVS_SES_Backup
LVS_SES_Backup -.->|故障转移| LVS_SES_Master
LVS_SES_Master -.-> lvs_policy
end
subgraph KeepalivedSES[调度Keepalived]
keepalived_master_SES[("Keepalived 主")]
keepalived_backup_SES[("Keepalived 备份")]
keepalived_master_SES --> keepalived_backup_SES
keepalived_backup_SES -.-> keepalived_master_SES
end
VIP_SES -.- ses_d_nic1 --> LVS_SES_Master
LVS_SES_Master --> keepalived_master_SES
end
subgraph SEW[工作集群]
subgraph Backend_Servers1["工作节点集群"]
sew_m_nic1((管理口))
sew_d_nic1((业务口))
envoy1[("Envoy")]
polycuber1[("Polycuber")]
sew_m_nic1 & sew_d_nic1 -.- envoy1
end
subgraph Backend_Servers2["工作节点集群"]
sew_m_nic2((管理口))
sew_d_nic2((业务口))
envoy2[("Envoy")]
polycuber2[("Polycuber")]
sew_m_nic2 & sew_d_nic2 -.- envoy2
end
subgraph Backend_Servers3["工作节点集群"]
sew_m_nic3((管理口))
sew_d_nic3((业务口))
envoy3[("Envoy")]
polycuber3[("Polycuber")]
sew_m_nic3 & sew_d_nic3 -.- envoy3
end
end
subgraph UserServer[业务服务集群]
subgraph ServerCluster1["业务服务集群1"]
node1[(node)]
node2[(node)]
end
subgraph ServerCluster2["业务服务集群2"]
node3[(node)]
end
subgraph ServerCluster3["业务服务集群3"]
node4[(node)]
end
end
subgraph UserSpace["用户|服务访问流量"]
user1((用户访问))
frontend((前端业务))
end
ses_m_nic1 & SEW ---> Apiserver
m_nic1 & m_nic2 & m_nic3 <-.-> switch_datacenter
LVS_SES_Master --业务流量转发--> sew_d_nic1 & sew_d_nic2 & sew_d_nic3
user1 & frontend --> VIP_SES
envoy1 -->|反向代理| node1 & node2 & node3
envoy2 -->|重定向| node4
envoy3 <--> direct_response[直接响应]
router_datacenter <-.->|放通dst为sc的流| switch_public
classDef network fill:#bbf,stroke:#333,stroke-width:2px;
class Internet,PublicNetwork,DataCenter network;
class Internet internet
class PublicNetwork publicNet;
class DataCenter dataCenter;
class SEM management;
class SES scheduling;
class SEW workCluster;
class UserServer userService;
class UserSpace userAccess;
classDef internet fill:#bde0fe,stroke:#333,stroke-width:2px;
classDef publicNet fill:#fed7b2,stroke:#333,stroke-width:2px;
classDef dataCenter fill:#d4a5a5,stroke:#333,stroke-width:4px;
classDef management fill:#fbcfe8,stroke:#333,stroke-width:2px;
classDef scheduling fill:#fef08a,stroke:#333,stroke-width:2px;
classDef workCluster fill:#bbf7d0,stroke:#333,stroke-width:2px;
classDef userService fill:#d9f99d,stroke:#333,stroke-width:2px;
classDef userAccess fill:#a7f3d0,stroke:#333,stroke-width:2px;
classDef flowLine stroke:#2563eb,stroke-width:2px,stroke-dasharray: 5, 5;
linkStyle 0 stroke:#db2777,stroke-width:12px;
linkStyle 0 stroke:#2563eb,stroke-width:2px,stroke-width:8px;
linkStyle 0 stroke:#db2777,stroke-width:12px,color:red;
- 用户设备通过互联网访问总控。它们首先连接到公网路由器,然后通过公网交换机到达总控节点。
- 总控节点被部署在一个公网可访问的网络内,允许来自互联网的访问。
- 数据中心网络内包括引擎集群(管理面、调度集群、工作集群),它们通过数据中心路由器和数据中心交换机连接。
- 管理面集群中的节点通过Keepalived实现高可用性,使用虚拟IP(VIP)为入口点。
- VIP用于路由请求到调度集群,然后调度集群将任务分配给工作集群进行处理。
改造后物理架构
问题场景
云服务场景
公有云服务中一般 VIP 需要单独开,给服务部署带来更多要求。
灾难恢复
Keepalived主要设计用于单个数据中心或地理位置相近的网络中的高可用性。网络波动大、延时波动大的情况下,Keepalived 会出现频繁切主现象。
监控日志管理
虽然Keepalived提供基本的日志记录功能,但在复杂系统中,集成更先进的监控和日志分析通常是必需的,团队中没有修改 Keepalived 代码能力。
安全性
Vrrp 协议本身是未加密的通信,并且可能在用户网络中与已有的组播冲突 (224.0.0.18)
功能限制
Keepalived 是做服务器层面高可用,不可能在此基础上做应用高可用。
复杂
对于一些简单的高可用性需求,Keepalived的功能和配置可能显得过于复杂,特别是在没有负载均衡需求时。
技术栈不匹配
Keepalived 使用 ANSI C 编写,引擎管理面使用 Golang,技术栈上不匹配,后期不能维护
Web界面适配困难
只提供文本文件进行配置。