管理面HA改造答疑.md

Keepalived 概述

Keepalived 为 Linux 系统提供了负载均衡和高可用能力。负载均衡的能力来自 Linux 内核中的 LVS 项目模块 IPVS(IP Virtual Server)。

Keepalived 运行在 Linux 系统中,它会启动内核中的 LVS 服务来创建虚拟服务器。比如我们在两台服务器上都启动了一个 Keepalived 服务,然后 LVS 会虚拟化出来一个 IP(VIP),但是只有一个 Keepalived 会接管这个 VIP,就是说客户端的请求只会到 Master Keepalived 节点上。这样流量就只会到一台 keepalived 上了,然后 keepalived 可以配置几台真实的服务 IP 地址和端口,通过负载调度算法将流量分摊到这些服务上。对于另外一台 Backup Keepalived 节点,它是待机状态,没有流量接入的。

Keepalived 使用 C 语言编写的开源软件项目,项目的目的主要是简化 LVS 项目的配置并增强 LVS 的稳定性。简单来说,Keepalived 就是对 LVS 的扩展增强。管理面没有 LVS,所以管理面 HA 其实存在的意义不大。

---
title: 部署视图
---

graph TD;
    subgraph LAN["局域网"]
        switch[("交换机")]
    end
    subgraph Internet["互联网"]
        userDevice1(("用户设备1"))
        userDevice2(("用户设备2"))
        router[("路由器")]
        userDevice1 --> router
        userDevice2 --> router
        router --> switch
    end
    subgraph Servers["服务器集群"]
        master[("主服务器")]
        backup1[("备份服务器1")]
        backup2[("备份服务器2")]
        switch --> master
        switch --> backup1
        switch --> backup2
    end
    VIP(("虚拟IP")) -.-> master
    master -. "故障转移" .-> backup1
    master -. "故障转移" .-> backup2
    classDef default fill:#1E90FF,stroke:#333,stroke-width:2px;
    classDef network fill:#708090,stroke:#333,stroke-width:2px;
    class LAN,Internet,Servers network;

  • 用户通过互联网访问服务,请求先到达路由器,然后通过交换机转发。
  • 主服务器拥有虚拟IP(VIP),作为对外服务的唯一入口点。
  • 备份服务器准备在主服务器发生故障时接管VIP,保证服务的持续可用性。
  • 虚拟IP(VIP)指向主服务器,但在需要时可以快速迁移到备份服务器。
  • 故障转移的逻辑被表示为从主服务器指向备份服务器的箭头。
---
title: 故障转移视图
---
graph TD;
    subgraph Internet["互联网"]
        userDevice1(("用户设备1"))
        userDevice2(("用户设备2"))
        router[("路由器")]
        userDevice1 -->|请求服务| VIP
        userDevice2 -->|请求服务| VIP
    end
    subgraph LAN["局域网"]
        switch[("交换机")]
        VIP(("虚拟IP")) --> switch
    end
    subgraph Servers["服务器集群"]
        master[("主服务器")]
        backup1[("备份服务器1")]
        backup2[("备份服务器2")]
        switch --> master
        switch --> backup1
        switch --> backup2
    end
    VIP -. "当前指向" .-> master
    master -. "故障时转移至" .-> backup1
    backup1 -. "故障时转移至" .-> backup2
    classDef default fill:#1E90FF,stroke:#333,stroke-width:2px;
    classDef network fill:#708090,stroke:#333,stroke-width:2px;
    class Internet,LAN,Servers network;

Keepalived 在传统高可用架构下的位置

---
title: 传统高可用架构部署视图
---
graph TD;
    subgraph Internet["互联网"]
        userDevice1(("用户设备1"))
        userDevice2(("用户设备2"))
        userDevice1 -->|请求服务| VIP
        userDevice2 -->|请求服务| VIP
    end
    subgraph LAN["局域网"]
        switch[("交换机")]
        VIP(("虚拟IP")) --> switch
    end
    subgraph LB_Cluster["负载均衡器集群(LVS)"]
        LVS_Master[("LVS 主节点")]
        LVS_Backup1[("LVS 备份节点1")]
        LVS_Backup2[("LVS 备份节点2")]
        switch --> LVS_Master
        switch ----> LVS_Backup1
        switch ----> LVS_Backup2
    end
    subgraph Keepalived_Cluster["Keepalived 集群"]
        Keepalived_Master[("Keepalived 主")]
        Keepalived_Backup1[("Keepalived 备份1")]
        Keepalived_Backup2[("Keepalived 备份2")]
        LVS_Master --> Keepalived_Master
        LVS_Backup --> Keepalived_Backup1
        LVS_Backup --> Keepalived_Backup2
    end
    subgraph Backend_Servers["后端服务器"]
        server1[("服务器1")]
        server2[("服务器2")]
        server3[("服务器3")]
        LVS_Master --> server1
        LVS_Master --> server2
        LVS_Master --> server3
    end
    VIP -. "当前指向" .-> LVS_Master
    LVS_Master -. "故障时转移至" .-> LVS_Backup
    Keepalived_Master -. "管理" .-> LVS_Master
    Keepalived_Backup1 -. "备用管理" .-> LVS_Backup
    Keepalived_Backup2 -. "备用管理" .-> LVS_Backup
    classDef default fill:#1E90FF,stroke:#333,stroke-width:2px;
    classDef network fill:#708090,stroke:#333,stroke-width:2px;
    class Internet,LAN,LB_Cluster,Keepalived_Cluster,Backend_Servers network;

  • 用户设备通过互联网向虚拟IP(VIP)发送服务请求。
  • 虚拟IP(VIP)由Keepalived管理,并指向LVS的主节点。
  • LVS的主节点和备份节点构成了负载均衡器集群,负责将请求分发到后端服务器。
  • Keepalived集群包括主节点和备份节点,用于管理LVS集群的高可用性,确保在LVS主节点发生故障时能够将VIP快速迁移到备份节点。
  • 后端服务器接收来自LVS主节点的请求,并处理这些请求。

改造前引擎高可用机制数据流向

---
title: 改造前引擎高可用机制数据流向
---
graph TD;
    subgraph Internet["互联网"]
        userDevice1(("客户设备1"))
        userDevice2(("客户设备2"))
        router_public[("公网路由器|防火墙")]
        userDevice1 -->|访问总控| router_public
        userDevice2 -->|访问总控| router_public
    end
    subgraph PublicNetwork["公网可访问网络"]
        switch_public[("公网交换机")]
        router_public --> switch_public
    end
    subgraph SC[总控]
        sc[(sc)]
        sv[(sv)]
        switch_public --> sc
    end
    subgraph DataCenter["数据中心网络"]
        router_datacenter[("数据中心路由器|防火墙")]
        switch_datacenter[("数据中心交换机")]
        router_datacenter --> switch_datacenter
    end
    subgraph SEM[管理面]
        VIP((虚拟IP))
        subgraph SEMNode1[管理面节点1]
            other_comp1[(管理面组件)]
        end
        subgraph SEMNode2[管理面节点2]
            other_comp2[(管理面组件)]
        end
        subgraph SEMNode3[管理面节点3]
            other_comp3[(管理面组件)]
        end
        subgraph KeepalivedSEM[管理面Keepalived]
            keepalived_master_SEM[("Keepalived 主")]
            keepalived_backup_SEM[("Keepalived 备份")]
            keepalived_master_SEM --> keepalived_backup_SEM
            keepalived_backup_SEM -.-> keepalived_master_SEM
        end
        other_comp1 <-->|服务搭建等管理工作| keepalived_master_SEM
        VIP -.-> other_comp1 -. "故障转移" .-> keepalived_backup_SEM
    end
    subgraph SES[调度集群]
        VIP_SES((业务VIP表))
        subgraph SES_LVS[调度LVS]
            LVS_SES_Master[("LVS 主节点")]
            lvs_policy((策略管理))
            LVS_SES_Backup[("LVS 备份节点")]
            LVS_SES_Master -->|调度请求| LVS_SES_Backup
            LVS_SES_Backup -.->|故障转移| LVS_SES_Master
            LVS_SES_Master -.-> lvs_policy
        end
        subgraph KeepalivedSES[调度Keepalived]
            keepalived_master_SES[("Keepalived 主")]
            keepalived_backup_SES[("Keepalived 备份")]
            keepalived_master_SES --> keepalived_backup_SES
            keepalived_backup_SES -.-> keepalived_master_SES
        end
        VIP_SES -.-> LVS_SES_Master
        LVS_SES_Master --> keepalived_master_SES
    end
    subgraph SEW[工作集群]
        subgraph Backend_Servers1["工作节点集群"]
            envoy1[("Envoy")]
            polycuber1[("Polycuber")]
        end
        subgraph Backend_Servers2["工作节点集群"]
            envoy2[("Envoy")]
            polycuber2[("Polycuber")]
        end
        subgraph Backend_Servers3["工作节点集群"]
            envoy3[("Envoy")]
            polycuber3[("Polycuber")]
        end
    end

    subgraph UserServer[业务服务集群]
        subgraph ServerCluster1["业务服务集群1"]
			node1[(node)]
			node2[(node)]
        end
        subgraph ServerCluster2["业务服务集群2"]
			node3[(node)]
        end
        subgraph ServerCluster3["业务服务集群3"]
			node4[(node)]
        end
    end
    subgraph UserSpace["用户|服务访问流量"]
    	user1((用户访问))
    	frontend((前端业务))
    end
    other_comp1 --> |管理| SES
    other_comp1 ---> |管理| SEW

    LVS_SES_Master --业务流量转发--> envoy1 & envoy2 & envoy3
    user1 & frontend --> VIP_SES
    envoy1 -->|反向代理| node1 & node2 & node3
    envoy2 -->|重定向| node4
    envoy3 <-->direct_response[直接响应] 
    switch_public -->|接受来自SC的流量| router_datacenter
    switch_datacenter --> VIP

    classDef network fill:#bbf,stroke:#333,stroke-width:2px;
    class Internet,PublicNetwork,DataCenter network;
    
    class Internet internet
    class PublicNetwork publicNet;
    class DataCenter dataCenter;
    class SEM management;
    class SES scheduling;
    class SEW workCluster;
    class UserServer userService;
    class UserSpace userAccess;
	classDef internet fill:#bde0fe,stroke:#333,stroke-width:2px;
    classDef publicNet fill:#fed7b2,stroke:#333,stroke-width:2px;
    classDef dataCenter fill:#d4a5a5,stroke:#333,stroke-width:4px;
    classDef management fill:#fbcfe8,stroke:#333,stroke-width:2px;
    classDef scheduling fill:#fef08a,stroke:#333,stroke-width:2px;
    classDef workCluster fill:#bbf7d0,stroke:#333,stroke-width:2px;
    classDef userService fill:#d9f99d,stroke:#333,stroke-width:2px;
    classDef userAccess fill:#a7f3d0,stroke:#333,stroke-width:2px;
    classDef flowLine stroke:#2563eb,stroke-width:2px,stroke-dasharray: 5, 5;
    linkStyle 3,4,17,18,29,30 stroke:#db2777,stroke-width:12px;
    linkStyle 19,20,21,22,23,24,25,26 stroke:#2563eb,stroke-width:2px,stroke-width:8px ;

改造后引擎高可用机制数据流向

---
title: 改造后引擎高可用机制数据流向
---
graph TD;
    subgraph Internet["互联网"]
        userDevice1(("客户设备1"))
        userDevice2(("客户设备2"))
        router_public[("公网路由器|防火墙")]
        userDevice1 -->|访问总控| router_public
        userDevice2 -->|访问总控| router_public
    end
    subgraph PublicNetwork["公网可访问网络"]
        switch_public[("公网交换机")]
        router_public --> switch_public
    end
    subgraph SC[总控]
        sc_m_nic((管理口))
        sc[(sc)]
        sv[(sv)]
        sc_core[(sc-core)]
        switch_public <-.-> sc_m_nic --> sc_core
        sc & sv -.-> sc_core
    end
    subgraph DataCenter["数据中心网络"]
        router_datacenter[("数据中心路由器|防火墙")]
        switch_datacenter[("数据中心交换机")]
        router_datacenter <-.-> switch_datacenter
    end
    subgraph SEM[管理面]
        subgraph SEMNode1[管理面节点1]
            m_nic1((管理口))
            other_comp1[(管理面组件流量入口)]
            controller1[(管理面组件)]
            apiserver_node1[apiserver]
            m_nic1 <-.-> other_comp1
            other_comp1 --> controller1 --> apiserver_node1
        end
        subgraph SEMNode2[管理面节点2]
            m_nic2((管理口))
            other_comp2[(管理面组件流量入口)]
            controller2[(管理面组件)]
            apiserver_node2[apiserver]
            m_nic2 <-.-> other_comp2
            other_comp2 --> controller2 --> apiserver_node2
        end
        subgraph SEMNode3[管理面节点3]
            m_nic3((管理口))
            other_comp3[(管理面组件流量入口)]
            controller3[(管理面组件)]
            apiserver_node3[apiserver]
            m_nic3 <-.-> other_comp3
            other_comp3 --> controller3 --> apiserver_node3
        end
        subgraph Apiserver[Apiserver集群]
            apiserver1[(Apiserver)]
            apiserver2[(Apiserver)]
            apiserver3[(Apiserver)]
        end
        apiserver_node1 -.- apiserver1
        apiserver_node2 -.- apiserver2
        apiserver_node3 -.- apiserver3
    end
    subgraph SES[调度集群]
        VIP_SES((业务VIP表))
        subgraph SES_LVS[调度LVS]
            subgraph SESNode1
                ses_m_nic1((管理口))
                ses_d_nic1((业务口))
                LVS_SES_Master[("LVS 主节点")]
                lvs_policy((策略管理))
                ses_m_nic1 -.- lvs_policy
            end
            subgraph SESNode2
                ses_m_nic2((管理口))
                ses_d_nic2((业务口))
                LVS_SES_Backup[("LVS 备份节点")]
            end
            
            LVS_SES_Master -->|调度请求| LVS_SES_Backup
            LVS_SES_Backup -.->|故障转移| LVS_SES_Master
            LVS_SES_Master -.-> lvs_policy
        end
        subgraph KeepalivedSES[调度Keepalived]
            keepalived_master_SES[("Keepalived 主")]
            keepalived_backup_SES[("Keepalived 备份")]
            keepalived_master_SES --> keepalived_backup_SES
            keepalived_backup_SES -.-> keepalived_master_SES
        end
        VIP_SES -.- ses_d_nic1 --> LVS_SES_Master
        LVS_SES_Master --> keepalived_master_SES
    end
    subgraph SEW[工作集群]
        subgraph Backend_Servers1["工作节点集群"]
            sew_m_nic1((管理口))
            sew_d_nic1((业务口))
            envoy1[("Envoy")]
            polycuber1[("Polycuber")]
            sew_m_nic1 & sew_d_nic1 -.- envoy1
        end
        subgraph Backend_Servers2["工作节点集群"]
            sew_m_nic2((管理口))
            sew_d_nic2((业务口))
            envoy2[("Envoy")]
            polycuber2[("Polycuber")]
            sew_m_nic2 & sew_d_nic2 -.- envoy2
        end
        subgraph Backend_Servers3["工作节点集群"]
            sew_m_nic3((管理口))
            sew_d_nic3((业务口))
            envoy3[("Envoy")]
            polycuber3[("Polycuber")]
            sew_m_nic3 & sew_d_nic3 -.- envoy3
        end
    end

    subgraph UserServer[业务服务集群]
        subgraph ServerCluster1["业务服务集群1"]
			node1[(node)]
			node2[(node)]
        end
        subgraph ServerCluster2["业务服务集群2"]
			node3[(node)]
        end
        subgraph ServerCluster3["业务服务集群3"]
			node4[(node)]
        end
    end
    subgraph UserSpace["用户|服务访问流量"]
    	user1((用户访问))
    	frontend((前端业务))
    end
    ses_m_nic1 & SEW ---> Apiserver
    m_nic1 & m_nic2 & m_nic3 <-.-> switch_datacenter

    LVS_SES_Master --业务流量转发--> sew_d_nic1 & sew_d_nic2 & sew_d_nic3
    user1 & frontend --> VIP_SES
    envoy1 -->|反向代理| node1 & node2 & node3
    envoy2 -->|重定向| node4
    envoy3 <--> direct_response[直接响应] 
    router_datacenter <-.->|放通dst为sc的流| switch_public

    classDef network fill:#bbf,stroke:#333,stroke-width:2px;
    class Internet,PublicNetwork,DataCenter network;
    
    class Internet internet
    class PublicNetwork publicNet;
    class DataCenter dataCenter;
    class SEM management;
    class SES scheduling;
    class SEW workCluster;
    class UserServer userService;
    class UserSpace userAccess;
	classDef internet fill:#bde0fe,stroke:#333,stroke-width:2px;
    classDef publicNet fill:#fed7b2,stroke:#333,stroke-width:2px;
    classDef dataCenter fill:#d4a5a5,stroke:#333,stroke-width:4px;
    classDef management fill:#fbcfe8,stroke:#333,stroke-width:2px;
    classDef scheduling fill:#fef08a,stroke:#333,stroke-width:2px;
    classDef workCluster fill:#bbf7d0,stroke:#333,stroke-width:2px;
    classDef userService fill:#d9f99d,stroke:#333,stroke-width:2px;
    classDef userAccess fill:#a7f3d0,stroke:#333,stroke-width:2px;
    classDef flowLine stroke:#2563eb,stroke-width:2px,stroke-dasharray: 5, 5;
    linkStyle 0 stroke:#db2777,stroke-width:12px;
    linkStyle 0 stroke:#2563eb,stroke-width:2px,stroke-width:8px;
    linkStyle 0 stroke:#db2777,stroke-width:12px,color:red;
  • 用户设备通过互联网访问总控。它们首先连接到公网路由器,然后通过公网交换机到达总控节点。
  • 总控节点被部署在一个公网可访问的网络内,允许来自互联网的访问。
  • 数据中心网络内包括引擎集群(管理面、调度集群、工作集群),它们通过数据中心路由器数据中心交换机连接。
  • 管理面集群中的节点通过Keepalived实现高可用性,使用虚拟IP(VIP)为入口点。
  • VIP用于路由请求到调度集群,然后调度集群将任务分配给工作集群进行处理。

改造后物理架构

通信链路改造前静态拓扑

通信链路改造后静态拓扑

a

sr_mmrefactor_ses

通信链路改造-工作节点故障转移

问题场景

云服务场景

公有云服务中一般 VIP 需要单独开,给服务部署带来更多要求。

灾难恢复

Keepalived主要设计用于单个数据中心或地理位置相近的网络中的高可用性。网络波动大、延时波动大的情况下,Keepalived 会出现频繁切主现象。

监控日志管理

虽然Keepalived提供基本的日志记录功能,但在复杂系统中,集成更先进的监控和日志分析通常是必需的,团队中没有修改 Keepalived 代码能力。

安全性

Vrrp 协议本身是未加密的通信,并且可能在用户网络中与已有的组播冲突 (224.0.0.18)

功能限制

Keepalived 是做服务器层面高可用,不可能在此基础上做应用高可用。

复杂

对于一些简单的高可用性需求,Keepalived的功能和配置可能显得过于复杂,特别是在没有负载均衡需求时。

技术栈不匹配

Keepalived 使用 ANSI C 编写,引擎管理面使用 Golang,技术栈上不匹配,后期不能维护

Web界面适配困难

只提供文本文件进行配置。

参考链接


管理面HA改造答疑.md
https://abrance.github.io/2024/04/07/mdstorage/project/sr/通信链路改造/管理面HA改造答疑/
Author
xiaoy
Posted on
April 7, 2024
Licensed under