Split Brain Scenario and HA pair(High Availability pair)

    The “split brain” scenario is an exceptional circumstance where two healthy nodes in an HA pair both believe themselves to be the Active node, and that the other node is the Standby.
    This scenario represents an unhealthy HA pair, and must be resolved.
    The “split brain” scenario can occur after a failed forced failover operation. Specifically:
    1.The healthy Standby node becomes an Active node.
    2.The unhealthy Active node fails to become the Standby node.
    3.The unhealthy Active node is repaired. Both nodes are now healthy and Active, and each also believes the other node in the HA pair to be the Standby node. This is the “split brain” scenario.
    To resolve the “split brain” scenario, perform a Forced Standby operation from the re­paired Active node. This forces the repaired Active node to become the Standby node in the HA pair.
原文链接:Recovering from a Split Brain Scenario

手动翻译如下:
    Split Brain Scenario是发生在HA pair(High Availability pair)(分布式系统)中,两个健康节点均认为自己是活动节点(Active node),另一个节点是备份节点(Standby node)的特殊情况。
    这种情况意味着这是一个不健康的HA pair,如果想系统正常运行,必须解决这个问题。
    Split Brain Scenario会发生在一次失败的强制故障切换(failover)之后(网络异常或活动节点发生故障)。具体如下:

  1. 原来的活动节点发生故障,正常运行的备份节点变成活动节点。
  2. 发生故障的活动节点不能转换为备份节点。
  3. 当发生故障的活动节点被修复好了,系统中就存在两个正常运行的活动节点(均认为自己是Master节点),两个节点互相认为对方节点是备份节点。这就是Split Brain Scenario。
    Split Brain Scenario and HA pair(High Availability pair)

    为了解决Split Brain Scenario,需要在修复好的活动节点上运行一个强制备份操作。这将强制修复好的活动节点变成备份节点,从而解决Split Brain Scenario。
Split Brain Scenario and HA pair(High Availability pair)

    An HA pair is two storage systems(nodes) whose controllers are connected to each other directly.In this configuration,one node can take over its partner’s storage to provide continued data service if the partner goes down.
    You can configure the HA pair so that each node in the pair shares access to a common set of storage,subnets and tape drives,or each node can own its own distinct set of storage.
    The controller are connected to each other through an HA interconnection.This allows one node to serve data that resides on the disks of its failed partner node.Each node continually monitors its partner,mirroring the data for each other’s nonvolatile memory(NVRAM or NVMEN).The interconnect is internal and requires no external cabling if both controllers are in the same chassis.
    Takeover is the process in which a node takes over the storage of its partner.Giveback is the process in which that storage is returned to the partner.Both processes can be initiated manually or configured for automatic initiation.

手动翻译如下:
    一个HA对是两个由控制器(controller)直接相连的存储系统(节点)。在这种配置中,如果一个节点发生故障,另一个节点可以接管其伙伴节点的存储,并提供持续的数据服务(高可用)。
    你可以手动配置HA对,以便该对中的每个节点共享一组公共的存储、子网和磁带机的访问,或者每个节点都可以拥有自己不同的存储集。
    控制器通过一个HA互连相互进行通信。这允许一个节点为其发生故障的伙伴节点的磁盘上的数据提供服务。每个节点持续监视其伙伴,为彼此的非易失性存储器(NVRAM或NVMEN)镜像数据。HA互连是内部的,如果两个控制器位于同一机箱中,则不需要外部布线。
    Takeover是一个节点接管其伙伴存储的过程。Giveback是将存储返回给伙伴的过程。这两个过程都可以手动启动或配置为自动启动。