Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A multi-instance ONOS deployment is a cluster of one or more ONOS instances, or nodes, each with an unique NodeId. Each node in a cluster is capable of using local information gathered from its own services to generate events. These events are shared with all of the nodes in a cluster via distributed mechanisms implemented in various services' Stores.

...

  • Keeping track of the membership of a cluster
  • Delegating identifiers to nodes, in the form of NodeIds
  • Providing the notion of a local node, similar to localhost 

The DistributedClusterStore currently leverages Hazelcast for its cluster membership facilities by implementing Hazelcast's MembershipListener, and using it to translate MembershipEvents into ONOS ClusterEvents. It also relies on it for the setup and management of the multicast group used for inter-store communication. 

...

A node begins in the NONE role. In current implementations, the first node to confirm that 1) the device has no master tied to it, and 2) has a control channel connection to the device, becomes its master. Any other nodes that subsequently discover the device become either STANDBY, if it has a connection to the device, or remain as NONE if otherwise. The last case occurs when the DeviceService detects a device indirectly through the distributed store, or if a previously connected device disconnects. The MastershipManager maintains  The mapping of roles, nodes, and devices are kept in the MastershipStore as a distributed map of DeviceIds to RoleValue model objects.

The established roles can change as a result of various events. We currently consider the following events:

...

 The candidate can choose to become the new master, or facing failure scenarios, appoint another candidate upon role relinquishment. Reelection can occur up to N times, given that there are N standby nodes for the device. Such a chain of handoffs can arise if the device fully disconnects from the control plane, and this mechanism serves to prevent endless reelections.

Handling Split-brain Scenarios

Given that a cluster splits into two of different sizes, the nodes in the smaller cluster will relinquish their roles, or, incapable of doing so, members of the larger cluster will force reelections for devices whose master nodes became part of the smaller cluster. The MastershipManager determines wether if it is in the minority or majority by querying the ClusterService.

The current ONOS implementation does not handle the case where the two partitions are equal in size, and both are still connected to the network. This is expected to be implemented very soon.