Page History

...

The first item is managed by the Cluster subsystem, and the second, by the Mastership subsystemwhich contains Cluster and Mastership management. The remaining sections elaborate on the distributed store, and describe the function of these managers.

...

Depending on the requirements of a service, the contents of a store may be either strongly or eventually consistent; this . This is made possible by having each service's store implement the appropriate distribution mechanism. Currently, Hazelcast's distributed structures are used for strong consistency, and a gossip protocol implementation implemented on Netty is used for eventual consistency. With the exception of cluster, mastership, and flow rule management, all services employ eventually consistent stores. . These mechanisms are used by stores of the same subsystem type (e.g. two GossipDeviceStores on two different nodes) to communicate directly with one another without traversing the ONOS stack.

At the time of this writing, all services with the exception of topology management have access to distributed stores.

Cluster Management

Info
We start by noting that ONOS also uses the term 'cluster' to refer to connected subgraphs of the network topology, which has no association to the cluster in the multi-instance sense. When clusters are mentioned in this section, they are strictly in terms of the latter.

...

The mastership Cluster subsystem is responsible for the following:

Keeping track of the membership of a cluster
Delegating identifiers to nodes, in the form of NodeIds
Providing the notion of a local node, similar to localhost

The DistributedClusterStore currently leverages Hazelcast for its cluster membership facilities by implementing Hazelcast's MembershipListener, and using it to translate MembershipEvents into ONOS ClusterEvents. It also relies on it for the setup and management of the multicast group used for inter-store communication.

Device Mastership Management

A device is free to connect to one or more nodes assigning per-device roles to each node in a cluster. The three roles that a node can take with respect to a device are:

NONE : The node may or may not have knowledge of the device, and cannot interact with it.
STANDBY : The node has knowledge of the device, and can read, but not write to it.
MASTER : the node has knowledge of the device, and have full read-write access to it.

A node can have different roles for different devices. The These three roles map to the NONE, SLAVE, and MASTER roles specified by OpenFlow v1.2+, respectively.

A device is free to connect to one or more nodes in a cluster. For a device connected to multiple nodes, only one node is MASTER at any given time, and the rest are either STANDBY or NONE.

2<, respectively, and are defined by the enum MastershipRole. The mastership subsystem is responsible for guaranteeing that every device has exactly one MASTER, and that the rest are either STANDBY or NONE. The following sections describe how the service assigns and reassigns roles, and recovers role assignments after various types of failures.

Node Mastership Lifecycle

A node begins in the NONE role. In current implementations, the first node to confirm that 1) the device has no master tied to it, and 2) has a control channel connection to the device, becomes its master. Any other nodes that subsequently discover the device become either STANDBY, if it has a connection to the device, or remain as NONE if otherwise. The last case occurs when the DeviceService detects a device indirectly through the distributed store, or if a previously connected device disconnects. The MastershipManager maintains

The established roles can change as a result of various events. We currently consider the following events:

Administrative intervention : an operator manually sets the role of a device
Disconnection of/from a device : the node loses control channel connectivity to a device
Disconnection from the cluster (Split-brain syndrome)

The MastershipManager responds to these role-changing events with role relinquishment and reelection to maintain the "at most one master per device" policy, and to ensure that a node incapable of properly handling a device doesn't get elected into mastership.

Info
The term 'node', 'MastershipManager', and 'MastershipService' are used interchangeably in these sections.

Role relinquishment

A node that relinquishes its role gives its current role up to fall back to the NONE role. A node will relinquish its role for a device if:

It loses its connection to a device, or the device fails
It becomes part of the minority during a split-brain situation
An administrative command sets its role to NONE
Consistency checks fail, e.g if an OpenFlow device responds to a RoleRequest with an error, or unanticipated mastership changes occur

Reelection

A node resigning from mastership may elect another node to become the new master for a device. Reasons for reelections include:

Failure (role relinquishment) of a master node
Device disconnection from a master node
Administrative demotion of a master to either STANDBY or NONE

A candidate node is selected from the pool of known standby nodes for a device. Currently, this pool is a ordered list of NodeIDs in preference order. This enables the relinquishing node to simply choose the next node on the list to ensure that the candidate is the next-best choice.

The candidate can choose to become the new master, or facing failure scenarios, appoint another candidate upon role relinquishment. Reelection can occur up to N times, given that there are N standby nodes for the device. Such a chain of handoffs can arise if the device fully disconnects from the control plane, and this mechanism serves to prevent endless reelections.

Handling Split-brain Scenarios

Given that a cluster splits into two of different sizes, the nodes in the smaller cluster will relinquish their roles, or, incapable of doing so, members of the larger cluster will force reelections for devices whose master nodes became part of the smaller cluster. The MastershipManager determines wether if it is in the minority or majority by querying the ClusterService.

The current ONOS implementation does not handle the case where the two partitions are equal in size, and both are still connected to the network. This is expected to be implemented very soon.

Page tree

Versions Compared

Old Version 2

New Version 3

Key