This section discusses the distributed aspects of ONOS's network topology representation.
As part of its northbound API, ONOS provides applications with access to a global network topology view. Applications operate on this view to perform various network functions such as path computation and flow provisioning, among others.
A primary concern in global network topology state management is the maintaining of the consistency of state deployed across various ONOS instances. Each controller must expose a view of the entire network even though at any given point in time it only has direct visibility over a subset of the network.
There are some properties that a good network topology state management solution should have in a distributed setting:
In ONOS, the entire global network topology state is cached in memory on each instance. This provides applications with low-latency access to topology state. Before we go into the details of how the topology state is kept in sync across instances and with the state of the physical network, it is useful to define a few concepts:
As discussed in Device Subsystem, each network device may have direct TCP connections to one or more ONOS instances, and as described in the previous section, ONOS elects one of those controllers to serve as the master for the device. The system invariant is that at any given point in time a device can have one and only one master. If the current master dies, ONOS elects a new master from amongst the controllers that the device can talk to. A mastership term is a per-device monotonically increasing counter (starting at 0) that is bumped each time a new master is elected. The very first time a switch comes online and a new master is elected, the mastership term value is 1. If that master dies and a new master is elected the term is bumped to 2 and so on.
The information regarding which controller is the current master for a device and the associated term number is tracked in a strongly consistent data store.
In a given mastership term, the elected master receives various topology events from the device. These could be events such as switch connected, port online, port offline, link down, etc. The current master maintains a per-switch counter to tag the various topology events it has received from the device during its term. So the sequence number is a monotonically increasing counter that is initialized to 0 at the start of the term and is incremented when a new topology event is detected by the master.
Now to the important part, given the mastership term and sequence number concepts, it is possible for us to assign a logical timestamp to each topology event emanating from the network that will lets us completely order the topology events originating from a that device. That timestamp is the tuple (mastership_term, sequence_number)
Take any two events for a device, those events can be ordered by first ordering based on the mastership term and if they both have the same term, by comparing their sequence numbers. It is very important to note that the logical timestamps will only let you order events emanating from one device. Two events emanating from different devices are considered independent of each other.
Each ONOS instance maintains a in-memory representation of the global network view that is updated as follows:
As should be evident by now, the role played by the event logical timestamps is to ensure the topology state machine evolves in the right direction over time by incorporating newer information. This remains true even when messages get lost, delayed or reordered.
In the setup described above, it is possible that a controller that is temporarily partitioned away will stop receiving updates from peers. Even under normal operation, there is no guarantee that each and every message that is broadcasted will be received by all members of the cluster. If left to its own devices :) , a system purely based on an optimistic replication technique like the one described above will get progressively out of sync and that is no good.
Another class of failures pertains to controller crashes that effectively results in a loss of topology updates. Consider a controller that, on receiving a topology event, promptly crashes before it could replicate that event to other controllers in the cluster. While ONOS automatically elects another controller as the new master for the device, the original topology event is still effectively lost. If that event was for a port going offline, the network view in each controller will continue to show the port as up if nothing else happens in the system. This is bad as well.
To detect and fix issues such as a the ones described above, ONOS employs a couple of techniques: