Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: correct adv. to adj.

...

  1. Completeness: Even though each controller only has direct visibility and influence over a subset of the network, they should all work together to ensure each controller’s network topology view reflects the state of the entire network.
  2. Accuracy: Switches, ports and links go up and down. Each controller’s network view should always expose the correct state for various network elements. This also means that each controller’s network view should quickly change to reflect any changes in the underlying network.
  3. Low latency access: Network topology view is a heavily consumed piece of state and therefore it is very important that the chosen mechanism provide low latency access to the network view.

Approach

...

In a given mastership term, the elected mater master receives various topology events from the device. These could be events such as switch connected, port online, port offline, link down, etc. The current master maintains a per-switch counter to tag the various topology events it has received from the device during its term. So the sequence number is a monotonically increasing counter that is initialized to 0 at the start of the term and is incremented when a new topology event is detected by the master.

...

  • An anti-entropy mechanism based on Gossip protocol, which works as follows: at fixed intervals (usually 3-5 seconds), a controller randomly picks another controller and they both synchronize their respective topology views. If one controller is aware of more recent information that the other controller does not have, they exchange that information and at the end of that interaction, their respectively respective topology views are mutually consistent. Most of the time the anti-entropy interaction will be uneventful, as each controller already knows about every event that happened in the network. But when a controller state drifts slightly, this mechanism quickly detects that and brings the controllers back in sync. This approach has the added benefit of quickly synchronizing a newly-joining controller with the rest. The first anti-entropy interaction that a newly joining controller has with an existing controller will bring it up to speed, without the need for a separate backup/discovery mechanism. 
     
  • For detecting and recovering from complete loss of topology updates, each controller periodically probes the devices for which it is the master. If it detects that the device state is different from the information it has, it promptly updates its local topology state and replicates that update to all other controllers in the cluster.

...

Previous : Cluster Coordination
  Next : Intent Framework 

...