From the network under-controlled point of view, an important attribute of ONOS is its capability to handle a high level of topology event changes. Network topology changes upto a certain level is expected in a real network due to new switches and links being installed, or regular failures of such. The level of changes may be different from different types and sizes of the network. This test is to characterize, in the worse case ONOS configuration, the level of change events that ONOS core can sustain.
Setup and Methods:
We expect ONOS to be able to handle an unusually high amount of topology change events, which may be difficult if artificially generated from an emulated or real network. We instrumented a tool in the Null (Link) Provider a Link Flicker Generator to cause ONOS listening to link change events at an adjustable rate. The metrics that we use to monitor ONOS core's sustaining the generator rate are the Link and Graph Events Rate meters in "topology-events-metrics" app. The bellow diagram depicts the experimental setup.
Link Flicker Generators generate link descriptions of vanishing and detected links directly consumed by ONOS SB Core API, which in turn triggers Link and Graph Events in the core. The rate generated is determined by: 1) a "eventRate" parameter in the Null Link Provider determines the rate generated by each generator thread; 2) the number of threads running within a server - each link in the Null Link Provider runs a thread, up to 80% of the total number of CPU cores on the server. Therefore, the total rate generated is the multiple of the two. Rate on each flicker thread can be monitored from the ONOS log.
We monitor the behavior of Link and Graph Event Rate meters to determine if ONOS core is capable of handling the level of events being generated. In order to do so, we first turn off the topology event batching by configuring "org.onosproject.net.topology.impl.DefaultTopologyProvider.cfg" with "maxEvents = 1; maxIdleMs = 0; maxBatchMs = 0" - i.e. turning every Link event into a Graph event immediately. This is not a desirable configuration when operating real network; however provides us with ONOS characteristics in the worst case scenario. By monitoring the divergence of Link and Graph Events rates we can determine a point when ONOS no longer can handle the amount of event generated from the generator. This point is not absolute, but provides the reasonable metric to reflect ONOS' topology event handling throughput.
The following steps are taken to make the measurement:
- Calibrate the generator rate with the Link Event Rate meter to be reasonably closely tracking each other;
- Start the generator at a moderate rate to observe Link and Graph Event Rate meter tracking to each other for an extended period of time (approx. 3min) so that Graph Event Rate meter does not trail Link Event Rate by more than 1% consistently;
- Gradually increase the generator rate by the Null Link Provider config file in moderate steps (e.g. 10% increase on rate) until we observe the divergence of Graph from Link Event Rates to be greater than 1%;
- The rate prior to the divergence is recorded as the sustained rate for the cluster scale;
- We run the above step over a 1, 3, 5, 7 -node clusters to gather characteristics when ONOS scales.