Objective:
This test is to characterize ONOS' core capability of sustaining a load of topology events when ONOS cluster scales from 1 to 3, 5, 7 nodes.
Method:
Since ONOS core handles topology events (include switch, port and link events) in a similar way, we select link events as load and metrics to characterize ONOS' capability. At the time of designing this test, we expect ONOS' capability to handle topology events to be in such a high level that is difficult for Openflow hardware or emulated devices to generate the sufficient load. We instrumented within ONOS Null Provider a link flickering generator to achieve a high adjustable rate of link up/down events.
When link events being generated from the generator, the test should periodically poll "topology-events-metrics" for an extended period of time (i.e. for using "m1" meters, the sustained period should be more than 3~4 minutes) and obtain a stable rate of Link and Graph Events. When the link event rate and graph event rate starting to diverge, and/or the metered rate starting to decrease, it is an indication that ONOS is incapable of sustaining the load generated. It is possible as well that the metric-reported rate can decrease from this point due to the mechanism of the generator implementation. This is due to the link event generator running on ONOS threads. As the generated rate start to overwhelm the server, it is possible that the generator also starting to behave erratically and causing the meter to fluctuate.
We adopt this method that the ONOS Link Event Throughput is determined as the highest sustained generator rate before any decrease in rate and divergence of Link-Graph events rate.
Diagram showing how link flickers events are generated and propagate through ONOS cluster.
Procedure:
The following steps are used to characterize performance of ONOS as a cluster of 1, 3, 5, 7 nodes:
1) start the load generator at a moderate rate;
2) periodically check "topology-events-metrics"; we use "m1" meter as the metric; "m1" meter should be rising gradually depending on the load. It should saturate within 3~4 minutes of sustained load;
3) the observed "Topology Link Events" and "Topology Graph Events" meters should be closely tracking with each other and the generator rate;
4) increase the load generator rate by changing "eventRate" in "org.onosproject.provider.nil.link.impl.NullLinkProvider.cfg";
5) repeat 2) ~ 4) until "m1" meters does not rise monotonically or the above two meters are out of sync. (Note: with the current flicker generator implementation, the generating rate can drop and thrash once system reach saturation point)
6) we consider the rate before the unstable system behavior in 5) starting as the rate ONOS cluster can sustain.