The goal of the High Availability (HA) test suite is to verify how ONOS performs when there are control plane failures. As HA is a key goal in ONOS, we would like to see ONOS gracefully handle failures and continue to manage the data plane during these failures. In cases where ONOS becomes unavailable, say the entire cluster is rebooted, ONOS should quickly recover any persistent data and get back to running the network.
The general structure of this test suite is to start an ONOS cluster and confirm that it has reached a stable working state. Once this state is reached, we will trigger some failure scenario and verify that ONOS correctly recovers from the failure. Below you will find two tables, the first table describes the failure scenarios and the second table describes the functionality and state tests that are performed in each of the HA tests.
Test | Failure Scenario | TestON Test name |
This tests runs through all the state and functionality checks of the HA Test suite but waits 60 seconds instead of inducing a failure. This is run as a 7 node ONOS cluster.
| HAsanity | |
Restart 3 of 7 ONOS nodes by gracefully stopping the process once the system is running and stable. | HAstopNodes | |
Minority of ONOS Nodes continuous shutdown | Continuously (1000 times) restart 1 of 7 ONOS nodes iteratively by gracefully stopping the process once the system is running and stable. Then verify the node correctly restarts and joins the cluster. | HAcontinuousStopNodes |
Restart 3 of 7 ONOS nodes by killing the process once the system is running and stable. | HAkillNodes | |
Restart 7 of 7 ONOS nodes by killing the process once the system is running and stable. | HAclusterRestart | |
Restart 1 of 1 ONOS nodes by killing the process once the system is running and stable. | HAsingleInstanceRestart | |
Partition the Control Network by creating IP Table rules once the system is in a stable state. During Partition:
After partition is healed:
| HAfullNetPartition | |
Dynamic Clustering: Swap nodes | Change membership of an ONOS cluster at run time
| HAswapNodes |
Dynamic Clustering: Scale up/down | Change the size of an ONOS cluster at run time
| HAscaling |
Description | Passing Criteria |
Topology Discovery |
|
Device Mastership |
|
Intents |
|
Switch Failure |
|
Link Failure
|
|
Leadership Election | Applications can run for leadership of topics. This service should be safe, stable and fault tolerant.
|
Distributed Sets | Call each of the following APIs and make sure they are functional and cluster wide
In addition, we also check that sets are unaffected by ONOS failures |
Distributed Atomic Counters | Call each of the following APIs and make sure they are functional and cluster wide
In addition, we also check that sets are unaffected by ONOS failures. Note: In-memory counters will not persist across cluster wide restarts |
Cluster Service |
|
Application Service |
|
Last Update:
by: