...
The goal of the High Availability (HA) tests test suite is to verify how ONOS performs when there are control plane failures. As HA is a key goal in ONOS, we would like to see ONOS gracefully handle failures and continue to manage the data plane during these failures. In cases where ONOS becomes unavailable, say the entire cluster is rebooted, ONOS should quickly recover any persistent data and get back to running the network.
...
High Availability Tests Scenarios
Test | Failure Scenario | TestON Test name |
This tests runs through all the state and functionality checks of the HA Test suite but waits 60 seconds instead of inducing a failure. This is run as a 7 node ONOS cluster. |
HAsanity |
now
Restart 3 of 7 ONOS nodes by gracefully stopping the process once the system is running and stable. | HAstopNodes |
| Minority of ONOS Nodes continuous shutdown | Continuously (1000 times) restart 1 of 7 ONOS nodes iteratively by gracefully stopping the process once the system is running and stable. Then verify the node correctly restarts and joins the cluster. | HAcontinuousStopNodes |
Restart 3 of 7 ONOS nodes by killing the process once the system is running and stable. | HAkillNodes |
Restart 7 of 7 ONOS nodes by killing the process once the system is running and stable. | HAclusterRestart |
Restart 1 of 1 ONOS nodes by killing the process once the system is running and stable. | HAsingleInstanceRestart |
now
Partition the Control Network by creating IP Table rules once the system is in a stable state. During Partition:
After partition is healed:
|
HAfullNetPartition |
| Dynamic Clustering: Swap nodes | Change membership of an ONOS cluster at run time
|
| HAswapNodes |
| Dynamic Clustering: Scale up/down | Change the size of an ONOS cluster at run time
| HAscaling |
| Offline Backup Recovery | Take a backup of ONOS data and resore ONOS using the backup
| HAbackupRecover |
| ISSU | Perform an In-Service Software Upgrade (ISSU) of ONOS
| HAupgrade |
| ISSU - Rollback | Rollback an In-Service Software Upgrade (ISSU) of ONOS
| HAupgradeRollback |
State and Functionality Checks in the HA Test Suite
Description | Passing Criteria |
| Topology Discovery |
|
| Device Mastership |
|
Intents |
|
now
Switch Failure |
|
| Link Failure |
|
| Leadership Election | Applications can run for leadership of topics. This service should be safe, stable and fault tolerant.
|
Distributed Sets | Call each of the following APIs and make sure they are functional and cluster wide
In addition, we also check that sets are unaffected by ONOS failures |
now
Distributed Atomic Counters | Call each of the following APIs and make sure they are functional and cluster wide
In addition, we also check that sets are unaffected by ONOS failures. Note: In-memory counters will not persist across cluster wide restarts |
Cluster Service |
|
now
Application Service |
|
now
...