You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 12
Next »
High Availability Tests Scenarios
Test | Failure Scenario | TestON Test name | Roadmap |
HA Sanity Test | This tests runs through all the state and functionality checks of the HA Test suite but waits 60 seconds instead of inducing a failure. This is run as a 7 node ONOS cluster. | HATestSanity | now |
Minority of ONOS Nodes restart | Restart 3 of 7 ONOS nodes by killing the process once the system is running and stable. | HATestMinorityRestart | now |
Entire ONOS cluster restart | Restart 7 of 7 ONOS nodes by killing the process once the system is running and stable. | HATestClusterRestart | now |
Single node cluster restart | Restart 1 of 1 ONOS nodes by killing the process once the system is running and stable. | SingleInstanceHATestRestart | now |
Control Network partition | Partition the Control Network by creating IP Table rules once the system is in a stable state. During Partition: - Topology is replicated within the sub-cluster the event originated in
- Get flows will only show flows within a sub-cluster
- Intents should only be available in the majority cluster (Intent behavior is not fully defined for the raft implementation)
- Mastership behavior has not been defined for split-brain scenario
After partition is healed: - Topology is consistent across all nodes and should reflect the current state of the network(including updates during partition)
- Flows view should be consistent across all nodes
- Intents should be available on all nodes including any new intents pushed to the majority sub-cluster
- Mastership should be consistent across all nodes
| | |
Partial network partition | Partially partition the Control Network by creating IP Table rules once the system is in a stable state. (A and B can't talk, but both can talk to C) - Topology should be consistent across all nodes
- Flows view will show reachable controllers (A sees AC and B see BC and C sees ABC)
- Intents(Wait for Raft implementation, but will depend on which node is the raft leader)
- Mastership(Wait for Raft implementation)
| | |
State and Functionality Checks in the HA Test Suite
Description | Passing Criteria | Roadmap |
Topology Discovery | - All Switches, Links, and Ports are discovered
- All information (DPIDs, MACs, Port numbers) are correct
- ONOS correctly discovers any change in dataplane topology
- Each node in an ONOS cluster has the same correct view of the topology
| now |
Device Mastership | - Devices have one and only one ONOS node as master
- Mastership correctly changes when device roles are manually changed
- Mastership fails over if current master becomes unavailable
- Each node in an ONOS cluster has the same view of device mastership
| now |
Intents | - Intents can be added between hosts
- Hosts connected by Intents have dataplane connectivity
- Intents remain in the cluster as long as some ONOS nodes are available
- Connectivity is preserved during dataplane failures as long as at least one path exists between the hosts
| now |
Switch Failure | - Topology is updated and intents are recompiled
| now |
Link Failure | - Topology is updated and intents are recompiled
| now |
Leadership Election | Applications can run for leadership of topics. This service should be safe, stable and fault tolerant. - The service is functional before and after failures, nodes can withdraw and run for election
- There is always only one leader per topic in an ONOS cluster.
| now |
Distributed Sets | Call each of the following APIs and make sure they are functional and cluster wide - get()
- size()
- add()
- addAll()
- contains()
- containsAll()
- remove()
- removeAll()
- clear()
- retain()
In addition, we also check that sets are unaffected by ONOS failures | now |
Distributed Atomic Counters | Call each of the following APIs and make sure they are functional and cluster wide In addition, we also check that sets are unaffected by ONOS failures. Note: In-memory counters will not persist across cluster wide restarts | now |
Cluster Service | - Every ONOS node should be clustered with every other node in the test (unless we specifically make one unavailable)
| now |
Application Service | - Application IDs are unique to an application
- Application activation
- Application deactivation
- Active applications reactivate on restart
| now |
Last Update:
by: