Objectives:
ONOS is a network controller. Applications interact with ONOS through its intent APIs. ONOS controls data networks (e.g. an Openflow network) through its adapter layer on the southbound. In between, ONOS’ flow subsystem is the key component for translating application intents into Openflow flow rules. ONOS is also a distributed system and it is essential that ONOS' distributed architecture shows performance gains as its cluster size increases. This evaluation takes an external view of ONOS as a clustering system and aims to characterize its performance from the perspective of applications and the operational environment.
We designed a set of experiments to characterize ONOS' latencies and throughput under various application and network environments. By analyzing the results, we hope to provide network operators and application developers with a "first look" of ONOS’ performance capability. In addition, the performance results should help developers gain insights for identifying performance bottlenecks and optimization opportunities. The following diagram illustrates the key performance evaluation points, viewing ONOS' distributed system as a whole.
The performance evaluation points indicated in the diagram include:
for latencies, A - Switch connect/disconnect; B - Link up/down; C - Intent Batch Install/Withdraw/Re-route;
for throughput, D - Intent operations; E - link events (Test deprecated; new test TBD); F - Burst flow rule installation.
General Experiment Setup:
Performance Measured at Increasing Scale:
ONOS’ most prominent characteristic is in its distributed system architecture. A key aspect of characterizing ONOS’ performance is to analyze and compare performance at various scales. The general theme of all test cases is to make measurements on ONOS as it scales from 1 node to 3, 5, 7, nodes.
Measurement Instrumentation:
In order to characterize ONOS’ intrinsic characteristics and not to be limited by test fixture performance, we instrumented a few utilities for the experiments.
For all experiments except switch and port related ones, which require Openflow interactions, we implemented in ONOS a set of Null Providers at the Adapter level to interact with the ONOS core. The Null Providers act as device, link, host producers as well as a sink of flow rules. By using the Null Providers, we bypass Openflow adapters and eliminate potential performance limits from having to use real or emulated Openflow devices.
We also instrumented a number of load generators so that we can generate a high-level of loading from the application or the network interfaces to stretch ONOS's performance limits. These generators include:
- Intent performance generator, “onos-app-intent-perf” that interfaces with intent API, and generates self-adjusting intent install and withdraw operations to the highest rate ONOS can sustain;
- flow rule installer utility python script that interfaces with ONOS flow subsystem to install and remove flow rules in the subsystem;
- a link event (flicker) generator in Null Link providers that sends link up/down descriptions to ONOS core at an elevated rate up to that which ONOS can sustain.
In addition, we utilize meters in "topology-events-metrics" and "intents-events-metrics" apps for some of the timing and rate related tests to capture key event timestamps and processing rates.
We will describe more details on utilizing the generator setup in the individual test cases.
General Test Environments:
A 7-node bare-metal server cluster is set aside for all the experiments. Each server has the following specs:
Dual Intel Xeon E5-2670v2 2.5GHz Processors - 10 real cores/20 hyper-threaded cores per processor;
32GB 1600MHz DDR3 DRAM;
1Gbps Network interface card;
Ubuntu 14.04 OS;
Time synchronization amongst cluster nodes using ptpd.
ONOS specific software environment includes:
- Java HotSpot(TM) 64-Bit Server VM; version 1.8.0_31
- JAVA_OPTS="${JAVA_OPTS:--Xms8G -Xmx8G}"
- onos-1.1.0 snapshot: commit a31e13471ee626abce2bc43c413fab17586f4fc3
Additional case-specific ONOS parameters to be described in specific case.
The following Child Pages will describe further setup details, discuss and analyze the results from each test.