This page is outdate, please refer to javadoc of latest ONOS master 

 

Team

­Name

Organization

Email

Damian O'Neill

BTI  Systems

doneill@btisystems.com

Kieran McPeake

BTI  Systems

kmcpeake@btisystems.com

Hayden Shorter

BTI  Systems

hshorter@btisystems.com

Overview

This project adds Fault Management of Network Elements (NEs) to ONOS.

When a fault or event occurs, a NE will typically send a notification to the network operator via SNMP. The network operator may also (or alternatively) poll the NE to retrieve this information. An alarm is a persistent indication of a fault that clears only when the triggering condition has been resolved. This proposal outlines a solution so that ONOS will provide support for such alarms.

For information on Fault Management as it pertains to NETCONF, refer to NETCONF Fault Management.

Proposed work

Fault Management is in ONOS terms: a service, i.e. "a unit of functionality that is comprised of multiple components that create a vertical slice through the tiers as a software stack"

Two layers are will be updated to provide fault management:

For context, the following diagram taken from ONOS design wiki, illustrates the relationship between the ONOS layers.

Usage

1) Use normal steps for Installing and running ONOS locally or remotely as required.

2) Deploy the relevant applications. (Example below shows ONOS running as a standalone instance on to remote host)

[sdn@psm3 ~]$/opt/onos/bin/onos-service 
onos> app activate org.onosproject.snmp
onos> app activate org.onosproject.faultmanagement

3) SNMP devices are seeded via config file. The default seed file contains connection details for devices (SNMP agents) available via internet e.g. demo.snmplabs.com

cp /opt/onos/apache-karaf-3.0.3/etc/samples/org.onosproject.provider.snmp.device.impl.SnmpDeviceProvider.cfg   /opt/onos/apache-karaf-3.0.3/etc/

4) ONOS will poll these SNMP devices and store their alarms.

5) User will be able to manipulate the alarms via

More details on each of these interfaces in sections below.

 

SNMP Provider

A new SNMP southbound provider will be available.

A new generic provider provides SNMP communication with NEs. It is a southbound plugin handling SNMP 2c bi-directional interaction with NEs. It uses strongly-typed APIs generated automatically from NEs' SNMP MIB files. It has a core based on SNMP4J (Reference #1) At runtime, ONOS uses this, plus additional NE-specific automatically-generated jar libraries to provide the strongly-typed NE-specific programmatic interface. It is deployed as a Java OSGi application with bundles for

Some NEs use an SNMP trap-based mechanism to communicate occurrence of NE database changes to a SNMP manager to reduce amount of SNMP polling required.

SNMP SETs are supported so the solution can support NEs which support SNMP SETs as a registration/deregistration mechanism for their SNMP trap listeners.

Note: trap notifications (whether fault or configuration related) are an optimisation: using them, ONOS can receive faults before its next poll interval, but polling is the only guaranteed mechanism to have a correct picture of the NEs’ faults - in particular when faults pre-date management by ONOS or when the ONOS or network goes offline temporarily. Usually alarm-polling will be implemented first for a NE variant.

A set of standard MIB specific libraries will be supplied by default (allowing SNMP interaction with e.g. MIB-II compliant NEs).

In addition a mechanism will be provided to allow management of other NEs (with either standards-based or vendor-specific MIBs). A mechanism will be provided to allow generation of a Java library for such NEs in an offline step.

Fault Management (FM) Application

A new ONOS (Fault Management) app will track the state of alarms on a device.

It will register its interest in alarms with the Alarm Provider mentioned above. It is abstracted from provider implementation details i.e. is not aware of SNMP.

In future it is expected the NETCONF provider will also be updated to support alarm retrieval/events (but that is not in scope of current work). The Alarm Provider (e.g. SNMP variant) hides for a particular NE type the MIB/Vendor-specific mappings from fault tables and fault notifications. All communication between SNMP provider and ONOS Core (Fault Management) uses the Provider Service interface.

It will include ‘recently cleared’ alarms but these will get purged regularly.  A NE will also have its alarms purged if it is deleted from ONOS i.e. undiscovered.

The Fault Management  application provides several mechanisms to access and update its stored NE alarms -

REST API

Users may retrieving current alarms with various query parameters and update some attributes.

Here is the swagger REST document for alarms API.

CLI

 

Here is a CLI example:

GUI

Topology View

To enable an alarms overlay in the topology view, enable the 'Alarm Overlay' button. It is highlighted in bottom left in screenshot below.

This adds Total Alarm Count for all devices and for Individual Devices Alarm Counts.

A device mouse-action tool-tip can be enabled via a keyboard shortcut to show individual alarm count for the decorated devices.

If a device is clicked, a popup with Alarms Summary for that device will appear in bottom right corner. That popup will have extra buttons to navigate to 'All' or 'Device-specific' alarm tabular views.

Only total counts currently shown, but counts-by-severity may be added later.

 

Tabular Alarm View

Tabular view showing all alarms may be accessed via buttons mentioned above or the main ONOS menu.

If launched from topo view's device specific button, the list will be filtered for the required device.

Selecting a row gives a popup dialog with more details on select alarm, as shown below.

 

GUI Alarms Table View

 

A screen cast of the User Interface updates can be viewed here. 

Alarms Model

 

The persisted alarms model is as follows:

Field

Notes

Id

Unique alarm identifier allocated by ONOS.

Acknowledged?

Set to true if a ONOS-user has acknowledged this alarm. Default is false.

Description

From NE e.g. “Equipment Missing” or generated by ONOS internally, e.g. "NE is unreachable"

Device identify

DeviceId

Source

AlarmEntityId. An entity within the context of this alarm's device.E.g. port:1/11/2/1

Optional - since not used if deviceId sufficiently identifies the location.

Is Service Affecting?

As defined by ITU recommendation X.733.

Severity

As specified in ITU recommendation X.733, i.e.

indeterminate/critical/major/minor/warning/cleared.

Time Raised

The time when raised (if supplied on NE) else time when fault discovered (either by poll or notification)

Time Updated

Returns time at which the alarm was updated most recently, due to some change in the device, or ONOS.

If the alarm has been cleared, this is the time at which the alarm was cleared.

Time Cleared

If applicable.

Raising Notification Id

If applicable. Not applicable if discovered by poll.

Clearing Notification Id

As above.

User Assigned

ONOS-user (if any) to whom this alarm has been assigned.

 

A future FM release may support persisting faults over longer timeframes (including those related to NEs that are no longer managed) so that historical data is made available. Support for historical data mining is excluded from this release.

Project Plan

Terminology

There are many resources online giving overview and definitions for fault management. We will use same definitions as the IETF Alarm MIB RFC 3877; whilst do not want to repeat that document the following extract may be a helpful.

Other terminology used in this proposal:

References

1.     http://www.snmp4j.org/ SNMP4J is an enterprise class free open source and state-of-the-art SNMP implementation for Java™ SE 1.4 or later*. SNMP4J supports command generation (managers) as well as command responding (agents). Its clean object oriented design is inspired by SNMP++, which is a well-known SNMPv1/v2c/v3 API for C++