Overview

In ONOS Blackbird release introduced few primitives for managing distributed state. The goal behind this was to make distributed state management and coordination simple and easily accessible to application developers. These primitives cater to different use cases by providing different levels of consistency, redundancy and durability. All these primitives are designed with correctness, usability and performance at scale as key considerations.

EventuallyConsistentMap

An eventually consistent map provides weaker consistency properties in return for superior read/write performance. All reads are performed on local state and all writes update local state and in the background updates to other replicas in the background. Users can configure this map with a ClockService that will be used to time stamp various update events. The timestamps are used to reorder updates that arrive out of order and thereby ensure that the system state across all replicas eventually converges to the correct state.

An eventually consistent map fully replicates all its state. That means each node in the cluster will have a full copy of the map contents. The state is stored in memory on each node and that means a full cluster restart will result in data loss. In Cardinal, we will be introducing an option to persist data to disk so as to survive a full cluster restart.

ConsistentMap

Applications that require stronger consistency guarantees can use the ConsistentMap primitive. ConsistentMap supports java.util.concurrent.ConcurrentMap style conditional update operations and it ensures that all operations for any given key (in the map) are serialized for strong consistency. The underlying protocol that lets us achieve this is Raft. Furthermore, the entire key space (for the map) is partitioned so as to ensure good scale out characteristic. i.e. each key in the ConsistentMap maps to a single partition or shard. Consistency of each shard is maintained via a separate Raft consensus cluster. This ensures that operations on keys mapped to different partitions can proceed independently. In a N node cluster, by default we create N shards. The responsibility for each shard lies with 3 different nodes thereby ensuring shard availability even if one of its nodes fails.

LeadershipService

ONOS has a service to facilitate leader election for arbitrary topics. The service ensures that at any given point in time a single controller node acts as a leader for a given topic. The LeadershipService can at any given point in time facilitate leader election for multiple topics and likewise each controller node can simultaneously be in the leadership race for multiple topics.

ClusterCommunicationService

This service is more basic in its functionality and lets a controller instance communicate with others by making RPCs. A controller can register handlers that get invoked when messages of certain type are received. The service supports various cluster wide communication primitives such as: unicast, multicast and broadcast

ClusterService

This service can be used to discover other nodes in the cluster and their current state (alive or dead)

DistributedSet (available in Cardinal)

As the name suggests this is a data structure that provides set semantics in a distributed context.

AtomicCounter (available in Cardinal)

This is similar to AtomicLong but in a distributed setting. It is useful for vending globally unique counter values.

LogicalClockService (available in Cardinal)

This service is useful for assigning globally consistent time stamps to various events. This will be useful for ordering events in a distributed setting.