Onix SDN Paper Review

ReviewNetwork


Teemu Koponen, Martin Casado, Natasha Gude, Jeremy Stribling, Leon Poutievski, Min Zhu, Rajiv Ramanathan, Yuichiro Iwata, Hiroaki Inoue, Takayuki Hama, and Scott Shenker. 2010. Onix: a distributed control platform for large-scale production networks. In Proceedings of the 9th USENIX conference on Operating systems design and implementation (OSDI'10). USENIX Association, USA, 351–364.

Problem

How to create a control platform that meets all five paradigms of SDN: generality, scalability, reliability, simplicity, and control plane performance?

Introduction

The control platform allows the network system implementors to create control policies easily that are extensible to manage all sizes of clusters and applicable to different scenarios. It is extremely important for constructing SDN.

Previous Work

RCP, SANE, Ethane, 4D project are all previous implementations of the control platform but they could not be generalized to all types of work. NOX is general-purposed but does not provide options related to scalability. Previous work has also attempted to build the control platform distributedly on the end-host.

Implementation

This paper proposes a control platform that enables control implementors to create a range of applications through a more general API. Additionally, Onix provides distributed gadgets and different types of storage options. Specifically, different types of storage options allow implementors to decide the tradeoff between consistency and availability for their applications. Finally, Onix is not restricted to be running on switches, which is innovative given the fact that servers today have more power.

Onix failure is handled by service discovery platform Zookeeper. The system defines two types of storage options allowing partitioning and aggregation for state distribution: one is for performance and implemented using DHT allowing for triggers; another is for consistency and implemented as a transactional persistent database. To handle data inconsistency for the preference on performance, control policy implementors could dictate their solution by extending the entity class.

Insights

The paper does not introduce a new design. Instead, it adapted designs in distributed systems to the network control program. Specifically, the network information base is introduced as a distributed storage system that could be used to store the state quo in the network and provide information regarding control policies.

Evaluation 

The system is evaluated both by microbenchmark and in real-world scenarios for scalability and reliability. For scalability, the authors show that it is qualified by the threading and RPC performance. For reliability, the authors show that both link failure (4 times extra time in the worst case) and end-to-end failure in the real-world (the median overhead is 120ms) would not lead to too much overhead.

It does not fully make network control policy easy to implement. The implementors still have to consider the tradeoff between consistency and performance. Additionally, the implementation is not optimized and future work could focus on threading library, etc.

Questions for Authors

  1. How does its performance compared to hardware-based control policy implementation?
  2. How much is the difference between DHT and persistent database? How much overhead would occur if implementors only use persistent database?
  3. Is it possible to automatically determine the storage option in a JIT style?