I enjoy Tech Field Day events for the independence and sheer nerdiness that they bring out. Networking Field Day events are held twice a year. I had the privilege of presenting the demo for Infineta Systems at NFD3 and made it through unscathed. There is absolutely no room for ‘marketecture’. When you have sharp people like Ivan Pepeljnak of ipSpace fame and Greg Ferro of Packet Pushers fame questioning you across the protocol stack, you have to be on your toes.
I recently watched the videos for NFD8. This blog post is about the presentation made by Nuage Networks. As an Alcatel-Lucent venture, Nuage focuses on building an open SDN ecosystem based on best of breed. They had also presented last year at NFD6.
To recap what they do, Nuage’s key solution is Virtualized Services Platform (VSP), which is based on the following three virtualized components:
- The Virtualized Services Directory (VSD) is a policy server for high level primitives from Cloud Services. It gets service policies from VMware, OpenStack, and CloudStack and also has a builtin business logic and analytics engine based on Hadoop.
- The Virtualized Services Controller (VSC) is the control plane. It is based on ALU Service Router OS, which was originally developed 12-13 years ago and is deployed in 300,000 routers, now stripped to be relevant as an SDN Controller. The scope of Controller is a domain, but it can be extended to multiple domains or data centers via a BGP-MP federation, thereby supporting IP Mobility. A single availability domain has a single data center zone. High availability domains have two data center zones. A VSC is a 4-core VM with 4 GB memory. VSCs act as clients of BGP route reflectors in order to extend network services.
- The Virtual Routing and Switching module (VRS) is the Data Path agent that does L2-L4 switching, routing, and policies. It integrates to VMware via ESXi, XEN via XAPI, and KVM via libvirt. The libvirt API exposes all the resources needed to manage the support of VMs. (As a side, you can see how it comes into play in this primer on OVS 1.4.0 installation I wrote a while back.) The VRS gets the full profile of the VM from the hypervisor and reports that to the VSC. The VSC then downloads the policy from the VSD and implements them. These could be L2 FIBs, L3 RIBs/ACLs, and/or L4 distributed firewall rules. For VMware, VRS is implemented as a VM with some hooks because ESXi has a limitation of 1M pps.
At NFD8, Nuage discussed a recent customer win that demonstrates its ability to segment clouds. The customer was a Canadian Cloud Service Provider (CSP), OVH, that has deployed 300,000 servers in its Canadian DCs. OVH’s customers can, as a beta service offering, launch their own clouds. In other words, it is akin to Cloud-as-a-Service with the Nuage SDN solution underneath. It’s like a wholesaler of cloud services whereby multiple CSPs could businesses could run their own OpenStack cloud without building it themselves. Every customer of this OVH offering would be running independent Nuage’s services. Pretty cool.
Next came some demos that address following 4 questions about SDN:
- Is proprietary HW needed? The short answer is NO. The demo showed how to achieve Hardware VTEP integration. In the early days of SDN, overlay gateways proved to be a challenge because they were needed to go from the NV domain to the IP domain. As a result VLANs needed to be manually configured between server-based SW gateways and the DC routers – a most cumbersome process. The Nuage solution solves that problem by speaking routing language, uses standard RFC 4797 (GRE encapsulation) on its dedicated TOR gateway to tunnel VXLAN to routers. As covered in NFD6, Nuage has three approaches to VTEP Gateways:
- Software-based – for small DCs with up to 10 Gbps
- White box-based – for larger DCs based on standard L2 OVSDB schema. In NFD8, two partner gateways were introduced – Arista and the HP 5930. Both feature L2 at this point only, but will get to L3 at some point.
- High performance-based (7850 VSG) – 1 Tbps L3 gateway using merchant silicon, and attaining L3 connectivity via MP-BGP
- How well can SDN scale?
The Scaling and Performance demo explained how scaling in network virtualization is far more difficult than scaling in server virtualization. For example, the number of ACLs needed grows quadratically as the number of web servers or database servers increases linearly. The Nuage solution breaks down ACLs into abstractions or policies. I liken this to an Access Control Group, whereby ACLs fall under an Access Control Group. Another way of understanding this is Access Control Entries being part of an Access Control List (for example, an ACL for all web servers or an ACL for all database servers) so that the ACL is more manageable. Any time a new VM is added, it is a new ACE. So, policies are pushed, rather than individual Access Control Entries, which scales much better. Individual VMs are identified by tagging routes, which is accomplished by, you guessed it right, BGP communities (these Nuage folks sure love BGP!).
- Can it natively support any workload? The demo showed multiple workloads including containers in their natural environments without being VMs, i.e. bare metal. Nuage ran their scalability demo on AWS with 40 servers. But instead of VMs, they used Docker containers. Recently, there has been a lot of buzz around Linux containers, especially Docker. The advantage containers hold over VMs is that they have much lower overhead (by sharing certain portions of the host kernel and operating system instance), allow for only a single OS to be managed (albeit Linux on Linux), have better hardware utilization, and have quicker launch times than VMs. Scott Lowe has a good series of writeups on containers and Docker on his blog. Also, Greg Ferro has a pretty detailed first pass on Docker. Nuage CTO Dimitri Stiliadis explained how containers are changing the game as short-lived application workloads are becoming increasingly prevalent. The advantages that Docker brings, as he explained, is to move the processing to the data rather than the other way round. Whereas typically you’d see no more than 40-50 VMs on a physical server, the Nuage demo had 500 Docker container instances per server. So there were 20,000 container instances total. And they showed how to bring them up along with 7 ACLs per container instance (140K ACLs total) in just 8 minutes. That’s 50 containers or VMs per second! For reference, in the demo, they used an AWS c3.4xlarge instance (which has 30GB memory) for the VSD, a c3.2xlarge for the VSC, and 40 c3.xlarge instances for the ‘hypervisors’ where the VRS agents ran. The Nuage solution was able to successfully respond to the rapid and dynamic connectivity requirements of containers. Moreover, since the VRS agent is at the process level (instead of the host levels with VMs), it can implement policies at a very fine control. Really impressive demo.
- How easily can applications be designed?
The Application Designer demo here showed how to bridge the gap between app developers, and infrastructure teams by means of high level policies to make application deployment really easy. In Packet Pushers Show 203, Martin Casado and Tim Hinrichs discussed their work in OpenStack Congress, which attempts to formalize policy-based networking so that a Policy Controller can abstract high level, human-readable primitives (which could be HIPAA, PCI, or SOX as an example), and express them in a language to an SDN Controller. Nuage confirmed that they contribute to Congress. The Nuage demo defined application tiers and showed how to deploy an WordPress container application along with a backend database in seconds. Another demo integrated OpenStack Neutron with extensions. You can create templates to have multiple instantiations of the applications. Another truly remarkable demo.
To summarize, the Nuage solution seems pretty solid and embraces open standards, not for the sake of lip service, but to solve actual customer problems.