Category Archives: Networking Field Day

Deconstructing Big Switch Networks at NFD8

I recently caught up with the presentation made by Big Switch Networks at Networking Field Day 8.

Founder Kyle Forster kicked things off with an introduction to Big Switch. He used the term ‘Hyperscale Data Centers’ to describe the data center of today and tomorrow that Big Switch targets. Big Switch has two products based on the following three principles:

  1. Bare metal switches for Hardware using Broadcom ASICs.
  2. Controller-based design for software
  3. Core-pod architecture replacing the traditional Core-Aggregation-Edge tier.

The two products are:

  1. Big Tap Monitoring Fabric – Taps for offline network monitoring
  2. Big Cloud Fabric – Clos switching fabric

Next up, CTO Rob Sherwood went into more detail. He defined the core-pod architecture as essentially a spine-leaf architecture where there are 16 pods (racks that have servers), at the top of which are two leaves. Each server is dual-connected to a leaf via 10G links. The leaves themselves connect up to spines via 40G links. The leaf switches are 48x10G and 6x40G for uplinks; the spine switches are 32x40G. So the maximum number of spine switches in a pod is 6. (In a leaf-spine fabric every leaf must connect to every spine). That also means a maximum of 32 leaves can connect to a spine. These numbers will definitely increase in future generations of switches when Broadcom can produce them at scale. This solution is targeted at Fortune 1000 companies, not really as much on smaller enterprises. Pods are very modular and can be replaced without disrupting the older network designs.

What I thought was pretty cool was the use of Open Network Installation Environment (ONIE) for provisioning. The switches get shipped to customers from Dell or Accton with a very lightweight OS, then as it turn on the box it netboots from the Controller (an ONIE server). Both Switch Light (which is the Big Switch OS), as well as the relevant configurations, get downloaded from the Controller to the switch. LLDP is used to auto-discover the links in the topology, and management software will tell if there are missing or double-connected links.

In the first demo, the Automated Fabric Visibility Tool was used to first allocate and assign roles in the topology. At that point, any errors in cabling would appear in the GUI, which was pretty user-friendly. The Big Cloud Fabric architecture has a dedicated OOB control/management network that connects to the Controller. Amongst the management primitives are a logical L2 segment (ala VLAN) that have logical ports and end-points, tenants that are logical grouping of L2/L3 networks, and logical routers that are the tenant routers for inter-segment or intra-segment routing. Each logical router corresponds to a VRF. VLAN tags can be mixed and matched and added into bridged domains. The use case would be analogous to a multi-tenant environment in each ESX instance, when you declare egress VLAN tags on vswitch in VMware deployments. You have the choice of specifying the tag as global fabric or local to the vswitch. Interestingly, Big Switch used to have an Overlay product two years ago and ended up tossing it away (because they feel they are L2 solutions only, not L3 solutions) to come up with the current solution because they believe it uses the hardware the way it was designed to be used.

The next demo was to create tenants, assign servers and VMs to logical segments by VLAN, physical ports, or port-groups to meet a use case of a common two-tier application.

The fabric in Big Cloud Fabric is analogous to a sheet metal chassis-based fabric that has fabric backplanes, line cards, and supervisors/management modules in that the spine switches are the backplanes, the leaf switches are the line cards, and the Controllers are the supervisors. The analogies actually don’t end with the components. Sherwood explained that traditional chassis switch vendors use proprietary protocols between their backplanes and their line cards that is actually Ethernet and is, therefore, no different from the OOB management network between spine switches and leaf switches. The control planes and the data planes in Big Cloud Fabric are completely decoupled so that in the event of the management switch completely going down, you only lose the ability to change and manage the network. So for example, if a new server comes up, routes for that host don’t get propagated. Of course, if both supervisors in a Nexus 7K go down, the whole switch stops working. If both Controllers go down simultaneously, the time needed to bring up a third Controller is about 5 minutes.

Big Cloud Fabric is based on OpenFlow with extensions. The white box switches that Big Switch sells have Broadcom ASICs that have several types of forwarding tables (programmable memory that can be populated). Some of the early implementations of OpenFlow only exposed the ACL table (which had only about 2000 entries), which didn’t scale well. The way Big Switch implements OpenFlow in Switch Light OS is to expose an L2 table and an L3 table, each with over 100,000 entries. They couldn’t go into more details as they were under NDA with Broadcom. Switch Light OS is Big Switch’s Indigo OpenFlow Agent running on Open Network Linux on x86 or ASIC-based hardware. Whereas traditional networks have clear L2/L3 boundaries in terms of devices, in Big Cloud Fabric L3 packets are routed on the first hop switch. If a tenant needs to talk to another tenant, packets go through a system router, which resides only on a spine switch.

Next up was Service Chaining and Network Functions Virtualization support. Service Chaining is implemented via next-hop forwarding. For example, at a policy level, if one VM or app needed to talk to another app, it could be passed through a service such as a load balancer or firewall (while leveraging the ECMP bits of the hardware) before reaching the destination. The demo showed how to create a policy and then, with a firewall service example, how to apply that policy, which is known as service insertion to an ECMP group. However, it is early days for this NFV use case and for more elaborate needs such as health monitoring, the recommendation is to use OpenStack. Interestingly, Big Switch integrates with OpenStack, but not VMware at this time (it is on the roadmap though).

Operational Simplicity was next on the agenda. Here Big Switch kicked off with the choice of control plane APIs to the user – CLI, GUI, or REST, which, generally speaking, would appeal to network engineers, vCenter administrators, and DevOps personnel respectively. High Availability is designed so that reactions to outages are localized as much as possible. For example, the loss of a spine only reduces capacity, the loss of a leaf is covered by the other leaf in the same rack (thanks to a dedicated link between the two) that has connections to the same servers (so the servers failover to the other leaf via LACP). The Hitless Upgrade process is truly hitless from an application perspective (a few milliseconds of data packets are lost) though capacity is reduced. A feature called Test Path shows the logical (at a policy level) as well as physical path a host takes to reach another host.

The final session was on the Network Monitoring features of Big Switch, namely Big Tap Monitoring Fabric. Sunit Chauhan, head of Product Marketing, said that the monitoring infrastructure is developed using the same bare metal switches that is managed from the same centralized controller. The goal is to monitor and tap every rack and ideally every vswitch. In a campus network that means the ability to filter traffic from all locations to the tools. The Big Tap Monitoring Controller is separate from the Big Cloud Fabric Controller and runs Switch Light OS as well. The example he gave was of a large mobile operator in Japan that needed thousands of taps. The only scalable (in terms of cost and performance) solution to to monitor such a network was to use bare metal Ethernet switches that report to a centralized SDN Controller.

The Big Tap Monitoring demo was based off a common design with a production network (which could be Big Cloud Fabric or traditional L2/L3 networks) with filter ports connected to a Big Tap Controller, which was then connected via delivery ports to the Tool Farm, where all the visibility tools existed.Of course Big Switch eats its own dogfood like every noble startup by deploying Big Tap Monitoring Fabric in its own office. They were able to capture the actual video stream of the NFD event that went out to the Internet from their office. Remote Data Center Monitoring is also supported now (though not demonstrable at NFD8), which reminded me of RSPAN except that this used L2-GRE tunnels.

A few afterthoughts: Big Switch used the marketing term ‘hyperscale data center’ like it was an industry standard and they gave examples of networks that were not hyperscale without explaining how they weren’t. In fact there was a slide that was dedicated to terminology used in a demo, but ‘hyperscale’ was not there. It reminded me of my days in a startup that used that same term in its marketing headlines without ever defining it.

From a personal perspective, in 2010 I worked as a Network Engineer in a large financial exchange where the Big Tap Monitoring Fabric would have been invaluable. Any time a trade was delayed by a couple of seconds resulted in potentially millions of dollars. The exchange would be spared that penalty if it could be proved that the delay was due to the application or the remote network and not the exchange’s own network. At that time we used network monitoring switches to determine where the delay occurred. But the location of those taps was critical. Moreover, it was just not scalable to have taps at every location off of every port. Since it was a reactive (troubleshooting) effort, it was really a Whac-a-Mole exercise. Ultimately, we went with a vendor that built the infrastructure to collect latency data from exchanges, and then offered the results to firms to allow them to monitor data and order execution latency on those markets. But it was expensive and those investments were between $10 and $15 million and ultimately that vendor went out of business. A solution like Big Tap Monitoring Fabric would have been a godsend. If Big Switch can figure out how to keep their costs down, they may have a huge opportunity in hand.

Tech Field Day events are intended to be very technical and this was no different. Slides with Gartner Magic Quadrants are usually met with rolling eyeballs, but I think Big Switch can be forgiven for having one reference to an industry analyst. Apparently according to Dell ‘Oro, in 2013 more ports were shipped from bare metal switches (from vendors such as Dell, Accton, and Quanta) that from Arista, Juniper, and Extreme combined!

While Big Cloud Fabric competes against the Cisco Nexus 7K product line, Big Tap Monitoring goes head to head against Gigamon. It was very refreshing to see a startup take on two behemoths, sheerly with clever engineering and nimble principles.

Advertisement

Deconstructing Nuage Networks at NFD8

I enjoy Tech Field Day events for the independence and sheer nerdiness that they bring out. Networking Field Day events are held twice a year. I had the privilege of presenting the demo for Infineta Systems at NFD3 and made it through unscathed. There is absolutely no room for ‘marketecture’. When you have sharp people like Ivan Pepeljnak of ipSpace fame and Greg Ferro of Packet Pushers fame questioning you across the protocol stack, you have to be on your toes.

I recently watched the videos for NFD8. This blog post is about the presentation made by Nuage Networks. As an Alcatel-Lucent venture, Nuage focuses on building an open SDN ecosystem based on best of breed. They had also presented last year at NFD6.

To recap what they do, Nuage’s key solution is Virtualized Services Platform (VSP), which is based on the following three virtualized components:

  • The Virtualized Services Directory (VSD) is a policy server for high level primitives from Cloud Services. It gets service policies from VMware, OpenStack, and CloudStack and also has a builtin business logic and analytics engine based on Hadoop.
  • The Virtualized Services Controller (VSC) is the control plane. It is based on ALU Service Router OS, which was originally developed 12-13 years ago and is deployed in 300,000 routers, now stripped to be relevant as an SDN Controller. The scope of Controller is a domain, but it can be extended to multiple domains or data centers via a BGP-MP federation, thereby supporting IP Mobility. A single availability domain has a single data center zone. High availability domains have two data center zones. A VSC is a 4-core VM with 4 GB memory. VSCs act as clients of BGP route reflectors in order to extend network services.
  • The Virtual Routing and Switching module (VRS) is the Data Path agent that does L2-L4 switching, routing, and policies. It integrates to VMware via ESXi, XEN via XAPI, and KVM via libvirt. The libvirt API exposes all the resources needed to manage the support of VMs. (As a side, you can see how it comes into play in this primer on OVS 1.4.0 installation I wrote a while back.) The VRS gets the full profile of the VM from the hypervisor and reports that to the VSC. The VSC then downloads the policy from the VSD and implements them. These could be L2 FIBs, L3 RIBs/ACLs, and/or L4 distributed firewall rules. For VMware, VRS is implemented as a VM with some hooks because ESXi has a limitation of 1M pps.

At NFD8, Nuage discussed a recent customer win that demonstrates its ability to segment clouds. The customer was a Canadian Cloud Service Provider (CSP), OVH, that has deployed 300,000 servers in its Canadian DCs. OVH’s customers can, as a beta service offering, launch their own clouds. In other words, it is akin to Cloud-as-a-Service with the Nuage SDN solution underneath. It’s like a wholesaler of cloud services whereby multiple CSPs could businesses could run their own OpenStack cloud without building it themselves. Every customer of this OVH offering would be running independent Nuage’s services. Pretty cool.

Next came some demos that address following 4 questions about SDN:

  1. Is proprietary HW needed? The short answer is NO. The demo showed how to achieve Hardware VTEP integration. In the early days of SDN, overlay gateways proved to be a challenge because they were needed to go from the NV domain to the IP domain. As a result VLANs needed to be manually configured between server-based SW gateways and the DC routers – a most cumbersome process. The Nuage solution solves that problem by speaking routing language, uses standard RFC 4797 (GRE encapsulation) on its dedicated TOR gateway to tunnel VXLAN to routers. As covered in NFD6, Nuage has three approaches to VTEP Gateways:
    1. Software-based – for small DCs with up to 10 Gbps
    2. White box-based – for larger DCs based on standard L2 OVSDB schema. In NFD8, two partner gateways were introduced – Arista and the HP 5930. Both feature L2 at this point only, but will get to L3 at some point.
    3. High performance-based (7850 VSG) – 1 Tbps L3 gateway using merchant silicon, and attaining L3 connectivity via MP-BGP
  2. How well can SDN scale?
    The Scaling and Performance demo explained how scaling in network virtualization is far more difficult than scaling in server virtualization. For example, the number of ACLs needed grows quadratically as the number of web servers or database servers increases linearly. The Nuage solution breaks down ACLs into abstractions or policies. I liken this to an Access Control Group, whereby ACLs fall under an Access Control Group. Another way of understanding this is Access Control Entries being part of an Access Control List (for example, an ACL for all web servers or an ACL for all database servers) so that the ACL is more manageable. Any time a new VM is added, it is a new ACE. So, policies are pushed, rather than individual Access Control Entries, which scales much better. Individual VMs are identified by tagging routes, which is accomplished by, you guessed it right, BGP communities (these Nuage folks sure love BGP!).
  3. Can it natively support any workload? The demo showed multiple workloads including containers in their natural environments without being VMs, i.e. bare metal. Nuage ran their scalability demo on AWS with 40 servers. But instead of VMs, they used Docker containers. Recently, there has been a lot of buzz around Linux containers, especially Docker. The advantage containers hold over VMs is that they have much lower overhead (by sharing certain portions of the host kernel and operating system instance), allow for only a single OS to be managed (albeit Linux on Linux), have better hardware utilization, and have quicker launch times than VMs. Scott Lowe has a good series of writeups on containers and Docker on his blog. Also, Greg Ferro has a pretty detailed first pass on Docker. Nuage CTO Dimitri Stiliadis explained how containers are changing the game as short-lived application workloads are becoming increasingly prevalent. The advantages that Docker brings, as he explained, is to move the processing to the data rather than the other way round. Whereas typically you’d see no more than 40-50 VMs on a physical server, the Nuage demo had 500 Docker container instances per server. So there were 20,000 container instances total. And they showed how to bring them up along with 7 ACLs per container instance (140K ACLs total) in just 8 minutes. That’s 50 containers or VMs per second! For reference, in the demo, they used an AWS c3.4xlarge instance (which has 30GB memory) for the VSD, a c3.2xlarge for the VSC, and 40 c3.xlarge instances for the ‘hypervisors’ where the VRS agents ran. The Nuage solution was able to successfully respond to the rapid and dynamic connectivity requirements of containers. Moreover, since the VRS agent is at the process level (instead of the host levels with VMs), it can implement policies at a very fine control. Really impressive demo.
  4. How easily can applications be designed?
    The Application Designer demo here showed how to bridge the gap between app developers, and infrastructure teams by means of high level policies to make application deployment really easy. In Packet Pushers Show 203, Martin Casado and Tim Hinrichs discussed their work in OpenStack Congress, which attempts to formalize policy-based networking so that a Policy Controller can abstract high level, human-readable primitives (which could be HIPAA, PCI, or SOX as an example), and express them in a language to an SDN Controller. Nuage confirmed that they contribute to Congress. The Nuage demo defined application tiers and showed how to deploy an WordPress container application along with a backend database in seconds. Another demo integrated OpenStack Neutron with extensions. You can create templates to have multiple instantiations of the applications. Another truly remarkable demo.

To summarize, the Nuage solution seems pretty solid and embraces open standards, not for the sake of lip service, but to solve actual customer problems.