Ivan Pepeljnak’s makes an important point in his webinar on Cloud Computing Networking: as a customer, understand the QoS and SLA Guarantees that your public cloud provider offers. Whatever Tenant A does should not impact the performance of Tenant B. At a very minimum, there should be some guarantees on bandwidth, IO operations, and CPU cycles for every tenant. You don’t want to have the noisy neighbor who hogs up resources that leaves you no choice but to reboot your VM with the hope of getting reassigned to a physical server with less load. An AWS Small Instance is an example of an environment where you might encounter this scenario.
Recently I stumbled upon the blog of David Gee in the UK. He covered the Cavium acquisition of Xpliant as well as Broadcom’s announcement of the StrataXGS Tomahawk chipset less than two months later. The remarkable thing about both chipsets is that they are both capable of 3.2 Tbps and feature programmability, something which the Trident II (a 1.28 Tbps chipset) didn’t have. The Trident II is used on Cisco’s Nexus 9000, Juniper’s QFX5100, and HP’s 5930, to name a few switches. There had been great anticipation for the Trident II because it contains support for VXLAN, which the Trident did not. However, the most recent tunnel encapsulation protocol, Generic Network Virtualization Encapsulation (GENEVE), isn’t supported on Trident II. Well, with Tomahawk, as well as Xpliant, because of their programmable nature, they should, in theory.
Broadcom’s press announcement page contains an impressive array of quotes from vendors such as Brocade, Big Switch, Cumulus, HP, Juniper, Pica8, and VMware, to name a few. It remains to be seen what vendors will implement Xpliant.
Earlier this week, news broke out on SDNCentral about a new startup called SocketPlane that integrates Docker containers with Open vSwitch (OVS). Docker is one of the hottest areas in enterprise tech these days. At the OpenStack SV event last month, Mirantis CEO Adrian Ionel, said that Docker had had 20 million downloads in the past four months mainly due to its ease of use and its benefits to developers. He showed a screenshot of Google Trends with ‘Docker’ compared against ‘Virtualization’. That picture is recreated below.
One of the co-founders of SocketPlane is Brent Salisbury, who has a network engineering background in academia before joining Red Hat earlier this year. In recent years he got more involved in the Open Daylight (ODL) project and is arguably the most well known network engineer-turned-coder. His blog has a wealth of information on hands on guides for installing and integrating OVS, OpenStack, and ODL, which I’ve referred to frequently. Two other prominent contributors to ODL, Madhu Venugopal and Dave Tucker, are the other co-founders of SocketPlane.
I had listened to a Class C Block podcast on ODL in November 2013, in which Venugopal and Salisbury spoke at length of their involvement with the project. Definitely worth a listen if you have the time.
Recently, Big Switch Networks earned bragging rights as the first networking vendor to attain the OpenStack Compatible certification. To achieve this status, the requirements are different for hardware and software products. Big Switch demonstrated compatibility with both Nova and Neutron networking environments. There are more details on the Big Switch and Mirantis sites. Big Switch differentiates between the two environments as:
- In a Neutron implementation, the Big Cloud Fabric leverages the BSN ML2 Driver, enabling automation and orchestration of its bare metal, SDN-based Big Cloud Fabric with the OpenStack controller.
- In a Nova implementation, Big Cloud Fabric has optimized configurations and performance enhancements that let it serve as a multi-path leaf/spine CLOS service 4k VLANs to every edge port. Unlike traditional spanning-tree based switching designs, full cross-section bandwidth can be acheived while delivering 4k vlans to every edge port with no performance penalty.
The question I have is that while obviously somebody has to be first, why aren’t there more products and vendors listed? Specifically, how soon will it be before we see HP, the leading contributor to OpenStack on that list?
TL; DR – They just follow the laws of Physics.
Moore’s Law states that the number of transistors per integrated circuit will double every two years. In a recent interview with Marketplace, Intel CEO Brian Krzanich, who is the sixth CEO of the company, expressed hope that Moore’s Law would remain alive on the watch of the next couple of CEOs. Currently, Intel can achieve 14 nm manufacturing processes. Krzanich said with the current technology, they can keep Moore’s Law alive for another 6-10 years.
Recently, I’ve been reading up on the science behind cricket bat manufacturing. Cricket has increasingly become a batsman’s game, highlighted not only by higher scores, but by bigger hits (more fours and sixes being hit). There are a few prevalent theories about bat manufacturing techniques improving and bats becoming heavier. A heavier bat can result in the ball being hit farther. However, as described by bat-maker Chris King, “the material and design have been pushed to their limits. Like Formula One cars, they operate at the outer edges of what’s possible.” Instead, the psychology of bat-owners is where the innovation lies. Bats, which have always been made of willow, aren’t necessarily getting heavier, but rather bigger and lighter, to give a batsman the impression that it’s heavier. Variations in the balance and the weight distribution make a huge difference. And that is what fives the batsman the sense of security and confidence to go for bigger hits. As King says, “What we’re up against is the belief that a big bat is more powerful than a bat of the same weight that’s smaller, which it isn’t. That’s against the laws of physics.”
I decided to jump on the bandwagon of the 30 Blogs in 30 Days challenge. Om Malik, of GigaOm fame, and Greg Ferro of Packet Pushers fame, are already on it. Let’s see if I can produce something original over 140 characters for a month.
For starters, here’s a photo I took yesterday from where I moved two weeks ago – Santa Cruz, California.
Recently I was listening to an episode of Ivan Pepeljnak’s Software Gone Wild podcast featuring Snabb Switch that inspired me to write this post. Snabb Switch is an open source program, developed by Luke Gorrie, for processing virtualized Ethernet traffic for white field deployments using x86 hardware. It caught my attention because the recent announcements of Intel’s networking capabilities at IDF14 were fresh in my mind. Snabb Switch is a networking framework that also defines different building blocks for I/O (such as input/Rx links and output/Tx links), Ethernet interfaces, and packet processing elements leveraging x86 servers and Intel NICs. It speaks natively to Ethernet hardware, Hypervisors, and the Linux kernel by virtue of a user-space executable. The cornerstone of Snabb Switch is its super light footprint, which enables it to process tens of millions of ethernet packets per second per core. Moreover, it has been known to push 200 Gbps on an x86 server. Pretty impressive for an open source program.
Snabb Switch uses the Lua programming language, which is a lightweight scripting language that can make some function calls and change the configuration in real time. It leverages LuaJit, a Just-In-Time compiler that compiles Lua code for x86 in real-time while switching packets. This technology is used in the video games industry as well as high frequency trading in the financial industry, but not very prevalent in the networking industry yet. The biggest exception is CloudFlare, the CDN that optimizes website delivery by blocking DOS attacks.
Snabb Switch rides the wave of the vast improvements in hardware performance on x86 servers and NICs. In a nutshell, networking applications on Linux have been moved out of the kernel and into user space. It used to be that each packet arriving from the network to the NIC of an x86-based Linux server would be sent up to the kernel, which would then have to wake up, via an Interrupt signal, and process them before sending them out on the network. This was a very time-consuming process and it also made it very difficult for application developers to write networking code because it involved intricate knowledge of the kernel. However, with faster hardware, developers realized that with so many packets arriving each microsecond, waking up the kernel to process each packet was too inefficient. Instead, it became more prudent to assume a continuous stream of packets and setting aside a dedicated pool of memory for this traffic. In other words, the NIC is mapped directly with the memory of the user process. Snabb Switch does this by writing their own driver for the NIC (Intel NICs for now) that drives features such as an embedded Ethernet switch and QoS on around 850 lines of Lua code.
Generally speaking, people with networking backgrounds have traditionally assumed x86-based servers to be limited in their packet-processing capabilities (attributed to PCI bus bottlenecks, slow memory, slow CPU, etc). In reality, the raw performance that can be extracted from x86-based hardware is quite high. 800 Gbps can be attained from DRAM banks, 600 Gbps can be attained from PCI Express, and the interconnect between CPUs is also hundreds of Gbps. There is no reason one cannot attain 500 Gbps using a dual core Xeon server. The bottleneck is quite clearly the software. Of course this works best (10 million packets per second per core) for simple cases such as just sending packets in and out. But for slightly more complicated scenarios, such as accessing an unpredictable address in memory, performance can drop by an order of magnitude.
Snabb Switch is known to have generated 200 Gbps out of a single core at just 10% CPU utilization, which is quite incredible. The way that Gorrie did this is by reading in 32,000 packets into a PCAP file, pushing them out on 20 10G NICs, and programming those ports to run in a loop.
The outcome of Snabb Switch is quite similar to Intel’s DPDK, in which there is user space-based forwarding, no Kernel interrupts, and CPUs are dedicated to particular NICs. However, Snabb Switch is a lightweight platform for ground up designs, whereas DPDK is intended to allow developers, who have written applications that run inside the kernel, to port their mature code to user space. For newer application designs, user space development is more prevalent because of the higher traffic levels and performance expectations. Snabb Switch modus operandi is to poll the kernel for new packets to process rather than interrupting it. It runs a scheduler in a polling loop with multiple parallel traffic processes on separate CPUs.
Snabb Switch can also run as a high performance NFV switch for OpenStack environments. The way it can do this is by removing the kernel from the forwarding path and allowing the user space program to talk directly to the device driver on the guest VM. The VMs are only able to address their own memory that they have allocated themselves. A software switch cannot allocate memory to a VM. Instead, for each VM, a separate TX/RX queue in hardware is provisioned in the NIC. So when a VM gives a buffer for packets, the buffer is translated from a standard virtio format (in KVM) directly to hardware format. In other words, when a packet comes in from the network, the NIC determines which VM should get it (typically by looking up the destination MAC address and VLAN ID), picks the appropriate hardware queue with memory that belongs to that VM, grabs a buffer and copies the data from the NIC to that VM. Since Snabb Switch acts as the translation engine between standard virtio and native hardware on the standard Intel NIC, there is no need to write or install a specific device driver for guest VMs to access the hardware.
I believe that Snabb Switch has a lot of promise though it may take a while for deployments to be more mainstream.