Harnessing the Raw Performance of x86 – Snabb Switch

Recently I was listening to an episode of Ivan Pepeljnak’s Software Gone Wild podcast featuring Snabb Switch that inspired me to write this post. Snabb Switch is an open source program, developed by Luke Gorrie, for processing virtualized Ethernet traffic for white field deployments using x86 hardware. It caught my attention because the recent announcements of Intel’s networking capabilities at IDF14 were fresh in my mind. Snabb Switch is a networking framework that also defines different building blocks for I/O (such as input/Rx links and output/Tx links), Ethernet interfaces, and packet processing elements leveraging x86 servers and Intel NICs. It speaks natively to Ethernet hardware, Hypervisors, and the Linux kernel by virtue of a user-space executable. The cornerstone of Snabb Switch is its super light footprint, which enables it to process tens of millions of ethernet packets per second per core. Moreover, it has been known to push 200 Gbps on an x86 server. Pretty impressive for an open source program.

Snabb Switch uses the Lua programming language, which is a lightweight scripting language that can make some function calls and change the configuration in real time. It leverages LuaJit, a Just-In-Time compiler that compiles Lua code for x86 in real-time while switching packets. This technology is used in the video games industry as well as high frequency trading in the financial industry, but not very prevalent in the networking industry yet. The biggest exception is CloudFlare, the CDN that optimizes website delivery by blocking DOS attacks.

Snabb Switch rides the wave of the vast improvements in hardware performance on x86 servers and NICs. In a nutshell, networking applications on Linux have been moved out of the kernel and into user space. It used to be that each packet arriving from the network to the NIC of an x86-based Linux server would be sent up to the kernel, which would then have to wake up, via an Interrupt signal, and process them before sending them out on the network. This was a very time-consuming process and it also made it very difficult for application developers to write networking code because it involved intricate knowledge of the kernel. However, with faster hardware, developers realized that with so many packets arriving each microsecond, waking up the kernel to process each packet was too inefficient. Instead, it became more prudent to assume a continuous stream of packets and setting aside a dedicated pool of memory for this traffic. In other words, the NIC is mapped directly with the memory of the user process. Snabb Switch does this by writing their own driver for the NIC (Intel NICs for now) that drives features such as an embedded Ethernet switch and QoS on around 850 lines of Lua code.

Generally speaking, people with networking backgrounds have traditionally assumed x86-based servers to be limited in their packet-processing capabilities (attributed to PCI bus bottlenecks, slow memory, slow CPU, etc). In reality, the raw performance that can be extracted from x86-based hardware is quite high. 800 Gbps can be attained from DRAM banks, 600 Gbps can be attained from PCI Express, and the interconnect between CPUs is also hundreds of Gbps. There is no reason one cannot attain 500 Gbps using a dual core Xeon server. The bottleneck is quite clearly the software. Of course this works best (10 million packets per second per core) for simple cases such as just sending packets in and out. But for slightly more complicated scenarios, such as accessing an unpredictable address in memory, performance can drop by an order of magnitude.

Snabb Switch is known to have generated 200 Gbps out of a single core at just 10% CPU utilization, which is quite incredible. The way that Gorrie did this is by reading in 32,000 packets into a PCAP file, pushing them out on 20 10G NICs, and programming those ports to run in a loop.

The outcome of Snabb Switch is quite similar to Intel’s DPDK, in which there is user space-based forwarding, no Kernel interrupts, and CPUs are dedicated to particular NICs. However, Snabb Switch is a lightweight platform for ground up designs, whereas DPDK is intended to allow developers, who have written applications that run inside the kernel, to port their mature code to user space. For newer application designs, user space development is more prevalent because of the higher traffic levels and performance expectations. Snabb Switch modus operandi is to poll the kernel for new packets to process rather than interrupting it. It runs a scheduler in a polling loop with multiple parallel traffic processes on separate CPUs.

Snabb Switch can also run as a high performance NFV switch for OpenStack environments. The way it can do this is by removing the kernel from the forwarding path and allowing the user space program to talk directly to the device driver on the guest VM. The VMs are only able to address their own memory that they have allocated themselves. A software switch cannot allocate memory to a VM. Instead, for each VM, a separate TX/RX queue in hardware is provisioned in the NIC. So when a VM gives a buffer for packets, the buffer is translated from a standard virtio format (in KVM) directly to hardware format. In other words, when a packet comes in from the network, the NIC determines which VM should get it (typically by looking up the destination MAC address and VLAN ID), picks the appropriate hardware queue with memory that belongs to that VM, grabs a buffer and copies the data from the NIC to that VM. Since Snabb Switch acts as the translation engine between standard virtio and native hardware on the standard Intel NIC, there is no need to write or install a specific device driver for guest VMs to access the hardware.

I believe that Snabb Switch has a lot of promise though it may take a while for deployments to be more mainstream.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s