Tag Archives: ivan pepelnjak

Viptela SEN – DMVPN Done Right

Recently I had the treat of listening to two Layer 3 routing protocol maestros when the CTO of the startup Viptela, Khalid Raza, appeared on Ivan Pepelnjak’s Software Gone Wild podcast. Interestingly, the first time I had ever heard of Khalid or Ivan was through the Cisco Press books that they each authored. Ivan had the famous ‘MPLS and VPN Architectures‘ and Khalid, one of the first CCIEs, wrote ‘CCIE Professional Development: Large Scale IP Network Solutions‘, (which I owned an autographed copy of).

In a nutshell, Viptela’s Secure Extensible Network (SEN) creates hybrid connectivity (VPNs) across the WAN. Their target market is any large retailer or financial company, that has many branches. Khalid and the founder Amir Khan (of Juniper MX product line fame), come from super strong Layer 3 background and, consequently, they don’t purport to have a revolutionary solution. Instead, they have harnessed that background to improve on what DMVPN has been attempting to solve for the past 10 years. In Khalid’s words, they have “evolved MPLS concepts for overlay networks”.

Viptela SEN comprises a controller, VPN termination endpoints, and a proprietary protocol that is heavily inspired by BGP. In fact, one of the advisors of Viptela is Tony Li, author of 29 RFCs (mostly BGP-related), and one of the main architects of BGP. Viptela SEN can discover local site characteristics (such as the IGP) and report them to the controller, which then determines the branch’s connectivity policy. So it essentially reduces the number of control planes, which reduces the number of configurations for the WAN. This looks incredibly similar to what DMVPN sought out to do a decade ago. Viptela calls these endpoints dataplane points, but they still run routing protocols, so to me they’re just routers.

DMVPN, itself, started as a Cisco proprietary solution, spearheaded by Cisco TAC, in particular a gentleman by the name of Mike Sullenberger, who served as an escalation engineer. He has since coauthored an IETF draft on DMVPN. In fact, one of the earliest tech docs on cisco.com touts how ‘for a 1000-site deployment, DMVPN reduces the configuration effort at the hub from 3900 lines to 13 lines’.

Getting back to Viptela SEN, the endpoints (aka routers) authenticate with the controller (through exchange of certificates). Different circuits from different providers (MPLs or broadband) can be balanced through L3 ECMP. Their datapath endpoints are commodity boxes with Cavium processors that can give predictable (AES-256) encryption performance that tunnel to other endpoints (via peer-to-peer keys) as prescribed by the orchestrator/controller. In the event of a site-controller failures, if a site still has dataplane connectivity to another site that it needs to communicate with, then traffic can still forward (provided the keys are still valid) and all is well though the entries are stale.

One of the differentiators between Viptela and others in this space is that they do not build overlay subnet-based routing adjacencies. This allows them to offer each line of business in a large company to have a network topology that is service driven rather than the other way round. Translated in technical terms, each line of business effectively has a VRF with different default routes, but a single peering connection to the controller. In DMVPN terms, the controller is like the headend router, or hub. The biggest difference that I could tell between Viptela SEN and DMVPN is the preference given to L3 BGP over L2 NHRP. One of the biggest advantages of BGP has always been the outbound attribute change in the sense that a hub router could manipulate, via BGP MED, how a site could exit an AS. It is highly customizable. For example, majority of the sites could exit via a corporate DMZ while some branches (like Devtest in an AWS VPC) could exit through a regional exit point. In DMVPN, NHRP (which is a L2 ARP-like discovery protocol) has more authority and doesn’t allow outbound attribute manipulation which BGP, a L3 routing protocol has been doing successfully throughout the Internet for decades. NHRP just isn’t smart enough to provide that level of control-plane complexity.

Viptela SEN allows for each site to have different control policies – it could be a control plane path that says

The flexibility that Viptela SEN extends to a site can be at a control plane path level (e.g. ensure that certain VPNs trombone through a virtual path or service point like a firewall or IDS before exiting, as done in NFV with service chaining ) or data plane level (e.g. PBR). Since it promises easy bring-up and configuration, to alleviate concerns about SOHO endpoint boxes being stolen, they have a GPS installed in these lower end boxes. The controller only allows these boxes to authenticate with it if they are in the prescribed GPS coordinates. If the box is moved, it is flagged as a potentially unauthorized move and second-factor authentication is required in order to be considered as permissible. The controller can permit this but silently monitor the activities of this new endpoint box without its knowledge, akin to a honeypot. That’s innovation!

QoS and SLA Guarantees in the Cloud

Ivan Pepeljnak’s makes an important point in his webinar on Cloud Computing Networking: as a customer, understand the QoS and SLA Guarantees that your public cloud provider offers. Whatever Tenant A does should not impact the performance of Tenant B. At a very minimum, there should be some guarantees on bandwidth, IO operations, and CPU cycles for every tenant. You don’t want to have the noisy neighbor who hogs up resources that leaves you no choice but to reboot your VM with the hope of getting reassigned to a physical server with less load. An AWS Small Instance is an example of an environment where you might encounter this scenario.

Harnessing the Raw Performance of x86 – Snabb Switch

Recently I was listening to an episode of Ivan Pepeljnak’s Software Gone Wild podcast featuring Snabb Switch that inspired me to write this post. Snabb Switch is an open source program, developed by Luke Gorrie, for processing virtualized Ethernet traffic for white field deployments using x86 hardware. It caught my attention because the recent announcements of Intel’s networking capabilities at IDF14 were fresh in my mind. Snabb Switch is a networking framework that also defines different building blocks for I/O (such as input/Rx links and output/Tx links), Ethernet interfaces, and packet processing elements leveraging x86 servers and Intel NICs. It speaks natively to Ethernet hardware, Hypervisors, and the Linux kernel by virtue of a user-space executable. The cornerstone of Snabb Switch is its super light footprint, which enables it to process tens of millions of ethernet packets per second per core. Moreover, it has been known to push 200 Gbps on an x86 server. Pretty impressive for an open source program.

Snabb Switch uses the Lua programming language, which is a lightweight scripting language that can make some function calls and change the configuration in real time. It leverages LuaJit, a Just-In-Time compiler that compiles Lua code for x86 in real-time while switching packets. This technology is used in the video games industry as well as high frequency trading in the financial industry, but not very prevalent in the networking industry yet. The biggest exception is CloudFlare, the CDN that optimizes website delivery by blocking DOS attacks.

Snabb Switch rides the wave of the vast improvements in hardware performance on x86 servers and NICs. In a nutshell, networking applications on Linux have been moved out of the kernel and into user space. It used to be that each packet arriving from the network to the NIC of an x86-based Linux server would be sent up to the kernel, which would then have to wake up, via an Interrupt signal, and process them before sending them out on the network. This was a very time-consuming process and it also made it very difficult for application developers to write networking code because it involved intricate knowledge of the kernel. However, with faster hardware, developers realized that with so many packets arriving each microsecond, waking up the kernel to process each packet was too inefficient. Instead, it became more prudent to assume a continuous stream of packets and setting aside a dedicated pool of memory for this traffic. In other words, the NIC is mapped directly with the memory of the user process. Snabb Switch does this by writing their own driver for the NIC (Intel NICs for now) that drives features such as an embedded Ethernet switch and QoS on around 850 lines of Lua code.

Generally speaking, people with networking backgrounds have traditionally assumed x86-based servers to be limited in their packet-processing capabilities (attributed to PCI bus bottlenecks, slow memory, slow CPU, etc). In reality, the raw performance that can be extracted from x86-based hardware is quite high. 800 Gbps can be attained from DRAM banks, 600 Gbps can be attained from PCI Express, and the interconnect between CPUs is also hundreds of Gbps. There is no reason one cannot attain 500 Gbps using a dual core Xeon server. The bottleneck is quite clearly the software. Of course this works best (10 million packets per second per core) for simple cases such as just sending packets in and out. But for slightly more complicated scenarios, such as accessing an unpredictable address in memory, performance can drop by an order of magnitude.

Snabb Switch is known to have generated 200 Gbps out of a single core at just 10% CPU utilization, which is quite incredible. The way that Gorrie did this is by reading in 32,000 packets into a PCAP file, pushing them out on 20 10G NICs, and programming those ports to run in a loop.

The outcome of Snabb Switch is quite similar to Intel’s DPDK, in which there is user space-based forwarding, no Kernel interrupts, and CPUs are dedicated to particular NICs. However, Snabb Switch is a lightweight platform for ground up designs, whereas DPDK is intended to allow developers, who have written applications that run inside the kernel, to port their mature code to user space. For newer application designs, user space development is more prevalent because of the higher traffic levels and performance expectations. Snabb Switch modus operandi is to poll the kernel for new packets to process rather than interrupting it. It runs a scheduler in a polling loop with multiple parallel traffic processes on separate CPUs.

Snabb Switch can also run as a high performance NFV switch for OpenStack environments. The way it can do this is by removing the kernel from the forwarding path and allowing the user space program to talk directly to the device driver on the guest VM. The VMs are only able to address their own memory that they have allocated themselves. A software switch cannot allocate memory to a VM. Instead, for each VM, a separate TX/RX queue in hardware is provisioned in the NIC. So when a VM gives a buffer for packets, the buffer is translated from a standard virtio format (in KVM) directly to hardware format. In other words, when a packet comes in from the network, the NIC determines which VM should get it (typically by looking up the destination MAC address and VLAN ID), picks the appropriate hardware queue with memory that belongs to that VM, grabs a buffer and copies the data from the NIC to that VM. Since Snabb Switch acts as the translation engine between standard virtio and native hardware on the standard Intel NIC, there is no need to write or install a specific device driver for guest VMs to access the hardware.

I believe that Snabb Switch has a lot of promise though it may take a while for deployments to be more mainstream.