Head End Replication and VXLAN Compliance

Arista Networks recently announced that its implementation of VXLAN no longer requires IP Multicast in the underlay network. Instead, the implementation will now rely on a technique called Head End Replication to forward BUM (Broadcast, Unknown Unicast, and Multicast) traffic in the VLANs that it transports. But first, let’s rewind to the original VXLAN specification.

Virtual eXtensible Local Area Networks were first defined in an Internet draft called draft-mahalingam-dutt-dcops-vxlan-00.txt in August 2011. It took some time for switch vendors to implement it, but now Broadcom’s Trident II supports it. Of course, software overlay solutions such as VMware NSX and Nuage Virtualized Services Platform (VSP) also implement it. Three years later, in August 2014, this draft became RFC 7348. The draft had 9 revisions to it, so it went up to draft-mahalingam-dutt-dcops-vxlan-09.txt, but there are no significant changes with respect to Multicast requirements in the underlay. They all say the same thing in section 4.2:

Consider the VM on the source host attempting to communicate with the destination VM using IP.  Assuming that they are both on the same subnet, the VM sends out an Address Resolution Protocol (ARP) broadcast frame. In the non-VXLAN environment, this frame would be sent out using MAC broadcast across all switches carrying that VLAN.

With VXLAN, a header including the VXLAN VNI is inserted at the beginning of the packet along with the IP header and UDP header. However, this broadcast packet is sent out to the IP multicast group on which that VXLAN overlay network is realized. To effect this, we need to have a mapping between the VXLAN VNI and the IP multicast group that it will use.

In essence, IP multicast is the control plane in VXLAN. But, as we know, IP multicast is very complex to configure and manage.

In June 2013, Cisco deviated from the VXLAN standard in the Nexus 1000V in two ways:

  1. It makes copies of packets for each possible IP address at which the destination MAC address can be found, and sent from the head-end of the VXLAN tunnel, or VLAN Tunnel End Point (VTEP). Then these packets are unicast to all VMs within the VXLAN segment, thereby precluding the need to have IP multicast in the core of the network.
  2. The Virtual Supervisor Module (VSM) of the Nexus 1000V acts as the control plane by maintaining the MAC address table of the VMs, which it then distributes, via a proprietary signaling protocol, to the Virtual Ethernet Module (VEM), which, in turn, acts as the data plane in the Nexus 1000V.

To their credit Cisco acknowledged that this mode is not compliant with the standard, although they do support a multicast-mode configuration as well. At that time they expressed hope that the rest of the industry would back their solution. Well, the RFC still states that an IP multicast backbone is needed.

This brings me to the original announcement from Arista. They claim in their press statementThe Arista VXLAN implementation is truly open and standards based with the ability to interoperate with a wide range of data center switches.

But nowhere else on their website do they state how they actually adhere to the standard. Cisco breaks the standard by conducting Head End Replication. Adam Raffe does a great job in explaining how this works (basically, the source VTEP will replicate the Broadcast or Multicast packet and send to all VMs in the same VXLAN). Arista should explain how exactly their enhanced implementation works.

Linux as a Switch Operating System: Five Lessons Learned

Although this post is nearly a year old, it is still gold. Ken Duda, the CTO of Arista Networks described five lessons learned along the way of supporting Enterprise Operating System (EOS), the Linux-based switching operating system. They are listed as:

  1. It’s okay to leave the door unlocked.
  2. Preserve the integrity of the Linux core.
  3. Focus on state, not messages.
  4. Keep your hands out of the kernel.
  5. Provide familiar interfaces to ease adoption.

Definitely worth a read.

What the world outside of IT can learn from open source

Earlier this week, the world’s leading drugmaker Johnson and Johnson (J & J) announced that it would join hands with rival GlaxoSmithKline (GSK) to develop a vaccine to combat the Ebola disease. Apparently, both companies had been working on a vaccine, but now they are collaborating.

Yawn. Tech companies have been doing that for decades, since the early days of Linux. It’s called Open Source, people. And it’s a beautiful thing. When competitors get together to come up with solutions, obviously much of it is for publicity, but much good does come out of it. The world would be a much better place if other major corporations would follow suit for a change, and come up with ideas together to solve real world problems.

Ethernet Alliance unveils five new speeds

This week Network World laid out some details of the work the IEEE group, the Ethernet Alliance, is doing with respect to new data rates. As mentioned in this blog post, while there are 5 shipping speeds of Ethernet (100 Mbps, 1 Gbps, 10 Gbps, 40 Gbps, and 100 Gbps), there are 5 new speeds that are currently being worked on (2.5 Gbps, 5 Gbps, 25 Gbps, 50 Gbps, and 400 Gbps). The last time Ethernet got this sexy was when promiscuous mode was introduced.

Some of the drivers for these new speeds are adoption rates of the older speeds. As detailed in the July 2014 IEEE Call for Interest , while the initial adoption for 10G, 40G, and 100G was in 2004, 2012, and 2015 (anticipated) respectively, because these speeds are turning out to be cost prohibitive, the transition to higher speeds has been slower than previously forecasted. For example, the 1G -> 10G transition has repeatedly moved out (from 2012 to 2014 to 2016 now). This creates a window where new technology can provide the higher port speed at lower cost. So, as an example, the SFP+ technology can be leveraged in 25 Gbps as a single lane and 50 Gbps as two lanes.

The 2.5 and 5 Gbps speeds (known as MGBASE-T) address the growing demands of BYOD in campus networks. Many of the newer APs nowadays ship with 802.11ac. This Wifi standard will have a second wave in 2015 whereby the uplinks (or backhauls) between the APs and the access switches will be multi-gigabit rates. The key requirement here is to be able to reuse the existing cabling infrastructure. So Cat 5e and Cat 6 would still be supported over the usual 100 meters and there would be no need to rip and replace cables.

Ethernet has come a long way since the days of the 2.94 Mbps flavor that Bob Metcalfe had invented. There is very little in common between the types of Ethernet standards we have today from the IEEE and the original specification. One thing that is common, however, is the ability to evolve according to market needs, from single-pair vehicular Ethernet to four-pair PoE and in between. More on this in another post.

Docker to come to Windows Server

Recently news broke out about Windows Server introducing support for Docker. This is significant because the ultra hot company had previously only been supported on Linux (and Azure). One of the major complaints about it was the lack of flexibility when it came to host operating system support. With this news Microsoft also announces that it will be contributing to Docker’s open source APIs. What a remarkable change from a company that epitomized closed systems.

I’m excited to announce today that Microsoft is partnering with Docker, Inc to enable great container-based development experiences on Linux, Windows Server and Microsoft Azure. Docker is an open platform that enables developers and administrators to build, ship, and run distributed applications. Consisting of Docker Engine, a lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments. Earlier this year, Microsoft released support for Docker containers with Linux on Azure.  This support integrates with the Azure VM agent extensibility model and Azure command-line tools, and makes it easy to deploy the latest and greatest Docker Engine in Azure VMs and then deploy Docker based images within them. – Scott Guthrie, executive vice president of the Microsoft Cloud and Enterprise group.

QoS and SLA Guarantees in the Cloud

Ivan Pepeljnak’s makes an important point in his webinar on Cloud Computing Networking: as a customer, understand the QoS and SLA Guarantees that your public cloud provider offers. Whatever Tenant A does should not impact the performance of Tenant B. At a very minimum, there should be some guarantees on bandwidth, IO operations, and CPU cycles for every tenant. You don’t want to have the noisy neighbor who hogs up resources that leaves you no choice but to reboot your VM with the hope of getting reassigned to a physical server with less load. An AWS Small Instance is an example of an environment where you might encounter this scenario.

3.2 Tbps on a single chip – Merchant silicon cranks it up

Recently I stumbled upon the blog of David Gee in the UK. He covered the Cavium acquisition of Xpliant as well as Broadcom’s announcement of the StrataXGS Tomahawk chipset less than two months later. The remarkable thing about both chipsets is that they are both capable of 3.2 Tbps and feature programmability, something which the Trident II (a 1.28 Tbps chipset) didn’t have. The Trident II is used on Cisco’s Nexus 9000, Juniper’s QFX5100, and HP’s 5930, to name a few switches. There had been great anticipation for the Trident II because it contains support for VXLAN, which the Trident did not. However, the most recent tunnel encapsulation protocol, Generic Network Virtualization Encapsulation (GENEVE), isn’t supported on Trident II. Well, with Tomahawk, as well as Xpliant, because of their programmable nature, they should, in theory.

Broadcom’s press announcement page contains an impressive array of quotes from vendors such as Brocade, Big Switch, Cumulus, HP, Juniper, Pica8, and VMware, to name a few. It remains to be seen what vendors will implement Xpliant.

EtherealMind

Software Defined Networking, Data Centre and Infrastructure

ipSpace.net

Thoughts on Data Centers, LANs, WANs, SDN, Cloud, and anything to do with networks

Follow

Get every new post delivered to your Inbox.

Join 99 other followers