Category Archives: VXLAN

Head End Replication and VXLAN Compliance

Arista Networks recently announced that its implementation of VXLAN no longer requires IP Multicast in the underlay network. Instead, the implementation will now rely on a technique called Head End Replication to forward BUM (Broadcast, Unknown Unicast, and Multicast) traffic in the VLANs that it transports. But first, let’s rewind to the original VXLAN specification.

Virtual eXtensible Local Area Networks were first defined in an Internet draft called draft-mahalingam-dutt-dcops-vxlan-00.txt in August 2011. It took some time for switch vendors to implement it, but now Broadcom’s Trident II supports it. Of course, software overlay solutions such as VMware NSX and Nuage Virtualized Services Platform (VSP) also implement it. Three years later, in August 2014, this draft became RFC 7348. The draft had 9 revisions to it, so it went up to draft-mahalingam-dutt-dcops-vxlan-09.txt, but there are no significant changes with respect to Multicast requirements in the underlay. They all say the same thing in section 4.2:

Consider the VM on the source host attempting to communicate with the destination VM using IP.  Assuming that they are both on the same subnet, the VM sends out an Address Resolution Protocol (ARP) broadcast frame. In the non-VXLAN environment, this frame would be sent out using MAC broadcast across all switches carrying that VLAN.

With VXLAN, a header including the VXLAN VNI is inserted at the beginning of the packet along with the IP header and UDP header. However, this broadcast packet is sent out to the IP multicast group on which that VXLAN overlay network is realized. To effect this, we need to have a mapping between the VXLAN VNI and the IP multicast group that it will use.

In essence, IP multicast is the control plane in VXLAN. But, as we know, IP multicast is very complex to configure and manage.

In June 2013, Cisco deviated from the VXLAN standard in the Nexus 1000V in two ways:

  1. It makes copies of packets for each possible IP address at which the destination MAC address can be found, and sent from the head-end of the VXLAN tunnel, or VLAN Tunnel End Point (VTEP). Then these packets are unicast to all VMs within the VXLAN segment, thereby precluding the need to have IP multicast in the core of the network.
  2. The Virtual Supervisor Module (VSM) of the Nexus 1000V acts as the control plane by maintaining the MAC address table of the VMs, which it then distributes, via a proprietary signaling protocol, to the Virtual Ethernet Module (VEM), which, in turn, acts as the data plane in the Nexus 1000V.

To their credit Cisco acknowledged that this mode is not compliant with the standard, although they do support a multicast-mode configuration as well. At that time they expressed hope that the rest of the industry would back their solution. Well, the RFC still states that an IP multicast backbone is needed.

This brings me to the original announcement from Arista. They claim in their press statementThe Arista VXLAN implementation is truly open and standards based with the ability to interoperate with a wide range of data center switches.

But nowhere else on their website do they state how they actually adhere to the standard. Cisco breaks the standard by conducting Head End Replication. Adam Raffe does a great job in explaining how this works (basically, the source VTEP will replicate the Broadcast or Multicast packet and send to all VMs in the same VXLAN). Arista should explain how exactly their enhanced implementation works.