Category Archives: Cisco

Bringing Reference Architectures to Multi-Cloud Networking

Recently I attended Aviatrix Certified Engineer training to better understand multi-cloud networking and how Aviatrix is trying to solve its many problems, some of which I have experienced first-hand. Disclaimer: Since 2011, I’ve been an avid listener of the Packet Pushers podcast, where Aviatrix has sponsored 3 shows since December 2019.

Ever since I embarked on the public cloud journey, I have noticed how each of the big 4 vendors (AWS, Azure, GCP, and OCI) approach networking in the cloud differently from how it has been done on-premises. They all have many similarities, such as:

  • The concept of a virtual Data Center (VPC in AWS and GCP, VNET in Azure, VCN in OCI).
  • Abstracting Layer 2 as much as possible (no mention of Spanning Tree or ARP anywhere) from the user despite the fact that these protocols never went away.

However, there are many differences as well, such as this one:

  • In AWS, subnets have zonal scope – each subnet must reside entirely within one Availability Zone and cannot span zones.
  • In GCP, subnets have regional scope – a subnet may span multiple zones within a region.

Broadly speaking, the major Cloud Service Providers (CSPs) do a fairly decent job with their documentation, but they don’t make it easy for one to connect clouds together. They give you plenty of rope to hang yourself, and you end up being on your own. Consequently, your multi-cloud network design ends up being unique – a snowflake.

In the pre-Public Cloud, on-premises world, we would never have gotten far if it weren’t for reference designs. Whether it was the 3-tier Core/Aggregation/Access design that Cisco came out with in the late 1990’s, or the more scaleable spine-leaf fabric designs that followed a decade later, there has always been a need for cookie-cutter blueprints for enterprises to follow. Otherwise they end up reinventing the wheel and being snowflakes. And as any good networking engineer worth their salt will tell you, networking is the plumbing of the Internet, of a Data Center, of a Campus, and that is also true of an application that needs to be built in the cloud. You don’t appreciate it when it is performing well, only when it is broken.

What exacerbates things is that the leading CSP, AWS, does not even acknowledge multiple clouds. In their documentation, they write as if Hybrid IT only means the world of on-premises and of AWS. There is only one cloud in AWS’ world and that is AWS. But the reality is that there is a growing need for enterprises to be multi-cloud – such as needing the IoT capabilities of AWS, but some AI/ML capabilities of GCP; or starting on one cloud, but later needing a second because of a merger/acquisition/partnership. Under such circumstances, an organization has to consider multi-cloud, but in the absence of a common reference architecture, the network becomes incredibly complex and brittle.

Enter Aviatrix with its Multi-Cloud Network Architecture (MCNA). This is a repeatable 3-layered architecture that abstracts all the complexity from the cloud-native components, i.e. regardless of the CSPs being used. The most important of the 3 layers is the Transit Layer, as it handles intra-region, inter-region, and inter-cloud connectivity

Aviatrix Multi-Cloud Networking Architecture (MCNA)

Transitive routing is a feature that none of the CSPs support natively. You need to have full-mesh designs that may work fine for a handful of VPCs. But it is an N² problem (actually N(N-1)/2), which does not scale well in distributed systems. In AWS, it used to be that customers had to be able to address this completely on their own with Transit VPCs, which was very difficult to manage. In an attempt to address this problem with a managed service, AWS announced Transit Gateways at re:Invent 2018, but that doesn’t solve the entire problem either. With Transit Gateways (TGW), a peered VPC sends it routes to the TGW it is attached to. However, that TGW does not automatically redistribute those routes to the other VPCs that are attached to it. The repeatable design of the Aviatrix MCNA is able to solve this and many other multi-cloud networking problems.

Aviatrix has a broad suite of features. The ones from the training that impressed me the most were:

  • Simplicity of solution – This is a born-in-the-cloud solution whose components are:
    • a Controller that can even run on a t2.micro instance
    • a Gateway that handles the Data Plane and can scale out or up
    • Cloud native constructs, such as VPC/VNET/VCN
  • High Performance Encryption (HPE) – This is ideal for enterprises who, for compliance reasons, require end-to-end encryption. Throughput for encrypting a private AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect, or OCI FastConnect link cannot exceed 1.25 Gbps because virtual routers utilize a single core and establish only 1 IPSec tunnel. So even if you are paying for 10 Gbps, you are limited by IPSec performance and get only 1.25 Gbps performance. Aviatrix HPE is able to achieve line-rate encryption using ECMP.
  • CloudWAN – This takes advantage of the existing investment that enterprises have poured into Cisco WAN infrastructure. When such organizations need to connect to the cloud with optimal latency between branches and apps running in the cloud, Aviatrix CloudWAN is able log in to these Cisco ISRs, and configure VPN and BGP appropriately so that they connect to an Aviatrix Transit Gateway with the AWS Global Accelerator service for the shortest latency path to the cloud.
  • Smart SAML User VPN – I wrote a post on this here.
  • Operational Tools – FlightPath is the coolest multi-cloud feature I have ever seen. It is an inter-VPC/VNET/VCN troubleshooting tool that retrieves and displays Security Groups, Route table entries, and Network ACLs along all the cloud VPCs through which data traverses so you can pinpoint where a problem exists along the dataplane. This would otherwise involve approximately 25 data points to investigate manually (and that doesn’t even include multi-cloud, multi-region, and multi-account). FlightPath automates all of this. Think Traceroute for multi-cloud.

In the weeks and months to come, I’m hoping to get my hands wet with some labs and write about my experience here.

Advertisement

Cisco ONE Controller – SDN Startup Killer?

Military nations demonstrate their power by testing nuclear weapons. Pure play networking vendors display their power in the SDN ecoystem by releasing Controllers. ~Anonymous

I sat in today on Cisco’s Webcast on OpenFlow and the ONE Controller. Cisco CTO, and Engineering and Chief Architect, David Ward spoke at length of this announcement. Ward is also the Chair of the Technical Advisory Group of the Open Network Foundation (ONF). The webcast featured two use cases – in the Enterprise (Indiana University) and in the Service Provider (NTT Communications) arenas.

OpenFlow Model
OpenFlow Model

A typical OpenFlow Controller, or Switch as defined by the standards, would interface to the Data Plane via OpenFlow Configuration Protocol, OF-Config, (persistent across reboots) and OpenFlow Protocol (mechanism for adding and deleting flows). But OpenFlow is just a part of SDN.

In a classical router or switch, the fast packet forwarding (data path) and the high level routing decisions (control path) occur on the same device. An OpenFlow Switch separates these two functions. The data path portion still resides on the switch, while high-level routing decisions are moved to a separate controller, typically a standard server. The OpenFlow Switch and Controller communicate via the OpenFlow protocol, which defines messages, such as packet-received, send-packet-out, modify-forwarding-table, and get-stats. – ONF Website

Cisco ONE Controller Model
Cisco ONE Controller Model

The goal of Cisco’s ONE Software Controller is to enable flexible, application-driven customization of network infrastructure. It includes the onePK toolkit – an SDK for developers to write custom applications to solve their business needs. So, a ONE Controller could speak to other vendor devices via the OpenFlow standard or it could speak to Cisco devices via the onePK southbound API. At least that is what the diagram shows – onePK and OpenFlow are side-by-side. However, during the webcast Q&A, it was stated that onePK is an infrastructure that includes support for multiple abstraction protocol; onePK includes Openflow. This is probably semantic.

One of the features described is network slicing. It is intended to provide more than just L2 or L3 segmentation. It is more like a form of multi-tenancy. The way it was described on the call, instead of making decision based on just ‘shortest path’, network slicing can enable the controller to differentiate based on lowest cost path, highest bandwidth path, and latency. At a demo at Cisco Live in London, latency was tweaked and the Controller was able to compute a different path accordingly.

Another feature presented by Cisco in ONE Controller is of hybrid mode SDN, in which network operators can use SDN for specific flows and traditional integrated CP/DP (i.e. classical routers or switches) for the remaining traffic

What are the ramifications of this release on the SDN ecosystem? Well, although the new open source consortium Daylight supposedly does not include Cisco onePK on Day 1, it is very likely it will be included in about six months. Cisco has announced platform support roadmaps for the Platform APIs (onePK platforms), Controller Agents, and Overlay Networks such as VXLAN Gateway. Some of these won’t be available until Q3 of this year. That sounds just about the right time for a vendor to provide an end-to-end solution for Daylight. If a pure play hardware networking vendor, such as Cisco, can provide a free open source controller, it will be able to kill the competition from many SDN startups. For example, take Floodlight, the open source OpenFlow controller that was developed by Big Switch and is sold on a freemium licensing model. If ONE Controller is given away for free, why would a customer use Floodlight?

In other words, in Daylight there is no need for Floodlights!

Plethora of Cisco Cloud Announcements – February 2013

I’m writing this post the week after Cisco Live was held in London. I did not attend Cisco Live, but this morning I attended a Cisco event today titled entitled Fabric Innovations for the World of Many Clouds. It was kicked off by Cisco’s Chief Strategy Officer Padmasree Warrior who outlined the Fabric vision of the company at this time, which is summarized in the figure below.

February 4, 2013 Cisco Announcement
February 4, 2013 Cisco Announcement

The Nexus 6000 is a new product line with a super high 10/40 Gbps port density and hovering at 1.2 microsecond port-to-port latency. Available today, the 4RU Nexus 6004 has 48x40Gbps ports along with 4 expansion modules allowing for a total of up to 96x40Gbps ports. Also announced, but available in Q2, is the Nexus 6001 – a 1RU switch with 48x1Gx10G with 4x10G/40G uplinks.  Senior VP of Cisco’s Data Center Business Unit, David Yen, said that even Cisco could avail of merchant silicon, but that they still backed their own custom silicon to deliver lower port-to-port latencies, as seen in their Algo Boost technology. To give you an idea on how low 1.2 microseconds is in the industry, Arista has been boasting low-latency switches as low as 350 nanoseconds port-to-port for several years. But Cisco already has an answer for Arista’s ultra-low latency switches – the Nexus 3548 which boast port-to-port latencies as low as 190 nanoseconds. These are better suited for financial exchanges where low switching latencies are critical for conducting electronic trades.

Cisco claims it can scale the Nexus 6004’s 1.2 microsecond latency for as many as 1,500 10G ports. The number 1500 is attained when the Nexus 6004 is combined with another new product – the Nexus 2248PQ Fabric Extender. The last-named product can support 1500 GE or 10GE server ports through Cisco’s FEX technology. Assuming 50 VMs per server, this means that the 1500 FEX ports can support up to 75,000 VMs. This is an impressive number and shows the scalability of the Nexus 6000 platform.

The Network Analysis Module (NAM) has also now formally made its foray into the Nexus offering. I worked a lot with the first two generations of the NAM in 2004 and was impressed by its robustness (one of the few products at the time to be built on Linux) and ease of use. Of course, that was with the Catalyst 6500 platform, which was defribilliated a couple of years ago with the Supervisor 2T. It seems that Cisco is now finally bringing service modules onto the Nexus platform.

The second major announcement was the Nexus 1000V InterCloud for connecting enterprise clouds to provider clouds in a secure manner. The highlights are making application migrations incredibly simple without having to convert VM formats, create templates, deploy site-to-site tunnels between clouds, or re-configure network policies. The Nexus 1000V IC is intended to automate all these steps and support all hypervisors. It is managed by Virtual Network Management Center (VNMC) InterCloud. The highlight of that (to me) was that it hooks into cloud orchestration systems like Cloupia (Cisco’s recent acquisition) and Cisco’s own Intelligent Automation for Cloud (IAC) via a northbound API. Hybrid cloud deployment solutions are a relatively new area and I will be following how this pans out with great interest.

I was most keen about the third announcement, which was of Cisco’s ONE Controller. Last year Cisco announced onePK, but there was no product. Now finally, there is the Controller. It features northbound APIs, such as REST and OSGI and southbound APIs, such as OpenFlow and Cisco’s own onePK. Cisco also announced a roadmap for the ONE Controller’s compatibility with Cisco’s existing Nexus and Catalyst product line.

More information is available from the following links:

Introducing Nexus 6000 Series
Cisco Launches Nexus 1000V InterCloud Part I
Cisco Launches Nexus 1000V InterCloud Part II