Category Archives: Musing

Bringing Reference Architectures to Multi-Cloud Networking

Recently I attended Aviatrix Certified Engineer training to better understand multi-cloud networking and how Aviatrix is trying to solve its many problems, some of which I have experienced first-hand. Disclaimer: Since 2011, I’ve been an avid listener of the Packet Pushers podcast, where Aviatrix has sponsored 3 shows since December 2019.

Ever since I embarked on the public cloud journey, I have noticed how each of the big 4 vendors (AWS, Azure, GCP, and OCI) approach networking in the cloud differently from how it has been done on-premises. They all have many similarities, such as:

  • The concept of a virtual Data Center (VPC in AWS and GCP, VNET in Azure, VCN in OCI).
  • Abstracting Layer 2 as much as possible (no mention of Spanning Tree or ARP anywhere) from the user despite the fact that these protocols never went away.

However, there are many differences as well, such as this one:

  • In AWS, subnets have zonal scope – each subnet must reside entirely within one Availability Zone and cannot span zones.
  • In GCP, subnets have regional scope – a subnet may span multiple zones within a region.

Broadly speaking, the major Cloud Service Providers (CSPs) do a fairly decent job with their documentation, but they don’t make it easy for one to connect clouds together. They give you plenty of rope to hang yourself, and you end up being on your own. Consequently, your multi-cloud network design ends up being unique – a snowflake.

In the pre-Public Cloud, on-premises world, we would never have gotten far if it weren’t for reference designs. Whether it was the 3-tier Core/Aggregation/Access design that Cisco came out with in the late 1990’s, or the more scaleable spine-leaf fabric designs that followed a decade later, there has always been a need for cookie-cutter blueprints for enterprises to follow. Otherwise they end up reinventing the wheel and being snowflakes. And as any good networking engineer worth their salt will tell you, networking is the plumbing of the Internet, of a Data Center, of a Campus, and that is also true of an application that needs to be built in the cloud. You don’t appreciate it when it is performing well, only when it is broken.

What exacerbates things is that the leading CSP, AWS, does not even acknowledge multiple clouds. In their documentation, they write as if Hybrid IT only means the world of on-premises and of AWS. There is only one cloud in AWS’ world and that is AWS. But the reality is that there is a growing need for enterprises to be multi-cloud – such as needing the IoT capabilities of AWS, but some AI/ML capabilities of GCP; or starting on one cloud, but later needing a second because of a merger/acquisition/partnership. Under such circumstances, an organization has to consider multi-cloud, but in the absence of a common reference architecture, the network becomes incredibly complex and brittle.

Enter Aviatrix with its Multi-Cloud Network Architecture (MCNA). This is a repeatable 3-layered architecture that abstracts all the complexity from the cloud-native components, i.e. regardless of the CSPs being used. The most important of the 3 layers is the Transit Layer, as it handles intra-region, inter-region, and inter-cloud connectivity

Aviatrix Multi-Cloud Networking Architecture (MCNA)

Transitive routing is a feature that none of the CSPs support natively. You need to have full-mesh designs that may work fine for a handful of VPCs. But it is an N² problem (actually N(N-1)/2), which does not scale well in distributed systems. In AWS, it used to be that customers had to be able to address this completely on their own with Transit VPCs, which was very difficult to manage. In an attempt to address this problem with a managed service, AWS announced Transit Gateways at re:Invent 2018, but that doesn’t solve the entire problem either. With Transit Gateways (TGW), a peered VPC sends it routes to the TGW it is attached to. However, that TGW does not automatically redistribute those routes to the other VPCs that are attached to it. The repeatable design of the Aviatrix MCNA is able to solve this and many other multi-cloud networking problems.

Aviatrix has a broad suite of features. The ones from the training that impressed me the most were:

  • Simplicity of solution – This is a born-in-the-cloud solution whose components are:
    • a Controller that can even run on a t2.micro instance
    • a Gateway that handles the Data Plane and can scale out or up
    • Cloud native constructs, such as VPC/VNET/VCN
  • High Performance Encryption (HPE) – This is ideal for enterprises who, for compliance reasons, require end-to-end encryption. Throughput for encrypting a private AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect, or OCI FastConnect link cannot exceed 1.25 Gbps because virtual routers utilize a single core and establish only 1 IPSec tunnel. So even if you are paying for 10 Gbps, you are limited by IPSec performance and get only 1.25 Gbps performance. Aviatrix HPE is able to achieve line-rate encryption using ECMP.
  • CloudWAN – This takes advantage of the existing investment that enterprises have poured into Cisco WAN infrastructure. When such organizations need to connect to the cloud with optimal latency between branches and apps running in the cloud, Aviatrix CloudWAN is able log in to these Cisco ISRs, and configure VPN and BGP appropriately so that they connect to an Aviatrix Transit Gateway with the AWS Global Accelerator service for the shortest latency path to the cloud.
  • Smart SAML User VPN – I wrote a post on this here.
  • Operational Tools – FlightPath is the coolest multi-cloud feature I have ever seen. It is an inter-VPC/VNET/VCN troubleshooting tool that retrieves and displays Security Groups, Route table entries, and Network ACLs along all the cloud VPCs through which data traverses so you can pinpoint where a problem exists along the dataplane. This would otherwise involve approximately 25 data points to investigate manually (and that doesn’t even include multi-cloud, multi-region, and multi-account). FlightPath automates all of this. Think Traceroute for multi-cloud.

In the weeks and months to come, I’m hoping to get my hands wet with some labs and write about my experience here.

2013 Goals Revisited

It has been quite a while since my last blog post. I suppose part of the reason is due to my role as a Product Manager. While I’ve learned an incredible amount since taking on the role in April, lately I often wonder whether the source of my knowledge is from generally following the industry or from what I learn at work.

Looking back at the past twelve months at the goals I set myself before my current role, I can see that I was quite far off:

  • Home OpenStack lab – not met
  • Document my findings and release them to the public in easy-to-understand videos and screencasts – not met
  • Watch all of Ivan Pepeljnak’s webinars. So far I’ve only watched about a third. – not met. This is one goal that I need to set again for 2014.
  • Attain a working knowledge of Python via Codeacademy – not met
  • Recertify my CCIE status – met
  • Play a major role in building a product – met

Overall, I think I might be better served with a little more focus as some of the goals I set myself for 2013 were related to technical marketing, which is a bit ambitious given that is not my job function. While I haven’t set goals for 2014, I definitely hope to write more frequently.

Brief Observations on the State of Networking in Pakistan

I was in Karachi, Pakistan recently for brief visit during the Christmas holidays. Though the purpose of my visit was personal, I did manage to squeeze in some time speaking to professionals who are intimate with the state of networking in Pakistan. In particular I spoke with one individual at Cisco Pakistan who did not wish to be named, but is very familiar with the largest networks in Pakistan.

First, a brief word about telecommunications. Mobile networks in Pakistan currently utilize GPRS and EDGE technologies. Plans to roll out 3G and pseudo 4G technologies have been put on hold. However, broadband speeds to homes and offices have improved significantly over the years, with WiMAX deployments common in Karachi.

Amongst the various ISPs in Pakistan, the biggest player by far is PTCL, which, as of 2006, is a semi-private corporation. PTCL is a Cisco shop that is investing heavily in L2 and L3 MPLS backbones for their customers. For many years, up into the mid-2000s, VSAT communications and dialup were the only means of Internet connectivity. So it was refreshing to see this step being taken.

High Availability is a tough ask in Pakistan with very few enterprises deploying redundant links or nodes. The exceptions are the larger banks. Generally speaking, it has been difficult to educate CIOs in Pakistan on the need for high availability. Likewise, structured cabling and cooling in Data Centers is often neglected or simply misunderstood.

On the technology front, the biggest banks in Pakistan are among the few to deploy Nexus 7Ks. Service Providers such as PTCL deploy CRS’. Virtualization has also yet to make a significant penetration in Pakistan Data Centers. Only the largest banks carry ESX licenses. My questions about SDN, overlay networks, and private clouds met bemused expressions.

However, the country has no shortage of talent. The number of universities that offer degrees in computer science and computer engineering has increased significantly since the early 1990s. The past 20 years has seen some brilliant professionals rise from Hamdard Institute of Information Technology (HIIT), Usman Institute of Technology (UIT), Ghulam Ishaq Khan Institiute of Engineering Sciences and Technology (GIK), and the School of Electrical Engineering and Computer Sciences (SEECS) at the National University of Science and Technology (NUST). R&D work in engineering sciences, unlike that in the natural sciences, is of a much higher quality and comparable to any international institute.

Take, for example, a startup incubated out of NUST, called xFlow Research, which is doing fantastic work in porting Open vSwitch to the Marvell xCat and LION platforms. On the Open vSwitch mailing list archives, about 10% of the contributions come from Pakistanis.

Clearly, despite all the challenges that Pakistani enterprises face with proprietary offerings from pure-play networking vendors and a politically unstable environment, the open source world offers a lot of potential to Pakistan networking industry. I wouldn’t be surprised if 2013 saw some major contributions to OpenStack being made by Pakistani companies.

Death to the CLI

One of the selling points of Cisco’s Nexus 1000V virtual switch is that it provides network administrators with a familiar way of configuration, namely the CLI. The Nexus 1000V is built on NX-OS and is accessible via SSH. This was intended to give network engineers the familiar look and feel to perform their duties that they didn’t have with the hypervisor’s native vSwitch.

I understand the need for separation of duty and that is what any dedicated management interface of switch provides. And I appreciate that the Nexus 1000V offers many rich features that most soft switches don’t, such as L2/L3 ACLs, link aggregation, port monitoring, and some of the more advanced STP knobs like BPDU Guard. What I don’t appreciate is clinging on to an archaic mode of configuration.

When I took my CCIE lab, Cisco provided a single reference CD-ROM, known as UniverCD or DocCD. Many tasks required knowledge of esoteric commands. One of the first steps any competent test-taker would take would be to use the alias command to define short cuts. For example, show ip route might become sir. Network engineers often take great pride in the aliases they define and the Expect/Perl/Python scripts they’ve written to automate tasks. They rave about the amount of time saved. Of course all of this would break when new CLI commands were created by the vendor that conflicted with existing aliases.

In one of my past roles I was one of five engineers who used to frequently make firewall rule changes to ASAs. All of us were CCIEs, but none of us used the CLI to make the changes. Instead we preferred to use ASDM, the GUI element manager. Sure it was buggy and handled concurrent changes poorly, but at least the changes made were accurate. Adding a single rule isn’t as simple as adding a single line. In most cases you have to edit object groups and make sure there are no overlapping or conflicting rules. Trusting a human to do this accurately every time is like trusting someone to have a 5-hour daily drive for work and never get into an accident.

There is a smarter way to do configuration management. Make the network programmable. Offer APIs to developers that are stateful and intelligent. Obviously, the rebuttal from Nexus 1000V loyalists is that engineers are familiar with NX-OS and would therefore be more comfortable with the CLI. But that’s a step in the wrong direction. When I look back at how much time gets wasted by network engineers in creating simple automation tasks such as macros, I realize this is one of the reasons networking has lagged behind compute technologies. Network engineers should not have to write their own scripts to make their own lives easier. Applications should be doing this for them. Let the network engineers focus on their job, which is optimizing how packets need to get sent from source to destination – as quickly, reliably, and securely as possible.

SYN

Welcome to my blog!

I have been planning, designing, building, implementing, analyzing, operating, and supporting networks since 1996. Not to mention the things they glue together. I have worked for networking vendors, been a customer of networking vendors, and delivered professional services to customers of networking vendors. I am CCIE #11857 and rubbed shoulders with Milton Friedman apologists while earning my MBA at Chicago Booth. (I personally subscribe to the Richard Thaler libertarian paternalism school of thought, better known as Nudging.)

With this varied background, occasionally I feel the need for an outlet when I sense déjà vu in the networking industry or process an acquisition announcement or learn about a cool new feature or something along those lines. Hence I’ve started this blog.

I hope you find my musings interesting, informative, and perhaps even helpful. I look forward to discussing topics related to LANs, WANs, SDNs, Data Centers, IPv6, and yes, even Cloud. Moreover, I will try to make it more than just about the technology.

And in case you were wondering, No, my next post will not be entitled SYN-ACK. Unless there has been a man-in-the-middle attack.

— Umair Hoodbhoy