Category Archives: Cloud

The Management Plane of Multi-Cloud Networking – Aviatrix CoPilot

Recently, Aviatrix launched a new product called CoPilot to address the dire need of operational visibility in multi-cloud networking. This piqued my interest because the none of the Cloud Service Providers (CSPs) provide any topology tools for end-to-end visualization, monitoring and troubleshooting. So I decided to attend the launch event.

Some of the biggest challenges that enterprises face in today’s multi-cloud environments are complexity and lack of visibility (topology and traffic flow). It’s difficult enough managing a single CSP. Add multiple vendors with their proprietary, opaque ways of passing data and it becomes nearly impossible to pinpoint how and where traffic is flowing.

This is critical for enterprises that have SLAs that need to be met. For example, around a decade ago when electronic trading started replacing open outcry transactions in the financial markets, there was a strong need to identify, at millisecond granularity, where delays in electronic trades were occurring. Penalties would be imposed on the Exchange if it could not prove that the delays were on the member trading firm’s side. Monitoring tool companies like Correlix and Corvid (not to be confused with COVID!) were born out of this need.

Of course, that was fine for the on-prem world. In a multi-cloud world, this becomes far more complex. For example, if there is a routing issue (that is not yet identified as a outright outage) in a region for a particular CSP, and an airline is unable to track its passengers’ baggage that is intended to traverse multiple partner airlines (each using their own CSP), how will it be able to identify where the fault is without the right level of operational visibility in a multi-cloud environment? How will it meet its SLAs? CoPilot is able to visually identify such global multi-cloud anomalies.

The way CoPilot is able to achieve this based on its Aviatrix Transit Gateway as well as the native constructs from each CSP. While Aviatrix Controller is the Control Plane and Aviatrix Transit Gateway is the Data Plane, in a sense Aviatrix CoPilot can be considered the Management Plane (excluding the domain of IAM). It is more than just passive monitoring as it allows the user to take action in real-time.

The topology below shows AWS, Azure, and GCP clouds along with instances.

Aviatrix CoPilot Topology


The FlowIQ visualization tool makes use of heat maps and Sankey flow diagrams to provide intelligent reports on traffic patterns, trends, and key analytics regarding flow through the multi-cloud network. See this screenshot below.

Aviatrix CoPilot FlowIQ

Other anomalies it is able to detect include if an unusual amount of traffic is coming from a certain geo-location. The FlowIQ tool allows the user to search on a given geo-location as well, such as in this screenshot below.


Aviatrix CoPilot Heat Map

The presenter also gave a sneak peek of some very impressive features on their roadmap:

  • Track what resources VPN users are trying to access
  • Show live link latencies – This is an absolute must for SLA testing.
  • Latency Monitor – You will be able to set thresholds for latencies and be notified when the latency is exceeded. See the screenshot below.

Aviatrix CoPilot Live Latency

I believe Aviatrix is only getting warmed up in the world of operational visibility for multi-cloud networking.


Bringing Reference Architectures to Multi-Cloud Networking

Recently I attended Aviatrix Certified Engineer training to better understand multi-cloud networking and how Aviatrix is trying to solve its many problems, some of which I have experienced first-hand. Disclaimer: Since 2011, I’ve been an avid listener of the Packet Pushers podcast, where Aviatrix has sponsored 3 shows since December 2019.

Ever since I embarked on the public cloud journey, I have noticed how each of the big 4 vendors (AWS, Azure, GCP, and OCI) approach networking in the cloud differently from how it has been done on-premises. They all have many similarities, such as:

  • The concept of a virtual Data Center (VPC in AWS and GCP, VNET in Azure, VCN in OCI).
  • Abstracting Layer 2 as much as possible (no mention of Spanning Tree or ARP anywhere) from the user despite the fact that these protocols never went away.

However, there are many differences as well, such as this one:

  • In AWS, subnets have zonal scope – each subnet must reside entirely within one Availability Zone and cannot span zones.
  • In GCP, subnets have regional scope – a subnet may span multiple zones within a region.

Broadly speaking, the major Cloud Service Providers (CSPs) do a fairly decent job with their documentation, but they don’t make it easy for one to connect clouds together. They give you plenty of rope to hang yourself, and you end up being on your own. Consequently, your multi-cloud network design ends up being unique – a snowflake.

In the pre-Public Cloud, on-premises world, we would never have gotten far if it weren’t for reference designs. Whether it was the 3-tier Core/Aggregation/Access design that Cisco came out with in the late 1990’s, or the more scaleable spine-leaf fabric designs that followed a decade later, there has always been a need for cookie-cutter blueprints for enterprises to follow. Otherwise they end up reinventing the wheel and being snowflakes. And as any good networking engineer worth their salt will tell you, networking is the plumbing of the Internet, of a Data Center, of a Campus, and that is also true of an application that needs to be built in the cloud. You don’t appreciate it when it is performing well, only when it is broken.

What exacerbates things is that the leading CSP, AWS, does not even acknowledge multiple clouds. In their documentation, they write as if Hybrid IT only means the world of on-premises and of AWS. There is only one cloud in AWS’ world and that is AWS. But the reality is that there is a growing need for enterprises to be multi-cloud – such as needing the IoT capabilities of AWS, but some AI/ML capabilities of GCP; or starting on one cloud, but later needing a second because of a merger/acquisition/partnership. Under such circumstances, an organization has to consider multi-cloud, but in the absence of a common reference architecture, the network becomes incredibly complex and brittle.

Enter Aviatrix with its Multi-Cloud Network Architecture (MCNA). This is a repeatable 3-layered architecture that abstracts all the complexity from the cloud-native components, i.e. regardless of the CSPs being used. The most important of the 3 layers is the Transit Layer, as it handles intra-region, inter-region, and inter-cloud connectivity

Aviatrix Multi-Cloud Networking Architecture (MCNA)

Transitive routing is a feature that none of the CSPs support natively. You need to have full-mesh designs that may work fine for a handful of VPCs. But it is an N² problem (actually N(N-1)/2), which does not scale well in distributed systems. In AWS, it used to be that customers had to be able to address this completely on their own with Transit VPCs, which was very difficult to manage. In an attempt to address this problem with a managed service, AWS announced Transit Gateways at re:Invent 2018, but that doesn’t solve the entire problem either. With Transit Gateways (TGW), a peered VPC sends it routes to the TGW it is attached to. However, that TGW does not automatically redistribute those routes to the other VPCs that are attached to it. The repeatable design of the Aviatrix MCNA is able to solve this and many other multi-cloud networking problems.

Aviatrix has a broad suite of features. The ones from the training that impressed me the most were:

  • Simplicity of solution – This is a born-in-the-cloud solution whose components are:
    • a Controller that can even run on a t2.micro instance
    • a Gateway that handles the Data Plane and can scale out or up
    • Cloud native constructs, such as VPC/VNET/VCN
  • High Performance Encryption (HPE) – This is ideal for enterprises who, for compliance reasons, require end-to-end encryption. Throughput for encrypting a private AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect, or OCI FastConnect link cannot exceed 1.25 Gbps because virtual routers utilize a single core and establish only 1 IPSec tunnel. So even if you are paying for 10 Gbps, you are limited by IPSec performance and get only 1.25 Gbps performance. Aviatrix HPE is able to achieve line-rate encryption using ECMP.
  • CloudWAN – This takes advantage of the existing investment that enterprises have poured into Cisco WAN infrastructure. When such organizations need to connect to the cloud with optimal latency between branches and apps running in the cloud, Aviatrix CloudWAN is able log in to these Cisco ISRs, and configure VPN and BGP appropriately so that they connect to an Aviatrix Transit Gateway with the AWS Global Accelerator service for the shortest latency path to the cloud.
  • Smart SAML User VPN – I wrote a post on this here.
  • Operational Tools – FlightPath is the coolest multi-cloud feature I have ever seen. It is an inter-VPC/VNET/VCN troubleshooting tool that retrieves and displays Security Groups, Route table entries, and Network ACLs along all the cloud VPCs through which data traverses so you can pinpoint where a problem exists along the dataplane. This would otherwise involve approximately 25 data points to investigate manually (and that doesn’t even include multi-cloud, multi-region, and multi-account). FlightPath automates all of this. Think Traceroute for multi-cloud.

In the weeks and months to come, I’m hoping to get my hands wet with some labs and write about my experience here.

Remote User Access in the Era of COVID-19

The worldwide lockdown due to COVID-19 has given me an opportunity to reflect on many aspects of life and work. Nowadays I’m helping enable companies and non-profits for secure remote access work (i.e. not site-to-site VPN). I was looking into enterprise-grade solutions for secure remote users access to VPNs when I came across the Smart SAML Remote User VPN solution from Aviatrix

I have prior experience with inexpensive/free solutions such as Libreswan for site-to-site IPSec VPN and OpenVPN for site-to-site SSL VPN. While OpenVPN also handles remote user VPN, I haven’t come across many solutions that can also handle SAML. SAML of course stands for Security Assertion Markup Language and, simply put, is a way allowing identity providers to pass authorization credentials to service providers for Single Sign On (SSO). Facebook is a common example of an Identity Provider. One of the best write-ups I’ve seen that explains how SAML works is on Duo’s site.

Brief tangent: in mid-2011, Spotify launched a 6-month ad-free trial period in USA. I signed up for it using my Facebook account as the Identity Provider. In January 2012, I converted my account to Premium. Nine years and hundreds of playlists later, I am still a Spotify Premium member, but because of the notoriety Facebook has gained from its stance on privacy, I’ve wanted to dissociate my Spotify account from Facebook only to learn the hard way that “If your Spotify account was created on Facebook, you can’t disconnect from Facebook.

Of course, while many end-users use Facebook, LinkedIn, or Google as an Identity Provider so that they don’t have to create multiple accounts, the more common solutions used by Enterprises are Okta, Duo, and Active Directory from Microsoft. Enterprises often use commercial Remote Access VPN clients that correspond to the VPN Concentrator of their choice. Alternatively, they may also use open source based clients, such as OpenVPN.

Aviatrix has an OpenVPN client that supports SAML authentication through Enterprise-grade Identity Providers. The solution will enable remote access to employees, customers, and partners who need to remotely access private company resources that reside in public clouds as well as on-premise applications. Aviatrix actually has a promotion through June 2020 to credit organizations that use this solution. For a list of other offers/promotions made by tech companies, visit this page on Packet Pushers.

I had heard of Aviatrix for a couple of years as a leader in multi-cloud networking. I’ll save some of those thoughts for my next post.

Creating a Multi-Tier Kubernetes App in GCP and AWS

This post details my experience with creating a simple multi-tier Kubernetes app using Google Cloud Platform (GCP) as well as Amazon Web Services (AWS) by taking advantage of the free tier accounts for each service. The app comprises a Redis master for storage, multiple Redis read replicas (a.k.a Slaves), and load-balanced web frontends. Kubernetes acts as a Frontend load balancer that proxies traffic to one or more of these Slaves (known as ‘container nodes’ in Kubernetes lingo). In order to manage these, I created Kubernetes replication controllers, pods, and services in this sequence:

  1. Create the Redis master replication controller.
  2. Create the Redis master service.
  3. Create the Redis slave replication controller.
  4. Create the Redis slave service.
  5. Create the guestbook replication controller.
  6. Create the guestbook service.


GCP has a few easy-to-follow tutorials for their main services. For Google Kubernetes Engine (GKE), the tutorial walks you through creating cluster to deploy a simple Guestbook application to the cluster.

AWS launched Amazon Elastic Container Service for Kubernetes (Amazon EKS) at re:Invent 2017 and it became Generally Available in June 2018.

For EKS, I generally followed the EKS documentation with the following observations:

  • Some pre-work was needed in EKS, such as:
    • Installing the latest AWS CLI. On the other hand, Google Cloud Shell in GCP makes it really simple to perform operation and administration tasks.
    • Installing a tool to use AWS IAM credentials to authenticate to a Kubernetes cluster. It’s not clear to me why exactly this is needed. The documentation for AWS IAM Authenticator for Kubernetes states “If you are an administrator running a Kubernetes cluster on AWS, you already need to manage AWS IAM credentials to provision and update the cluster. By using AWS IAM Authenticator for Kubernetes, you avoid having to manage a separate credential for Kubernetes access.” However, it appears to be a needless step. Why can’t Kubernetes clusters in AWS EKS leverage AWS IAM credentials directly?
    • In GCP, Google Cloud Shell includes the kubectl CLI utility. In AWS, you need to install it locally. Moreover, GCP made it far easier to configure the kubeconfig file by issuing the gcloud container clusters get-credentials pakdude-kubernetes-cluster –zone us-central1-a command. With EKS, I had to manually edit the kubeconfig file to populate the cluster endpoint (called anAPI server endpoint in EKS) and auth data (called a Certificate Authority in EKS).
  • Creating a Kubernetes cluster takes about 10 minutes in EKS, compared to just 2-3 minutes with GKE.
  • When creating the Kubernetes cluster in EKS, as per the documentation, You must use IAM user credentials for this step, not root credentials. I got stuck on this step for a while, but it is good practice anyway to use an IAM user instead of the root credentials.
  • In GKE, I ran into memory allocation issues when creating node pools that had machine type f1.micro, which are 1 shared vCPU, 0.6 GB memory. However, when I created the node pools with machine type g1-small (1 shared vCPU, 1.7 GB memory), things ran more smoothly. In EKS, t1.micro instances are not even offered; the smallest type of instance I could specify for the NodeInstanceType was t2.small.
  • The supported Operating Systems for nodes in GKE node pools are Container Optimized OS (cos) and Ubuntu. In EKS, you have to use an Amazon EKS-optimized AMI, which differs across regions. That difference, alone, opens up inconsistencies between the two implementations.
  • There is very little one can do at the AWS Console for EKS. Most of the work at the CLI or programmatically. For example, there is no way in AWS to check the status of worker nodes. You have to use the kubectl get nodes –watch command.

The sample GKE Frontend/Guestbook code was cloned from Git Hub. After setting it up, it gave this output:

umairhoodbhoy@pakdude713:~$ kubectl get pods
NAME                 READY     STATUS    RESTARTS   AGE
frontend-qgghb       1/1       Running   0          23h
frontend-qngcj       1/1       Running   0          23h
frontend-vm7nq       1/1       Running   0          23h
redis-master-j6wc9   1/1       Running   0          23h
redis-slave-plc59    1/1       Running   0          23h
redis-slave-r664d    1/1       Running   0          23h
umairhoodbhoy@pakdude713:~$ kubectl get rc
frontend       3         3         3         23h
redis-master   1         1         1         1d
redis-slave    2         2         2         23h
umairhoodbhoy@pakdude713:~$ kubectl get services
NAME           TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)        AGE
frontend       LoadBalancer   80:30549/TCP   23h
kubernetes     ClusterIP     <none>           443/TCP        1d
redis-master   ClusterIP   <none>           6379/TCP       1d
redis-slave    ClusterIP    <none>           6379/TCP       23h

GCP GKE Guestbook Tutorial

The sample EKS Guestbook app was also cloned from Git Hub. After setting it up, it gave this output:

hoodbu@macbook-pro /AWS (608) kubectl get pods
NAME                 READY     STATUS    RESTARTS   AGE
guestbook-9lztg      1/1       Running   0          3h
guestbook-bb7md      1/1       Running   0          3h
guestbook-gx6sr      1/1       Running   0          3h
redis-master-qhk8h   1/1       Running   0          3h
redis-slave-7jlpb    1/1       Running   0          3h
redis-slave-8hsmg    1/1       Running   0          3h
hoodbu@macbook-pro /AWS (609) kubectl get rc
guestbook      3         3         3         3h
redis-master   1         1         1         3h
redis-slave    2         2         2         3h
hoodbu@macbook-pro /AWS (610) kubectl get services
NAME           TYPE           CLUSTER-IP      EXTERNAL-IP        PORT(S)          AGE
guestbook      LoadBalancer     affaf9aae9039...   3000:30189/TCP   3h
kubernetes     ClusterIP      <none>             443/TCP          5h
redis-master   ClusterIP   <none>             6379/TCP         3h
redis-slave    ClusterIP   <none>             6379/TCP         3h
hoodbu@macbook-pro /AWS (611) 

AWS EKS Guestbook Tutorial

Obviously, these are just simple Guestbook applications and I only covered the ease of setting up Kubernetes. A more relevant measure of Kubernetes on either cloud platform would be the performance and scalability. I had no easy way of stress testing the apps. However, after going through this exercise in both AWS and GCP, it is clear where GCP’s strengths lie. AWS may be the dominant Public Cloud player, but launching EKS despite having its own Elastic Container Service (ECS) is an indication of how popular Kubernetes is as a container orchestration system. Running Kubernetes on AWS is incredibly cumbersome compared to running it on GCP. EKS is relatively new and I’m sure AWS will iron out the kinks in the months to come.

Creating a Simple Two-Tier App in GCP and AWS

This post details my experience with creating a simple two-tier app using Google Cloud Platform (GCP) as well as Amazon Web Services (AWS) by taking advantage of the free tier accounts for each service.

GCP has a few easy-to-follow tutorials for their main services. For Google Compute Engine (GCE), the tutorial walks you through creating a two-tier app in which the Frontend VM is a Node.js ToDo Web app and the Backend VM runs MongoDB. For either service, I followed these steps:

  1. Create and configure VMs in GCE or AWS Elastic Cloud Compute (EC2)
  2. Connect via SSH to each VM to install appropriate packages and run the relevant services.

In GCP, I started by creating each VM instance separately.

In AWS, however, EC2 let me specify 2 instances to create simultaneously with the identical settings other than the IP addresses.

First, I created the Backend VM that ran MongoDB. The only customizations required were to select a Micro instance (to incur the fewest charges on my Free Tier account), specifying Ubuntu 14.04 LTS as the OS, and opening up HTTP for the Frontend and Backend VMS to communicate. In AWS, I did this by applying a Security Group to the instances that opened up port 80.

Next, I created the Frontend VM that runs the Node.js ToDo application with the same customizations that I applied for the Backend VM.

Once both VM instances were created, I connected to them via SSH.

GCP offers a built-in browser-based terminal utility called Cloud Shell as an alternative to Terminal (Mac OS) or PuTTY (Windows).

gcloud compute --project "pakdude713" ssh --zone "us-east1-b" "backend"

But, for AWS, I used Terminal on my Macbook:

hoodbu@macbook-pro /AWS (502) ssh ubuntu@ -i my-us-east-keypair.pem 

From there, I updated the packages first:

umairhoodbhoy@backend:~$ sudo apt-get update

Then, I installed MongoDB:

umairhoodbhoy@backend:~$ sudo apt-get install mongodb

Next, I configured MongoDB. But first, I had to stop the service:

umairhoodbhoy@backend:~$ sudo service mongodb stop

Create a directory for MongoDB and then run the MongoDB service in the background on port 80.

umairhoodbhoy@backend:~$ sudo mkdir $HOME/db
umairhoodbhoy@backend:~$ sudo mongod --dbpath $HOME/db --port 80 --fork --logpath /var/tmp/mongodb

That concludes the work needed on the Backend VM. For the Frontend VM, I SSH’d to the instance and updated the packages first:

umairhoodbhoy@frontend:~$ sudo apt-get update

Next, I installed Git and Node.js on the Frontend VM:

umairhoodbhoy@frontend:~$ sudo apt-get install git nodejs

Then, I installed and ran the Frontend web app by cloning the sample application. The sample application exists in a GCP repository on Github and I used it for the GCP experiment as well as the AWS one:

umairhoodbhoy@frontend:~$ git clone

Next, I installed application dependencies and prepared the web server for port 80 instead of the default 8080:

umairhoodbhoy@frontend:~$ cd todomvc-mongodb; npm install
umairhoodbhoy@frontend:~/todomvc-mongodb$ sed -i -e 's/8080/80/g' server.js

Finally, I was ready to run the ToDo app on the Frontend VM:

umairhoodbhoy@frontend:~/todomvc-mongodb$ sudo nohup nodejs server.js --be_ip --fe_ip &

These IP addresses are the private IP addresses generated by the GCE instances. For AWS, I replaced the IP addresses with the ones generated by AWS EC2 instances.

And that’s it! In GCP, I launched the ToDo app by visiting

In AWS, I launched the same app by visiting