Category Archives: Security

AppIQ – Unprecedented visibility that Aviatrix CoPilot brings

Earlier in my career, I worked as a Network Engineer in the high-frequency trading industry at a capital market exchange. It was the time when electronic trading was gaining heavy momentum as open outcry was receding. This was thanks mainly in part to vendors such as Arista who leveraged merchant silicon from Broadcom to lead the charge of low-latency networking.

Scores of trading firms would set up their equipment in one of the exchange’s many data centers inside the building to practice latency arbitrage. Speed was the name of the game and livelihoods were hedged on the network’s ability to pass packets as quickly as possible.

In the early days, any time there was a significant delay (could be as low as 1-2 seconds), the exchange would get hit with hefty fines. However, if we could prove that it was not the fault of the network, but rather the application that caused a trade to execute slowly, then we were off the hook. So my team invested in several network taps and sniffers from NETSCOUT and Gigamon to perform forensic analysis on these low-latency, high-throughput financial systems.

But there were never enough taps. Taps allowed us to pinpoint the location and cause of delays and retransmissions if we were lucky enough to have placed them at the exact spot in the network where the delay was incurred. It was like a playing a game of whack-a-mole. Providing evidential data was a nightmare in those days. There was such little visibility.

Did I mention we owned the entire network?

Fast forward to public clouds today which are complete black boxes. They provide very little visibility and the network has no way to prove it is not at fault because there have been no tools that are able to extract meaningful data until Aviatrix CoPilot came along. It already had the ability to display NetFlow records to provide such empirical data. Take this screenshot as an example.

If I were to see a flow with a few SYNs coming in, for example, I could use that information to ask the Application team whether everything is okay on their end. Or if I see a SYN followed immediately by a RST, that might point in the direction of a firewall blocking something. Or maybe if PSH packets are going through fine and data is being passed for a while, it might be another indication of the network doing its job and the application developer needing to be pulled in. It’s a very powerful feature.

But with the new AppIQ feature released this week in CoPilot, visibility is taken to the next level. AppIQ allows you to generate a comprehensive report of latency, traffic, and performance monitoring data between any two cloud instances connected via your Aviatrix transit network, such as shown here with an SSH test.

Now you can see latencies on a hop-by-hop basis. AWS us-east-1 (N. Virginia) to us-east-2 (Ohio) regions are about 12 ms away on average. And each of those green links represents an encrypted tunnel.

End-to-end encryption in the cloud with the visibility: that’s what every network engineer dreams of having.

What’s the Big Deal About Multi-Cloud Networking – Part 2

If you were experiencing issues with Zoom calls today, you were not alone.

But if you take a close look at today’s outage, it is clear that it was correlated with an AWS outage today.

In fact, most of Zoom runs on AWS, according to AWS. This is despite Oracle’s claim that millions of users run Zoom on Oracle Cloud. Zoom didn’t state the cause of the outage, but it is quite possible from these two charts that a well-architected transit network, such as the Aviatrix Multi-Cloud Network Architecture, could have prevented this outage.

Bringing Reference Architectures to Multi-Cloud Networking

Recently I attended Aviatrix Certified Engineer training to better understand multi-cloud networking and how Aviatrix is trying to solve its many problems, some of which I have experienced first-hand. Disclaimer: Since 2011, I’ve been an avid listener of the Packet Pushers podcast, where Aviatrix has sponsored 3 shows since December 2019.

Ever since I embarked on the public cloud journey, I have noticed how each of the big 4 vendors (AWS, Azure, GCP, and OCI) approach networking in the cloud differently from how it has been done on-premises. They all have many similarities, such as:

  • The concept of a virtual Data Center (VPC in AWS and GCP, VNET in Azure, VCN in OCI).
  • Abstracting Layer 2 as much as possible (no mention of Spanning Tree or ARP anywhere) from the user despite the fact that these protocols never went away.

However, there are many differences as well, such as this one:

  • In AWS, subnets have zonal scope – each subnet must reside entirely within one Availability Zone and cannot span zones.
  • In GCP, subnets have regional scope – a subnet may span multiple zones within a region.

Broadly speaking, the major Cloud Service Providers (CSPs) do a fairly decent job with their documentation, but they don’t make it easy for one to connect clouds together. They give you plenty of rope to hang yourself, and you end up being on your own. Consequently, your multi-cloud network design ends up being unique – a snowflake.

In the pre-Public Cloud, on-premises world, we would never have gotten far if it weren’t for reference designs. Whether it was the 3-tier Core/Aggregation/Access design that Cisco came out with in the late 1990’s, or the more scaleable spine-leaf fabric designs that followed a decade later, there has always been a need for cookie-cutter blueprints for enterprises to follow. Otherwise they end up reinventing the wheel and being snowflakes. And as any good networking engineer worth their salt will tell you, networking is the plumbing of the Internet, of a Data Center, of a Campus, and that is also true of an application that needs to be built in the cloud. You don’t appreciate it when it is performing well, only when it is broken.

What exacerbates things is that the leading CSP, AWS, does not even acknowledge multiple clouds. In their documentation, they write as if Hybrid IT only means the world of on-premises and of AWS. There is only one cloud in AWS’ world and that is AWS. But the reality is that there is a growing need for enterprises to be multi-cloud – such as needing the IoT capabilities of AWS, but some AI/ML capabilities of GCP; or starting on one cloud, but later needing a second because of a merger/acquisition/partnership. Under such circumstances, an organization has to consider multi-cloud, but in the absence of a common reference architecture, the network becomes incredibly complex and brittle.

Enter Aviatrix with its Multi-Cloud Network Architecture (MCNA). This is a repeatable 3-layered architecture that abstracts all the complexity from the cloud-native components, i.e. regardless of the CSPs being used. The most important of the 3 layers is the Transit Layer, as it handles intra-region, inter-region, and inter-cloud connectivity

Aviatrix Multi-Cloud Networking Architecture (MCNA)

Transitive routing is a feature that none of the CSPs support natively. You need to have full-mesh designs that may work fine for a handful of VPCs. But it is an N² problem (actually N(N-1)/2), which does not scale well in distributed systems. In AWS, it used to be that customers had to be able to address this completely on their own with Transit VPCs, which was very difficult to manage. In an attempt to address this problem with a managed service, AWS announced Transit Gateways at re:Invent 2018, but that doesn’t solve the entire problem either. With Transit Gateways (TGW), a peered VPC sends it routes to the TGW it is attached to. However, that TGW does not automatically redistribute those routes to the other VPCs that are attached to it. The repeatable design of the Aviatrix MCNA is able to solve this and many other multi-cloud networking problems.

Aviatrix has a broad suite of features. The ones from the training that impressed me the most were:

  • Simplicity of solution – This is a born-in-the-cloud solution whose components are:
    • a Controller that can even run on a t2.micro instance
    • a Gateway that handles the Data Plane and can scale out or up
    • Cloud native constructs, such as VPC/VNET/VCN
  • High Performance Encryption (HPE) – This is ideal for enterprises who, for compliance reasons, require end-to-end encryption. Throughput for encrypting a private AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect, or OCI FastConnect link cannot exceed 1.25 Gbps because virtual routers utilize a single core and establish only 1 IPSec tunnel. So even if you are paying for 10 Gbps, you are limited by IPSec performance and get only 1.25 Gbps performance. Aviatrix HPE is able to achieve line-rate encryption using ECMP.
  • CloudWAN – This takes advantage of the existing investment that enterprises have poured into Cisco WAN infrastructure. When such organizations need to connect to the cloud with optimal latency between branches and apps running in the cloud, Aviatrix CloudWAN is able log in to these Cisco ISRs, and configure VPN and BGP appropriately so that they connect to an Aviatrix Transit Gateway with the AWS Global Accelerator service for the shortest latency path to the cloud.
  • Smart SAML User VPN – I wrote a post on this here.
  • Operational Tools – FlightPath is the coolest multi-cloud feature I have ever seen. It is an inter-VPC/VNET/VCN troubleshooting tool that retrieves and displays Security Groups, Route table entries, and Network ACLs along all the cloud VPCs through which data traverses so you can pinpoint where a problem exists along the dataplane. This would otherwise involve approximately 25 data points to investigate manually (and that doesn’t even include multi-cloud, multi-region, and multi-account). FlightPath automates all of this. Think Traceroute for multi-cloud.

In the weeks and months to come, I’m hoping to get my hands wet with some labs and write about my experience here.

Remote User Access in the Era of COVID-19

The worldwide lockdown due to COVID-19 has given me an opportunity to reflect on many aspects of life and work. Nowadays I’m helping enable companies and non-profits for secure remote access work (i.e. not site-to-site VPN). I was looking into enterprise-grade solutions for secure remote users access to VPNs when I came across the Smart SAML Remote User VPN solution from Aviatrix

I have prior experience with inexpensive/free solutions such as Libreswan for site-to-site IPSec VPN and OpenVPN for site-to-site SSL VPN. While OpenVPN also handles remote user VPN, I haven’t come across many solutions that can also handle SAML. SAML of course stands for Security Assertion Markup Language and, simply put, is a way allowing identity providers to pass authorization credentials to service providers for Single Sign On (SSO). Facebook is a common example of an Identity Provider. One of the best write-ups I’ve seen that explains how SAML works is on Duo’s site.

Brief tangent: in mid-2011, Spotify launched a 6-month ad-free trial period in USA. I signed up for it using my Facebook account as the Identity Provider. In January 2012, I converted my account to Premium. Nine years and hundreds of playlists later, I am still a Spotify Premium member, but because of the notoriety Facebook has gained from its stance on privacy, I’ve wanted to dissociate my Spotify account from Facebook only to learn the hard way that “If your Spotify account was created on Facebook, you can’t disconnect from Facebook.

Of course, while many end-users use Facebook, LinkedIn, or Google as an Identity Provider so that they don’t have to create multiple accounts, the more common solutions used by Enterprises are Okta, Duo, and Active Directory from Microsoft. Enterprises often use commercial Remote Access VPN clients that correspond to the VPN Concentrator of their choice. Alternatively, they may also use open source based clients, such as OpenVPN.

Aviatrix has an OpenVPN client that supports SAML authentication through Enterprise-grade Identity Providers. The solution will enable remote access to employees, customers, and partners who need to remotely access private company resources that reside in public clouds as well as on-premise applications. Aviatrix actually has a promotion through June 2020 to credit organizations that use this solution. For a list of other offers/promotions made by tech companies, visit this page on Packet Pushers.

I had heard of Aviatrix for a couple of years as a leader in multi-cloud networking. I’ll save some of those thoughts for my next post.

Avoiding Shellshock in Mac OSX

Shellshock is a vulnerability in bash (the shell that comes with Mac OSX) that surfaced in late September 2014 and has the potential to do more harm than Heartbleed that made headlines in April 2014. Apple ships OSX with an old version of bash. According to this site, Shellshock can potentially be used to execute arbitrary code on environment variables that are passed to child processes. What follows is my approach to hardening my Macbook.

You know you are vulnerable in OSX if you run the following at the Terminal window prompt:

hoodbu@pakdude-mbp /~ (499) env x='() { :;}; echo vulnerable' bash -c "echo this is a test"
vulnerable
this is a test
hoodbu@pakdude-mbp /~ (500)

This is because of the version of bash that I had on my Macbook:

hoodbu@pakdude-mbp /~ (501) bash --version
GNU bash, version 3.2.51(1)-release (x86_64-apple-darwin13)
Copyright (C) 2007 Free Software Foundation, Inc.
hoodbu@pakdude-mbp /~ (502)

Following the instructions given at Stack Exchange, I ran the following:

hoodbu@pakdude-mbp /~ (528) mkdir bash-fix
hoodbu@pakdude-mbp /~ (529) cd bash-fix/
hoodbu@pakdude-mbp /bash-fix (530) curl https://opensource.apple.com/tarballs/bash/bash-92.tar.gz | tar zxf -
-bash: /sw/bin/tar: Bad CPU type in executable
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0 4088k    0 16384    0     0   4927      0  0:14:09  0:00:03  0:14:06  4927
curl: (23) Failed writing body (0 != 16384)
hoodbu@pakdude-mbp /bash-fix (531)

This error was because of my version of ‘tar’. Somehow, my ‘/sw/bin/tar’ is  a PowerPC-only binary of tar probably because I once owned a PowerPC-based Mac and after upgrading many years ago my version of ‘tar’ somehow didn’t get updated.

hoodbu@pakdude-mbp /bash-fix (534) /usr/bin/tar --version
bsdtar 2.8.3 - libarchive 2.8.3
hoodbu@pakdude-mbp /bash-fix (535) tar --version
-bash: /sw/bin/tar: Bad CPU type in executable
hoodbu@pakdude-mbp /bash-fix (536) which tar
/sw/bin/tar

So I just used ‘/usr/bin/tar’ and will deal with ‘/sw/bin/tar’ later. Moving on,

hoodbu@pakdude-mbp /bash-fix (537) curl https://opensource.apple.com/tarballs/bash/bash-92.tar.gz | /usr/bin/tar zxf -
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4088k  100 4088k    0     0   603k      0  0:00:06  0:00:06 --:--:--  607k
hoodbu@pakdude-mbp /bash-fix (539) cd bash-92/bash-3.2
hoodbu@pakdude-mbp /bash-3.2 (540) curl https://ftp.gnu.org/pub/gnu/bash/bash-3.2-patches/bash32-052 | patch -p0
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3250  100  3250    0     0   2041      0  0:00:01  0:00:01 --:--:--  2042
patching file builtins/common.h
patching file builtins/evalstring.c
patching file variables.c
patching file patchlevel.h
hoodbu@pakdude-mbp /bash-3.2 (541) curl http://alblue.bandlem.com/bash32-053.patch | patch -p0
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1118  100  1118    0     0    803      0  0:00:01  0:00:01 --:--:--   803
patching file parse.y
patching file patchlevel.h
hoodbu@pakdude-mbp /bash-3.2 (542) cd ..
hoodbu@pakdude-mbp /bash-92 (543) xcodebuild
xcode-select: note: no developer tools were found at '/Applications/Xcode.app', requesting install. Choose an option in the dialog to download the command line developer tools.

Apparently I had had ‘xcodebuild’, but not the way Apple wants it. So I installed it from the App Store. At 2.46 GB, it took a while to download, but once installing, running as sudo, and agreeing to the EULA, the rest was straightforward:

hoodbu@pakdude-mbp /bash-92 (544) xcodebuild
Agreeing to the Xcode/iOS license requires admin privileges, please re-run as root via sudo.
hoodbu@pakdude-mbp /bash-92 (545) sudo xcodebuild
Password:
You have not agreed to the Xcode license agreements. You must agree to both license agreements below in order to use Xcode.
Hit the Enter key to view the license agreements at '/Applications/Xcode.app/Contents/Resources/English.lproj/License.rtf'
<long EULA skipped>
hoodbu@pakdude-mbp /bash-92 (547) sudo xcodebuild
<long output skipped>
** BUILD SUCCEEDED **
hoodbu@pakdude-mbp /bash-92 (548) sudo cp /bin/bash /bin/bash.old
hoodbu@pakdude-mbp /bash-92 (549) sudo cp /bin/sh /bin/sh.old
hoodbu@pakdude-mbp /bash-92 (550) build/Release/bash --version # GNU bash, version 3.2.53(1)-release
GNU bash, version 3.2.53(1)-release (x86_64-apple-darwin13)
Copyright (C) 2007 Free Software Foundation, Inc.
hoodbu@pakdude-mbp /bash-92 (551) build/Release/sh --version   # GNU bash, version 3.2.53(1)-release
GNU bash, version 3.2.53(1)-release (x86_64-apple-darwin13)
Copyright (C) 2007 Free Software Foundation, Inc.
hoodbu@pakdude-mbp /bash-92 (552) sudo cp build/Release/bash /bin
hoodbu@pakdude-mbp /bash-92 (553) sudo cp build/Release/sh /bin
hoodbu@pakdude-mbp /bash-92 (554) bash --version
GNU bash, version 3.2.53(1)-release (x86_64-apple-darwin13)
Copyright (C) 2007 Free Software Foundation, Inc.
hoodbu@pakdude-mbp /bash-92 (555)

Finally, this is the indicator that my Macbook is no longer vulnerable to Shellshock:

hoodbu@pakdude-mbp /bash-92 (555) env x='() { :;}; echo vulnerable' bash -c 'echo hello'
bash: warning: x: ignoring function definition attempt
bash: error importing function definition for `x'
hello
hoodbu@pakdude-mbp /bash-92 (556)

I hope you find this useful.