Introducing ACE Cloud Operations

Recently Aviatrix developed a new course in the Aviatrix Certified Engineer (ACE) program. Aviatrix Certified Engineer – Multi-Cloud Network Operations (or ACE Cloud Ops for short) is catered towards cloud operations practitioners who need to successfully run, operate, and manage business-critical Day-2 workloads in the cloud.

The ACE program recently announced its 10,000th certified engineer. That’s a phenomenal achievement considering our stretch goal for the year 2020 was only 500. It’s amazing how Covid 19 has resulted in expanding our reach to hundreds of students per week.

ACE Cloud Ops takes a unique view on operating cloud infrastructure, which is necessarily different from operating on-prem infrastructure.

Operations in the On-Prem World

In the On-prem world, enterprises own the underlay. They have full control over traffic patterns and have a familiar toolkit regardless of what vendor they use on-prem.

Of course some tools, such as SNMP died away, but ICMP-based tools such Ping and Traceroute are still going strong 40 years after RFC 792. IP doesn’t go away when you move to the cloud and neither should the network engineering toolkit.

Key skills for Infrastructure Operations engineers include:

  • Hardware (knowledge of cables, transceivers, switches, routers, racks, real estate, physical security, power, cooling)
  • Layer 2 (Spanning Tree is the worst use of an Operations Engineer’s time)
  • OSPF, BGP
  • Repeatability achieved by scripting tools such as Expect (which is really screen-scraping), Shell, Perl, Python (still invaluable). This is not true automation.

Capacity planning in the on-prem world often involves ordering the right number of spares to plan for outages, so that there is some form of high availability, although it does result in higher RPOs and RTOs.

We all know the financial benefits (when done well) of moving apps to the cloud. But while it offers great agility for Developers (you can  spin up a database within minutes), networking has been slow to catch up. Moreover, as we see a rapid shift towards multi-cloud, Operations teams are left on their own without guidance.

Operations in the Cloud World

Operations engineers have a harder time doing their job because of the lack of toolsets afforded to them by Cloud Service Providers (CSPs). Each CSP has proprietary tools that are intended to keep their customers locked into their cloud. Moreover, networking is not a source of revenue for CSPs. They don’t make networking easy and their networking tools are, simply put, not enterprise-ready. 

For example, consider what it takes just to view a route table in Azure. An intuitive approach would be to list the routes from the VNet or at least have a direct link to it. However, you would be mistaken into thinking that way.

Instead, buried in a list of connected devices in that VNet, you have to select the appropriate NIC, which may have an obscure ID.

Next, you have to select an even more obscure term called ‘Effective Routes’

Only then can you see the routes.

It is a very clunky approach to a routine task in the On-prem world. Of course the problem grows exponentially when having to deal with the oddities of each cloud when the enterprise goes multi-cloud. Each CSP abandons the networking toolkit and offers their platform as a blackbox to Operations teams.

When moving to the cloud, an Operations Engineer must have these new skills at a minimum:

  • Agile mindset
  • Infrastructure as Code (read Terraform)
  • CI/CD
  • VCS

Capacity planning takes place with cloud-native principles, such as elasticity and auto-scaling. It requires a new way of thinking, not just for Developers, but also for Operations teams. 

ACE Cloud Ops

The ACE Cloud Ops course better equips Cloud Operations teams  to run a multi-cloud network in their daily jobs. It builds on the immensely popular ACE program with some of the most common use cases we see our customers when operating in any cloud:

  • How to Ensure Business Continuity with an Enterprise-class Transit Solution
  • How to Strengthen Compliance and Audit Initiatives by providing Monitoring and Troubleshooting for Cloud Security Appliances
  • How to Efficiently Connect Remote Sites to Cloud
  • How to Improve your Cloud Egress Security posture
  • Best Practices for Platform Operations Management
  • DevOps for Network Engineers

There are also hands on labs focused on break-fix scenarios that are based on this topology:

The source code of the Terraform that built this topology is here.

ACE Associate is a pre-requisite for ACE Cloud Ops. 

Submit interest for taking ACE Cloud Ops here.

Sponsored Post Learn from the experts: Create a successful blog with our brand new courseThe WordPress.com Blog

WordPress.com is excited to announce our newest offering: a course just for beginning bloggers where you’ll learn everything you need to know about blogging from the most trusted experts in the industry. We have helped millions of blogs get up and running, we know what works, and we want you to to know everything we know. This course provides all the fundamental skills and inspiration you need to get your blog started, an interactive community forum, and content updated annually.

AppIQ – Unprecedented visibility that Aviatrix CoPilot brings

Earlier in my career, I worked as a Network Engineer in the high-frequency trading industry at a capital market exchange. It was the time when electronic trading was gaining heavy momentum as open outcry was receding. This was thanks mainly in part to vendors such as Arista who leveraged merchant silicon from Broadcom to lead the charge of low-latency networking.

Scores of trading firms would set up their equipment in one of the exchange’s many data centers inside the building to practice latency arbitrage. Speed was the name of the game and livelihoods were hedged on the network’s ability to pass packets as quickly as possible.

In the early days, any time there was a significant delay (could be as low as 1-2 seconds), the exchange would get hit with hefty fines. However, if we could prove that it was not the fault of the network, but rather the application that caused a trade to execute slowly, then we were off the hook. So my team invested in several network taps and sniffers from NETSCOUT and Gigamon to perform forensic analysis on these low-latency, high-throughput financial systems.

But there were never enough taps. Taps allowed us to pinpoint the location and cause of delays and retransmissions if we were lucky enough to have placed them at the exact spot in the network where the delay was incurred. It was like a playing a game of whack-a-mole. Providing evidential data was a nightmare in those days. There was such little visibility.

Did I mention we owned the entire network?

Fast forward to public clouds today which are complete black boxes. They provide very little visibility and the network has no way to prove it is not at fault because there have been no tools that are able to extract meaningful data until Aviatrix CoPilot came along. It already had the ability to display NetFlow records to provide such empirical data. Take this screenshot as an example.

If I were to see a flow with a few SYNs coming in, for example, I could use that information to ask the Application team whether everything is okay on their end. Or if I see a SYN followed immediately by a RST, that might point in the direction of a firewall blocking something. Or maybe if PSH packets are going through fine and data is being passed for a while, it might be another indication of the network doing its job and the application developer needing to be pulled in. It’s a very powerful feature.

But with the new AppIQ feature released this week in CoPilot, visibility is taken to the next level. AppIQ allows you to generate a comprehensive report of latency, traffic, and performance monitoring data between any two cloud instances connected via your Aviatrix transit network, such as shown here with an SSH test.

Now you can see latencies on a hop-by-hop basis. AWS us-east-1 (N. Virginia) to us-east-2 (Ohio) regions are about 12 ms away on average. And each of those green links represents an encrypted tunnel.

End-to-end encryption in the cloud with the visibility: that’s what every network engineer dreams of having.

Building a Multi-Cloud Network for less than $1 an Hour – Aviatrix Kickstart

This is the post I had been meaning to write for ages. How do you leverage Infrastructure as Code to build a multi-cloud network? It turns out you don’t have to write the code yourself. This is the beauty of Aviatrix Kickstart.

For less than $1 an hour, I was able to build a multi-cloud transit network with 2 spoke VPCs in AWS, 2 spoke VNets in Azure, a transit VPC in AWS, a transit VNet in Azure, and a peered connection between the two. Oh, and also 2 EC2 test instances in AWS for various tests. All within an hour.

The quickest way to build this would have been to script it in Terraform. But with Kickstart, a containerized environment handles this for you. So you don’t need to have Terraform skills or write any code.

Kickstart is a Docker image that you can download from here. All you need to run it are:

  • Docker. You can do this either by installing Docker Desktop on your client PC (Windows or Mac) or running it on an AWS EC2 Linux 2 AMI.
  • An AWS account and optionally an Azure account. I built the entire environment in AWS and Azure Free Tier accounts.

Once you run docker run -it aviatrix/kickstart bash, the script walks you through the process and allows you to configure the region, names of the resources, and CIDRs of the Aviatrix Controller as well as the Multi-Cloud Network Architecture (MCNA) Transits. It then leverages Terraform to issue several Day-Zero activities. Aviatrix Kickstart makes it so easy that I was able to build the following architecture in less than an hour without writing a line of code.

Check out these resources for more details:

  • Guide to building the environment
  • 10-minute video of demo
  • Test cases to try out once you’ve built the environment

Why I Joined Aviatrix

Earlier this month I joined Aviatrix Systems as a Solutions Architect with a focus on growing the Aviatrix Certified Engineer (ACE) program. I had gone through a journey of 2 years of immersing myself in Public Cloud platforms from training sites, such as A Cloud Guru and Linux Academy. Here are some of my observations during that period which led to my decision to join Aviatrix:

  • Cloud Networking is radically different from on-premises networking. For example,
    • In the on-prem world, network architects designed in layers (Core, Aggregation/Access). The world of Public Cloud is flat in order to meet the pace of DevOps.
    • Security principles, such as Defense-in-Depth have led to new constructs, such as IAM, Accounts, Organizations, Subscriptions, which were not prevalent in the on-prem world.
    • Cloud Vendors try their best to abstract the networking underlay constructs so that networking is represented as a black box to the cloud architect. To a certain extent they’ve done well (who honestly misses Spanning Tree?), but just because they don’t offer a mechanism to view these constructs, it doesn’t mean they no longer exist. In fact, Operations needs better visibility now than they did in the on-prem world.
  • While Cloud Vendors offer Networking Specialty certifications, they don’t provide any visibility into Day 2 Operations. And from an Architecture perspective, they trivialize the networking underlay. For example, they don’t provide solutions to real-world problems like overlapping subnets or end-to-end visibility.
  • Cloud vendors are incentivized by lock-in and have no real motivation for multi-cloud.
  • Enterprises find it easier to interpret multi cloud mostly in terms of governance and billing rather than infrastructure.
  • Cloud Training platforms such as A Cloud Guru and Udemy completely lack multi-cloud networking offerings. They have training courses on various cloud-first tools and technologies like Terraform, CloudFormation, Deployment Manager, Docker, Kubernetes, and certification courses for AWS, Azure, and GCP. But when it comes to multi cloud let alone multi cloud networking, they have not yet capitalized on the opportunity.
  • Enterprises need better instruction on the need for multi-cloud networking. Often when Enterprises say they need Cloud Infrastructure Architects, they really mean Cloud Application Architects. Yet, when they cross that bridge of multi-cloud (and they almost inevitably will), then they realize that application performance relies on a rock solid transit. And that is where Aviatrix shines.

Aviatrix is the pioneer in multi-cloud networking and is solving a really hard problem the right way – by simplifying. I’m looking forward to sharing some more of my learnings with you as I embark on this new journey.

What’s the Big Deal About Multi-Cloud Networking – Part 2

If you were experiencing issues with Zoom calls today, you were not alone.

But if you take a close look at today’s outage, it is clear that it was correlated with an AWS outage today.

In fact, most of Zoom runs on AWS, according to AWS. This is despite Oracle’s claim that millions of users run Zoom on Oracle Cloud. Zoom didn’t state the cause of the outage, but it is quite possible from these two charts that a well-architected transit network, such as the Aviatrix Multi-Cloud Network Architecture, could have prevented this outage.

EtherealMind

Software Defined & Intent Based Networking

ipSpace.net Blog Posts

Musings on Cloud, Multi-Cloud, Networking