Today I’m extremely happy to announce the launch of the Rafay Certification program – the industry’s first and only multi-cloud Kubernetes operations certification. This is a unique program for platform teams, infrastructure engineers, SREs, and application developers to develop competencies in application modernization using Kubernetes.
Let’s face it. Kubernetes is difficult! Enterprises are finding it difficult to translate the skills learned in the Certified Kubernetes Administrator (CKA) exam offered by CNCF to large scale environments. It’s hard enough managing a single cluster. To manage hundreds of clusters with enterprise-grade best practices along with governance and security can be a daunting task.
But don’t hate k8s. This is where Rafay helps. And this is where the Rafay Certification program will help. It provides ongoing education to enable customers in their digital transformation initiatives by gaining rapid efficiencies from Kubernetes.
We have partnered with Credly by Pearson to build this program. Credly provides a digital badging platform that comprises 95% of the top IT certifications. Many of our customers already use Credly to flaunt their hard-earned accomplishments with other vendors. The Rafay Certified Associate is the latest entry there.
Recently I made the decision to join Rafay Systems. I had been in Enterprise IT for over two decades (all in networking), and most recently at multicloud networking pioneer Aviatrix Systems. So what made me want to join Rafay? In a nutshell – application modernization.
Although Multicloud Networking has grown to the point where Gartner now has a formal definition for the Multicloud Networking Software market, it’s important to remember that networking will always need to respond to the needs of modern application development. It’s always playing catch-up with the app.
I had always been fascinated by technologies that enable Enterprises to build applications with greater agility, whether they are in the cloud, for IoT, or for 5G. And containerization provides provides exactly this along with other benefits, such as:
Continuous integration, development, and deployment
Loosely coupled microservices
Cloud and OS distribution portability
However, what containerization, alone, does not offer is:
Load Balancing
Automated rollouts and rollbacks
Self-healing
This is where Kubernetes fits in. Kubernetes (k8s for short) is a framework for achieving all of the above goals and more to build modern applications.
However, k8s has a very steep learning curve. Here are the components of Kubernetes cluster:
None of these components are optional! At a small scale (such as POC level), managing this complexity is hard enough. However, when an Enterprise decides to take the plunge, they often find themselves falling down a slippery slope to a bottomless pit. Here are just a few reasons why:
Multicloud reality – While each of the CSPs have their own flavor of managed Kubernetes services, none is incentivized by multicloud support. I’ve previously written what the big deal about multicloud networking is. Taking a step back, Enterprises face the same challenge with operationalizing apps in multiple clouds. How do you perform lifecycle management of cluster types across all clouds, be it private or public?
Lack of centralized policy management controls – While there are native k8s constructs for network and security policy, they lack unified definition and enforcement across fleets of clusters. How do you configure enterprise-grade policies that can be enforced across all Kubernetes infrastructure while allowing for centralized detection and reporting of policy violations?
Limited Role Based Access Control (RBAC) – The kubectl CLI tool does not provide RBAC by default. Executed commands are not logged by user account and generally speaking, kubectl is difficult to access outside firewalls. Moreover, using it to manage entire fleets is cumbersome and error-prone. How do you ensure that developers, QA, DevOps, and Ops/SREs teams have the right access based on their roles and responsibilities?
Here’s how Rafay solves the above problems:
Lifecycle Management of any Kubernetes cluster type, be it Public (EKS, AKS, or GKE) or Private Cloud On-Premises. Rafay provides a single pane of glass for Operations teams to deploy, manage, and upgrade all of an Enterprise’s Kubernetes clusters across all environments from a single console access.
Centralized governance through cluster Blueprints. These ensure that clusters are always in compliance with company policies. Blueprints allow centralized configurations for cluster standardization that can encompass security policies, software add-ons such as service mesh, ingress controllers, monitoring, logging and backup and restore strategies.
Zero-Trust Access. This service enables controlled, audited access for developers, SREs, and automation systems to the Kubernetes infrastructure. It integrates tightly with enterprise-grade RBAC/SSO solutions and is continuously validated for security configuration and posture to ensure compliance.
These are just a few of the rich suite of turnkey services that the Rafay Kubernetes Operations Platform provides.
Networking will always hold a special place in my heart and I’ll still get some of that exposure at Rafay. However it will be less with BGP AS Path Prepend and more with CNI plugins.
At the end of the day, technology features and benefits are one thing, but what really excites me is what app modernization ultimately means for organizations. We all use many of these modern apps every day. Enterprises build them for a number of critical business reasons, such as to serve customers, leverage cloud computing, and better compete in the market.
To date, Kubernetes has been more of a hurdle than an enabler. Rafay’s goal is to change that and help make Kubernetes the accelerator to modernization that it was intended to be. And that movement, to me, is worth joining.
I’m absolutely thrilled to begin my journey in the exciting world of enterprise-grade Kubernetes operations management with Rafay!
Recently news broke out about Windows Server introducing support for Docker. This is significant because the ultra hot company had previously only been supported on Linux (and Azure). One of the major complaints about it was the lack of flexibility when it came to host operating system support. With this news Microsoft also announces that it will be contributing to Docker’s open source APIs. What a remarkable change from a company that epitomized closed systems.
I’m excited to announce today that Microsoft is partnering with Docker, Inc to enable great container-based development experiences on Linux, Windows Server and Microsoft Azure. Docker is an open platform that enables developers and administrators to build, ship, and run distributed applications. Consisting of Docker Engine, a lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments. Earlier this year, Microsoft released support for Docker containers with Linux on Azure. This support integrates with the Azure VM agent extensibility model and Azure command-line tools, and makes it easy to deploy the latest and greatest Docker Engine in Azure VMs and then deploy Docker based images within them. – Scott Guthrie, executive vice president of the Microsoft Cloud and Enterprise group.
Earlier this week, news broke out on SDNCentral about a new startup called SocketPlane that integrates Docker containers with Open vSwitch (OVS). Docker is one of the hottest areas in enterprise tech these days. At the OpenStack SV event last month, Mirantis CEO Adrian Ionel, said that Docker had had 20 million downloads in the past four months mainly due to its ease of use and its benefits to developers. He showed a screenshot of Google Trends with ‘Docker’ compared against ‘Virtualization’. That picture is recreated below.
One of the co-founders of SocketPlane is Brent Salisbury, who has a network engineering background in academia before joining Red Hat earlier this year. In recent years he got more involved in the Open Daylight (ODL) project and is arguably the most well known network engineer-turned-coder. His blog has a wealth of information on hands on guides for installing and integrating OVS, OpenStack, and ODL, which I’ve referred to frequently. Two other prominent contributors to ODL, Madhu Venugopal and Dave Tucker, are the other co-founders of SocketPlane.
I had listened to a Class C Block podcast on ODL in November 2013, in which Venugopal and Salisbury spoke at length of their involvement with the project. Definitely worth a listen if you have the time.
I recently watched the videos for NFD8. This blog post is about the presentation made by Nuage Networks. As an Alcatel-Lucent venture, Nuage focuses on building an open SDN ecosystem based on best of breed. They had also presented last year at NFD6.
To recap what they do, Nuage’s key solution is Virtualized Services Platform (VSP), which is based on the following three virtualized components:
The Virtualized Services Directory (VSD) is a policy server for high level primitives from Cloud Services. It gets service policies from VMware, OpenStack, and CloudStack and also has a builtin business logic and analytics engine based on Hadoop.
The Virtualized Services Controller (VSC) is the control plane. It is based on ALU Service Router OS, which was originally developed 12-13 years ago and is deployed in 300,000 routers, now stripped to be relevant as an SDN Controller. The scope of Controller is a domain, but it can be extended to multiple domains or data centers via a BGP-MP federation, thereby supporting IP Mobility. A single availability domain has a single data center zone. High availability domains have two data center zones. A VSC is a 4-core VM with 4 GB memory. VSCs act as clients of BGP route reflectors in order to extend network services.
The Virtual Routing and Switching module (VRS) is the Data Path agent that does L2-L4 switching, routing, and policies. It integrates to VMware via ESXi, XEN via XAPI, and KVM via libvirt. The libvirt API exposes all the resources needed to manage the support of VMs. (As a side, you can see how it comes into play in this primer on OVS 1.4.0 installation I wrote a while back.) The VRS gets the full profile of the VM from the hypervisor and reports that to the VSC. The VSC then downloads the policy from the VSD and implements them. These could be L2 FIBs, L3 RIBs/ACLs, and/or L4 distributed firewall rules. For VMware, VRS is implemented as a VM with some hooks because ESXi has a limitation of 1M pps.
At NFD8, Nuage discussed a recent customer win that demonstrates its ability to segment clouds. The customer was a Canadian Cloud Service Provider (CSP), OVH, that has deployed 300,000 servers in its Canadian DCs. OVH’s customers can, as a beta service offering, launch their own clouds. In other words, it is akin to Cloud-as-a-Service with the Nuage SDN solution underneath. It’s like a wholesaler of cloud services whereby multiple CSPs could businesses could run their own OpenStack cloud without building it themselves. Every customer of this OVH offering would be running independent Nuage’s services. Pretty cool.
Next came some demos that address following 4 questions about SDN:
Is proprietary HW needed? The short answer is NO. The demo showed how to achieve Hardware VTEP integration. In the early days of SDN, overlay gateways proved to be a challenge because they were needed to go from the NV domain to the IP domain. As a result VLANs needed to be manually configured between server-based SW gateways and the DC routers – a most cumbersome process. The Nuage solution solves that problem by speaking routing language, uses standard RFC 4797 (GRE encapsulation) on its dedicated TOR gateway to tunnel VXLAN to routers. As covered in NFD6, Nuage has three approaches to VTEP Gateways:
Software-based – for small DCs with up to 10 Gbps
White box-based – for larger DCs based on standard L2 OVSDB schema. In NFD8, two partner gateways were introduced – Arista and the HP 5930. Both feature L2 at this point only, but will get to L3 at some point.
High performance-based (7850 VSG) – 1 Tbps L3 gateway using merchant silicon, and attaining L3 connectivity via MP-BGP
How well can SDN scale?
The Scaling and Performance demo explained how scaling in network virtualization is far more difficult than scaling in server virtualization. For example, the number of ACLs needed grows quadratically as the number of web servers or database servers increases linearly. The Nuage solution breaks down ACLs into abstractions or policies. I liken this to an Access Control Group, whereby ACLs fall under an Access Control Group. Another way of understanding this is Access Control Entries being part of an Access Control List (for example, an ACL for all web servers or an ACL for all database servers) so that the ACL is more manageable. Any time a new VM is added, it is a new ACE. So, policies are pushed, rather than individual Access Control Entries, which scales much better. Individual VMs are identified by tagging routes, which is accomplished by, you guessed it right, BGP communities (these Nuage folks sure love BGP!).
Can it natively support any workload? The demo showed multiple workloads including containers in their natural environments without being VMs, i.e. bare metal. Nuage ran their scalability demo on AWS with 40 servers. But instead of VMs, they used Docker containers. Recently, there has been a lot of buzz around Linux containers, especially Docker. The advantage containers hold over VMs is that they have much lower overhead (by sharing certain portions of the host kernel and operating system instance), allow for only a single OS to be managed (albeit Linux on Linux), have better hardware utilization, and have quicker launch times than VMs. Scott Lowe has a good series of writeups on containers and Docker on his blog. Also, Greg Ferro has a pretty detailed first pass on Docker. Nuage CTO Dimitri Stiliadis explained how containers are changing the game as short-lived application workloads are becoming increasingly prevalent. The advantages that Docker brings, as he explained, is to move the processing to the data rather than the other way round. Whereas typically you’d see no more than 40-50 VMs on a physical server, the Nuage demo had 500 Docker container instances per server. So there were 20,000 container instances total. And they showed how to bring them up along with 7 ACLs per container instance (140K ACLs total) in just 8 minutes. That’s 50 containers or VMs per second! For reference, in the demo, they used an AWS c3.4xlarge instance (which has 30GB memory) for the VSD, a c3.2xlarge for the VSC, and 40 c3.xlarge instances for the ‘hypervisors’ where the VRS agents ran. The Nuage solution was able to successfully respond to the rapid and dynamic connectivity requirements of containers. Moreover, since the VRS agent is at the process level (instead of the host levels with VMs), it can implement policies at a very fine control. Really impressive demo.
How easily can applications be designed?
The Application Designer demo here showed how to bridge the gap between app developers, and infrastructure teams by means of high level policies to make application deployment really easy. In Packet Pushers Show 203, Martin Casado and Tim Hinrichs discussed their work in OpenStack Congress, which attempts to formalize policy-based networking so that a Policy Controller can abstract high level, human-readable primitives (which could be HIPAA, PCI, or SOX as an example), and express them in a language to an SDN Controller. Nuage confirmed that they contribute to Congress. The Nuage demo defined application tiers and showed how to deploy an WordPress container application along with a backend database in seconds. Another demo integrated OpenStack Neutron with extensions. You can create templates to have multiple instantiations of the applications. Another truly remarkable demo.
To summarize, the Nuage solution seems pretty solid and embraces open standards, not for the sake of lip service, but to solve actual customer problems.
Musings on How to Build Modern Applications in the Cloud and Onprem