All posts by Umair Hoodbhoy

Challenges that Plexxi Faces

This week, the Lean Startup conference was held in San Francisco. The Lean Startup philosophy borrows its roots from the lean management method of manufacturing, with Kanban or Just-in-time processing at the center of the design principle. It basically advocates startups to minimize outside funding, not strive for the perfect product (think Minimal Viable Product), be flexible (think Pivot), and cater the product completely to the customer’s needs, all with the goal of being a highly efficient company.

However, as noted investor Marc Andreessen, who spoke at the conference, warns,

Not all startups can be Lean Startups

Indeed, some startups cannot afford to employ a Pivot. Infrastructure or hardware companies come to mind, especially when they’ve already taken $50 million of investment. This is what Plexxi has done without a complete solution to show for yet.

I was listening to the recent Packet Pushers show #126 sponsored by Plexxi. While their approach is a creative one, I’m not sure whether it is viable. In a nutshell, Plexxi brings optical technology, in the form of WDM, to the Data Center and flattens traditional hierarchical network designs. When I first started learning about network designs, the classical approach was the 3-tier Core-Distribution-Access model. In the mid-2000s this got reduced to a Collapsed Core. What Plexxi proposes is a flat topology, eliminating the need for Core switches in the Data Center.

Plexxi adopted the SDN approach of a programmable controller (a virtual appliance) that pushes policies to its switches. The policies are intended to optimize data path flow for affinitized traffic. Applications that are more sensitive of certain resources are classified in Affinity Networks. Some example of the constraints or sensitivities that Plexxi’s Director of Product Management, Marten Terpstra, described include:

  • Hop-count
  • Bandwidth

Plexxi switches use merchant silicon (Broadcom ASICs) to form an Ethernet ring on top of a WDM lightwave. By changing lambdas, Layer-1 connections between switches can be changed according to the application requirements.

Plexxi uses their own closed APIs for communication between their switch interfaces and their controller, in order to convey their message of affinities. However, they open up their proprietary northbound API for user-to-controller communication so users can write scripts, for example, by using REST APIs. Interestingly, they are a member of Open Network Foundation. The Controller places TCAM entries in switches based on application requirements for affinitized traffic.

Terpstra discussed two use cases:

  1. Affinitized iSCSI traffic for most bandwidth with least number of hops
  2. Cloud provider – Use a Plexxi ring as a premium service to affinitize traffic.

In neither case are the results mentioned.

Okay, so so far Plexxi’s solution is a 1 RU box that can prioritize traffic based on hop-count and bandwidth. I fail to see much of a business case there. Any network engineer worth his or her salt will tell you that there is more to traffic classification and prioritization than just hop-count and bandwidth. Financial trading institutions would be more concerned about latency guarantees. Hop Count alone is a flimsy criterion to classify important traffic, regardless of whether a cute term like Affinity Network is given to that classification. High Availability is a critical issue that a ring topology exacerbates. As Doug Gourlay of Arista mentions, unnecessary downtime is introduced any time you add new nodes because the ring is broken. Moreover, the network is reduced to a split brain model in the even of just two nodes going down. Depending on the Controller placement, this could have adverse outcomes. The thing about outages is that we can never control where they occur. Gourlay rightly puts it:

I thought Token Ring died for good reasons… why is someone trying to bring it back?

Getting back to the Lean Startup idea, Terpstra said “Our Layer-3 affinities are coming”. Plexxi is targeting Christmas 2012 for 1.0 version of Layer-3 capabilities. Until then Plexxi only has a Layer-2 switch with no quotable value to show for $50 million in investment. Not a good time to Pivot.

Reports of the death of the Core switch in the Data Center have been greatly exaggerated.

Cisco UCS Manager – Orchestration Done Right

Earlier this month Cisco announced General Availability of UCS Central – a manager of managers. Given Cisco’s string of failures when venturing outside its comfort zone (think Flip and Cius), few expected anything different when Cisco entered the servers market in 2009. But instead it has been a resounding success. UCS has become the fastest growing business in Cisco history with over fifteen thousand customers and over a $1 billion annual revenue. It is already the #2 blade server market share leader in North America. Why has it done so well?

I believe one of the main reasons is that its management software, UCS Manager, delivers one thing that SDN also promises – High Level Orchestration. Before UCS Manager, administrators, such as Chris Atkinson, had to write their own scripts to configure and maintain BIOS configs, RAID configs, and firmware configs. With UCS Manager, this information is kept in a Service Profile. When a blade dies, the blade does not need to be removed and ports do not need to be updated. Just move the Service Profile to a new blade via software. Need to repurpose a physical server for a different department? Just create a new Service Profile and reassign the blade server to that Service Profile without recabling or moving metal, all within a matter of seconds. You would expect such agility with moving virtual machines, not physical machines!

In one of my past lives I used to work very closely with CiscoWorks. UCS Manager does a far better job at managing servers than CiscoWorks did at managing routers and switches. CiscoWorks was rigid and far too dependent on CLI (Telnet/SSH) and SNMP MIB modules for accessing and managing devices. UCS Manager, with its single-wire management and XML API, is more flexible and integrates with third party systems management tools, which allows for more agile deployments. My understanding is that the same APIs are opened up in UCS Central. Let’s see whether it can be as successful as UCS Manager.

Death to the CLI

One of the selling points of Cisco’s Nexus 1000V virtual switch is that it provides network administrators with a familiar way of configuration, namely the CLI. The Nexus 1000V is built on NX-OS and is accessible via SSH. This was intended to give network engineers the familiar look and feel to perform their duties that they didn’t have with the hypervisor’s native vSwitch.

I understand the need for separation of duty and that is what any dedicated management interface of switch provides. And I appreciate that the Nexus 1000V offers many rich features that most soft switches don’t, such as L2/L3 ACLs, link aggregation, port monitoring, and some of the more advanced STP knobs like BPDU Guard. What I don’t appreciate is clinging on to an archaic mode of configuration.

When I took my CCIE lab, Cisco provided a single reference CD-ROM, known as UniverCD or DocCD. Many tasks required knowledge of esoteric commands. One of the first steps any competent test-taker would take would be to use the alias command to define short cuts. For example, show ip route might become sir. Network engineers often take great pride in the aliases they define and the Expect/Perl/Python scripts they’ve written to automate tasks. They rave about the amount of time saved. Of course all of this would break when new CLI commands were created by the vendor that conflicted with existing aliases.

In one of my past roles I was one of five engineers who used to frequently make firewall rule changes to ASAs. All of us were CCIEs, but none of us used the CLI to make the changes. Instead we preferred to use ASDM, the GUI element manager. Sure it was buggy and handled concurrent changes poorly, but at least the changes made were accurate. Adding a single rule isn’t as simple as adding a single line. In most cases you have to edit object groups and make sure there are no overlapping or conflicting rules. Trusting a human to do this accurately every time is like trusting someone to have a 5-hour daily drive for work and never get into an accident.

There is a smarter way to do configuration management. Make the network programmable. Offer APIs to developers that are stateful and intelligent. Obviously, the rebuttal from Nexus 1000V loyalists is that engineers are familiar with NX-OS and would therefore be more comfortable with the CLI. But that’s a step in the wrong direction. When I look back at how much time gets wasted by network engineers in creating simple automation tasks such as macros, I realize this is one of the reasons networking has lagged behind compute technologies. Network engineers should not have to write their own scripts to make their own lives easier. Applications should be doing this for them. Let the network engineers focus on their job, which is optimizing how packets need to get sent from source to destination – as quickly, reliably, and securely as possible.

SDN – What’s in a Name? Part 3

This is the third part of my series of posts on trends of vendors to latch on to the SDN bandwagon. For more information, refer to parts 1 and 2. In this post I discuss how a few of the services vendors have responded to the buzz around Software Defined Networking.

WAN Optimization

Riverbed claimed it was riding the SDN wave at VMworld 2012 when it announced its latest release of Cascade. However, they went by the second definition I used in Part 1, which says that SDN decouples physical and virtual networks or overlays (not Control Plane and Data Plane as the other definition emphasizes). The difference is subtle. Riverbed partnered with VMware to develop the IPFIX record format that can provide VXLAN tenant traffic information as well as VLAN tunnel endpoint information. Thus, they claim, Cascade is SDN-ready because it is VXLAN-aware.

Silver Peak also pumped its chest at VMworld 2012 with its Agility announcement, which is a plug-in for VMware vCenter. Agility allows administrators to enable acceleration between workloads within vCenter using the vSphere interface that server administrators are familiar with. This requires Silver Peak’s Virtual Appliance to already exist within vCenter. Almost three months after the announcement, details are extremely thin. Silver Peak has been drooling about Nicira ever since the SDN champion was acquired by VMware. Indeed, all you have to do is Google Nicira and Silver Peak and observe all the enthusiasm that Silver Peak has shown for Nicira. But the feelings are not mutual. Silver Peak claims it is working with Nicira and leveraging Nicira’s Open vSwitch, but Nicira/VMware have made no such announcements. In fact, there are no further details about this relationship on Silver Peak’s own website.

Exinda is so far behind on the SDN learning curve that the only mention of SDN on its website is a hyperlink to the SDN Wikipedia page in a blog post written by the VP of Marketing that reported his observations from Interop 2012. Clearly Exinda has a long way to go before its SDN strategy can be taken seriously.

Load Balancers

As of VMworld 2012F5‘s Big-IP products can support native VXLAN functionality and will be have VXLAN virtual tunneling endpoint capabilities in the first half of 2013. What that exactly means is vague at this time. The press statement I linked to is the only mention of Big-IP’s current SDN capabilities. My guess is that they’ve opened up some APIs to VMware to allow programmability.

Embrane uses an under-the-cloud approach of offering cloud providers a platform that delivers elastic Layer 4-7 network services to their customers. The services include Load Balancer, Firewall, and VPN. Embrane’s heleos architecture is a radical solution that comprises the Elastic Services Manager (a provisioning tool) and the Distributed Virtual Appliance, the latter being a logical network services container instantiated across a pool of general-purpose x86 servers. The issue likely to raise eyebrows is that each service that is part of their platform is a wrapper around an open source distribution. I haven’t heard of too many providers willing to vouch for ipchains as a Firewall.

Firewalls

Palo Alto Networks earned its stripes by making its firewall appliances with merchant silicon. To  stay ahead in the SDN era, it announced a technology partnership with Citrix in October 2012, but has not yet released a product offering.

Big Switch Networks, a leading SDN player, announced, on November 13, 2012, its ecosystem of Technology Alliance Partners that included Palo Alto Networks. However, Palo Alto Networks has not mentioned this on their website, which is odd given that partnerships are what have made Palo Alto the hugely successful company it is today. One would expect them to be on top of it.

So there are no SDN-friendly firewalls currently on the market other than Cisco’s Nexus 1000V portfolio, which includes VSG and ASA 1000V Cloud firewall.

Summary

It appears, from these observations, as though partnerships are key to ousting incumbents in the SDN world. Much of SDN support at this time is just hype, but sellable products will come out soon. The industry also needs the open source movement to challenge the VMware-centric ecosystem to enable a higher level of interoperability and allow for more flexible orchestration and programmability. OpenStack is that approach. More to follow in future posts.

SDN – What’s in a Name? Part 2

In Part 1 of this series I outlined two of the more commonly accepted definitions of SDN. In this post I discuss how pure play networking vendors have tried to create solutions and package them as SDN.

Cisco announced onePK, a developer kit for their new Open Network Environment (ONE), which, in turn, they announced at Cisco Live this year. onePK is yet to actually be released as of this post. It essentially is a set of APIs that developers can use to interact with their Cisco gear instead of the Northbound and Southbound APIs that I referred to in Part 1. In the onePK APIs, an Open Flow agent can run in IOS, IOS XR, or NX-OS as speak with an Open Flow controller on the ‘north’ side and the openPK API on the ‘south’ side. As you can surmise, this leaves the Control Plane and the Data Plane still in the Cisco device. The reason for Cisco to do this are quite clear: Cisco feels threatened by SDN’s potential.

The biggest networking news item in 2012 was VMWare’s $1.26 billion acquisition of Nicira. Nicira was, after all, the pioneer of Open Flow and the SDN movement. People began to realize that after a decade of slow progress, networking was finally growing up. It manifested the networking industry’s readiness to keep up with server virtualization. However, that didn’t mean that VMWare started outselling Cisco overnight. Contrary to popular belief, the biggest revolution to hit the networking industry in the past five years is not Software Defined Networking. It is the advent of merchant silicon.

Merchant silicon is the reason why firewalls such as Palo Alto Networks, WAN Optimization Controllers such as Infineta, and data center switches such as Arista can exist. By using off-the-shelf silicon, they can deliver superior value by focusing on software. Pure-play giants like Cisco, who have invested a lot time and money in custom ASICs, are seeing their margins plummet because competitors can offer comparable value for much lower prices. Recently, Alcatel-Lucent outbid Cisco by $100 million to win a network infrastructure refresh project with the 23-campus California State University. Clearly trends like SDN, VM mobility, or DCI are not high priorities for everyone.

The insecurity that Cisco feels from SDN is the reason they want to to remain at the center of the ecosystem. With Cisco onePK, control remains on the Cisco device as Omar Sultan of Cisco describes. Thus, the controller that communicates with an Open Flow agent is quite different than the centralized controller envisioned by the Open Network Foundation (ONF). Cisco will make several announcements of other environments, platforms, and products that are iterative changes in reality to demonstrate that they are playing along. However, they will not release control of their market share by, for example, making a dumb switch running an Open Flow agent, and whose forwarding tables can be manipulated by standards.

In October 2012, Cisco acquired vCider as part of their SDN strategy, specifically to enhance their involvement in OpenStack. Of course, there is also a Cisco spin-off Insieme, now rumored at over 150 employees dedicated to building SDN solutions and platforms from ground up.

Brocade, another pure play networking giant, claimed their November 2012 acquisition of Vyatta was an SDN win. Brent Salisbury agrees. However, as Greg Ferro put itthe products are not SDN today.

In Part 3, I will wrap up this series of posts on vendors who have claimed SDN compliance by  discussing some vendors that focus on Services, such as WAN Optimization, Firewalls, and Load Balancers.