Challenges that Plexxi Faces

This week, the Lean Startup conference was held in San Francisco. The Lean Startup philosophy borrows its roots from the lean management method of manufacturing, with Kanban or Just-in-time processing at the center of the design principle. It basically advocates startups to minimize outside funding, not strive for the perfect product (think Minimal Viable Product), be flexible (think Pivot), and cater the product completely to the customer’s needs, all with the goal of being a highly efficient company.

However, as noted investor Marc Andreessen, who spoke at the conference, warns,

Not all startups can be Lean Startups

Indeed, some startups cannot afford to employ a Pivot. Infrastructure or hardware companies come to mind, especially when they’ve already taken $50 million of investment. This is what Plexxi has done without a complete solution to show for yet.

I was listening to the recent Packet Pushers show #126 sponsored by Plexxi. While their approach is a creative one, I’m not sure whether it is viable. In a nutshell, Plexxi brings optical technology, in the form of WDM, to the Data Center and flattens traditional hierarchical network designs. When I first started learning about network designs, the classical approach was the 3-tier Core-Distribution-Access model. In the mid-2000s this got reduced to a Collapsed Core. What Plexxi proposes is a flat topology, eliminating the need for Core switches in the Data Center.

Plexxi adopted the SDN approach of a programmable controller (a virtual appliance) that pushes policies to its switches. The policies are intended to optimize data path flow for affinitized traffic. Applications that are more sensitive of certain resources are classified in Affinity Networks. Some example of the constraints or sensitivities that Plexxi’s Director of Product Management, Marten Terpstra, described include:

  • Hop-count
  • Bandwidth

Plexxi switches use merchant silicon (Broadcom ASICs) to form an Ethernet ring on top of a WDM lightwave. By changing lambdas, Layer-1 connections between switches can be changed according to the application requirements.

Plexxi uses their own closed APIs for communication between their switch interfaces and their controller, in order to convey their message of affinities. However, they open up their proprietary northbound API for user-to-controller communication so users can write scripts, for example, by using REST APIs. Interestingly, they are a member of Open Network Foundation. The Controller places TCAM entries in switches based on application requirements for affinitized traffic.

Terpstra discussed two use cases:

  1. Affinitized iSCSI traffic for most bandwidth with least number of hops
  2. Cloud provider – Use a Plexxi ring as a premium service to affinitize traffic.

In neither case are the results mentioned.

Okay, so so far Plexxi’s solution is a 1 RU box that can prioritize traffic based on hop-count and bandwidth. I fail to see much of a business case there. Any network engineer worth his or her salt will tell you that there is more to traffic classification and prioritization than just hop-count and bandwidth. Financial trading institutions would be more concerned about latency guarantees. Hop Count alone is a flimsy criterion to classify important traffic, regardless of whether a cute term like Affinity Network is given to that classification. High Availability is a critical issue that a ring topology exacerbates. As Doug Gourlay of Arista mentions, unnecessary downtime is introduced any time you add new nodes because the ring is broken. Moreover, the network is reduced to a split brain model in the even of just two nodes going down. Depending on the Controller placement, this could have adverse outcomes. The thing about outages is that we can never control where they occur. Gourlay rightly puts it:

I thought Token Ring died for good reasons… why is someone trying to bring it back?

Getting back to the Lean Startup idea, Terpstra said “Our Layer-3 affinities are coming”. Plexxi is targeting Christmas 2012 for 1.0 version of Layer-3 capabilities. Until then Plexxi only has a Layer-2 switch with no quotable value to show for $50 million in investment. Not a good time to Pivot.

Reports of the death of the Core switch in the Data Center have been greatly exaggerated.

Advertisements

4 thoughts on “Challenges that Plexxi Faces”

  1. My two cents…I think you sell their solution somewhat short, because not every possible detail that could have been discussed was covered in a 45 minute podcast. Plexxi did bring up latency as another prioritization element you can apply to an affinity, and I believe it came up when we recorded? (I won’t say for certain, as I’ve had a few conversations with them, and I muddle what we talked about with them in the podcast vs. what I’ve talked to them about in person.) Financials are one of the verticals they are targeting and that has shown interest in Plexxi.

    As I understand it, the Plexxi controller is a place to get started – an example of what is possible, while not being limiting or defining. The use-cases Marten brought up were a couple of simple ones to get the idea across of what could be done, as opposed to a complete list of capabilities.

    As far as the ring topology goes, there is redundant functionality built in for when a break occurs, I believe. I recall speaking to them about that, but as it was months ago, I don’t remember the details. IIRC, there is more complexity to the ring than “if it breaks, there’s an outage.” That’s worth a question to Plexxi, though. Doug’s extremely personable, but I’ve heard him present enough to know that he has no qualms using hyperbole or generalization to denigrate competitor solutions. And while he’s usually quite funny when he lets fly a zinger, he obviously has an agenda that will bias his assessments. His comments at GigaOm are typical of that approach. (Does he really think designers would build a single Plexxi domain to handle multiplied thousands of 10G ports? Or that the current Plexxi solution is even intended to fit in the spaces where such a thing would be a requirement? Of course he doesn’t. Hyperbole.)

    RE: split-brain. In your estimation, what elements are at risk, assuming 2 breaks in the ring? What devices will assert themselves, and in what role, to cause an issue? As currently an L2 topology only, I don’t see much of an issue here (other than the obvious host isolation and/or confusion if MLAG is split between an isolated and non-isolated switch). A Plexxi switch can forward independent of a controller. As you summarized, it’s just an Ethernet switch with a WDM optical interconnect and TCAMs that can take flow entries that are similar to (but not in fact) OpenFlow. Depending on what the L3 implementation looks like, a segmented L3-capable ring could present a larger concern in my mind.

    I’m coming off like a Plexxi apologist here, and I don’t mean to be. I think they have an uphill battle to climb as they are a displacement technology. But I do think it’s an interesting solution. I think WDM is useful as employed and the concept could be scaled up rather large in future switches without too much imagination. I think eliminating a layer of switches *is* plausible in certain scenarios, especially in pods or small, predictable deployments. And I think the orchestration/API functionality is obviously useful, if not especially a differentiator for the long haul.

  2. Thanks for visiting my blog. I agree, it is an interesting solution and I appreciate the controller-based model. I see that Plexxi has written a detailed riposte to Doug’s comments on GigaOM. They do clarify some concepts for me. In some ways Plexxi should be thanking Doug for his comments. These are concerns that I’m sure many people have. Networking evolves and weaker protocols such as Token Ring get phased out by the more durable ones such as Ethernet. People would want to know why ring-like topologies are back.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s