Plexxi Capacity Planning Done Right
I try and get a daily dose of at least a few minutes of a PacketPushers podcast to keep up with latest networking products and trends. In Show 126 Plexxi & Affinity Networking With Marten Terpstra Greg and Ethan meet with Marten Terpstra of Plexxi and dug into the new GA product offering from Plexxi.
Plexxi Affinity Networking
Plexxi defines “affinitized traffic” as traffic that matches a proactively installed rule. That rule forwards the data plane traffic based on a TCAM rule rather than a destination mac address in a MAC address to port Key->Value mapping. This is different from OpenFlow in how a controller interacts with the switch. In OpenFlow, the first packet of the flow comes into the switch and a packet-in event is created. That event involves the switch encapsulating the first packet of the flows fields/tuples and punts it to the controller. Plexxi focuses on configuration management by pre-populating forwarding/flow tables.
Broadcom ASICs in the Switch x86 in the controller
Plexxi uses Broadcom ASICs, presumably the Trident+ or 2 chipset. Flow table scale is still a problem in today’s merchant silicon. Off the shelf top of rack geared silicon only supports around ~3,000-6,000 wildcard flows. Typical BCAM (binary content addressable memory) is used for example in mac address tables. An example of BCAM logic is a key with a value such as a mac address in a key->value pair. That key would either match or not match a value (binary). If a frame comes in destined for a mac address it is either in the tables or not. If it is not the switch floods all ports for that destination until it is learned. Ternary CAM can return either a match, a doesn’t match or something often referred to as a “don’t care bit”. The don’t care bit is denoted with a wildcard.
Example of TCAM Flow table Wildcard Matching in SDN
If I have a simple 4 tuple rule consisting of src_mac, dest_mac, src_ip and dest_ip a flow table could match:
- src_mac: *
- dest_mac: 00:00:C0:FF:FF:EE
- src_ip: *
- dest_ip: 172.16.1.20
In that example src_mac and src_ip are wild carded meaning ignore those values and the two source addresses have explicit values. That rule would take up one flow entry in a flow tables. 5,000 flow tables begin draining very quickly considering a single host can have thousands of flows at any given time. presumably, just as in OpenFlow programming and design for scale, coarse flow rules are being installed to large swaths of traffic rather than large amounts of individual flows. This is very similar to access lists in traditional networks.
Lambda Driven Fabric
The new twist that Plexxi offers is built in Wave Division Multiplexing (WDM) Coarse WDM specifically. A simple way of thinking of WDM is passing light through a prism, each color that breaks out is a color or wave with an associated frequency that is measure in nanometers (NM) in space between waves. In traditional chassis along with stackable (virtual and physical) switches control channels are setup for management and FIB or flow table downloads. That is almost always a proprietary encapsulating or messaging. I am assuming Plexxi sets up a lambda or two (if not in a physical ring topology) for controller to switch communications.
Figure 1. Plexxi Controller Architecture.
What Plexxi is
- Traffic Engineering on Steroids.
- Affordable physical overlays via lambdas.
- Layer2 Ethernet flooding and learning.
- Capacity planning in the data center with WDM.
- Network management through a proprietary API (RESTful) that will be published. That said there isn’t such thing as a “standardized” northbound API today other than the Quantum OpenStack API.
- A bus made up of Ethernet waves.
- One channel/wave/lambda.
- Broadcom ASICs.
What Plexxi isn’t
- Reactive forwarding. All flows are proactively installed into TCAM from the controller into the switch.
- Controllers are not involved in the Data Path other than the initial download or update of the switches TCAM proactive flows.
- Layer3 forwarding until Q1. May already be supported, I have an inquiry and will update when I hear back. I have also asked
- L4 port matching but it is roadmapped.
- Does it do L2-L4 header re-writes? Also will find that out and update the post.
- Does the Plexxi logic have any sense of the local/normal FIB pipeline in a hybrid fashion?
My Thoughts
I had one of those oh duh moments. This is truly bandwidth on demand. Operationally capacity planning is a train wreck today. QOS is unmanageable and unless a highly disciplined agile operations to add capacity as thresholds are exceeded. To add capacity whether scaling out in bundles or scaling up with replacing links not only requires technical efficiency but also budgetary planning.
Photonic switching which is a fancy way of saying lambda switching eliminates the need for O-E-O conversions (O being Optical and E being Electrical). Every time a frame or packet is forwarded in and out of a switch or router feed via optics that conversion takes place. Photonic switching eliminates the conversion which has economic and theoretically performance implications. Since Plexxi plumbs a wave through multiple switch hops in a ring it is an optical path from A to Z. Cabling and 3-tier, 2-tier architectures are all conceptually simplified.
As the cost of optical continues to go down, WDM should continue to be a viable solution for bandwidth scaling in data centers. Optical networking is already pervasive in WAN architectures. Fiber availability continues to become more scarce. NSPs are still relying on fiber builds from the dotcom boom of the 1990’s where possible to avoid the CapEx associated with investing in optical infrastructures. Fiber scarcity will continue to increase as mobile back hauls drains availability. NSP CTO types are bullish on SDN. I think what Plexxi is doing in the data center, is a precursor to service providers are looking for out of SDN. It is creating a multi-layer SDN strategy.
Flow based forwarding while conceptually straightforward are not easy to program. There are not piles of reusable code lying around GitHub to implement. It is networking from scratch. The nuances of ARPs, broadcasts, multicasts that need to be dynamically allowed for operations are not trivial. That is a primary reason I lean towards open source or dare I say standardized to pave the way if new abstraction layers are going to be the future.
Plexxi brings something different which is the optical twist. Promises of application awareness is a good vision but I don’t see anything from anyone yet that solves the problem at the edge of data classification through analytics. I expect other vendors especially ones that have optical business units are soon to follow. From someone who gets aggravated about saturated links that should never happen but the long funnel of scaling out or up links is often not executed in time I appreciate the simplicity of the approach. I will interested to see if Plexxi stays its current proprietary controller to hardware proactive architecture, *if* vendors loosely agree on a framework and carriers adopt multi-layer SDN strategies that integrate optical, MPLS and IP. That is the beauty of commercial silicon, the differentiation is in software along with the flexibility for vendors to adjust their architectures in software rather than heading to the foundry to re-spin ASICs. Service provider architectures scale and they buy in large volume, that s why I run MPLS in data centers, doh did I say that? 🙂
Additional Resources
More Plexxi Information:
- Show 126 – Plexxi & Affinity Networking With Marten Terpstra
- Discuss the show at the Packet Pushers Forum
- Backed by $48M, startup Plexxi unveils SDN wares – Jim Duffy
- Plexxi Website and Whitepapers
I also recommend a few normal switch deep dive podcasts since they are very similar conceptually.
- Show 118 – Juniper MX Series
- Show 64 – Catalyst 6500 Supervisor 2T Deep Dive With Cisco TME’s Patrick Warichet + Scott Hodgdon
- Show 11 – Brocade VDX 8770 – Technical Deep Dive
Some Related OpenFlow Topics:
- Show 128 – Big Switch Networks Demos Big Virtual Switch & Big Tap
- SDN Use Cases for Service Providers (NFV)
- OpenFlow Review: Traditional Network Devices-IosHints
Thanks for stopping by.
Hi Brent,
This is pretty smart write up on Plexxi. You figured out a lot more about what we are doing than most people who think we added token-ring to an Ethernet switch. A couple of quick comments:
– Lambda driven fabric – we often refer to Plexxi rings in our designs as fitting domain for the algorithms, but yes it is really a reconfigurable fabric. A fabric that is diurnal and where the orchestration can be expressed through the API from Plexxi Control.
– We built our own controller (called Plexxi Control) because the performance requirements and computational requirements are far beyond what has been attempted in prior commercially available or demonstrated control systems. We wanted the performance to scale with each switch in the network not degrade, so we federated a piece of the controller that manages local conditions and flow-setup and we centralized the planning capabilities which are abstracted by the affinity configuration. We manage state on every switch with a dedicated on-board high-performance microprocessor. Plexxi Control (the central controller part) exists out of the data path as you stated to calculate topology based on user directed affinity guidance. State and topology calculation are separate functions. In the Plexxi fabric, (i.e. ring) 100% of the capacity is used and how it is configured, deployed, reconfigured and allocated is influenced by Plexxi Control. Because the controller function is separated and distributed with the switches a Plexxi network can work autonomously without Plexxi Control.
– To say we do not reactive mode is not correct, but since we are not using OF, it is not an apples-to-apples comparison. For all residual traffic (where the user has not defined specific Affinity policies) Plexxi Control can program the switch forwarding hardware (the actual switching ASIC) to best leverage all the diverse paths across the ring (both direct optical paths, indirect switch hop paths, and combinations of the two) intelligently without using TCAM entries. The federated co-controller manages flow setup in essentially a reactive mode, it just happens to reside on the switch. For specific traffic requirements that require an end-to-end path, Control creates a path through the network to meet those needs using a combination of TCAM and reprogramming of the fabric as necessary. Both of these modes operate simultaneously so you do not have to decide about reactive or proactive ahead of time, the system manages that based on what is needed by the applications.
– We do not use OpenFlow and have some basic fundamental (philosophical) differences from the OF model. Instead of trying to abstract switch internals via an API (a la OF), our model is to provide a set of tools and inputs to users (meaning applications or orchestration management systems) to influence or directly control the behavior of the entire network. We manage the switch internals ourselves, including managing TCAM entries directly, etc. These internals are not exposed to the user except via our Affinity controls, which by their nature are much higher level controls – they allow the user to describe workload characteristics that will influence the underlying network connectivity.
– We think of applications as programs that use the network. Applications can express needs, desires, preferences to the network via Plexxi Control. Well done on your part figuring this out and yes we can be smart how we populate TCAM tables because we have a central control and state control on switches.
/wrk
Hi Bill, appreciate you taking time to clarify some of the salient points.
“Reactive” is certainly relative today. Interesting about the CPU/SOC glued on board, I didn’t hear that in the PPP post. Must say I am curious about that now (payload?). Also wondering if the pipeline between modes in a hybrid config can be flipped, fast-path 1st, slow-path 2nd or vice versa depending on optimization.
From a altruistic standpoint, it would be nice to see fragmentation north of the controller rather than south of the controller but I suppose its all flexible with commodity parts and can be adjusted if there is consolidation on wire protocol X.
I am interested in learning more about the CP/DP algorithm. @buraglio and I are going to chat with Mat for some Q&A.
Thanks for the extra info and comments.
Respect,
-Brent
Interesting system to be sure, but for the same price couldn’t I build a full-bisection-bandwidth electrical fabric and just not worry about bandwidth? This seems like a case where the money saved by innovation is exactly offset by the cost of that innovation.
Wes – the cost of a full bisection bandwidth does not necessarily scale linearly so depending on what part of the scale spectrum we are talking about makes a bit different. Certainly for 10G hosts at any interesting scale costs become prohibitive for full bisection very quickly. We also see lots of different proposed physical topologies to get better scale but with moderate oversubscription but many of these schemes have better characteristics for specific applications forcing customers to have different physical topologies and manage them separately.
I would also question the ease at which any of these fabrics are managed and controlled. Making sure a full bisection network provides full bisection at L2 isnt straightforward and at L3 creates segmentation issues that may not be tenable for some environments.
The network wr are talking about offers variable oversubscription controlled by the needa of applications at the cost of a mid-OSR conventional fabric.