How I Learned to Hate the DCI: Layer 2 data center interconnects (DCI) is still alive and well. I blame VMware with vMotion and now every other Hypervisor vendor on the planet. Live workload migration is certainly vital to most operations. The further up the stack that happens the more flexibility you have with choosing more layers beneath you to reach for in the toolkit.
The problem needing to be solved is how to move a live data workloads that a client is communicating with directly from one hypervisor (physical peice of gear) without losing TCP state and maintaining service delivery. You are essentially replicating memory and draining the session from one VM to another. To do this you have to keep the IP address the same for the client. The bandaid is overlay and tunneling mechanisms that are being spun up daily now to facilitate the extension of the broadcast domain from one location to another, without to have end to end Layer2 connectivity. Basically going against every sane networking principle we have implemented for the past 10 years.
Two huge problems over distance:
- 5ms latency threshold.
- Storage needs to be replicated between both sites.
The two figures at the bottom, depict the networking gear managing the tunneling and the hypervisors managing the tunnel. I think today I prefer the networking components handling that piece for a few stability factors such as unicast flooding etc, but that will flip fairly quickly in my opinion to distribute the load for massive scale.
IP routing scales, example below the Internet.
Figure 1. The Internet scales. Obviously debatable but it beats bridging.
We have basically two options if one wants to move live workloads between data centers. Disclaimer: trying to interconnect data centers beyond being in the same city for realtime replication is fairly risky since you begin fighting time and space with latency as you keep adding fiber miles to your DCI.
Leave the DCI on the networking hardware to nail up tunnels. MPLS pseudowires/VPLS/TP win in my mind every time. Guess why that doesn’t work for many people? The Nexus 7000 does not support VPLS. So do you add yet another layer or succumb to the OTV pre-standard of TRILL.
The second option is more interesting, not necessarily better, but for a hyper scale provider the potential is critical as we head towards open standard, XML based API cloud aaS movements like Openstack/Cloudstack. As those continue to grow the networking piece will be pushed for more flexibility in automation and provisioning.
The goals of reducing or demolishing operational staffing over the next decade or two (no-ops), will require programatic flexibility through API’s to hand the entire operational process over to an orchestration component. Today’s networking provisioning and automation isn’t quit there beyond scripting from orchestration solutions and SNMP type configurations that have never really gone anywhere. Abstract it with the rest of the stack into software on the Hypervisor softswitch and scale out and up on the same framework. Workloads and data sets are what make it tricky in my opinion, especially outside of the cloud and content provider space like a typical enterprise where you deliver every app possible at a smaller scale, thus needing a very flexible compute platform.
I still say the further up the stack you go the more options you have in most cases. Global server load balancing (GSLB) offers scale and performance. Google’s DNS play has been interesting to watch shape. today in my opinion to global scale.