In light of recent IaaS provider outages, it is easy to understand that organizations are hesitant to move critical infrastructure into the cloud. Yet, the flexibility and potential cost savings are too attractive to just dismiss. So, how can a responsible organization go about moving part of its network and server infrastructure into the cloud, without exposing itself to undue risk and without putting all eggs into a single IaaS provider’s basket?
The answer is to pursue a multi cloud strategy: Use more than one cloud provider. For example, instead of having all your servers with Amazon EC2, also have some with Rackspace. Or at least, have servers in multiple EC2 geographic regions. Ideally, configure one to be the backup for the other, so that in case of failure a seamless and automatic switch over may take place. This, however, is not made easy, due to proprietary management interfaces for each provider, and because traditional network tools often cannot be used across provider networks.
In this article, we will show how we can accomplish rapid failover of cloud resources, even across provider network boundaries. We use a simple example to show how to construct a seamless and secure extension of your enterprise network into the cloud, with built-in automatic failover between servers located in different cloud provider’s networks.
Setup and tools overview
For our example, we bring up two servers, one in the Amazon EC2 cloud, the other in the Rackspace data center. This demonstrates the point of being able to cross provider boundaries, but of course, if you prefer you could also just have a setup in multiple geographic regions of the same provider.
In addition, we use vCider’s virtual network technology to construct a single network – a virtual layer 2 broadcast domain – on which to connect the two servers as well as agateway that will be placed into the enterprise network. The gateway acts as the router between the local network and the virtual network in the cloud, securely encrypting all traffic before it leaves the safety of the corporate environment.
Finally, we use Linux-HA’s Heartbeat to configure automatic address failover between those servers in the cloud, in case one of them should disappear. The IP address failover facilitated by Linux-HA requires the cluster machines to be connected via a layer 2 broadcast domain. This rules out deployment on IaaS providers like Amazon EC2 and others, which do not offer any layer 2 networking capabilities. However, as we will see, it works perfectly fine on the layer 2 broadcast domain provided by vCider.
The following graphic summarizes the configuration:
Figure 1: Seamless IP addresses failover of cloud based resources: Client requests continue to be sent to a working server.
In figure 1, we can see an enterprise network at the bottom, with various clients issuing requests to address 220.127.116.11. This address is a “floating address”, which may fail over between the two servers at Rackspace and Amazon EC2. The gateway machine (in green) is part of the enterprise network as well as the vCider network and acts as router between the two. We will see in a moment that the address failover is rapid and client request continue to be served, without major disruption.
Step 1: Constructing the virtual network
After bringing up two Ubuntu servers, one in the Amazon EC2 cloud and one at Rackspace, we now construct our virtual network. If you do not yet have an account with vCider, please go here to create one now.
Go to the download page to pick up our installation package and follow the instructions there. In our example, we install it in two cloud based nodes and in one host within our enterprise network. We configure them into a single virtual network with the 172.168.1/24 address range. After we are done, the network looks like this in the vCider control panel:
Figure 2: The vCider control panel after the nodes have been added to the virtual network.
We can see here that virtual IP addresses have been assigned to each node, which can already start to send and receive packets using those addresses.
Step 2: Configuring IP address failover (installing Heartbeat)
Our floating IP address will be managed by Linux-HA’s Heartbeat, a well established and trusted solution for high-availability clusters with server and IP address failover. Thanks to vCider’s virtual layer 2 broadcast domain, Heartbeat can finally also be used in IaaS provider networks that do not natively support layer 2 broadcast.
Heartbeat requires a little bit of configuration. Let’s go through it step by step:
Add your hosts to the
/etc/hosts file and change hostname
First, add these two entries to your
/etc/hosts file on BOTH your server hosts:
This allows us to configure Hertbeat by referring to the cluster nodes via easy to remember server names. Please note that we are using the vCider virtual addresses here. Now change the hostname on each node via the
$ hostname rackspace_server
Do the same on the Amazon EC2 server, respectively.
On your Rackspace and Amazon EC2 server, install Heartbeat:
$ sudo apt-get install heartbeat
On the Rackspace server, create the configuration file
/etc/ha.d/ha.cf with the following contents:
ucast vcider0 172.16.1.2
Note that we are listing both our cluster nodes by name and refer to the IP address of the other node, as well as ‘vcider0′, the name of the vCider network device. For more details about the Heartbeat configuration options, please refer to Heartbeat’sdocumentation.
On the second node, the Amazon EC2 server, create an exact copy of this file, except that the ‘ucast vcider0′ line should refer to the IP address of the Rackspace server. like so:
We now need to establish the authentication specification for both cluster nodes, so that they know how to authenticate themselves to each other. Since all communication on a vCider network is fully encrypted and secured, and since with vCider we can easily cloak our network from the public Internet, we use a simplified setup here, which saves us the creation and exchange of keys. Please create the file
/etc/ha.d/authkeys on both cluster nodes, with the following contents:
Then set the permissions on these files:
$ sudo chmod 600 /etc/ha.d/authkeys
Heartbeat can now be started (on both nodes):
$ sudo /etc/init.d/heartbeat start
Heartbeat comes with its own configuration command:
$ sudo crm configure edit
In the text editor, you will see a few basic lines. Edit the file, so that it looks something like this:
node $id="923cbacf-00af-4b6c-a8ca-2e4aae780038" amazon-ec2-server
node $id="aca44dea-e6f8-4c4c-94ab-087568c32e36" rackspace-server
primitive ip1 ocf:heartbeat:IPaddr2 \
params ip="172.16.1.99" nic="vcider0:1" \
op monitor interval="5s"
primitive ip1arp ocf:heartbeat:SendArp \
params ip="172.16.1.99" nic="vcider0:1"
group FailoverIp ip1 ip1arp
order ip-before-arp inf: ip1:start ip1arp:start
property $id="cib-bootstrap-options" \
In particular, make sure that your two cluster nodes are mentioned and that you define the two ‘primitives’ for the IP address and the IP failover, which mention our floating IP address 172.16.1.99. Also note the definition of the ‘FailoverIp’ group and the ‘order’. Use the
crm_mon command to ensure that Heartbeat reports both cluster nodes in working order.
Within a few seconds, you will notice that Heartbeat has configured the floating address on one of the cluster nodes. You can see it as an alias on the vcider0 network device when using the
ifconfig command. This is also reflected in the vCider control panel:
Figure 3: Heartbeat has configured the floating IP address on one of the cluster nodes, which is now shown in the vCider control panel.
Allowing gratuitous ARP to be accepted
A so-called ‘gratuitous ARP’ packet is sent out by Heartbeat in case of an IP address failover. This packet updates the ARP cache on the gateway machine (and any other attached device, for that matter). The ARP cache is what allows a network connected device to translate an IP address to a local layer 2 address, which is needed for delivery of packets in the local network. Normally, an ARP request is sent by a host in order to learn a local machine’s MAC address. The response back then updates the ARP cache of the sender. A gratuitous ARP, however, is an unsolicited response, sent to the layer 2 broadcast address in the LAN and this seen by all connected devices. It updates everyone’s cache, even without them having to ask for it.
Acceptance of gratuitous ARP packets is disabled by default on most systems. Therefore, on our gateway machine, we need to switch it on. As root, issue this command:
# echo 1 > /proc/sys/net/ipv4/conf/all/arp_accept
Dealing with send_arp
Normally, Heartbeat would be ready to go at this point. However, there is a a small problem with the ‘send_arp’ utility, which comes as part of Heartbeat. This utility is used during IP address failover to send the gratuitous ARP packet as a layer 2 broadcast to all other devices connected on the layer 2 network, in order to inform them about the new location of the floating IP address. A small bug in send_arp prevents it from working 100% correctly under all circumstances. The Heartbeat developers have recently fixed this issue, but that fix is not in all distros’ repositories yet. For example, Fedora 16 already uses the latest version by default, while Ubuntu 11.10 still does not have this fix. Therefore, to be absolutely sure, we simply replace send_arp with a similar utility, called ‘arping’. Just follow these steps:
$ sudo apt-get install iputils-arping # 'yum install arping' on RPM systems
$ sudo mv /usr/lib/heartbeat/send_arp /usr/lib/heartbeat/send_arp.bak
Create a new file in
/usr/lib/heartbeat/send_arp and give it execute permissions. The content of this file should be:
$ARPING -U -b -c $REPEAT -I $INTERFACE -s $IPADDRESS $IPADDRESS
Step 3: Testing our setup
All is configured and in order now, so let’s test the failover. First, log into the gateway machine and have a look at the ARP cache, using the
arp -n command:
Figure 4: The ARP cache on the gateway machine before failover.
We see that the MAC address (‘HWaddress’) for the floating IP address is the same as for 18.104.22.168, which in our example is the MAC address of the vCider interface on the Rackspace server. We will cause an IP address failover in a moment, by issuing a command to put one of the cluster nodes into standby mode. But before we do so, start a
ping 172.16.1.99 either on the gateway node or on one of the enterprise clients. We will observe what it does during the failover.
While the ping is running, let’s log into any one of the cluster nodes in a different terminal and let’s take down the node that currently holds the floating IP address:
$ sudo crm node standby rackspace-server
If you now take a look at the ongoing ping output, you see something like this:
Figure 5: Even during IP address failover, the floating IP address can still be reached.
The red arrow marks the moment at which we took down the first cluster node and Heartbeat moved the floating IP address over to the second node. Our gateway machine is located on the US east coast, the Rackspace server in Chicago, while the EC2 server is at the US west coast. Because the floating IP addresses failed over to a node that’s further away, the network round trip time naturally went up. The key takeaway point, however, is that the floating IP address remained accessible to clients, even as it failed over into the cloud of another IaaS provider!
To see what happened, we can take a look again at the ARP cache of the gateway:
Figure 6: The gateway’s ARP cache after failover.
In figure 6 we highlighted the new MAC address of the floating IP address, showing that it is now identical to the vCider interface’s MAC address of the second cluster node. As a result, all clients connected in the vCider virtual network are able to continue to access the floating IP address.
One aspect we have left out of this blog for brevity’s sake is the replication of content across the failover servers. To accomplish a transparent failover, these machines have to be able to serve identical content to clients. Naturally, the same server software needs to be installed and the content replication itself can be accomplished in a number of ways: Static file duplication, maybe via rsync, a full cluster setup of whatever application or database you are running, or even block level replication via DRBD or similar. This is a concept we hope to explore more in a future blog post.
Linux-HA with Heartbeat is a trusted, reliable high-availability cluster solution, which can ensure continuous availability of resources even in the face of server failure. Key to this functionality is the seamless, rapid update of everyone’s ARP cache via a gratuitous ARP packet, which is sent on the local network as layer 2 broadcast.
In cloud networks, and especially across geographic regions and providers, you do not have a local network and therefore, such an IP address failover is normally not possible. Without a layer 2 broadcast domain, Heartbeat is not able to update ARP caches in case of an IP address failover, which results in service interruption until the ARP cache entries finally time out, thereby restricting applicability of Linux-HA in cloud environments and limiting administrators’ ability to setup high-availability clusters.
Because vCider provides a true layer 2 broadcast domain, sending the gratuitous ARP is possible again, no matter where the nodes in the virtual network are located. Therefore, Linux-HA with Heartbeat can now be used to facilitate IP address failover, even across geographic regions or IaaS provider network boundaries.
Big, Flat Layer 2 Networks Still Need Routing
[ 0 ]March 9, 2011 | Chris Marino
There’s a lot of talk these days about how the solution to all networking problems simply boils down to flattening the network and building Big Flat Layer 2 Networks (BFL2Ns). Juniper is one of the most explicit about their strategy to flatten the network, but all the vendors are making a big deal about they layer 2 strategies.
Big Flat Networks
The appeal of this approach is very compelling. Layer 2 is fast and supports plug-and-play administrative simplicity. Furthermore, some of the most compelling virtualization features including live VMotion require layer 2 adjacency (although there is some confusion on this point) and the single-hop performance it provides is critical for converged storage/data networks.
What you don’t read much about, though, is that there were some very good reasons to split layer 2 broadcast domains and most of those reasons have not gone away. The problems invariably involve some kind of runaway multicast or broadcast storm. Most network admins have experienced the frustration of struggling to figuring out the root cause of these kinds of floods.
When you stretch layer 2 over the WAN, you’re asking for trouble since this relatively low bandwidth link would be the first to go.
Part of the problem is the simplistic nature of how links are used at layer 2. The Spanning Tree Protocol (STP) prevents forwarding loops, but at the same time can funnel traffic into hotspots that can easily overwhelm a single link. TRILL, the proposed replacement as well as other vendor-specific technologies address some of these problems and are already part of vendors’ flat layer 2 roadmaps.
From a technical perspective, all this makes sense and big, flat layer 2 networks have the potential to deliver their promised goals. Nevertheless, I remain skeptical. But for reasons that are completely unrelated to the technology.
I’ve read about technical problems of big layer 2 networks based on the increased likelyhood of failure due to unintentional broadcast storms and other error conditions. While these risks are certainly real, I haven’t read anything that could not be solved with good engineering.
No, what troubles me about these kinds of networks is how they will be used and the expectation of those that deploy them. By this I mean, when you actually go and build one of these BFL2Ns, you’re very likely going to need to segment it into several Smaller, Nearly Flat Networks, bringing you back almost to where you started.
Why? Already, today’s smallish layer 2 networks are routinely segmented into VLANs. What people are looking for from these BFL2Ns is the flexibility not only to segment it into VLANs, but also to enable every endpoint to potentially gain access though any edge device. VMotion from anywhere, to anywhere is a simple way to think about this.
And this will be possible with a BFL2N simply by making every port a trunking port for every VLAN. But if you think about that for a minute, you’ll notice that once you do that, you’ve undermined the VLAN segmentation that you started with and have essentially built giant LAN.
Of course, if you don’t want that, then you’ll prune them, and restrict parts of your BFL2N from other parts by not trunking some VLANs certain places, and that should work just fine.
But don’t forget there is a name for this kind of networking: Routing.
I’m sure there are other administrative benefits from building a BFL2N, but you also have to remember that there are lots of benefits to layer 3 networks as well. And you’ve probable already got one of those.
Why Should I Care More About OpenFlow than Quantum?
[ 3 ]October 24, 2011 | Chris Marino
That’s the question I want answered this week at theApplied OpenFlow Symposium.
I hope I get a good answer.
I was not able to attend last week’s Open Networking Summit, but since some of thepresentations are on YouTube already, I’m going to try to watch them all. I just finished watching Martin’s presentation and it provided a nice historical background on OpenFlow and a great perspective on how it fits in the broader vision of Software Defined Networking. I highly recommend it.
The slide that says it all is shown around 20:50. There Martin says:
OpenFlow is ‘a’ interface to the switch. And in a fully built system it’s of very little consequence. If you changed it, nothing would know.
I’ve recreated one of Martin’s slides here to show how he illustrated this. As you can see SDNs can be implemented through a variety of mechanisms, with OpenFlow or not.
This should not come as a surprise to anyone that’s been following OpenFlow, but its a point that I think has gotten lost in all the media attention surrounding SDNs.
One of the things I wanted to learn more about at the Open Networking Summit was what work is being done on these other interfaces? It seems to me that the application interface is much more important to users than the internal protocol to the hardware. Maybe some of the other presentations will address this directly.
I know companies are working on them. Cisco, VMware for sure. BigSwitch Networks has been talking about these kinds of applications for a while now and Nicira is one of the major contributors to OpenStack’s Quantum Network API.
It seems to me that a standard Quantum-like API would be more valuable toward achieving the objectives of a SDN, than OpenFlow.
Virtual Networks can run Cassandra up to 60% faster….
[ 0 ]September 19, 2011 | Chris Marino
In my previous post I described some of the challenges in running a noSQL database like Cassandra on EC2 and how a virtual network could help. Presented here is a performance comparison between running Cassandra on EC2 using the native interfaces vs. interfaces on a private, vCider virtual network.
Cassandra performance is generally measured by how fast it can insert into the database new key-value pairs. These pairs become rows in a column and can be any size, although they are typically pretty small, often less than 1K. Database performance is frequently tested with small record sizes to isolate database performance and not the performance of the I/O, disk or network components of the system.
However, in this case we want to isolate the impact of network performance on overall database performance. So we measured performance over a range of network intensive runs until other bottlenecks began to appeared. Our network tests ran with increasing column sizes and replication factors. The larger data size runs began to show disk and I/O bottlenecks when there were more than 256 bytes (with a replication factor of 3) so we varied column widths from 32 up to a maximum 256 bytes.
One important point to note: Database performance is highly application dependent, and users often tune their environment to optimize performance. Our tests are in no way comprehensive, but do illustrate the performance impact of a virtual network on the system within the range of configurations and column sizes actually tested. Obviously, for other situations that are either more or less network bound, your performance may vary.
We set up a 4-node Cassandra cluster in a single EC2 region. For our baseline measurements we configured it with the Listen, Seed and RPC interfaces all on the node’s private interface. The client was a fifth EC2 instances running a popular Cassandra stress test load generator. The client was running in the same EC2 availability zone.
For the virtual network configurations, we set up the Seed and RPC interfaces on one virtual network, and the Listen interface on a second. This gave us the flexibility to run tests where we could encrypt only the node traffic while letting client traffic remain unencrypted. The interfaces on the virtual network are actually virtual interfaces. so in reality they all use the private interface on the instance as well.
For the tests we chose to run unencrypted, fully encrypted, and node-only encrypted networks. We also ran tests inserting a single copy of the data (R1, replication factor = 1) and again when writing 3 copies (R3, replication factor 3) spread across 3 nodes in the cluster. Since the nodes are responsible for replicating data, running with R3 generates a lot of internode traffic, while with R1, most of the traffic is client traffic.
The summary results for running a 4 node cluster are:
Cassandra Performance on vCider Virtual Network
Replication Factor 1 32 64 128 192 256 byte cols.
v. Unencrypted: 8.2% 0.8% -2.3% -2.3% -6.7%
v. Encrypted: 63.8% 55.4% 60.0% 53.9% 61.7%
v. Node Only Encryption: -0.7% -5.0% 1.9% 5.4% 4.7%
Replication Factor 3 32 64 128 192 256 byte cols
v. Unencrypted: -4.5% -4.7% -5.8% -4.5% -1.5%
v. Encrypted: 31.5% 29.6% 31.4% 27.3% 29.9%
v. Node Only Encryption: 3.8% 3.9% 6.1% 8.3% 4.0%
The complete data and associated charts are available here.
There is tremendous EC2 performance variability and our experiments tried to adjust for that by running 10 trials for each column size and averaging them. Averaged across all column widths, the performance was:
Replication Factor 1
v. Unencrypted: -3.7%
v. Encrypted: +59%
v. Node Only Encryption: +1.3%
Replication Factor 3
v. Unencrypted: -4.2%
v. Encrypted: +30%
v. Node Only Encryption: +5.2%
As you might expect, the performance while running on a virtual network was a little slower than running on the native interfaces.
However, when you encrypt communications (both node and client) the performance of the virtual network was faster by nearly 60% (30% with R3). Since this measurement is primarily an indication of the client encryption performance, we also measured performance of the somewhat unrealistic configuration when only node communications were encrypted. Here the virtual network performed better by between 1.3% and 5.2%.
The overall improvement for the virtual network from -4.2% to -3.7% for unencrypted R3 v. R1 is understandable since R3 is more network intensive than R1. However, since the vCider virtual network performs encryption in the kernel (which seems to be faster than what Cassandra can do natively) when encryption is turned on, the virtual network performance gains are greater with R3 since more data needs to be encrypted.
We expect similar performance characteristics across regions. However, these gains will only be visible when the cluster is configured to hide all of the WAN latency by requiring only local concurrency. The virtual network lets you assign your own private IPs for all Cassandra interfaces so the standard Snitch can be used everywhere as well.
Once we finish these multi-region test, we’ll publish them too. We’ll also put everything in a public repository that includes all Puppet configuration modules as well as the collection of scripts that automate nearly all of the testing described here.
So, netting this out, if you’re running Cassandra in EC2 (or any other public cloud) and want encrypted communications, running on virtual network is a clear winner. Here, not only is it 30-60% faster, but you don’t have to bother with the point-to-point configurations of setting up a third party encryption technique. Since these run in user space, its not surprising that dramatic performance gains can be achieved with the kernel based approach of the virtual network.
If you are running on one of the other popular noSQL databases, similar results may be seen as well. If you have any data on this, we’ll love to hear from you. If you want to try vCider with Cassandra or any other noSQL database you can register for an account at my.vcider.com. Its free!
VEPA Loopback Traffic Will Overwhelm the Network
[ 1 ]January 31, 2011 | Chris Marino
In my previous post I wrote that under most reasonable assumptions about east-west traffic patterns and virtualization density growth, VEPA loopback traffic could overwhelm your network. Here are some numbers that illustrate the point.
Let’s say you had 24 applications, each requiring 12 servers
for deployment for a total of 288 workloads. Lets also assume that each server required 0.5G of bandwidth for acceptable performance. For simplicity, lets further assume that this 0.5G is distributed uniformly among the 11 other servers required for the application (this is a bad assumption, but we’ll relax it later to show the conclusion remains the same).
We now virtualize this environment in a rack of 12 3U vHosts, each with 4 10G NICs. Each system has 24 cores (6 sockets @ 4 cores/socket) running only one VM per core, for a total of 288 VMs (one for each workload). In this configuration, each NIC supports 6VMs, providing 1.67Gbps (10G/6VMs) per VM of network bandwidth capacity.
So far, all is good. 1.67G capacity per VM when the app only uses 0.5G. This is even more than the 1G capacity that the NICs had when they were running on physical systems before virtualization. But now lets take a look at how much of that traffic gets looped back.
Best case, there could be zero loopback. This would occur when one workload from each app was assigned to one of the 12 available hosts (i.e. app workloads uniformly distributed across hosts). Since all traffic is between workloads running on other hosts, nothing needs to be looped back.
Worst case would be when an entire app is run on a single host and all traffic would have to loop back. Since 24 cores can run 24 workloads, two complete applications can run within a single host. Each workload produces 0.5G of traffic, two complete apps would produce 12Gbps (24 x 0.5G). Looping that back would require a total of 24 Gbps. Spreading that across 4 10GE NICs would consume 24G/40G or 60% of total capacity. Everything still looks fine.
Here the uniform traffic distribution assumption does not matter since even with 100% looped back there is excess network capacity.
Now fast forward a year or two to when virtualization densities grow so that you can run 48 cores on each system and each can run 2 VMs. Instead of 12 vHosts, you now only need 3. Here, with uniform traffic, the best case would assign 4 workloads from each app on each host (again, distributing the workloads uniformly across the hosts). Each of these workloads would need to loopback 3/11 of their traffic totaling 1.1Gbps per app (4 workloads x 3/11 *0.5G total x 2 for loopback). With 24 apps, that would total about 26.2Gbps, just for the loopback traffic.
That’s more than 65% of the peak capacity of the 4 10GE NICs on the system. The rest of the traffic would require 1.45Gbps per app (4 workloads x 8/11 *0.5G total) or 34.9GB. Together this is over 61Gbps or more than 150% of the peak theoretical performance of the NICs! And that’s the best case.
Now lets say you were lucky enough to have a traffic pattern that allowed for the possibility of distributing the VMs so that there was zero traffic looped back. Although highly unlikely, this situation might be possible with a multi-tiered apps where workloads communicated with only a small subset of other workloads and workloads were deliberately placed to reduce loopback traffic. Even then, the total bandwidth necessary per app would be 2Gbps (4 workloads * 0.5G), or 48G total for the host. Still more than 120% of peak capacity.
Of course you can add more servers and/or more NICs to get this all to work, but that reduces virtualization density and increases the required number of switch ports. To get back to the numbers of the first example you would need to quadruple the number of NICs, negating all the benefit of higher virtualization densities on the network.
You can play around with the numbers, but no mater what you do, as virtualization densities grow, the amount of network capacity consumed by hairpin traffic is going to quickly dominate. To me, the only reasonable way to address this is for the hosts to be smarter about how they handle network traffic.
We’re very pleased to have been selected for the GigaOm Structure Launchpad Event, where we’ll be one of a handful of start-ups introducing new solutions for the cloud. What’s our solution?
We’re building the industry’s first on-demand multi-layer distributed virtual switch for the cloud. Using our switch, you will be able to connect all of your systems, wherever they may be located, in a single layer 2 broadcast domain. This gives the network-address control and security you’re used to having in your data center, but now in a cloud or hybrid infrastructure.
Today, cloud providers don’t offer layer 2 connectivity among their systems. Worse yet, even if they did, you wouldn’t be able to extend your internal LAN out to the cloud without all sorts of complicated gateways and NAT devices along the way. And if you wanted to just connect systems between IaaS providers, you’re pretty much dead in the water.
With our solution you’ll be able to build a virtual network that can do all of this. If you’re interested in trying it our, send an email to email@example.com and we’ll send you an invite to our beta users program as soon as we’re ready.
Secure Virtual Network Gateway for Hybrid Clouds
Secure access to your vCider VPC is provided through a Virtual Network Gateway.
A Virtual Network Gateway is an on-premises system that has been added to a virtual network. The gateway system must also be on the local enterprise network that requires access to the VPC.
Setting up a virtual network gateway is fast and easy.
vCider software automatically detects which physical networks are accessible from each system on the virtual network. The vCider Management Console presents a list of all these potential network connections. Through the console, the user then select the system (or systems) to be configured as a gateway.
Once the gateway is specified, that system is configured to route packets from the secure encrypted external virtual network on to the physical network it is connected to.
vCider then automatically configures all the other systems on the virtual network with routes that specify the gateway system as the path to the enterprise LAN.
Internal to the enterprise, the firewall must be configured to enable access to the appropriate networks and a route must be specified.
Once the Virtual Network Gateway has been configured, all that remains is to ‘cloak
‘ the virtual network to complete the creation of a Virtual Private Cloud (VPC).
Moving to the cloud? Virtual networks keep you in control
[ 0 ]July 13, 2011 | Juergen Brendel
Multi-tier applications, cluster setups, failover configurations. These are all daily concerns for any server or network administrator responsible for deploying and maintaining non-trivial projects. Yet, it is exactly these things, which are greatly complicated when you consider moving your applications to cloud-based IaaS providers, such as Amazon EC2. It is somewhat ironic that in those heavily virtualized environments one of the most important types of virtualization is largely missing: The virtualization of the network topology. Since we at vCider provide solutions for the creation of virtualized networks, I want to take a moment and talk about why you need the ability to create your own virtual network topologies and how this can greatly simplify your life when you are moving into the cloud.
In the cloud, you have no control over the network
There are two fundamental problems. First, there is the simple fact that you don’t have any control over the IP address assignment when you start a machine instance on Amazon or Rackspace. You don’t control the network or how the IaaS provider is using it. Second, because you don’t control the network, you also cannot set up your own layer 2 broadcast domains either. Let’s take the example of simply wanting to setup a duplicate or clone of your on-premise network, maybe for redundancy and availability. The clone should be an exact replica of your original network, if possible.
This is difficult for a number of reasons. Let’s start with the IP addresses, over which you normally do not have any control in most IaaS environments. Some allow you to fix your public addresses (such as Amazon’s “elastic IP addresses”), but realistically there are usually only very few public ‘entry points’ to your site. Most of your site’s or network’s internal traffic will use internal IP addresses. Your front-end server uses an internal, private address to connect to your application server, your application server uses an internal address to connect to your database server, and so on. You want to use internal addresses for communication wherever possible: In most IaaS environments it is faster, cheaper and possibly also more secure to do so. But you can’t just configure those addresses in your web-server, application server or database configuration files, since whenever a machine reboots, its addresses are randomly chosen for you – most likely not even from the same subnet. People try to work around this by using dynamic DNS, with each node having to register itself first, but then of course you are adding yet another moving piece to the puzzle, further complicating your architecture.
This point about subnets leads us to the second issue: Your inability to create layer 2 broadcast domains. If you had your own data center, you would control the network. You would deploy switches to create layer 2 networks, and define specific routing rules for traffic between those layer 2 networks. In other words: There you can create a network topology of your choice. But in the cloud, you have lost this ability. For the most part, everything has to be routed, you need to live with whatever topology the IaaS provider offers you.
Why is it important to control your network?
But why is that a problem? Or to turn the question around: Why do you need or want the ability to assign your own addresses or create your own layer 2 networks? There are several good reasons.
If you control IP address assignment, it greatly simplifies configuration of your site. For example, you can just refer to you application or database servers by well-known, static yet private IP addresses, which may appear in various configuration files or scripts. In fact, with a reliable IP address assignment you can simply maintain a single, static /etc/hosts file for many of your servers. But as mentioned earlier, the dynamic addresses of Amazon EC2 instances will change with every reboot. So, what address do you write down in your configuration files? As mentioned, dynamic DNS is sometimes proposed as a solution here, or even on-the-fly re-writing of server configuration files, but who wants to deal with such an unnecessary complication?
Virtual network topologies for performance and security
If you control the network topology, you can create broadcast domains as a means to architect and segment your site, with controlled routing and security between your layer 2 networks. Being able to create broadcast domains therefore is a means to secure the tiers of your site, to keep broadcast traffic at manageable levels, to reflect your site’s architecture in the underlying network topology and of course to deploy cluster software that relies on the presence of a layer 2 network. Without this ability, you are restricted to the use of security groups (on Amazon) to secure different sections of the site, which are a good start but which of course are not a 100% replacement for stateful firewalls that could be deployed as routers between actual layer 2 networks. And naturally, you still don’t get all the other benefits of actual layer 2 networks.
Ability to move your network to different clouds
Different IaaS providers offer different means and capabilities for securing a network, expressing the architecture and topology, configuring load balancing and so on. Therefore, it is difficult to take your architecture and move it to a different data center, or even a different provider. What if you would like to spread your site across two data centers, one operated by Amazon, the other by Rackspace, for maximum availability? You would have to replicate the setup you developed for one provider and then adapt it to the changed network topology of the other. What if instead you could take the entire network topology and transfer and replicate it – exactly as is – to the other data center, including the IP addresses, broadcast domains, configured routing between tiers and all? This would offer you true mobility across clouds: You would be able to avoid a great deal of provider lock-in and gain true mobility for your entire site: All you need to define your entire network architecture in one handy package. Some efforts are under way to develop standards for this: As part of the OpenStack project there is an initiative, called ‘Network Containers’, which is concerned exactly with this, but those standards are well in their infancy at this point.
Virtual networks give you control over your network topology
So, users of IaaS offerings are forced to deal with a network infrastructure and topology over which they do not have control. This mandated topology may not match your site’s architecture very well. Furthermore, if we already have existing site configurations in more traditional hosting or network environments, we will be hard pressed to move them quickly and seamlessly into the cloud. This is exactly where virtual networks come into play. What is a virtual network? It is a network over which you have control, running on top of a network over which you do not have control.
With virtual networks you control the IP address assignment of your nodes, you can create (virtual) layer 2 switches, even in otherwise fully routed cloud environments. Your switches may even span data centers or providers, or may stretch from your on-premise systems to cloud-based systems. Those virtual layer 2 switches can do everything a real switch can do, including broadcast or running non IP protocols. You can also set up routers and firewalls between your broadcast domains (subnets) exactly the way you wish. There are of course different ways to create virtual networks. Here at vCider we believe we have a particularly user-friendly approach.
To explore how virtual networks can make your cloud deployments easier, visit vcider.com or just sign up and try it out . We also have a screencast to show how quick and simple it can be create your own virtual network.
High Performance Computing Deployments in a Secure Virtual Private Cloud
Securing Scalable Cloud Services for Computational Chemistry and Pharma Research
Mind the Byte is a Barcelona-based cloud application provider, offering computational services, applications, and data sets to biochemistry researchers in universities and research institutes, including researchers for the pharmaceutical industry. The company offers a cloud service called iMols, which compiles molecules, proteins and activities databases, becoming an in silico laboratory for storing chemogenomics data, building data sets, and sharing data and ideas with fellow researchers. The company also offers general consulting and computational services for researchers.
While simplifying computational chemistry for its customers, Mind the Byte faces its own challenges in the area of cloud security and management. The company stores vast quantities of biochemical data in Amazon S3. It accesses and processes this data via iMols running on EC2 instances in Amazon’s Virginia data center. iMols runs standard computations using the Chemistry Development Kit Java library (CDK), as well as Mind the Byte’s own proprietary calculations and analysis. Mind the Byte loads computations into Cassandra for fast, efficient interactive queries. For performance, cost, and security reasons, Cassandra instances and the iMols front end run in a data center in the Netherlands, close to Mind the Byte’s European customers. Mind the Byte manages operations from their in Barcelona.
The company needed a fast, efficient way to connect all these resources in a secure network that would be easy to manage.
The Solution: a vCider Virtual Private Cloud
Mind the Byte implemented a vCider Virtual Private Cloud to create a fast, secure private network for its cloud services and data. vCider’s Virtual Private Cloud (VPC) service enables cloud application developers to configure secure private networks that span data centers and cloud service providers.
Results: Fast, Secure Cloud Networking and More Time for Development
“Data security is vital for our customers, especially our Pharma customers. With vCider, we know that our cloud resources are safe.”
— Dr. Alfons Nonell-Canals, Mind the Byte
- • A fast, efficient, and secure cloud network spanning cloud service providers on different continents
- • Security for Pharma data and other biochemical research
- • Flexibility to add and remove nodes through an easy-to-use management console
- • Faster performance for Cassandra data stores
- • Graphical reporting on network health through the vCider management console
- • More time for chemistry research and customer management, now that cloud management has been simplified
SDN Factoid offered for your consideration…
[ 0 ]April 15, 2012 | Chris Marino
Saw the other day that Infoblox was getting closer to their IPO and had set a price range for their offering. This got me to looking into their S-1 to see how they are doing.
Turns out that they are doing quite well. Revenue up to more than a $160M/yr run rate and growing nicely. If you’re not familiar with Infoblox, they have a DNS appliance as well as a number of other network configuration management products. In their S-1 they say:
We are a leader in automated network control and provide an appliance-based solution that enables dynamic networks and next-generation data centers. Our solution combines real-time IP address management with the automation of key network control and network change and configuration management processes in purpose-built physical and virtual appliances. It is based on our proprietary software that is highly scalable and automates vital network functions, such as IP address management, device configuration, compliance, network discovery, policy implementation, security and monitoring. Our solution enables our end customers to create dynamic networks, address burgeoning growth in the number of network-connected devices and applications, manage complex networks efficiently and capture more fully the value from virtualization and cloud computing.
Sounds a lot like Software Defined Networking? Right?
I think by any definition what Infoblox provides is a vital aspect of SDN. Pretty sure they’re aware of this too since they are a member of the Open Networking Foundation. This is a pretty big commitment since membership requires a $30,000 annual fee.
So, you’d think that SDN, Software Defined Networking, ONF and/or Open Networking Foundation would be promentent in the offering documents, right?
A quick search of the these terms in their S-1 reveals the following tally:
- Software Defined Networking: zero
- SDN: zero
- Open Networking Foundation: zero
- ONF: zero
SaaS Applications Running in a Secure Virtual Private Cloud
Securing Multi-Provider SaaS Services for a Reservation and Ticketing Company
“Better reservations” is the goal of Betterez, a SaaS provider that helps passenger transport operators to increase revenues and improve customer service. The Betterez solution includes a full-featured reservations and ticketing engine that permits direct sales via websites, Facebook, and a back-office Web application, as well as intelligence and analytics that help operators market more effectively and operate more efficiently.
The Betterez service runs in the cloud and relies on proven high performance platforms such as Node.js and MongoDB. To ensure that transport operators and passengers have continuous access to critical ticketing and operations systems, Betterez has implemented a fully redundant and scalable service. They have taken a multi-data-center approach that spans two cloud providers: Amazon AWS and Rackspace Cloud.
The company chose Rackspace as its cloud provider for MongoDB databases. They concluded that Rackspace’s disk performance was more reliable and better suited for their purposes. Rackspace also provides better support, which is important for Betterez’s mission-critical needs. All other services run in various Amazon AWS regions. Betterez needed secure communications between all its servers as well as way to support dynamic scaling of services based on changing demand.
The Solution: vCider Virtual Private Cloud
Betterez decided to implement a vCider Virtual Private Cloud to create a fast, secure and scalable private network for its cloud infrastructure.
Results: Fast, Efficient Secure Networking and Reduced Operational Overhead
“vCider is the glue that keeps all servers and services communicating securely and in well-organized subnets, regardless of their data center location. Accomplishing this without vCider would be a difficult and tedious challenge. vCider saves us time and lets us focus on what we do best.”
—Mani Fazeli, co-founder
- • A fast, efficient, and secure cloud network spanning data centers at both Amazon and Rackspace
- • Flexibility to add and remove nodes through an easy-to-use management console
- • Encrypted communication between MongoDB instances in the multi-data-center replica set
- • Graphical reporting on network health through the vCider management console
- • Reduced operational overhead and demand on system engineers
Open Networking Foundation and the Promise of Inter-Controllable SDNs
[ 1 ]March 23, 2011 | Chris Marino
Big news in the networking world this week with the announcement of the Open Networking Foundation (ONF).
The Open Networking Foundation is a nonprofit organization dedicated to promoting a new approach to networking called Software-Defined Networking (SDN). SDN allows owners and operators of networks to control and manage their networks to best serve their needs. ONF’s first priority is to develop and use the OpenFlow protocol. Through simplified hardware and network management, OpenFlow seeks to increase network functionality while lowering the cost associated with operating networks.
This is an important and ambitious goal. The news got widespread coverage in the trade press (here, here, here and here) as well as a nice write up in the NYT.
There aren’t a lot of detail on how this is all going to work, other than OpenFlow will be the foundation of the approach. Recall that OpenFlow is an approach that separates the control plane from the data plane in network equipment, enabling controllers to manipulate the forwarding functions of the device. Once separate, the hope is that smarter, cheaper networks will be possible since fast, inexpensive forwarding engines can then be controlled by external software.
Its the external control of networks that I find so exciting. OpenFlow is just one technique for Software Defined Networks (SDN) that have the potential to revolutionize the way networks are build and managed. Clearly, virtual networks (VNs) are SDNs.
The vision of ONF is not only that the networks be interoperable, but that they also be inter-controllable. I remember Interop back in the late ’90s where the plugfests were the highlights of the conference. We take that for granted now, but there was a time when it wasn’t uncommon for one vendor’s router to be unable to route another vendor’s packets.
I guess we can start looking forward to control-fests.
The participation of all the major vendors clearly indicates the importance of SDNs. Cisco, Brocade, Broadcom, Ciena, Juniper and Marvell are all on board. I like that large network operators including Google, Facebook, Yahoo, Microsoft, Verizon and Deutsche Telekom make up the board, and not the vendors. This tells me that direction will be set by what users want.
Although I have to admit that I remain doubtful that we we’ll be running inter-controllable OpenFlow-enabled devices any time soon. Anyone that remembers theUnix Wars knows that big companies can’t agree on very much when it comes to their competitive advantage. Heck, they couldn’t even agree on byte order back then.
The unfortunate reality is that each of these vendors can support OpenFlow 100%, while at the same time pursue their own independent SDN strategy, which could result in networks that are no more inter-controllable than they are today.
Nevertheless, it will be really interesting to see what they cook up for the Interop OpenFlow Lab in May.
//End Of vCider Cache
Just to reiterate that was not my content other than the commentary at the beginning of the post. There were quite a few images to upload on this so if I missed one or two apologies. Thanks for stopping by!