Ethernet Fabric As Core - A Modest Proposal
We have traditionally built networks around a pair of massively redundant core switches using additional access and aggregation layers of switches to funnel traffic to the core. This design was driven by the limitations of previous generations of smaller switches, the spanning tree protocol and yesterday’s traffic patterns.
Just a few years ago, much of the traffic in an enterprise was users running 2 tier client server applications and accessing file services. This resulted in large volumes of so called north-south traffic between user PCs, external systems, and servers. Today, much of the traffic is server to server or east-west as web and application servers access databases. Virtualization adds to the east-west traffic pattern with live migration traffic and virtual desktops.
If we’re building a network for east-west traffic, why are we connecting ToR (Top of Rack) switches through oversubscribed uplinks to the core? If you have 100 to 300 or so servers with dual 10 gigabit connections, you could just build a full mesh of 48, 60 or 96 port ToR switches.
Let’s take a full mesh of 8-60 port switches with 20Gbps interconnects as an example. Each switch would use 14 10gbps ports for connections to it’s peers for providing 140Gbps of fabric bandwidth. That would leave 46 ports on each switch for server and storage connections, which would easily support 150 or so servers with dual connections and plenty of storage connections to boot. Each server would be no more than 2 switch hops from any other and the inter-switch links would be only about 3:1 oversubscribed.
A more conventional design would use a pair of core switches that the ToR switches all uplinked to. If we use 40Gbps uplinks, we boost the oversubscription level from 3:1 to 6.5:1 (52:8) and add another switch hop to the data path between any 2 servers connected to different switches. Using dedicated storage switches also connected to the core would further stress the uplinks. We may be able to save a few bucks on ToR switches but we’d have to spend several times that much to buy line cards for the core, let alone the cost of a pair of Nexus 7000 type core switches.
As you add layer 2 multipath capable ToR switches, consider using the fabric itself as the core. It could save you more than a few dollars while providing reliability and performance comparable to a more conventional design. Of course, full meshes can only scale so large. Adding more switches to the mesh requires more inter switch link ports so you can reach the point where with adding another switch to the mesh you’ll actually reduce the number of useable ports in the fabric. So if you’re building a network for 1000 servers, those core switches with high port counts might pay off.
So would you build a data center network without a core? Comments solicited.
Disclaimer: Brocade, who makes switches that could be used to build such a mesh network, is a client of DeepStorage.net.
Permalink Comments off







