Computation and Communication Systems Need Advanced Switching
By Gary LeeCommunication devices in the network edge now include software intensive functions such as Spam filtering and virus scanning. To provide these functions, communications computing devices need to employ multiple computing units clustered together in a single chassis and interconnected using efficient data transport protocols. This computing requirement, along with higher bandwidth needs, is driving CPU chip-set manufactures such as Intel to provide switched fabric interfaces on their devices. Of the many competing fabric technologies, Advanced Switching Interconnect provides the best match to the needs of communications applications.
At a high level, all multiprocessor systems are essentially large distributed computers. They contain most of the components found in a PC today - including CPUs, memory, disk drives, and I/O - linked together through an interconnect structure that can take on many forms. In conventional computer systems, memory connects directly to the CPUs through the front side bus or a Memory Controller Hub (MCH). The connection for I/O and disk drives comes through an I/O Controller Hub (ICH). Systems with multiple CPUs, a lot of I/O, or a large amount of memory may be distributed across several cards.
The key to the performance of a distributed computer lies at the more detailed level: the interconnect structure. This structure must allow the various computing, memory, and I/O elements within the design to exchange data efficiently and at high bit rates. Traditional parallel bus structures, in which all elements connect through a single common bus, are proving unable to meet the growing bandwidth needs of modern communication system designs.
One increasingly popular alternative is a switched fabric, which uses a matrix of switches to establish a dedicated connection between two design elements for the duration of the data transfer. The switches allow the system elements to share a common set of physical links without the loading and timing issues that arise when the elements are permanently connected together. By creating temporary, point-to-point connections, the switched fabric provides the connectivity of a shared bus with the high performance of a dedicated line. Many different switched fabric bus structures have arisen in the last few years, including PCI-Express, switched Ethernet, RapidIO, and Advanced Switching Interconnect (ASI).
While each of the switched fabric structures has it merits, however, only ASI offers the combination of features that suits communications computing applications. For example, PCI-Express is a point-to-point serial packet-based interconnect that is gaining popularity as a replacement upgrade to the traditional PCI bus in desktop computing systems. It has the significant advantage that its market will allow PCI-Express components to become inexpensive and widely available. The drawback comes in the fabric topology that it implements.
PCI Express is designed to emulate a PCI tree structure with the PCI devices connected through a virtual bus inside the PCI-Express switch. The typical PCI tree topology, shown in Figure 1, allows only one master CPU in the system. This master CPU is known as a root complex. Other CPUs can be connected to the PCI tree, but only through a non-transparent bridge device that make them appear as a leaf node to the root complex. If the primary root complex fails, the CPU connected through the non-transparent bridge can take over system control and become the new root complex.

Figure 1 � Distributed computer systems have essentially the same high-level architecture: they are composed of components or modules that communicate through some interconnect structure. The details of that structure determine the system�s performance.
PCI Tree Complicates Multiprocessing
This structure has many drawbacks for multiprocessing designs. In the standard PCI tree, all communication between peer devices must pass through the root complex; no direct peer-to-peer communication is available. This need to handle all communications causes a bandwidth bottleneck at the root complex. In addition, because the transactions must traverse the tree in both directions, the structure has high latency. Finally, the non-transparent bridging to additional CPUs, while suitable for parent-child interactions, is difficult to implement for peer-to-peer communications among CPUs in multiprocessing. These drawbacks make PCI-Express unsuitable for communications computing, which needs high bandwidth, low latency, and good peer-to-peer interaction.
For peer-to-peer applications using a backplane, Ethernet switches have recently been adopted as industry standard solutions. Some of the recent industry standards include PICMG 2.16 for CompactPCI and 3.1 for ATCA. The IEEE 802.3 Backplane Ethernet Study Group is also doing work defining Ethernet switching standards. At a first glance, Ethernet backplanes seem an attractive solution because they are readily implemented. Many CPU chip sets contain Ethernet MACs (media access controllers) and many system IO cards contain Ethernet ports.
The structure of a switched Ethernet backplane, as shown in Figure 2, calls for all the system elements to connect to the Ethernet switch. MAC devices are needed at most backplane endpoints. These MACs may also need Memory Controller (MC) or Disk Controller (DC) logic to connect to peripheral devices, which employ protocols such as PCI or SCSI. The CPUs often connect through MACs that are also TCP/IP Offload Engines (TOEs) because most applications transport data across an Ethernet backplane using the TCP/IP socket layer or RDMA. Without TOEs, implementing these protocols reduces the CPU performance available for applications software.

Figure 2 � The Ethernet backplane approach has all its elements connect to a common Ethernet switch and typically uses the TCP/IP protocol for communications. Its drawback is that it does not offer standardized quality of service (QoS) mechanisms.
Ethernet Struggles with QoS
Unfortunately, Ethernet backplanes have shortcomings in the areas of link utilization, quality of service (QoS), high availability (HA), and latency. Ethernet frames are less efficient than other packet based backplane technologies because of excessive frame overhead. This overhead also requires additional backplane bandwidth. If a line card needs to support a rate above 1 Gbit/s, a 10-Gbit/s Ethernet backplane link must be used.
In addition, Ethernet lacks effective mechanisms for flow control, congestion management, and high availability. Although Ethernet provides three priority bits in the VLAN tag that can be used to provide such mechanisms, there is no industry standard on how to use these bits. Ethernet also has no class based flow control mechanism, and only supports an XON/XOFF mechanism in some applications.
Most Ethernet backplane system designers attempt to solve the QoS problem by over-provisioning the backplane bandwidth, or by relying on higher layer TCP/IP processing with its added processing overhead. High availability also relies on the TCP/IP layer to send traffic around points of failure. Unfortunately, many of today’s systems only utilize layer 2 Ethernet switching, which does not have these QoS or HA features.
The limitations of Ethernet and PCI-Express have focused attention on alternative fabrics based on the serialization and deserialization (serdes) of byte-wide signals for transport over serial “lanes.” One emerging approach comes from the RapidIO specification. Parallel RapidIO has been available for more than 4 years and was originally developed as a chip-to-chip interconnect technology. Today it is available on a few devices as a front-side-bus interconnect for CPUs and distributed memory systems. The Serial RapidIO specification, released in 2002, allows 1 or 4 lanes of 1.5, 2.5, or 3.125 Gbit/s serdes, making it suitable for a switched fabric. Work has now started on a RapidIO fabric specification called RapidFabric. The current proposal includes SAR (segmentation and reassembly) functions, multicast, internetworking, as many as 256 traffic classes, millions of flows, and end-to-end flow control.
RapidFabric Still in Development
Because work on the RapidFabric specification has just begun, however, it is questionable whether or not it will meet the needs of communications computing. The current RapidIO specification, for example, does not have provisions for traffic isolation through virtual channels or for bandwidth scheduling. Also some of the intended traffic management functions, such as end-to-end flow control, are useful for network edge devices but impose unnecessary complexity on applications such as Storage, Blade Servers and Network Appliances. Further, the current Serial RapidIO specification allows for only 1 or 4 lanes, which yield a top user bandwidth 10Gbits/s. Switch fabrics require backplane speed-up to compensate for factors such as cell overhead and temporary traffic congestion. Because there is no backplane speed-up planned for Serial RapidIO, the achievable user bandwidth will be less than 10 Gbits/s.
The Advanced Switching Interconnect (ASI) fabric combines some of the best features of the alternatives, making it an excellent match to the needs of communications computing. It achieves low cost and wide availability of components by leveraging the physical and data link layer of PCI-Express. Unlike PCI-Express, however, it specifies a fabric topology similar to switched Ethernet for efficient peer-to-peer transactions, as shown in Figure 3. One CPU in the system serves as the fabric manager; another one can optionally serve as the backup fabric manager. ASI further provides an integrated data and control plane, eliminating the cost and complexity of supporting a separate interconnect structure for control traffic. ASI also supports tunneling of various other communication protocols such as PCI-Express and Ethernet, and provides native transaction mechanisms such as simple load store (SLS), simple queuing (SQ), and socket data transfer (SDT).

Figure 3 � Advanced Switching (ASI) has a structure similar to the Ethernet backplane, but uses the PCI-Express hardware links and includes built-in flow control. CPUs from companies such as Intel already have ASI interfaces built in.
ASI has several advantages over Ethernet backplanes. For one, ASI uses serial links that can scale from 1 to 32 lanes, providing user bandwidth in units of 2Gbit/s per lane. Ethernet has a much greater jump in its speed ranges, forcing overdesign. Sending a 2.5-Gbit/s traffic flow through an Ethernet backplane, for instance, requires a 10 Gbit/s backplane link. In ASI, this would only require two lanes of 2 Gbit/s (user bandwidth) for a total of 4Gbits/s. Further, the ASI frame overhead is much lower than in Ethernet. For example, a 4-byte backplane transaction would require a 64-byte Ethernet frame while the same transaction would require only a 23-byte ASI packet.
Another advantage of ASI over Ethernet is ASI’s built-in ability to support QoS control. Ethernet has no defined QoS standards for switched fabric applications. Instead, Ethernet provides a true best-effort service with latencies > 10µsec. This latency is about a factor of 10 higher than storage control traffic or server-clustering traffic can tolerate. The addition of new mechanisms to implement flow control that would lower the latency requires changes to the Ethernet standard. Using non-standard Ethernet components would then remove some of the cost advantages gained through Ethernet’s economies of scale.
RapidFabric is implementing some of the features found in ASI, but it is still on the drawing board. Further, RapidFabric will not provide enough backplane speed-up factor to support the use of 10G line cards until the next generation of RapidIO serdes devices becomes available. An ASI fabric can support 10G line cards now, using eight serdes lanes that provide 16 Gbits/s of user bandwidth or a speed-up factor of 1.6x. Next generation PCI-Express serdes will provide ASI with a 2x speed-up factor, allowing support of 10G using only four serdes lanes.
ASI Support Grows
The ASI approach is gaining support in the industry and a robust ecosystem has already started to develop. CPU subsystems are being developed today with ASI ports that can directly connect to an ASI switch element. Endpoint devices that allow peripherals to communicate over the ASI switch fabric are also under development. In addition, multiple vendors have announced products for ASI switches and bridges to be released in 2005.
Some of this support stems from the anticipated leverage of the PCI-Express physical layer. Industry expectation is that PCI-Express will emerge as a dominant interconnect technology. Connector, cable, and ATCA chassis vendors will spend a lot of time and effort to ease the use and lower the cost of PCI-Express interconnect technology for system designers. ASI will take advantage of this effort while enhancing PCI-Express functionality to allow true peer-to-peer interconnect without the need for such non-standard methodologies as non-transparent bridging.
The principle factor in the support for ASI, however, is its versatility in meeting the needs of high bandwidth computing applications. For example, by using redundant CPUs as file servers and controllers along with a dual ASI switching element, developers can create a highly available network attached storage system that can automatically switch around hardware failures. Blade servers can be designed with I/O control CPUs that include a direct ASI interface to the switch element, and then be coupled with I/O cards that use ASI Host Bus Adapters. The adapters can perform Ethernet tunneling to the server blades, eliminating the need for protocol conversions. If server clustering is required, the ASI Simple Load Store (SLS) and Socket Data Transfer (SDT) protocols can be used to provide low latency peer-to-peer transactions between the server blade memories.
ASI is also applicable to smaller systems, such as network appliances, that may not contain dedicated switch cards. Instead, small ASI switch elements can be located on each I/O and processing card and interconnected in a mesh structure. The switch elements on the I/O cards can contain embedded endpoints that serve as Ethernet to ASI bridges that can load balance traffic to the processing cards. The processing cards would have direct ASI interfaces to their local switch elements, allowing them to use Ethernet, SLS, Simple Queuing (SQ), or Simple Data Transport (SDT) to exchange information.
ASI, then, meets all the requirements of modern communications computing systems that use distributed computer architectures. It provides the efficient interconnect structure that the system components need to exchange data, building on PCI Express to minimize cost while providing peer-to-peer connectivity. Although PCI-Express, Ethernet and RapidFabric each provide some of the features required by communications computing systems, Advanced Switching is the only approach that provides low latency, scalable bandwidth, flexible interfaces, high availability, and quality of service features at a cost comparable to Ethernet.
Gary Lee is the director of Intelligent Switch Fabric marketing at Vitesse Semiconductor. Vitesse designs, develops, and markets semiconductors for communications and storage networks and is an Associate Member of the Intel® Communications Alliance.












