Enabling Next-Generation Storage Solutions Using Linux and Intel XScale® Technology

By William von Hagen

Dynamic data storage devices such as hard disk drives have steadily dropped in cost while their capacity and performance have continued to increase. Now the challenge for the future is to make ever increasing amounts of storage available to today’s computer systems, and to be able to dynamically add and reallocate storage as needed. Intel has introduced an impressive line of Intel® XScale technology-based I/O Processors that simplify the process of creating such modern, flexible storage solutions. When powered by the Linux operating system, these provide a firm foundation for the development of off-the-shelf and custom storage devices.

The amount of data required and retained by businesses today continues to escalate. In an increasingly global market, online sales, catalogs, product information, ordering mechanisms, and product delivery have become the lifeblood of many businesses, which has resulted in continuing growth in online data requirements. Similarly, analytical and introspective business practices such as data mining and associated data warehousing require that ever increasing amounts of historical information be readily available in local storage.

While today’s drives offer higher performance and higher capacity, making the best use of that capacity still involves a system design that involves cooperation among the operating system, a low level storage virtualization system, and a higher-level storage management system. To create such a system, designers need hardware that comes with a complete spectrum of support for storage devices, storage controllers, storage protocols, and higher-level file system protocols at the operating system level, handling both local and on-line network-attached storage scenarios.


Figure 1 � I/O Processor boards such as Intel�s IQ80315 can offload storage management functions from the host motherboard, using serial ATA connections to both boost performance and increase the number of drives attached to the host.

Serial Architectures Arise

In local storage scenarios, where storage must be directly attached to a computer system, the traditional approach has been to use parallel drive architectures such as ATA, Ultra ATA and SCSI. These architectures are limited, however, both in terms of speed and the number of devices that can be attached to a single controller. The growing demand for storage capacity has prompted the development of new, serial architectures.

Serial architectures, such as Serial ATA, provide higher performance by transferring multiple bits of information in packets over a high-speed connection. Unlike parallel architectures, they have no inherent limitation on the number of drives that can be connected to a single controller. Most Serial ATA-based motherboards, for example, provide either 2 or 4 SATA connectors, and SATA cards with 8 additional SATA drive connectors are readily available. These new serial technologies thus simplify the attachment of more storage to single systems and allow users to access the data more quickly.

The Intel® IQ80315 and IQ80219 development kits were among the first to provide on-board SATA connectors, and are being used as reference designs for storage controllers and complete storage solutions from leading manufacturers. These boards are complete single-board computers (SBCs) with PCI-Express connectors that enable them to be connected to modern PCI-X motherboards. This provides the data handling performance that modern storage systems need.

Hardware No

Boards are only half of the picture, however. Software is the other half, and the Linux operating system can complete the picture. High-performance reference Linux distributions for the Intel I/O processor-based reference boards, for instance, are readily available from embedded Linux and tools providers such as TimeSys. Running Linux on these boards enables the boards to perform most storage processing and management tasks locally rather than having these tasks performed by the operating system running on the motherboard. Much like the TOE (TCP Offload Engines) used on modern networking hardware, running Linux on an IOP provides a cooperative co-processing environment that reduces the load on a storage server.

Linux Support for Storage Controllers, Formats, and Protocols

One of the advantages of freely available, open source operating systems such as Linux is the wide range of software technologies that developers have made available for them. This includes a variety of software technologies for storage virtualization and management, including support for various data storage formats (file systems). Linux provides rich support for many different types of file systems, each with their own advantages. Above the device driver level, Linux provides two distinct levels of operating system software support for storage virtualization and delivery.

At the lowest level, Linux supports logical volumes, which are configurable sets of physical storage partitions that the operating system can address, format, and deliver data from as a single entity. Though logical volumes are often associated with specific hardware technologies such as disk arrays or RAID, Linux provides operating system support for them that is independent of any particular hardware interface. Linux can format logical volumes as any type of file system that the OS supports, including fast-restart journaling file systems such as EXT3, JFS, ReiserFS and Reiser4, and XFS.

At a higher level, Linux distributions provide integrated support for a variety of data-transfer protocols, ranging from low-level protocols such as iSCSI to higher-layer file system protocols such as the CIFS (the Common Internet File System, the “new name” for an enhanced version of Microsoft’s traditional SMB/NetBIOS filesystem), NFS (the Network File System, originally developed by Sun Microsystems and now widely used across all Unix-like operating systems), and AFP (the AppleTalk Filing Protocol traditionally used on Apple Macintosh systems prior to Mac OS X). Linux also provides built-in support for software RAID (redundant array of independent disk drives) although software RAID is relatively uninteresting in the storage device context because of the integrated hardware RAID support provided by I/O Processor (IOP) hardware.

Another of the core advantages of an open source operating system such as Linux is that it provides out-of-the-box support for a wide variety of integrated and add-on storage controller hardware. This support ranges from traditional controllers, such as IDE/EIDE/ATA-100 and SCSI controllers, to more modern onboard hardware such as SATA, IEEE 1394, and USB-attached storage. In addition, the absence of licensing costs for Linux, its support for a variety of local and networked storage technologies, and its rich support for off-the-shelf hardware make it the operating system of choice for Pentium-class computer systems that primarily function as traditional file servers. Having both the host system and any attached I/O processors run Linux provides an especially convenient scenario that simplifies software development by utilizing a common basis.


Figure 2 � Use of the PCI-X bus in boards such as Intel�s IQ80219 I/O Processor reference design allows a high-speed connection to the motherboard in networked attached storage (NAS) applications.

Linux-Based Network-Attached Storage Solutions

While local storage is important for data mining, many enterprise applications require on-line access to data. One easy and popular mechanism for quickly adding storage to an enterprise environment is to use Network-Attached Storage (NAS) devices on the corporate network. NAS devices are computers that are dedicated to providing additional storage. Because they are accessed over the network, they must support many popular network file systems, such as CIFS, NFS, and AFP. As with local storage, Linux’s rich support for different types of file systems and file system protocols makes it a natural choice to power NAS devices.

Most Linux-based CIFS storage is provided by an application suite known as Samba, which supports integration into existing Microsoft Windows domains and can also function as a Primary Domain Controller (PDC) if necessary. Similarly, NFS-based storage is easily exported from Linux systems through its support for the NFS 2, 3, and 4 protocols, and its integrated support for Network Information Service (NIS) authentication. Finally, AFP support is available through an application suite known as netatalk, which supports a variety of User Authentication Modules for traditional Macintosh authentication. AFP support is important for Macintosh clients running versions of the Mac OS prior to Mac OS X. The Unix foundation for Mac OS X further simplifies adding networked storage for Macintosh clients, because Mac OS X provides integrated support for the CIFS and NFS protocols, which largely removes the need for AFP support.

To simplify the development of NAS devices, Intel provides a NAS development board known as the Intel® EP80219 Development Kit. This 600 MHz Intel XScale® technology-based board features two 10/100 Ethernet interfaces and four SATA connectors, and ships with 16 MB of Flash and 128 MB of DRAM, which is expandable. This board provides a test bed for developing and fine-tuning NAS solutions. The Intel® IQ80219 development kit is a more fully featured version of this board, adding a PCI-X bridge and expansion connector, four additional SATA ports, dual Gigabit Ethernet ports instead of the 10/100 ports found on the EP810219.

Into the Future: Storage Area Networks

While NAS provides a quick and easy way to add storage to a networked enterprise environment, NAS storage is somewhat inflexible because it must be added as specific volumes that are located on a specific file server. This makes it easy to add storage space to an existing network, but migrating existing data from one file server is still a tedious and manual process. Once data has been migrated to a NAS device, existing volumes can easily be expanded if new disks can be added to the storage controllers and Linux logical volumes have been used as the underpinnings of exported storage volumes. In the long term, however, more flexible approaches to enterprise storage, such as the increasingly virtualized storage provided by a Storage Area Network (SAN) solution, are more appropriate for dynamically increasing and reallocating storage.

Storage Area Networks (SANs) provide pools of storage that can be allocated to any connected host by using sophisticated storage management and storage virtualization software. Because SANs provide logical storage, they offer a flexible solution in enterprise environments where the storage requirements of specific systems may change quickly. For those working on SAN storage solutions today, the Intel® IOP321, IOP331, and IOP315 I/O processors are available on reference boards such as the IQ80321, the IQ80331, and the IQ80315. Here, too, Linux distributions and complete development and testing tools are available for these reference boards.

SANs rely on communicating with network-attached storage at the transport or block level rather than the file system protocol level. SAN storage typically uses shared connections to multiple hosts over high speed links such as fiber channel. SAN solutions are available today over dedicated high-speed interfaces such as fiber channel, traditional TCP/IP network interfaces, and an increasing number of high-speed network interfaces such as Gigabit Ethernet. These higher-speed controller and networking connections are increasingly required to satisfy system data requirements that can no longer be serviced by traditional local storage controllers and directly attached devices.

One of the high-speed links being developed is iSCSI, the Internet Engineering Task Force draft specification for the SCSI protocol over TCP/IP networks. Linux already provides integrated support for iSCSI through projects such as the Linux-iSCSI Project (http://linux-iscsi.sourceforge.net/) and ArdisTech’s Linux iSCSI target implementation (http://www.ardistech.com/iscsi/). It is only a matter of time until iSCSI support is integrated into the mainstream Linux kernel, making Linux as effective at empowering SAN solutions as it is at powering the majority of today’s NAS offerings.

Conclusion

Linux provides a powerful foundation for current and future storage technologies. Linux drivers and protocol support are constantly being improved. Other work in the Linux space, such as the OSDL’s Carrier Grade Linux (CGL) specification originally developed for the Telecommunications and Networking space, are introducing additional Linux capabilities that are equally useful to the NAS and SAN markets, such as core features sets for serviceability, maintainability, and increased uptime. A CGL-compliant reference distribution is already available from TimeSys.

New storage technologies such as Serial ATA (SATA) are high-speed, modern mechanisms for attaching high-capacity and high-performance storage devices to Linux systems, facilitating both today’s Network Attached Storage (NAS) devices and the development of tomorrow’s Storage Area Networks (SAN). Intel XScale technology-based I/O processors provide the power to store, retrieve, and deliver data quickly, and help improve general system performance by offloading data management and storage from the system’s primary CPU(s). The combination of the power of Linux and Intel XScale technology-based I/O Processors thus provides working storage solutions today, with even more powerful solutions and technologies on the horizon.


William Von Hagen is a Senior Product Manager at TimeSys Corp. TimeSys, an embedded Linux developer, is a member of the Open Systems Group and the CE Linux Consortium.