Accelerate Deep Packet Inspection with Standard Servers

The Intel Platform for Communications Infrastructure offers OEMs and platform developers a powerful alternative to traditional NPU offerings, with resources that will help developers deliver first-to-market solutions while saving time and money.

By Austin Hipes, Vice President of Technology, NEI

In the past, organizations looking for high-performance deep packet inspection (DPI) typically turned to dedicated network processing units (NPUs). This specialized hardware presented numerous drawbacks, such as difficult programming models and infrequent silicon upgrades. OEMs can solve these challenges with the Intel® Platform for Communications Infrastructure (formerly code-named “Crystal Forest”), which combines multicore Intel® architecture (IA) packet processing with new encryption and compression hardware acceleration, all on standard server platforms. This optimized platform delivers multi-gigabit network security performance without dedicated NPUs. For instance, in Layer 3 packet forwarding, the platform is capable of over 160 million packets per second compared with the typical NPU’s 100 million packets per second.

This new platform gives developers the ability to scale their solutions from low-power, low-cost, single-core designs with sub-1 Gbps of bulk encryption capability and dual DDR3 memory channels, all the way up to large 16-core platforms with 80 Gbps+ of bulk encryption capability and 8 DDR3 memory channels. This scalability allows independent software vendors (ISVs) to develop once and deploy equipment achieving various performance and price points, quickly evolving their platforms in concert with Intel’s industry-leading “tick-tock” product cycle. What’s more, OEMs and ISVs can leverage products and services from Intel® Intelligent Systems Alliance members like NEI to further accelerate development, helping them deliver first-to-market solutions.

The Growing Need for Deep Packet Inspection (DPI)
Packet-based data continues to increase year after year. The growth of mobile 4G device use, cloud computing, cloud storage and streaming video are all pushing networks to their limits, requiring network operators to more efficiently use the their resources. DPI allows network operators to take a more comprehensive look into the data traveling on their networks and, when necessary, control how each data packet is handled. By inspecting each packet in real-time as it travels through the network, content rules can be more accurately enforced, security threats can be more easily identified, high-priority traffic can be more effectively prioritized, and usage statistics can be more accurately gathered.

For example, a corporate IT department may have a policy prohibiting streaming video on the company network. A standard firewall and policy enforcement tool without DPI typically limits the IT administrator to blocking certain sites such as YouTube, and blocking common TCP ports used for streaming video such as port 554. Adding DPI enables the enforcement tool to inspect the packet structure all the way down to the application layer, and thereby detect, block and report services like the real-time streaming protocol (RTSP) regardless of the port or web address it attempts to use.

Other uses for DPI include stateful load balancing, where each session needs to “stick” to the original server processing it until that session completes. Such solutions are common to transaction processing and video on demand (VoD) services.

DPI-based solutions are also widely used for network security by allowing network traffic to be scanned for viruses, worms and spyware in real time. With the increased use of mobile devices and notebook computers connecting into corporate networks though VPNs, new threats can be introduced into the network directly if they are not discovered first. DPI can allow for security policy enforcement at all layers of the network, allowing for a much more robust security solution.

The Increasing Need for Encrypted Data
The need for data encryption in data-plane services is increasing as networks load up with mobile devices, cloud computing and cloud storage. Increasingly, sensitive data must traverse the Internet or other unsecured networks between secure endpoints. Whether it is a salesperson accessing the company’s cloud-deployed customer relationship management (CRM) system from a smartphone, an IT professional using a remote terminal application on a tablet to control a corporate server, or a home user backing up important data to cloud-based storage, encryption is an important part of the way we live and work today. Because of the tremendous growth in these types of encryption-driven network services, DPI and encryption often go hand in hand. That is, DPI implementations often need to decrypt and encrypt network data in real-time in order to analyze packet payloads and make intelligent data-plane traffic decisions.

Traditional Approaches
Until now, high-performance DPI and encryption required NPUs specifically designed for these tasks. These devices can be found in network devices ranging from intrusion detection and prevention systems (IDPS) to session border controllers and network monitoring systems. In addition to employing an NPU (or several NPUs), many such platforms incorporate traditional CPU-based server hardware to provide full system-level control plane functionality.

While NPUs offer certain advantages, they also have important drawbacks. Most notably, NPUs use proprietary architectures that complicate the programming model. For one thing, NPUs typically require specialized programming skills and can be difficult to program. As a result, networking hardware often requires two separate programming teams—one for NPU software and another for CPU software. Coordinating these teams can be a major challenge, and the disparate code bases significantly limit design flexibility. NPUs also complicate hardware design which can raise system cost, and NPU silicon is refreshed on a longer timeline than typical mainstream microprocessors and peripherals—potentially leaving OEMs stuck with “trailing edge” technology. Recent advances in server hardware, however, are now allowing OEMs to reach multi-gigabit DPI performance levels without the need for specialized and costly NPU solutions.

The Intel® Platform for Communications Infrastructure
Enter the Intel Platform for Communications Infrastructure. Specifically designed for workload consolidation, the platform consists of multicore IA processors combined with new Intel® QuickAssist Technology hardware accelerators for encryption and data compression/decompression (Figure 1). The platform is capable of performing operations on applications, control plane and data plane concurrently, with very high throughput. In Layer 3 packet forwarding, for example, the platform is capable of over 160 million packets per second, compared with the typical NPU’s 100 million packets per second. (For more details on the platform, see “Tech Review: Intel® Platform for Communications Infrastructure”.)

Figure 1: The Intel® Platform for Communications Infrastructure pairs multicore processors with Intel® QuickAssist Technology accelerators. The chipset incorporates 1 GbE MACs, but the platform can be configured with external 10 GbE MACs for faster throughput.

This new platform allows OEMs to achieve a new level of DPI and encryption design flexibility while often achieving cost reduction compared with expensive NPU-based add-in cards. By using a single, common architecture, software developers can more effectively spend their time on application development and less on learning new hardware.

The processors used in this platform provide the basis for excellent performance. A dual-socket configuration with Intel® Xeon® processors E5 2600 series can provide up to 16 cores, up to eight 1600 MHz memory controllers, and a PCI Express (PCIe) Gen 3.0 root complex with up to 80 channels, giving the platform the lowest memory latency and highest available I/O throughput of any mainstream Intel platform to date.

Adding to the optimized design is the Intel® Communications Chipset 89xx Series (formerly code-named “Cave Creek”). This chipset combines traditional platform controller hub (PCH) compute I/O functions with communications hardware accelerators. It can be used as a PCH in an embedded motherboard design (as shown in Figure 1), or as a standalone PCIe device added to a traditional server platform (Figure 2). In both cases, the accelerators can offload both encryption and data compression from the system’s main CPUs so that they, in turn, are free to perform other work. Spared from these workloads, the IA cores are now available for processing both DPI and application tasks, effectively performing work on both data and control planes.


Figure 2: An Intel® Communications Chipset 89xx Series-based PCI Express (PCIe) add-in card transforms a legacy server into a high-performance DPI/encryption network appliance. Shown here is the Intel PCIe card code-named Granite Hill.

The platform gives developers the ability to scale their solutions from low-power, low-cost, single-core designs all the way up to large 16-core platforms. The chipset offers similar scalability, with configurations ranging from a single device with 5 Gbps of bulk encryption capability up to quad-device configurations with 80 Gbps+ of bulk encryption capability. (Note that total system throughput depends on both the processor and accelerator. Thus, total throughput starts at <1 Gbps for implementations with single-core processors.)

This huge range of performance is all available while using the same code base, which allows OEMs and ISVs to address multiple market segments without having to make a significant investment in developing new code simply by scaling the number of CPU cores and accelerators in a given platform. As well, existing server platforms can be transformed into high-performance DPI and encryption machines by adding PCI Express cards containing the Intel Communications Chipset 89xx Series (as shown in Figure 2).

Intel® Data Plane Development Kit (DPDK)
But the scalable hardware is only part of the story: How does one best take advantage of this new design? The answer is the Intel® Data Plane Development Kit (Intel® DPDK). Designed specifically to assist developers in migrating traditional NPU packet processing applications to IA, Intel DPDK helps maximize application throughput and minimize development time. The software kit is a series of highly optimized libraries and programming primitives that, when packaged within a virtual machine, allows developers to easily scale by instantiating multiple virtual machines (VMs).

As an example, Figure 3 shows multiple virtual data plane processors and a single control plane processor spread across two four-core processors. Seven of the cores run Intel DPDK instances dedicated to data-plane tasks such as DPI, packet forwarding, routing and so on. Four of the cores are controlling 1/10 Gigabit Ethernet ports, while a single core remains available to run Linux for control plane or higher-level “housekeeping” functions. The CPUs are connected via Intel® QuickPath Interconnect (Intel® QPI) for high-speed out-of-band data transfers.


Figure 3: In this example of Intel® Data Plane Development Kit (Intel® DPDK) running in Linux user space, multiple instances of Intel DPDK virtual machines (VMs) are running across different cores assisting in data plane packet processing.

Intel DPDK is designed to be compatible with any IA platform, allowing OEMs and platform developers to choose the class of CPU that best meets their performance needs at the most effective price point. As future processors become available, OEMs can quickly take advantage of new advances in processor performance and core counts.

Accelerating Design with the Alliance
OEMs and ISVs can extend the advantages of the Intel platform by working with Alliance members like NEI, a high-level systems integrator specializing in upfront consulting and design services in the security, telecom and storage markets. With more than a decade of experience in designing security appliances and more than eight years specifically designing DPI-based appliances, NEI understands the market needs and technology trends specific to DPI applications:

  • Solution Design: Targeting physical, virtual and cloud system designs, NEI services include hardware selection, OS customization, image creation, remote management and regulatory and compliance certifications.
  • System Integration: Full manufacturing and integration capabilities backed by robust process controls to ensure predictable, repeatable results.
  • Application Management: Unique software that delivers automated updates, monitors system health, performs back-ups, and manages deployed appliances.
  • Global Logistics: Comprehensive services that ease delivery and inventory management.
  • Support and Maintenance: Service programs that extend product life span and maximize uptime.

Over the coming months, NEI will release a wide variety of Intel Communications Chipset 89xx series-based systems, including small form factor appliances, enterprise- and carrier- grade rack mount servers, and ATCA reference solutions. Two examples include NEI’s E1800 R3 and E2900 R3 server platforms, both based upon Intel® Xeon® processor E5-26xx series, which can integrate the Intel Communications Chipset 89xx series via PCIe add-in cards (Figure 4). These scalable 1U and 2U rack mount platforms support up to six full-height, full-length PCIe expansion cards and up to 24 disk drives, providing excellent performance for medium- to high-density applications. Integrated RAID, hot-swap drives, and redundant hot-swap power supplies ensure maximum reliability and uptime. What’s more, NEI’s design services can help OEMs and ISVs rapidly create frame-level products based on these servers.


Figure 4: NEI’s E1800 R3 and E2900 R3 server platforms providing excellent flexibility and scalability for medium to high-density applications.

On the software side, NEI offers a virtual appliance model designed to optimize deployment across physical, virtual and cloud models (Figure 5). NEI creates a single controlled image that incorporates a virtualization layer. This virtual appliance can be deployed across physical, virtual and cloud models with the same performance and lifecycle, saving valuable time and money. In addition, the use of a single image enables NEI’s unique application management, which delivers automated remote health, update and backup tasks.


Figure 5: NEI offers both virtual and hardware appliance deployment models.

Accelerating DPI Design
The Intel Platform for Communications Infrastructure offers OEMs and platform developers a powerful alternative to the traditional NPU offerings. By using a single, common architecture, developers can spend less time on hardware and software basics and more time creating innovative applications. Likewise, system designers can more effectively scale their solutions for different market segments by adjusting the CPU class and number of hardware accelerators. OEMs and ISVs can extend the advantages of the Intel platform by working with the Alliance members like NEI, which provide hardware, software and services that can significantly accelerate design. By taking advantage of the Alliance’s expertise, developers can deliver first-to-market solutions while saving time and money.

This article first appeared in the Embedded Innovator newsletter (10th edition, 2012) published by the Intel® Intelligent Systems Alliance.



Austin Hipes is vice president of technology at NEI, where his primary focus has been on designing systems for network equipment providers requiring carrier-grade solutions. He was previously director of technology at Alliance Systems and a field applications engineer for Arrow Electronics. He received his bachelor’s degree from the University of Texas at Dallas.