Making the Hard Call: Know Your Packet Processing Options
The current state of the broadband industry is one of high pressure and demand. Competition is intense and quickly reaching a breaking point in terms of demand for increased capacity. In a time where one miscalculation could lead to serious repercussions, how do developers make design decisions that will lead to long-term success with the flexibility to evolve with ever-changing technologies?
Selecting packet processing chips for anew line card or pizzabox design can be a complicated decision with many variables at play. Being aware of the most important factors to consider will yield a system that not only meets the highest standards in functionality, but also adapts to new technologies and improves with new changes rather than render itself obsolete.
Do Your Homework – The “Four Ps”
A technology purchasing process is a major task and thorough research as well as proper due diligence is called for to eliminate any sort of buyer’s remorse. Vetting proposed changes against the concepts of programmability, processing, power and price will serve as a very good validation guideline.
Programmability
One of the most important considerations when making a decision is addressable market and product lifespan. Programmability of the device adds the ability to customize features required by demanding service provider customers, and enhance functionality over time as standards evolve. When the data plane cannot support new features required for new services, upgrades could be unnecessarily delayed until budget permits or until competition makes the change necessary. Programmability and capacity headroom holds a promise of product lifetime extension starting at three years and many times as much as nine years, depending on the circumstances.
Processing
In high-performance communications systems, wirespeed processing for all packet sizes is another key objective. The amount of services simultaneously supported at wirespeed operation is eventually what counts when service providers evaluate network platforms. Careful consideration needs to be made in terms of service density aspects; in other words, the number of operations available for each packet.
Power
For both operational cost and environmental reasons, service providers are carefully seeking the highest performance per watt. Given the special characteristics of packet processing, the most efficient chip should be measured as the highest performance per watt at wirespeed performance.
Price
In this competitive market, price will ultimately be dependent on cost of manufacturing. It is important to note, however, that a high price doesn’t necessary result in a high price on the target hardware. In modern ‘line-card-ona- chip’ designs, many functions and memory types have been integrated to the packet processing chips, reducing overall bill-of-material.

The Dataflow Architecture features a single pipeline of hundreds of PISC processor cores with embedded Engine Access Points. The Execution Context holds packet specific data, which traverses with the packet through the pipeline.
Making the Hard(ware) Call
So, what routes may system vendors take to select packet processing chips on new line card designs? Developers have three main choices:
Fixed-function Ethernet Switches
Fixed-function Ethernet switches are ASICs developed for a large market. They come with a feature set defined by the ASIC vendor and can be configured to different types of data plane services. Fixed-function Ethernet switches come with integrated MACs and Phys but with limited access to off-chip memory as they are primarily designed for enterprise and data center networking. Systems for the carrier market, therefore, tend to add custom ASICs or FPGAs to get features and scale tables and look-up rates.
As Ethernet switches are strictly defined by the vendor, their usage is particularly good in networking environments that are mature and where the pace of innovation is limited. In the carrier business, continuously changing business and operational models tend to limit the lifetime of Ethernet switch-based products. In addition, the system vendor doesn’t have the ability to freely differentiate the products to add value, thus tends to compete on price rather than on new features and adaptability to service provider requirements. This has significant impact on the addressable market for the product and limits the system vendor’s support of the service provider’s future roll-out plans.
FPGAs
FPGAs are generic devices that can be programmed to behave as packet processing devices with configurable functions. They serve a huge market and the latest generations are quite powerful. The drawbacks with FPGA-based packet processors include meeting the required price and power budgets for carrier systems. FPGAs are programmable on a low micro level. It is therefore a rather complex engineering task to design, validate and test all functions. Microcode to memory interfaces have to be implemented and integrated by the vendor. Extending functionality of the packet processing part cause interdependencies between subsystems, requiring a new wave of validation.
Network Processors
An attractive alternative to these approaches are network processors. These are specialized devices that can be programmed for different types of data plane applications. The inherent flexibility of these devices adds value to the carrier systems, providing the ability to handle diverse and continually evolving customer requirements. The result is enhanced product lifespan. \
Ranging from 2.5G to 100G in performance, they come in different form factors and are designed to be in the ‘sweet spot’ for different types of line cards. When comparing network processors, the evaluation process starts with a check-box exercise on what core functions are integrated on the device; traffic management, types, speeds and numbers of traffic and system interfaces, I/O memory types and performance, SerDes lanes speeds, and sharing of those between external interfaces etc. Comparing these raw data specifications is rather straight forward. It becomes a bit more complicated when looking at the packet processing performance. Here, you need to dive into the architecture to compare the processing capabilities.

Multicore architectures feature shared resources and a high-speed bus or cross bar for interconnection of processor cores and these resources.
Comparing NPU Architectures
Making advances in terms of programmability, processing and power is going to happen on the network processing unit, but there are many options to consider.When comparing architectures, you need to weigh the strengths and weaknesses of each route.
It is important to remember that comparing processor performance is difficult, as theoretical maximum values are often referred to with little real-world relevance. Moreover, performance is impacted by the ability to efficiently use the available processing capabilities, as well as by how well the I/O memory can be utilized in relation to the processing capacity.
The comparison must therefore start at the chip design level. Let us start by looking at a generic multicore architecture designed for communications applications. (See illustration.) Stemming from general-purpose processors, a multicore network processor (NPU) would want to leverage a higher degree of parallelism by increasing the number of processor cores. This can be achieved by decreasing the complexity and removing unnecessary features (e.g., floating-point instructions are not required for packet processing) found in today’s general-purpose processor architectures.
A multicore NPU architecture comes with a unique organization of processor cores. These cores are grouped into parallel pools or pipelined together in a serial manner. The organization can be tightly controlled by the architecture, as designed by the NPU vendor, to optimize performance.
If loosely defined, the organization allows programmers to more freely divide tasks between them, ultimately providing greater flexibility at the cost of performance control. In many cases, multicore NPUs end up as a hybrid architecture of pipelines and pools.This architecture will have critical bottlenecks in executing in tasks that cannot be parallelized. These have to be clearly understood in the evaluation process.
The organization of processor cores has a fundamental impact on the programming model. Parallel pools come with an associated multi-threaded programming model, where every processor core may run one or more threads. Essentially, the program takes a packet and executes a series of operations on it.
Once completed with the packet, the program is ready to take on the next in line. The programmer utilizes the processing resources by scheduling packets to the different pools. Synchronization across threads will be a key systemization task for the programmer.
The pipelined model takes the data-plane application and divides it into separate processing tasks, for example, classification, metering, statistics counting and forwarding. Each task is then mapped onto separate processor cores and the execution is either enforced by the architecture or left to the programmer. One challenge traditionally involves efficiently dividing the tasks among the cores, as the throughput is limited by the slowest stage.
Packets in a generic multicore architecture are typically stored in a shared memory area. In this case, the programmer has to divide the classification and packet modification tasks between the pools and pipelines of processing resources.
The Deterministic Dataflow Architecture
The dataflow architecture takes a unique approach and features a single pipeline of processor cores. (See illustration.) The architecture has been designed to be fully deterministic and ultra-efficient. It includes a packet instruction set computer (PISC) and an engine access point (EAP) in addition to the execution context.
The data-plane program is compiled to instruction memory located in the processor cores, eliminating the need to fetch instructions to the processor cores from a shared memory during program execution. Moreover, this leads to significant gains in performance and power dissipation.
The programming model is identical to the well-known sequential uni-processor model, wherein programmers write each packet program in sequential order, avoiding the hassles of multi-parallel programming (e.g., memory consistency, coherence and synchronization). When the software is compiled, the code is automatically mapped to the single pipeline of processor cores. One VLIW instruction occupies one processor core in the pipeline.
A significant benefit of the architecture and programming model is that it enforces wirespeed operation. Every type of packet has a guaranteed number of operations and classification resources.
Reduced Complexity, Greater Performance
Multicore architectures are inherently more flexible than the dataflow architecture. These can be used for more types of services. Being less efficient than the dataflow architecture for layer 2-4 processing, it is better suited for applications requiring processing at layer 5-7.
By reducing the complexity and fully optimizing the architecture for layer 2-4, the dataflow architecture’s design scales to several hundreds of processor cores, supporting 100 Gbits/s and 150 million packets per second with strict wirespeed guarantees.
Know Your Facts
It is clear that consumer demand for increased bandwidth will continue to drive major industry changes. Careful research and planning when making major carrier equipment decisions will yield positive results in terms of both economics and overall system performance.Ahealthy combination of programmability, performance and process headroom will ensure competitive technology that grows and adapts to new innovations.
While you learned the ThreeRs in grade school, it’s time to add the Four Ps to your design selection vocabulary. Making every decision in terms of programmability, processing, power and price will yield sound results and ensure long-term success.

Per Lembre is director of product marketing at Xelerated. He has more than 10 years of experience in product management and marketing in the data networking and communications industry. For more information, email per.lembre@ xelerated.com or visit www.xelerated.com.

