Accelerate Server-based Media Processing

A PCI Express media processing accelerator card using DSP technology offers benefits over additional servers for high-density voice and video processing in network applications.

“These go to eleven,” were the legendary words of Spinal Tap lead guitarist Nigel Tufnel when explaining why the band’s amplifiers made them louder, and therefore better, than other rock bands. The expression now has its own entry in the Oxford English Dictionary and has come to mean the act of taking something to the extreme. When talking about communications networks, taking the performance of an application to eleven usually means adding more equipment. This is especially true when adding functionality without compromising performance of the original applications on equipment such as rack mounted servers and appliances.

For example, high-density voice and video processing is increasingly in demand for network applications and the current typical solution is some kind of commercial host media processing (HMP) solution that draws on the existing processor resources of the server.

This article describes an alternative approach, using PCI Express media acceleration cards—which are increasingly available off the shelf—and embedded voice/video firmware to offer dramatically improved performance while taking up less space, consuming less power and costing less. In short, a solution that can take a server-based media processing application to eleven.

As communications networks transition to an all-IP environment, service providers and network operators are finding a need for IP media servers and new advanced flow management devices such as session border controllers, QOS analytic engines and intelligent flow optimizers.

Many of these are developed and deployed on 1U or 2U standard rack-mounted server architectures for simplicity. The role of IP media gateways and media servers is clear but as the developers and users of border flow management devices consider where to go next, one obvious step is to build some advanced media stream processing into the platform. One key concern is scalability. According to most analysts, mobile data and especially mobile video is expected to grow exponentially over the next three to five years so the pressure is on to find cost and power-efficient ways to scale media processing to suit. Some of the issues that confront equipment developers are as follows:

Adding Voice Transcoding to a Session Border Controller
A good example of a flow management application is the session border controller (SBC), an often-quoted example of a class of equipment known as network security gateways. These are characteristic of “bump in the wire” devices that form a bridge between trusted and untrusted networks or enterprises. Their job is to analyze and characterize incoming IP traffic, block undesirable or unauthorized flows and let through approved traffic. In communications networks, a lot of this traffic is media streams.

Figure 1: Session border controllers are gateways between heterogeneous networks

As this is a gateway point, many SBC users are also interested in providing additional media format translation in addition to the stream management. Even simple requirements like DTMF tone monitoring require that the media streams are decoded and analyzed.

The ability to have voice transcoding within the box helps simplify the communications flow for an operator, hence provides a competitive advantage for the equipment vendor. Unfortunately, voice and especially video stream processing in real time at high channel counts is a strenuous task, so adding this function can impose a significant reduction on the processing power available to the main service leading to a reduction in capacity.

Possible Solutions
Adding media processing functionality to an application can be done in a number of ways:

  1. An additional system or device linked to the original appliance
  2. An internal software solution, adding functionality to existing software
  3. An internal media processing accelerator offering hardware-accelerated transcoding

In the SBC plus voice transcoding example above, using an external media gateway is perhaps the simplest to envisage. The border gateway terminates principal traffic streams, and redirects media to the external gateway for transcode via external ports. Media can come back into the border gateway for egress filtering. The disadvantage is that this is costly, uses rack space and extra power, takes up valuable physical network interfaces off the border gateway, and still requires application development that controls and configures media stream handling on a stream by stream basis.

Taking the media server plus HD video example above, using an external HD conferencing device will be complex to manage, will take up additional rack space and power, and could be high cost. The service application would need to be able to manage both systems in parallel, potentially increasing complexity, management overhead, and OPEX costs. Upgrade paths to newer compression schemes such as H.265 may be limited.

The other two solutions allow for this function to be taken inside the box.

An internal software solution, for instance using commercially available host media processing software, necessarily makes use of internal processing resources.

In the case of voice transcoding, this may be a great solution for a moderate number of simultaneous channels, however it does not scale effectively. At upwards of 1200 simultaneous channels of G.729 encoding, the software solution approaches 50% utilization of a typical server, starving the original application of processing resource. Effectively this means that additional servers would be required to offer higher densities of voice transcoding, and the cost of the commercial software that is usually charged on a per-channel basis soon mounts up.

Although it is possible to add more servers to address this issue, accepting a reduction in capacity even for an improvement in functionality is often difficult to manage from a product line perspective. It results in a downgrade of capacity within the same product offering, so cannot really be viewed as adding functionality. Matters get even worse when considering field upgrades since a customer must accept that a given installation would no longer be able to carry the same traffic.

The Solution
A more elegant solution to the problem is to use a plug-in media processing accelerator to offload both audio and video processing from the server host.

Figure 2: A conventional host media processing (HMP) solution can compromise server performance

This keeps the function internal to the network element AND avoids the loss of central processing resource that would otherwise be required to run a fully software solution. Ideally this would be able to take account of new voice and video compression schemes as they emerge.
In this case, using a plug-in media processing accelerator offers a true upgrade path.

DSP Offload Card
It is now possible to deploy PCI Express media processing boards that offer high-performance voice and video transcoding based on digital signal processing (DSP) technology. Some boards even offer voice and video processing firmware optimized for their DSP array. Application developers can interact with these boards via a simple object-oriented application programmers interface (API). The transcoding performance scales linearly according to the number of DSPs that are fitted—options from 4 DSPs to 12 DSPs are available. But even with 4 DSPs and consuming less than 25W of power, cards are available that deliver a voice transcoding performance comparable to a typical server consuming 300W or more.

An Example Application
An example may help illustrate the value of using acceleration. Consider a packet processing application that, in a server based on dual Intel® Xeon® processors, can support 4000 concurrent sessions or streams. The market now demands to add voice transcoding capability.

As outlined above, one option is to use a commercial host media processing solution. This requires approximately 50% of a dual Intel Xeon server capacity for 2000 transcode streams. As a consequence, adding this capability reduces the available processing power for the original application by 50%. The resulting solution is now only a 2000 stream processing device. To get back to the 4000 stream capacity, a customer must buy two units, so power consumption and rack space is doubled.

The alternative is to add a PCI Express media accelerator card. This takes care of the processing-intensive workload, thus maintaining the original performance. In fact, compared to a host media processing solution that is limited to approximately 2000 sessions per server, a single PCI Express media accelerator card may be capable of transcoding over 7500 bidirectional voice streams or over 300 mobile video streams in hardware, and multiple boards can be fitted to a single server.

Figure 3: Emerson’s PCIE-8120 is an example of a PCI Express media processing accelerator card

Voice Capability
When considering the PCI Express accelerator card route, design engineers should ensure their shortlisted solutions support the following 3GPP, ITU-T, IETF and other voice codecs:

  • Uncompressed telephony: G.711 μ-law/A-law with Appendices I and II
  • Narrowband compression: G.729AB, G.729.1, G.723.1A, G.726, G.727
  • Wideband compression: G.722, G.722.1
  • Wireless network: GSM EFR, AMR and AMR-Wideband; EVRC and EVRC-B
  • Internet voice: iLBC, SILK (Skype), Opus [roadmap]

In addition, each voice channel should support echo cancellation, announcements, conferencing, mixing, and a full range of tone detection and relay functions.

Video Capability
HD (or other) video streams can be redirected within an appliance to a PCI Express accelerator card and transcoding and conferencing can happen without making any use of existing processing resource. For example, some PCI Express media accelerator cards can handle up to six 4-party video conference bridges where each participant uses H.264 720p at 30fps. There are also cards that can handle resizing to and from 1080p.

Design engineers should ensure the solution they choose supports the most common video compression schemes used in communications, such as H.263 (legacy) and MPEG-4 for CIF, and H.264 at resolutions up to 1080, and is easily upgradeable as newer compression schemes emerge.
Many rack mount servers are available in fully NEBS compliant, hardened versions, so the accelerator card should be designed for NEBS carrier grade and data center environments, so offering a common solution for both enterprise and telecom environments.


A Better Solution
High density voice and video processing is increasingly in demand for applications such as session border controllers, media gateways/servers or media resource functions, video or content optimization, video communications servers, and interactive voice and video response systems. We can see that using a PCI Express media processing accelerator card rather than additional servers has a lot of benefits:

  • It takes up less space
  • It consumes much less power
  • It can easily be retro-fitted to existing deployed systems as a true feature addition
  • It costs less than a comparable server + commercial host media processing combination for the same performance

Consequently, it offers a lower total cost of ownership and a much simpler upgrade and deployment experience. In the words of Spinal Tap’s Nigel, “They go to eleven. They’re one better.”


Brian Carr is strategic marketing manager for the Embedded Computing business of Emerson Network Power, with a particular focus on communications markets and applications including wireless, wireline and service delivery. A widely published author and accomplished speaker on AdvancedTCA technology and applications, Carr has also represented Emerson on conference advisory boards and industry consortia. He holds master’s degrees in engineering from Cambridge University and in information technology from Essex University.

Share and Enjoy:
  • Digg
  • Sphinn
  • Facebook
  • Mixx
  • Google
  • TwitThis