High Density IP Voice Conferencing with Mixed Narrowband and Wideband Channels
The leap from traditional voice conferencing to IP-enabled voice conferencing brings
with it a number of technical challenges that need addressing. We’ll start by describing
the state of the art in traditional conferencing algorithms. Next, we’ll discuss the
challenges that must be overcome in taking the leap to IP conferencing. Then, we’ll piece
together an IP conferencing system on a DSP chip. We’ll conclude by listing the
capabilities of a few Texas Instruments DSP chips in terms of conferencing channels and
features that each can support.
Traditional High-Density Conferencing
A conferencing algorithm combines the active conference input signals together to form
a composite signal. Before sending the composite signal back to each conference party,
that party’s transmission is removed from the composite
signal to avoid the perception of echo.
As the number of conference participants increases, we run into a few more issues. For
example, each participant presumably has some level of background noise. The noise level
may be low as in an office environment,
high – as is the case for a person on a cell phone while driving, or anywhere in
between. If we added all the input signals blindly, the noise would accumulate as the
number of participants increased. Furthermore,
when using fixed point arithmetic, the summation of many signals –including speech
signals – can cause overflow or clipping,
a very undesirable condition.
A conferencing algorithm can use several techniques to combat these issues. For
example, only a few “dominant” speakers’ signals are added to the conference at any given
time. This reduces the number of signals being added. Furthermore, noise suppression can
be employed on all input channels. So when there is significant background noise that
would otherwise bleed into the conference sum, the noise suppressor reduces the extent of
such noise. Automatic Level Control can be employed to combat overflow and clipping as
well as to compensate for different amounts of loss seen in the party’s input signal.
IP Conferencing – Challenges
A traditional voice conferencing system bridges together multiple traditional telephone
channels which occupy 3.3 kHz of audio bandwidth, and are sampled at 8 kHz. VoIP channels
can be either narrowband or wideband. Wideband channels have an audio bandwidth of 7 kHz
and a sampling rate of 16 kHz. Both narrowband and wideband channels can be sent either in
uncompressed or compressed format over a VoIP link.

Challenge #1: Compression
Table 1 compares some of the more commonly
used narrowband and wideband compression standards. Table 1 includes the processor
utilization for each of the algorithms
for a Texas Instruments C64X DSP.
Challenge #2: Adding Wideband Channels
In describing the state of the art conferencing algorithms, we mentioned a number of
signal processing algorithms that are used. We can perform some of these algorithms, such
as noise reduction and voice activity detection on the individual channels at their native
sampling rates. But before we do any summation,
of a mixture of 8 kHz and 16 kHz sampled data streams, we need to convert to common
sampling rate. The two logical choices are 8 kHz and 16 kHz. If we use 8 kHz, the 16 kHz
channels must be passed through a 2:1 decimation filter. If we choose the 16 kHz sampling
rate, the 8 kHz channels must be passed through a 1:2 interpolation filter.
The advantage of using an 8 kHz common sampling rate is the reduction in the signal
processing load because portions of the conferencing
algorithm will operate on half the number of samples. The disadvantage is that we will
lose the audio quality benefit that is afforded by wideband audio channels. The pros and
cons are the reversed when using a 16 kHz common sampling rate.
Challenge #3: VoIP Packet Loss
IP networks are not designed to carry real-time traffic. The end-to-end delay across a
VoIP network varies from one packet to the next, and sometimes packets arrive too late to
be decoded in real-time. Late packets are no better than lost packets when dealing with
VoIP. Furthermore, packets arriving out of order must be resequenced.
Techniques to deal with these issues already exist in VoIP systems. A jitter buffer
compensates
for variations in packet delay. RTP reorders out-of-sequence packets. Packet loss
concealment attempts to smooth over lost packets by looking at the recent signal history
and filling the missing pieces.
The reason we mention these issues is that in high-density conferencing systems, these
effects are magnified. For example, if we have a 100-channel conference call and each
channel has an average packet loss rate of 1%, it is likely that for each frame, one or
more channel will experience a lost packet! Since all channels except the offending
channel will hear the effects, it is almost like having a 100% packet loss rate on a
single channel.
Challenge #4: VoIP Echo Cancellation
Continuing the same line of reasoning with respect to echo cancellation, the situation
gets worse. When echo exists on a VoIP channel, it is worse than on a TDM channel because
the round-trip delay is longer due to the latency through the IP network. People are more
sensitive to echo as the round-trip delay increases.
Using the previous example, where there are 100 parties in a VoIP conference call, assume
that one party has uncancelled echo on his/her line. When any of the other 99 people
speaks, all 100 parties will hear an echo. In a typical two party call, the solution is to
hang up and dial again. In a 100 party conference call, you first need to identify the
offending party by determining whose speech does not cause echo. That person must then
hang up and dial back in, hopefully this time on an echo-free circuit.
Running into this problem is far more likely in a 100 party conference call than in a
two-party call. It is therefore much more important to ensure that a VoIP enabled echo
canceller is used in an IP conferencing system.

Piecing Together a VoIP Conferencing System on a DSP Chip
Figure 1 is a block diagram of a mixed narrowband/
wideband VoIP conference system on a chip. The packet interface block handles RTP , jitter
buffering, and both narrowband and wideband speech encode and decode functions. The packet
echo canceller cancels echo that may be present on the opposite side of the packet network
for narrowband channels only. It is assumed that wideband channels use four-wire
interfaces at the far-end and therefore have no hybrid echo.
Using a similar argument, there is a line echo canceller connected to the narrowband (8
kHz) TDM interface, but not to the wideband
(16 kHz) TDM interface.
The sampling rate converters perform sampling rate conversion with appropriate
filtering on narrowband and wideband signals which include both PCM and packet channels.
Note that it might not be necessary
to perform both conversions because the conferencing algorithm can run at either sampling
rate. But if tone detection/generation
is to be performed, it tends to be done at the narrowband sampling rate so additional rate
conversion may still be necessary.
The elastic store is a buffering
mechanism that compensates for possible varying frame sizes in the packet channels.
Finally the conference module performs the actual conferencing – including voice
activity detection, noise suppression,
dominant speaker identification, AGC, and conference summation.
Conferencing Channel Densities
Each conferencing application
may have unique requirements, making a programmable DSP an ideal chip on which to host the
conferencing solution. Some applications
may not need the tone processing; others may use external echo cancellers. By using a
programmable DSP, features can be added or removed. By doing so, we only use as much of
the DSP’s resources (MIPS and Memory) as is necessary for the required feature set. This
allows us to use the appropriate (smallest, least expensive, lowest power, etc.) DSP.
Stated differently, by removing the unneeded features, we can squeeze more conferences and
more ports into a single DSP.

Table 2 lists a few sample conference configurations
along with the achievable port density when running on different TI DSPs.
Summary
Achieving good voice quality during in large conference calls is a challenge even when
the problem is confined to the traditional narrowband, TDM-based world. When VoIP channels
and wideband channels into the mix, it is imperative that the right algorithms are used
and they are integrated properly. Doing so efficiently on a programmable
DSP is not only cost-effective, but can also get your product to market in a timely
manner.

Contact Information

Adaptive Digital Technologies, Inc.
525 Plymouth MeetingSuite 316
Plymouth Meeting, PA, 19462
USA
tele: 610.825.0182 x120
fax: 610.825.7616
sales@adaptivedigital.com
www.adaptivedigital.com













