Pumping CD-quality stereo audio over digital networks in real-time

The promotion of “digital lifestyles” over recent years has significantly raised consumer expectations in relation to video definition and audio fidelity.

2Digital audio has become synonymous with high-fidelity sound, but storing and “streaming” CD-quality stereo audio can be resource-hungry in terms of processing power, programming effort, and energy consumption. In telecommunications, much research has been conducted over many years to develop signal processing systems capable of transmitting speech at lower bit rates than PCM. Here we present a predictive coding algorithm that can be implemented in commodity FPGA devices and allows CD-quality audio to be transported in real time but at lower bit rates over packet-based digital networks, from Bluetooth wireless links to Internet Protocol networks.

The promotion of "digital lifestyles" over recent years has significantly raised consumer expectations in relation to video definition and audio fidelity. The latter challenge has been met with compact disc recordings, immersive surround sound for home theater, and more recently, satellite radio. But the inexpensive delivery of multi channel digital audio with PCM quality in real time over bandwidth-limited channels still poses a serious challenge to engineers. The telecommunications industry has conducted much research and development for many years into audio compression schemes capable of transporting intelligible human speech at considerably lower bit rates than PCM. Likewise, the consumer electronics industry has adopted both sophisticated compression standards put forward by international working groups and proprietary techniques developed and marketed by commercial organizations.

Sidebar 1
(Click graphic to zoom)

CD and other high-end digital audio systems typically use 16-bit linear PCM. As its name implies, Adaptive Differential Pulse-Code Modulation (ADPCM) is a technique thatre-codes the difference between actual and predicted audio samples, using quantization step-sizes that adapt to the magnitude of the prediction error. In this way ADPCM canprovide a similar audio quality to linear PCM but at a much-reduced bit rate. The apt-X system (see sidebar) aims to transparently code 24-bit PCM audio with a fixed compression ratio of 4:1 and is based on an implementation of sub-band ADPCM.

In order to better understand how this predictive type audio codec can be implemented in as IP cores in (for example, the Altera Cyclone II and the Xilinx Spartan-3E) or SoC/ASIC using industry-standard design tools and methodologies, let’s examine the encoding and decoding processes.

Encode side

The audio signal must first be converted to a 24-bit/sample PCM digital signal. This signal is then presented to the encoding algorithm (as seen in Figure 1) where successive time blocks of four PCM samples are first filtered into four equal-bandwidth sub-bands. These signals, still in 24-bit format, are then simultaneously processed in four separate signal chains - each incorporating a backward linear prediction loop that provides anestimation of the input signal. The prediction, based on the history of previous PCMsamples, is subtracted from the input to yield a difference signal that is commonly termed the error signal. It is this 24-bit error signal that is then quantized for each sub-band. The quantized values are packed and embedded non-audio data is added to allow decoder synchronization via the Autosync feature. This data packing results in a 24-bit codeword that represents the content of the original 4 x 24-bit linear PCM samples per channel (96 bits) and is therefore one quarter of the original PCM data rate, a rate reduction of 4:1.

Figure 1
(Click graphic to zoom by 2.3x)

Decode side

The input to the decoder is a 24-bit word. Due to the embedded Autosync information added during encoding, the decoder can detect the boundaries of the 24-bit words even after transmission across networks that are inherently unframed. Once decoder synchronization has been achieved, each input word is demultiplexed into the four low-bit resolution codewords that are fed into four separate sub-band inverse quantizer chains. The output from each inverse quantizer is then combined with the output of an adaptive predictor of similar structure to the encoder predictor. This generates four reconstructed 24-bit bandwidth-limited samples, one per sub-band, and the four sub-bands are inverse filtered to recreate four 24-bit PCM samples. The decoder output stream has the same data rate as the original PCM signal at the input to the encoder. If required these samples can be converted to an analog audio signal using an external converter.


Given that the problem is identifying the most elegant means of encoding audio for transport over a bandwidth-compromised channel, and having selected an ADPCM-based solution owing to the inherent benefits - low latency (no sample buffering), resilience in the presence of noise/drop-outs, and manageable complexity the problem that remains to be solved is the determination of the most appropriate means of implementation. Here, no DSP versus FPGA conundrum exists, as both these device architectures are suitably evolved to cope with the modest demands of enhanced ADPCM algorithms on digital resources (compared tocomputation- and memory-intensive "psychoacoustic" based audio codecs such as MP3, AAC, and their myriad variants).

The physical implementation of signal processing functions and algorithms should be a straightforward process. The choice of target device - DSP, FPGA, ASIC, SoC - in which to integrate the IP core is a decision left open to the system-level design engineer. Various factors influence this choice: bill of materials cost, upgrade flexibility, and power consumption, to name a few. Audio codecs available for design-in are typically available in a variety of delivery formats, from C code for a RISC processor, or gate-level netlist for a Field Programmable Gate Array device.

Audio codec in FPGA

Tables 1 and 2 show the key parameters from FPGA integration of the audio codec IP core under discussion. This implemen-tation supports two channels of encode and two channels of decode concurrently. Hence, the core can function as a full-duplex stereo codec. These figures are indicative: they vary depending on the enforced design constraints and the setup of the logic synthesis tool. (This data was obtained using the QuartusII7.1sp1 synthesizer and the Xilinx ISE9.21sp1.)

Table 1
(Click graphic to zoom by 2.5x)

Table 2
(Click graphic to zoom by 2.5x)

Audio codec in SoC/ASIC

Implementation of the full-duplex stereo apt-X codec described in the previous section requires approximately 200K ASICgates. The core architecture for ASIC is particularly efficient ifa large number of audio channels is required, as additionalchannels can be supported by scaling the core clock frequency and adding a relatively small amount of additional on-chip memory. Lower power operation is made possible by use ofspecific power-saving ASIC design techniques.

Typically the power consumption of the core on ASIC is 25 percent to 40 percent of the power consumption on FPGA, assuming comparable technology nodes. Evidently, structured ASIC technology offers an attractive trade off between price, performance, and power consumption.

Audio codec in DSP-RISC

The apt-X algorithm can be readily implemented as software on a variety of DSP and RISC processors. On modern DSPs, apt-X typically uses 8 KB of program storage, a fixed data memory block of 7 KB, and an incremental data memory block of 0.5 KB per audio channel supported. Each channel of apt-X encoding or decoding requires about 17 MIPS.

RISC processors can offer a lower cost solution, but their internal architecture is less well suited to efficient implementation of the DSP algorithms within apt-X, hence the processing requirements rises to 20-27 MIPS per audio channel, depending on the specific processor used.

Product application

Audio codec IP cores are integrated in digital communications applications from professional broadcast equipment (Figure 2) for Audio-over-IP contribution between studios, to microphones, active speakers, and Bluetooth A2DP Stereo headsets.

Figure 2
(Click graphic to zoom by 2.5x)

David Trainor is Engi-neering Manager within APT’s R&D licensing group and is responsible for new audio coding algorithms for hardware and software platforms. He designed the ADEPT core, a synthesizable platform for licensable audio codec IP. Previously, David led a team designing security systems for pay-TV at Latens Systems, and spent six years with Amphion Semiconductor (now NXP) designing standards-based IP cores for wireless communications. David earned his doctorate and masters in EE from the Queen’s University of Belfast, Northern Ireland.

+44 (0)28 9067-7200