Selecting the right peripheral for DSP applications

Embedded systems interact with the environment and other components within a larger system. Much of the bottleneck in meeting real-time demands comes from delays in getting data in and out of the system, rather than processor speed or software implementation. Knowing how to select the right mix of peripherals for embedded applications and how to utilize these peripherals to optimize system performance can alleviate this bottleneck.

Choosing peripherals DSP developers must understand when and where peripheral trade-offs can be advantageous for a given application. Selecting peripherals is not just an exercise in choosing a processor with the most MIPS or MACS. Using inefficient or poorly chosen peripherals for the application can significantly affect system performance.

Embedded processors vary in their customization for the specific problem at hand. These processor types will range from general purpose processors that handle a wide variety of applications, to application-specific processors like DSPs, which are specific to a particular application class such as signal processing, to single purpose processors, which are customized to a very specific function.

A single purpose processor is a digital circuit designed and implemented to execute a very precise program. In a digital camera, for example, a single purpose processor is often used to implement a JPEG codec, which can then be used to perform compression and decompression on video frames.

Many so-called single purpose processors are predesigned and integrated onto a more complicated System on a Chip (SoC). Other common names for these single purpose processors are accelerators or peripherals1.

There are benefits, as well as drawbacks, to using peripherals in embedded systems processors. On the one hand, performance is fast, power consumption for the functions executing on the peripheral is minimized, and overall size/function is reduced. On the other hand, a custom peripheral implementation is less flexible across applications, and unit cost may be higher due to the increased customization.

Modern DSPs often have several peripherals integrated on chip. These include serial ports, UARTs, USB ports, and video ports. Some of these peripherals require significant software support to operate efficiently, so engineers should consider whether the vendor supplies peripheral device drivers to interface with the peripherals to ease overall system development time. Because DSPs are designed to support hard real-time processing with streaming data samples as the critical path for many applications, these processors are optimized to move this data quickly and efficiently from peripherals to the DSP core for processing and then back out to the environment 2.

To meet these demanding I/O requirements, DSPs have dedicated serial ports that connect to customized peripherals. The interaction between the peripheral and the serial port is synchronous and controlled mostly in hardware.

DSPs also provides one or more Direct Memory Access (DMA) controllers to reduce the overhead of multiple interrupts being generated over the serial port. Intelligent use of the DMA using software control allows multiple samples to be buffered and then transferred to the on-chip memory without involving the DSP. The DSP is interrupted when the buffer is full of data, rather than getting interrupted on every data sample. Because of these efficiency gains, most DSP peripherals use DMA to move data on and off chip (Figure 1).

Peripherals – performance and function Peripherals are often referred to as hardware accelerators. In general, this refers to the replacement of a software algorithm with a hardware component, taking advantage of the intrinsic speed of hardware. From a programmer’s perspective, it’s not too different. Interfacing to a hardware accelerator is similar to calling a function. The main difference is that the function is implemented in hardware and is transparent to the calling function.

Using hardware-accelerated peripherals can result in an order of magnitude improvement in execution time, up to 100x for some algorithms. Hardware-accelerated peripherals are more efficient at performing certain mathematical functions, moving data from one place to another, and repetitively performing the same operation many times.

Using a peripheral requires the programmer to write data to memory-mapped registers (most peripherals have memory-mapped registers). Peripheral computation is then performed outside of the CPU so the CPU can continue executing code while the peripheral is processing in parallel. Setup and initialization of a peripheral is not too difficult, and usually requires several instructions to write to the control registers, status registers, and data registers, as well as a few instructions to read the result 3.

As an example of the power of peripherals, consider the latest generation of DSP processors for the video application space. Until recently, DSPs have not been fast enough to process images and video in real time. The main requirement for video applications is to be fast enough to keep up with smooth, continuous video and still be able to extract useful information.

Peripheral integration The Texas Instruments TMS320DM642 processor (Figure 2) has a set of peripherals designed for this application space. The combination of an application-specific DSP core and a set of application-centric peripherals allows the DSP device to perform extremely fast calculations and algorithms specialized for image processing. By integrating the key audio/visual connectivity peripherals on chip, overall system cost is reduced.

The DM642 device has three configurable video port peripherals (VP0, VP1, and VP2). These video port peripherals provide a glueless interface to common video decoder and encoder devices. The DM642 video port peripherals are designed to support multiple resolutions and video standards such as ITU-BT 656 and SMPTE 125M. The ITU-BT 656 standard describes the fundamentals of the video digitization process, while SMPTE 125M defines the parameters required to generate and distribute component video signals on a parallel interface.

The video port peripherals are configurable and can support either video capture and/or video display modes. Each video port consists of two channels, A and B with a 5120-byte capture/display buffer that is split table between the two channels.

In addition to the video port peripherals, the DM642 also has a multichannel audio serial port peripheral to support up to eight stereo lines over 16 channels, an Ethernet MAC to connect the system to IP packet networks, and a PCI peripheral to connect to a backplane chassis or PCI bus.

As shown in Figure 3, video port data transfers take place using the DMA. DMA requests are based on buffer thresholds. The preferred transfer size is often one entire line of data because this allows the most flexibility in terms of frame buffer line pitch. Some modes of operation for the highest display rates may require more frequent DMA requests, such as on a half or quarter line basis 4.

Embedded engineers must select the right processor for the right application. Even though this rarely results in a perfect match, the engineer must be able to make intelligent decisions based on some important data, such as:

  • Which peripherals are available on a given device
  • Whether an existing peripheral mix closely matches the requirements for the application
  • Understand the performance trade-offs with onboard peripherals
  • Determine whether some peripherals are better placed off the chip
  • Determine how easy it is to move data among the cores and on-chip peripherals