Performance, flexibility, and efficiency make the case for market-specific DSPs

Eran discusses programmable and semi-programmable approaches to DSP implementations on Systems-on-Chip (SoCs), including CPUs, MCUs, and native DSPs.

With recent advancements in wireless communications, being able to reach 100 Mbps on 4G wireless channels, the tasks carried by DSPs have evolved quite significantly. Modulation and demodulation of such high bit rates require some advanced algorithms, including multiple antennas (also known as MIMO), multicarrier modulation, and QAM modulation schemes. Correspondingly, DSP engines have evolved to meet these requirements in various ways, turning into application-specific (or communication-specific) DSPs.

Similar progression has been witnessed in other DSP fields such as video processing. Screen resolution has increased steadily from VGA to Full HD (also known as 1080p) in recent years, and complex video standards have emerged to deal with these higher resolutions while keeping bit rates as low as possible. Traditional DSP engines found it difficult to tackle these requirements, leading to the introduction of multicore designs and Single Instruction, Multiple Data (SIMD) processing. Also here, DSP has evolved into application-specific processors, specializing in multimedia tasks while maintaining their universality.

More than one way to obtain DSP functionality

There are several ways to obtain DSP functionality in an SoC. The hardwired method, whereby algorithms are implemented completely in hardware, will not be covered in this article. Instead, we will focus on programmable and semi-programmable approaches to DSP implementations, including CPUs, MCUs, and native DSPs.

CPUs such as Pentium and PowerPC processors are extremely high-performance, general-purpose, and run at 2 GHz or more clock speeds. They are usually not suitable for embedded applications. Although they can handle signal processing, their high frequencies and high power consumption shorten battery life.

MCUs, slower and smaller versions of CPUs like Pentium, from companies including ARM Holdings and MIPS Technologies, offer cores targeting embedded applications. As such, they are mainly suitable for control tasks and capable of running operating systems such as Windows Mobile and Linux. Generally, MCUs lack typical DSP support, as their main focus is on the control plane.

The third way to obtain DSP functionality is the native DSP. These are well suited for math-intensive and data-centric processing tasks, and are designed for embedded applications, ensuring power consumption and circuit size are suitable for use in an SoC chip. Typical examples for such native DSPs are TI’s C55 and C64, ADI’s Blackfin, and CEVA’s TeakLite and CEVA-X families of DSP cores.

Figure 1 maps these various DSP solutions on an X-Y space, describing DSP performance and flexibility. Hardwired accelerators are also mapped. The color coding reflects power consumption of each of the solutions.

Figure 1: DSP Performance

Fully programmable and general-purpose in nature, MCUs can perform various functions and are suitable for different applications in an SoC. However, MCUs do not achieve high performance in DSP functions, not being designed for this purpose.

Compared to MCUs, CPUs are faster at performing DSP functions, if you run them in the 1 GHz to 2 GHz range. But, unless you have sufficient battery capacity, it is highly unlikely that a CPU can be used for DSP functions in a handheld product. Hardwired engines obviously provide the highest DSP performance at the lowest power consumption; however, flexibility is significantly impaired.

Thus far, native DSPs have provided a good trade-off between all three aspects – flexibility, performance, and power consumption. With recent developments in various DSP applications, the embedded processor landscape has evolved with new subcategories.

DSP tasks are evolving, as are DSP processors

In recent years, DSP requirements in various applications such as home audio have significantly evolved:

·        Increase in the number of channels from 2 (Stereo) to 5.1 (typical Dolby Digital devices) to 7.1 (typical Blu-ray Disc players) per stream

·        Higher number of streams need to be decoded and mixed together from 1 (typical A/V receiver) to 2 (typical DVD use-case) and 3 (Blu-ray Disc use-case)

·        The bit rate of each decoded stream from 48 Kbps (MP3) to 24 Mbps (DTS-HD MA) is higher

·        Growing complexity of audio codecs and post-processing functions


A similar pattern can be found in video applications and the requirements they pose on today’s DSPs, where higher resolutions, higher bit rates, and more complex video standards and toolboxes have been introduced.

The bit rate evolution from 3G to 4G wireless standards follows the same pattern, but the increases are exponential.

Figure 2 depicts these advancements in all three DSP fields.

Figure 2: The burden on DSPs has grown dramatically in a number of areas.


In order to meet the mounting requirements for DSP horsepower, each of the three processor-based alternatives has evolved in somewhat different directions.

MCU vendors have approached the rising requirements by adding some DSP capabilities into their architectures and extending the ISA of these processors with dedicated DSP instructions. The ARM Cortex-A8 is a typical example, where a SIMD accelerator (the ARM NEON) is attached to a CPU. Some vendors take a slightly different direction, allowing their programmers to extend their MCUs with their own instructions. In both cases, such DSP extensions could provide a good system trade-off for basic DSP functions. However, for mid- to high-level processing requirements, such MCUs would be far from a viable solution, which also presents severe pitfalls in product roadmap and reuse.

CPU vendors have come to understand that simply running the processor faster does not suffice to meet the rising requirements. Instead, multiprocessor architectures provide an alternative solution, enabling higher Instruction-Level Parallelism (ILP), although this approach strains their ability to meet stringent power budgets in portable devices. Other CPU vendors have simply decided to focus on their core expertise and concede the heavy-duty DSP applications to other solutions. For example, no CPU (or multicore CPU) would be capable of effectively decoding video at 1080p resolution or running LTE baseband. Such tasks are left to market-specific DSPs.

While general-purpose DSPs still serve some markets, in the higher-tier an approach that uses an ISA that supports market-specific features is required. For example, such an approach can support the 4x4 matrix calculations required for the latest wireless communication standards. Such market-specific DSPs can serve a complete market more precisely, providing the much-required horsepower without limiting the reusability of such architecture for various solutions in this market. One example would be a wireless communication DSP that can be efficiently used for LTE, WiMAX, HSPA+, EV-DO, and the like (Figure 3).

Figure 3: Evolving DSP solutions mapped onto the X-Y space shown in Figure 1


Adding DSP functions to MCUs – is this a good practice?

For MCU vendors, one way to move up the performance ladder is to offer DSP instructions that extend their architectures. Some vendors offer MCUs with a predefined DSP ISA extension (for example, ARM NEON); others offer a set of various possible DSP ISA extensions and let the user choose; yet others allow licensees to come up with their own DSP instructions. In all cases, since these DSP ISA extensions are not inherently embedded into the processor, these could be better regarded as slave accelerators to a central microcontroller.

Pitfalls of extending an MCU for advanced DSP chores

Adding functional units (such as MAC units and adders) will not become effective unless you carefully take care of memory accesses. Typically, MCUs only support a single memory access at a time with large flat memories, with various data hazard restrictions. These are very different from the most basic DSP architectures and would quickly deteriorate the added value of any DSP extension.

MCUs usually heavily rely on cached memory architectures with privilege modes and virtual memory support, due to the nature of the control functions they need to support. DMAs, on the other hand, are less commonly used in MCUs; hence, basic DMAs could be offered as an add-on to an MCU. In highly data-intensive DSP applications, advanced DMAs are a prerequisite due to the real-time nature of the application and its requirement for deterministic processing. Therefore, as opposed to extended MCUs, advanced DSPs inherently support DMA mechanisms within the processor architecture.

Extending an MCU with DSP instructions usually does not involve any major changes in the addressing mechanism. Hence, unique addressing modes that are very typical in DSP applications due to their predictable data access patterns (for example, cyclic buffers and bit-reverse) are usually not supported by such extended MCUs.

DSP extensions are not an inherent part of the MCU, so they require compiler intrinsics in order to make good use of these DSP capabilities. This means straightforward C code can’t be easily compiled into optimized assembly code running on such DSP extensions. For native DSP processors, such compiler support is essentially embedded in the compiler-architecture mutual design.

MCUs typically do not support unique DSP data types required by some algorithms such as the 10- and 12-bit elements common in advanced wireless applications. In addition, numeric accuracy in an MCU is usually limited to 32-bit. For some applications, larger dynamic range is required, up to 72-bit data, including guard bits, saturation hardware, and rounding mechanisms. These are typically supported by market-specific DSPs, according to requirements unique to an application.

In such extended MCUs, the available parallelism usually limits the programmer from using the MCU and its DSP extensions at the same time. Thus, even though the processing horsepower might be available, it cannot be utilized in an optimized way.

For extended MCUs that offer you a set of various DSP extensions to choose from, or even the option to extend the ISA with DSP instructions, another issue is non-standardization. Such a la carte processor design could be appealing at first glance; however, software maintenance will be extremely difficult. You will need to port your code over and over again from one architecture to another, without being able to reuse your code.

This also means that code that was developed by a third-party software vendor will not necessarily run on your configuration, given that some instructions and mechanisms could be different. Since this becomes a proprietary processor architecture, the MCU vendor will not be able to maintain its roadmap. Instead, as the user (and creator) of this specific MCU configuration, you will need to keep maintaining your own processor roadmap in order to sustain a complete product roadmap.


While traditional DSP engines are a viable solution for many applications where general-purpose DSP horsepower is required, a new market-specific approach is needed to meet challenges such as 4G wireless communications processing and HD video/audio processing. Multicore designs boost performance but also increase power consumption. Some MCU vendors are offering various DSP ISA extensions. DSP vendors, on the other hand, are turning their new DSP designs into market-specific DSPs. The performance, flexibility, and efficiency with which these alternative solutions deliver this functionality remain the key issues.

As discussed, unlike market-specific DSPs, extended MCUs treat DSP capabilities as slave accelerators, rather than as an inherent part of the architecture. Furthermore, extended MCUs whereby customers can define their own set of DSP instructions can be considered a single-point solution, and cannot be used and reused for various products in the same product line. They lack roadmap and code compatibility and ultimately lack the universality of market-specific DSPs, whereby a single DSP architecture can serve a complete product line for a specific market. In essence, these are the critical advantages that market-specific DSPs, such as the CEVA-TeakLite-III for advanced audio applications and CEVA-XC for wireless communications, deliver and why market-specific DSPs will form the basis in next-generation SoCs.

Eran Briman serves as Vice President of Corporate Marketing for CEVA. Previously, Eran served as Senior Director of Marketing, specializing in wireless communications and multimedia applications. Prior to that, he was the Chief Architect for CEVA, with overall responsibility for the research and development of next-generation DSP cores. Before joining CEVA, he was with the DSP Group since 1995, holding different engineering and R&D management positions. Eran holds a B.Sc. in Electronic Engineering from Tel Aviv University and an MBA from the Kellogg Business School in Northwestern University and holds several patents on DSP technology.