For floating-point processing, new choice arrives with the new decade

Floating-point processors are a key element of (DSP) applications. For the last couple of decades, the AltiVec floating-point processor provided in PowerPC architectures, after it unseated the use of dedicated, specialized , has stood as the de facto standard in the industry. But now recent advances in Intel microprocessors have introduced an attractive alternative to AltiVec in the COTS military arena.

Military digital signal processing applies algorithms to acoustic, radio, or optical sensor data. To interpret this digital representation of the physical world requires mathematical processing. Many years ago, processing was the norm, using analog circuits such as simple filtering. Today, we collect multiple digital samples and put numerical algorithms to work. Digital signal processing is numerical and depends on mathematical processing of data rather than analog processing. In the digital sampled domain addition/subtraction/multiplication and division operations and transcendental functions, such as sines and cosines, form the building blocks of the numerical algorithms involved. The problem for computers is that of precision.

A piece of paper theoretically offers infinite precision. In comparison, the fixed size of integers (32- or 64-bit) limits numerical operations done inside a computer. On paper, complex algorithms involving the multiplication of numerous exponents, for example, can be performed without concern for numeric “overrun.” Computers, on the other hand, are subject to overflows (and underflows). The solution is floating-point processing, whose main benefit is that it makes overflowing much less likely to occur, because the range that can be expressed in the number is made very large. Curtiss-Wright Controls’s DSP customers have a strong preference for utilizing floating-point arithmetic, which is well known for simplifying algorithm development, versus the more complex effort required to work in fixed-point format with its attendant management of overflow and underflow conditions.

Since the 1990s, processors from the PowerPC family, also known as Power Architecture, and their AltiVec floating-point vector math unit, have been the dominant choice for open-system COTS boards used in high-performance embedded military DSP applications. These applications include radar, signal intelligence, sonar, and . Previously, such systems were largely implemented with specialized processors such as the Intel i860, the Texas Instruments 320C40, and the Analog Devices SHARC. These processors were popular because of their floating-point performance.

In the late 1990s a lack of software compatibility with their predecessors stymied acceptance of follow-on processors from Analog Devices and Texas Instruments (the TigerSHARC and 320C6701 respectively). The COTS market then turned to the PowerPC processor. PowerPC, developed by the Apple/IBM/Motorola alliance, was intended for personal computer use to compete with the Intel x86. Its RISC architecture was touted to be the future of high-performance microprocessors, but it was the introduction of the AltiVec instruction unit in the Motorola PowerPC 7400 (“G4”) that changed the signal processing landscape.

Signal processing experts soon realized that the floating-point-capable AltiVec unit could greatly accelerate the inner-loop processing found in common functions such as Fast Fourier Transforms (FFTs). The ability to perform up to four simultaneous floating-point multiplies and additions was revolutionary. Since then, the PowerPC with AltiVec has dominated the military signal processing market with a continuous succession of faster processors, ending with the 8640/8641. From the early 2000s until now, system developers have enjoyed a steadily evolving series of COTS products that offer more features and higher performance while providing a high degree of software compatibility.

Curtiss-Wright Controls’s most recently introduced multiprocessor , the CHAMP-AV6, is based on the Freescale 8641 processor. Freescale, though, has decided not to include the AltiVec unit in its latest high performance processor, the QorIQ P4080. The P4080, announced last year, while an excellent CPU for single board computer designs, with eight cores, integrated memory controllers, and SRIO (Serial ) interface, features a regular floating-point capability which is not the vector processor type required to attain the floating-point performance required for signal processing applications.

Meanwhile, Intel, over the years, continued to develop the floating-point capability of its own processors. Intel’s processors feature a vector-processing unit generically known as Streaming SIMD Extensions (SSE), first introduced in the Pentium III processor. Since then, Intel has continually added features and new instructions, culminating in the current implementation, SSE 4.2. Like AltiVec, SSE is a 128-bit wide processing unit, capable of simultaneously operating on four 32-bit floating-point values. SSE also features support for double-precision floating point, a feature that was never included in AltiVec. In Intel processors, each core has its own SSE unit, so the raw floating-point performance scales with the number of cores. Intel x86 processors are classic CISC processors. Successive generations of Intel processors continue to dispatch more instructions per clock. Since many more instructions per clock cycle get done, and the code density is higher, Intel processors are able perform more than twice the useful work per clock cycle as a Freescale RISC processor.

The good news for the COTS signal processing system designer is that beginning with Intel Core i7 dual-core processors (Figure 1), the low-power, high-performance advantages of the Intel Architecture processor technology can be used for the first time to design products such as DSP engines for the rugged deployed COTS signal processing space. Intel’s SSE unit offers a high-performance alternative to AltiVec floating-point processing for military signal processing applications. Expect to see COTS in the near future that take full advantage of this alternative.

21
Figure 1: The Intel Core i7 micro-architecture can execute 6 operations/cycle including two SSE packed floating point operations for a total of 8 FLOPs/cycle. This performance scales with the number of cores in the processor.

 

Robert Hoyecki is Director of Advanced Multi-Computing at . Rob has 15 years of experience in embedded computing with a focus on signal process products. He has held numerous leadership positions such as application engineering manager and product marketing manager. Rob earned a Bachelor of Science degree in Electrical Engineering Technology from Rochester Institute of Technology.

Rob can be reached at info@cwcembedded.com.