Intel tries DSP again...using a "soft" approach

Software is key to Intel's DSP plans.

Intel had a quasi-successful digital signal processor with their i860 in the early 1990s. Now they’re trying again using software, FPGAs, and standard CPUs

[Application Feature]

Intel tries DSP again … using a ‘soft’ approach

Intel had a quasi-successful digital signal processor with their i860 in the early 1990s. Now they’re trying again using software, FPGAs, and standard CPUs.

Launched in 1989, the Intel 80860 (“i860XP”) digital signal processor set a number of trends for the time but never really achieved commercial market success. Notably, the VLIW architecture included a 32-bit ALU and a three-part 64-bit floating-point processor unit. (At the time, 16-bit CISC CPUs were still shipping in volume.) According to the folks at Answers.com and this writer’s first-hand knowledge, the i860 competed for market share and Intel resources against the company’s 80960Kx RISC CPUs. Recall that these were the heady days when every major semiconductor company built their own CISC processor, and the migration to RISC was an approaching sea change. Intel end-of-lifed the i860 in the 1990s, followed thereafter by the ’960 family as well – freeing up the company to focus on the then-recently introduced (and hugely successful) Pentium family. So you could say that Intel’s foray into DSP didn’t last too long.

Today, of course, DSP is used in practically every digital doodad from MP3 players to smart phones to automotive engine management units. Intel’s plans for DSP no longer include discrete, stand-alone devices; rather, DSP is built into the company’s Nehalem (Core i7 and follow-on) architecture as well as Intel’s System-on-Chip (SoC) product line that started with the EP80579 multimedia processor for set-top boxes and high-end HDTVs. Intel implements DSP in on-chip functional units such as MPEG-4 decoders, WiMAX radios, and various other codecs that deal with audio, imaging, or software-defined radios. But beyond highly integrated Application-Specific Standard Products (ASSPs) based on SoCs, what is the company doing for general-purpose DSP implementations?

Intel is following the same strategy I’ve postulated with regard to the company’s Wind River acquisition: using software to drive chip sales. Its two biggest tactical initiatives are among the best-kept secrets you’ve probably never heard of: 1) a three-piece PowerPC with AltiVec to Intel Architecture (IA) SSE conversion toolkit, and 2) the FPGA-based QuickAssist Technology.

Partnered with and partially funding the UK company NA Software Limited (NASL), Intel now offers three tools that move AltiVec DSP applications to Intel processors. In the general-purpose DSP market, which includes automotive and what Intel calls Military, Aerospace, and Government (MAG), Freescale’s PPC has been the market leader in DSP for more than 10 years. With native signal processing instructions for FFTs and other vector operations built into the AltiVec engine, the PowerPC offered the best balance between general processing (“housekeeping”) and signal processing (“number crunching”) in a single device.

But Freescale was slow to move to multicore CPUs, and the company’s last discrete PowerPC – the MPC8641D dual-core – never gained market traction. This is partly due to Intel’s onslaught of dual-core Core Duo and Core 2 Duo CPUs during the past two years. Intel sees it this way: “Problem: AltiVec roadmap products are uncertain. Opportunity: Intel multicore processors are very effective for DSP applications.”

I’ll say. Using “Tool 1,” a VSIPL library for IA, a 1K point FFT is faster on a 2.16 GHz Core Duo 2 than on a 1 GHz 8641D (400 MHz bus) as follows: Real-to-complex: 7.5 versus 4.5 for IA (in microseconds); Complex-to-real: 7.7 versus 4.6 for IA; Complex to Complex out-of-place: 10.9 versus 6.8 (running Linux). For vector routines where the AltiVec really shines, IA was equivalent or better as follows: Vector square root: 1.4 versus 1.3 on IA; Complex vector multiply: 2.4 versus 1.8; and Polar convert: 15.6 versus 10.7. Vector cosine, however, ran slower on IA due to the hardwired instructions in the AltiVec: 3 versus 6.4 on IA. (Note: Code was not optimized for multithreading or multicore on IA. One might expect better results with tweaking.)

“Tool 2” is an altivec.h header file for IA that allows users to take AltiVec code unchanged and convert it to SSE SIMD for SSE2-SSE4 IA processors. By the time you read this, the version for VxWorks 6.6 should be available – do you see how Wind River again plays into Intel’s plans? And finally, for PPC designs with lots of hand-coded “bare metal” optimizations (such as loop unrolling), “Tool 3” is an AltiVec Assembler/Compiler for IA. Stuff AltiVec code in one end, turn the crank, and out pops Intel SSE assembler code. Again, Linux and VxWorks versions should be available as you read this.

It’s amazing that Intel has never made much noise about these three tools because the implications are huge! You can find more information from NASL or in an Intel webinar at http://edc.intel.com/Video-Player.aspx?id=2315.

As for QuickAssist Technology, Intel has created an API for SSE that bolts off-chip FPGA-based coprocessors to the CPU’s FSB. Now turned into a “community” of more than 20 third-party vendors such as XtremeData, Celoxica, and GE Fanuc Intelligent Platforms, QuickAssist also includes hooks into Intel’s next-gen CPU roadmap. I doubt that Intel’s getting into the FPGA business anytime soon, but through QuickAssist it has endorsed what the market has already decided: FPGAs are the best way to do flexible and programmable DSP routines.

Add up the impressive DSP on-chip capabilities of IA processors, plus tools that convert from PowerPC legacy designs into IA devices, then toss in direct-connect FPGA coprocessors, and it’s clear that Intel is back in the DSP business. In all cases, software remains the key to the company’s DSP renaissance.