# FPGA coprocessors accelerate the performance of 3-D stereo image processing

Three-dimensional (3-D) image processing systems have reached the mainstream and are now embedded in a wide range of products including security and surveillance devices, industrial robotics, and autonomous vehicles. These systems – that are offered in single- and multi-camera configurations – use a diverse array of techniques to generate 3-D information, including beaming structured light to calculate topographical information and focusing lasers in reflected angle or time-of-flight calculations. The most widely used technique is stereo image processing with two cameras, which mimics the way the human eye gathers depth information and avoids the need for special lighting requirements or expensive lasers.

“The ideal solution is a combination of a DSP processor and an FPGA to reap the benefits of both.” |

Additionally, the availability of standard interfaces such as Gigabit Ethernet, IEEE-1394 (FireWire), and USB 2.0 means an entire system can be assembled simply by plugging the cameras into a commercially available desktop PC. However, this simplicity comes at a price: drastically higher digital signal processing performance is needed to execute complex algorithms. Such enhanced performance can be realized with a combination of discrete DSP processors and FPGA coprocessors in a 3-D stereo image processing system implementation.

**Implementation**
Early image processing systems were slow and inflexible, but achieved superior performance compared to their software implementation counterparts. Advances in computer hardware have significantly improved the performance of image processing hardware. In addition, several recent hardware developments, particularly the addition of FPGA-based coprocessors, are delivering even more promising leaps in performance.

Flexibility is considered a key strength of software-based systems and is extremely important, since requirements and algorithms are constantly being refined and enhanced. However, flexibility is meaningless if the system cannot perform all required functions in the required time. Device performance is also difficult to quantify, since it depends heavily upon the overall system requirements of the application, including I/O and the underlying algorithms.

DSP processors can achieve performance levels comparable to common desktop PCs, even running at slower clock speeds. The Altera Stratix FPGA family further extends the performance of DSP processor-based systems and, at the same time, provides the flexibility needed to enable variant system architectures. FPGAs provide 10 times greater efficiency than DSP processor-based implementations, especially for data flow algorithms with minimal control processing. This efficiency is the inherent parallelism available in an FPGA as compared to the fetch, compute, and store serial processing of a DSP processor. A pure DSP processor-based implementation requires many clock cycles to execute the multiple computations FPGA coprocessors accomplish in a single cycle, as shown in Figure 1.

For this reason, the DSP performance of FPGA coprocessors is orders of magnitude higher than that of discrete DSP processors. Some FPGAs include dedicated hardwired DSP blocks that achieve even higher performance for computationally intensive functions. For example, Altera’s Stratix II family offers up to 384 18 x 18-bit multipliers running at 450 MHz for a throughput of 173 Giga Multiply Accumulates (GMACs) per second.

The ideal solution is a combination of a DSP processor and an FPGA to reap the benefits of both. By itself, the DSP processor provides coding simplicity and impressive performance. Teamed with the parallel operation prowess of an FPGA, it delivers the highest level of performance.

A product that exemplifies these capabilities is the Valde Systems VS1502 stereovision processor, which utilizes the power and flexibility of a DSP processor and adds the coprocessing power of an Altera Stratix II FPGA. The VS1502 provides video input streams via two GbE digital camera interfaces, an industry standard 10/100BASE-T Ethernet port for interfacing, and discrete I/O to interact with external devices to meet typical image processing system requirements.

**Lens distortion**
As with any camera, the images produced by 3-D image processing systems – even those equipped with adjustable or higher quality lenses – are subject to some level of distortion. This distortion hampers crucial performance such as the generation of disparity maps, thereby causing erroneous results. A common approach to correcting this effect is to rectify the image by remapping the pixels.

Remapping is achieved with either a Look-Up Table (LUT) of predefined coordinates or by calculating the position based on a geometric algorithm. The LUT method is faster, but as image resolution increases, the memory required to store the table becomes prohibitive. Performing the remapping in real time is straightforward and requires no memory, but when calculated across two images, it consumes significant processing time. The most cost-effective way to achieve real-time performance of this algorithm is to implement it in an FPGA that simultaneously executes many operations.

**Image correspondence**
Typically, one of the first functions performed by a stereo image processing system is image correspondence – finding the same part of the object in both images. If the top corner is chosen as a reference in one image, the same top corner must be found in the other. This is important because before any depth calculations are performed, the same points must be identified, since the difference between the two represents depth.

A common method for determining correspondence is the traditional Sum-of-Squared-Differences (SSD) algorithm that is described as follows:

- The matching cost is the squared difference of intensity values at a given disparity.
- Aggregation is performed by summing matching cost over square windows with constant disparity.
- Disparities are computed by selecting the minimal, winning, aggregated value at each pixel.

Correspondence is another function best performed by an FPGA in the image processing system. While a relatively simple function, image correspondence is often performed millions of times across the pixels in the image, a challenging task for a DSP processor that serially executes operations.

**Epipolar geometry **
Epipolar geometry is a technique that reduces the number of search dimensions from two to one. It is often used to minimize the computation required for finding image correspondence. Figure 2 illustrates the concept of epipolar point projection. Essentially, a given 3-D point P = [x, y, z] is projected onto each image at two points p1 = [u1, v1] and p2 = [u2, v2]. The epipolar plane is defined by the point P and the two camera optical centers C1 and C2. Plane C1C2 intersects the two image planes at the two epipolar lines: Ep1 and Ep2.

Ep1 passes through two points: E1 and p1. E1 and E2, called epipolar points, are the intersection points of the baseline C1 and C2 with each of the image planes. For a point in the first image, its correspondence in the second image must lie on the epipolar line in the second image. Called the epipolar constraint, this allows the reduction in correspondence dimension search from two to one. Calculation of the epipolar lines involves geometric projections and calculations that are produced at a high rate of speed in an FPGA implementation.

**Disparity maps**
A disparity map is simply an image in which the intensity is a representation of the depth of the image from the camera. Accurate depth measurement is particularly crucial for autonomous vehicle navigation, for both avoiding collisions and positioning grippers. Often in the process of generating a disparity map there are gaps caused by missing correspondences from perspective distortions. These gaps are filled by linear interpolation techniques and median filters, which again are best implemented in an FPGA that effectively parallelizes pixel operations.

**
Two processors: Big visual payoff**
3-D stereo image processing is just one of the many video/imaging applications that increasingly require very high signal processing performance and flexibility to meet the system requirements of performance and cost. Going forward, developers of these systems will utilize the combined power of DSP processors and high-performance FPGAs to execute the demanding algorithms in real time.

A current example of this trend is the Valde Systems VS1502 customizable stereo image processor, which leverages FPGAs to perform computationally intensive functions, thus reducing the processing load on the DSP. This approach enables high performance, low power consumption, and a compact form factor, making the VS1502 ideal for industrial automation applications in inspection, vision-guided robotics, facial recognition, and surveillance cameras.