Big images, complex processing? Think objects
Today’s real-time image processing hardware is feeling the pinch. Traditional reprogrammable devices are straining to keep pace across a range of automated inspection, security/surveillance, and professional video applications.
Got a 4K x 4K image to process? Equalizing, correcting, interpolating, adjusting, and filtering every pixel at 35 fps? The Field Programmable Object Array enters the fray and shrinks the processing hardware required. The Field Programmable Object Array (FPOA) is a high performance device that operates at speeds up to 1 GHz and is programmed at the object level. FPOAs are especially well suited to meet the requirements of ultra-fast high resolution image processing applications. The key to their performance is hundreds of objects that pass data and control to each other through a patented interconnect fabric.
Today‚Äôs real-time image processing hard-ware is feeling the pinch. Gigabit Ethernet links feed multi-million pixel full color space video to frame grabbers. Multi-core processors, fed by high bandwidth PCI Express ports, consume image processed data as fast as it can be delivered. It‚Äôs no wonder that traditional reprogram-mable devices are straining to keep pace across a range of automated inspection, security/surveillance, and professional video applications.
The Field Programmable Object Array (FPOA) is a high performance device thatoperates at speeds up to 1 GHz and is pro-grammed at the object level. FPOAs are especially well suited to meet the re-quirements of ultra-fast high resolution image processing applications. The key to their performance is hundreds of objects that pass data and control to each other through a patented interconnect fabric.
The MathStar Arrix family of FPOAs pro-vides 256 Arithmetic Logic Unit (ALU), 80 Register File (RF), and 64 Multiply Accumulator (MAC) objects. The objectsand the interconnect fabric run on a common core clock, operating deter-ministically at 1 GHz in the Arrix Family‚Äôs fastest device. This determinism enables a designer to select a core clock that meets the desired memory, I/O, and processing requirements.
An image processing example
In a high-end image processing implementation, 4K x 4K pixel color images are fed to an FPOA from the electro-optics of a high performance color CCD or CMOS camera as a single color channel at 35 frames per second. Use of a 900 MHz Arrix device enables external memory transfers at the FPOA‚Äôs 300 MHz top memory bandwidth.
The implementation has several stages:
- Non-linearities from process variations in the sensor array are corrected in the flat-field processing stage, equalizing pixel gain across the array and correcting for dark current flow in the sensor array.
- The Bayer demosaicing process generates RGB pixels by interpolating the color value for each pixel based on the color filter array architecture. Demosaicing increases the number of color channels to a total of three, effectively tripling the processing load.
- RGB data is then adjusted across all color-space components for the ambient lighting qualities or targeted display qualities in the color balancing processing block.
- Lastly, the image is filtered using a 3 x 3 convolution kernel before being driven out of the FPOA at the pixel rate.
Flat-field correction, Bayer demosaicing, color balancing and spatial filtering are mapped to the FPOA, as shown in Figure 1.
Flat-field correction is used to adjust image-sensor output data to ensure that constant intensity images generate constant pixel values at various levels of image intensity. This process addresses three types of pixel-based non-uniformities: gain, dark current offset, and defective pixels.
To rectify these non-uniformities, both a calibration and correction process must be performed. The calibration process determines the correction factors for pixel gain and offset and generates a defective pixel map. The correction process calculates an appropriate value for non-uniform pixels, and includes a dead pixel correction unit. Gain and offset correction can be performed using a single MAC object running at the core clock rate. Correction factor memory resides off-chip and can be accessed as two correction factor pairs every memory clock cycle.
A Bayer color filter array filter is an optical filter that routes specific wavelengths of light to a particular photo-sensor in a digital camera‚Äôs sensor array. Each photosensor will detect the intensity of a certain light band. After filtering, the classical Bayer filter decomposes light into red, green, and blue components.
The digital processing associated with Bayer filters determines the missing color values for a given pixel (for example, the red and green levels for a blue sensitive sensor). Reconstructing the missing pixel colors of the CFAis known as demosaicing. This processing stage in-terpolates the missing color values of the color filter array (CFA) sensor pattern and generates a fully pop-ulated RGB image.
Pixel size at the input is a single red, green, or blue value and, at the output, consists of red, green and blue components for each pixel. Given a frame rate of 35 fps for the 4K x 4K image, this interpolation stage increases the pro-cessing throughput from 600 M to 1.8 G color components/sec. 18 ALUs support a 600 Mps input, gen-erating three parallel outputs for each pixel (shown in Figure 2), interpolating in a 3 x 3 region about the pixel of interest.
The color balancing operation utilizes a nine-entry color transform matrix to correct for monitor or lighting irregu-larities. A fully parallel color balancer implementation at the core clock rate utilizes three MAC objects for each row of the color transform. A single RF per MAC for table distribution utilizes the local memory resources, localizes data into each MAC, and provides for support of up to 64 correction tables.
Image filtering for smoothing, gradient or edge enhancement, and sharpening is accomplished with a two-dimensional convolution using a 3 x 3 pixel mask. A fully parallel implementation of the filter is capable of calculating the result for one input every core clock. Since each color component needs to be processed separately with potentially separate masks per color component, a total of three filter processing blocks are required.
Performing flat-field correction, demo-saicing, balancing, and filtering stages for a 4K x 4K, 35 frame per second, full color space frame grabber is a tall order, but is easily handled by a high performance FPOA. Further, this image processing chain utilizes just 50 percent of the objects in a single FPOA, providing room for additional image processing functions.