Joint FPGA-DSP grab, squeeze, and send effort sees video compression success
Tim walks us through the three main phases of video compression and explains how addressing the requirements of each stage with a hybrid FPGA-DSP approach adds efficiency.
Video compression in the embedded computing market is little different from any other application, in that the requirement is to achieve maximum performance in an environment that is constrained in terms of size, weight, and power. Inevitably, trade-offs must be made, but using DSP and FPGA technology ensures that these trade-offs have minimal impact on performance.
There are three primary steps in video compression: video capture (Figure 1), compression, and encapsulation and transmission. A hybrid FPGA-DSP solution efficiently meets the requirements for each stage.
Step 1: Video capture
Video transmission standards continue to evolve as new technologies are developed. In embedded systems markets, however, legacy standards as well as new emerging standards must be supported. An existing military system might have numerous sensors integrated into the mechanical fuselage. The new processing platform must support whatever video standard was originally implemented.
A second argument for supporting a broad range of video standards is that certain military programs must adhere to strict guidelines regarding the standards. STANAG 3350 video standards, for instance, require specific video capture capabilities. Supporting multiple video standards – while still presenting the data in a common format to a compression engine – is crucial. An intelligent processing node must be used to accept data from among the dissimilar inputs and present a common output to the compression engine.
The compression engine (Figure 2) is the core video codec standard such as H.264 or JPEG2000, which converts the video data into a bitstream. The stream becomes anywhere from three times smaller (in lossless compression examples) to 100 or more times smaller. The key measurements of any embedded compression engine are throughput, latency, and flexibility. Popular codecs have included MPEG-2, JPEG, MPEG-4 Part 2, VC-1, JPEG2000, and ITU-T H.264 (aka MPEG-4 AVC). MPEG-2 was the earliest codec that achieved a broad installed base of applications. Today, H.264 is the preferred codec when a broad range of compression ratios is necessary and/or the available bandwidth for the compressed data transfer may be severely restricted, and has thus become the most common choice for embedded applications. JPEG2000 comes into play when lossless compression is necessary or frame subsampling is needed. An additional consideration in the choice of codec is the complexity of the decompression task: Some codec standards require more processing power to decompress than others.
Step 2: Compression
To meet the varying demands for embedded video compression, convenient switching between several video codecs – notably H.264, JPEG2000, and MPEG-2 – while retaining the same hardware platform is a significant plus. This approach also means that a system supplier only needs to qualify and validate a single hardware platform, which can then be software-configured for a range of video codec requirements.
The encapsulation phase (Figure 3) occurs after the compression engine generates the bitstream. Its purpose is to create a sequence of packets that can be properly routed to and interpreted/decompressed by a remote display. The typical communication mechanism is RTP over UDP protocol using Ethernet. While this protocol makes use of bandwidth in a way that enables streaming real-time compressed video over Ethernet, as a side effect it allows lost or dropped packets. The encapsulation method must allow for the potential of lost packets without significant disruption in the video display. The encapsulation of the bitstream data is called a Transport Stream (TS), and the standard with the widest adoption is the MPEG-2 TS. MPEG-2 TS was developed in connection with the advent of MPEG-2 compression systems to distribute the compressed video to receiver units in a standard format. It is still widely used today for newer compression schemes, due to the large installed base of systems that understand how to interpret the MPEG-2 TS packets.
Step 3: Bitstream encapsulation
With the advent of H.264 and JPEG2000 compression, the transport stream question is being revisited. H.264 can be supported under the MPEG-2 transport stream protocol but JPEG2000 cannot. Alternative RTP/UDP protocols exist for H.264 and for JPEG2000, often in conjunction with Real Time Control Protocol (RTCP) and/or Real Time System Protocol (RTSP) parallel streams. An additional JPEG Implementation Protocol (JPIP) has been developed that makes it possible to optimize bandwidth utilization between a send-receiver combination in a JPEG2000 system. The design of a video compression solution should allow for various encapsulation engines, depending on the codec used and the requirements of the communication protocol supported by the receivers of the compressed data.
A proposed hybrid FPGA/DSP solution
High performance, as well as flexibility, characterize ideal video compression. This is where a combination of FPGA and DSP technology enters the picture. Current solutions for embedded video compression typically involve an FPGA in conjunction with an ASIC. The ASIC performs the compression and returns a bitstream. Alternatively, the compression algorithm can be implemented inside larger FPGAs with embedded DSP processing blocks. In either case, it is still necessary to transfer the bitstream to a host CPU, which will complete the encapsulation task. Changing from one codec engine implementation to another – for example, converting from H.264 to JPEG2000 – requires either new hardware or firmware loads. An alternative approach is shown in Figure 4 using a hybrid DSP device, such as a Texas Instruments DaVinci DSP processing platform, which also includes a general-purpose ARM9 core.
Step 1 solution: Video capture
A hybrid FPGA-DSP solution successfully meets the requirements described earlier for a number of reasons.
Many of these video standards require specialized ADC or decoder circuitry to convert them into a digital format that can be processed through a compression algorithm. Having a large number of dissimilar input standards results in a very large multiplexing task that can be effectively handled with a low-cost FPGA. The FPGA is able to receive various color spaces, pixel frequencies, and frame sizes while supporting de-interlacing and re-timing. It can then resend the data on, in a common format, to a compression engine.
Step 2 solution: Compression engine
The compression engine can be implemented in various ways:
· An ASIC that hard-codes a specific algorithm can be used.
· The logic and memory available inside an FPGA can be leveraged.
· A DSP device can be used.
The approach taken in this hybrid DSP-FPGA solution is to implement the compression engine in a DSP device. A DSP is well suited to perform this task, with its built-in image manipulation primitives and pipelined architecture. Tools exist for customizing the DSP engine relatively easily to implement different compression standards. The compression engine can be updated with a single software update load, and a new board design is not required, as would be the case when using an ASIC or FPGA solution. The TI DaVinci chipset is specifically designed for image processing and video compression. Sometimes, new versions of the hybrid DSP-CPU architecture are pin-compatible with older versions and can be upgraded without significant changes to a PCB design, making it an excellent choice for implementing the compression engine.
Step 3 solution: Encapsulation engine
The final step in the process is the encapsulation of the bit- stream into a transport stream that can be distributed over an Ethernet, or other, communication link. This step often requires insertion of meta-data into the stream, such as timestamps and GPS location, and can also require transmission of audio data in connection with the compressed video stream. The hybrid DSP-FPGA solution – where the DSP device itself, such as the TI DaVinci chipset, contains a DSP core and a general-purpose CPU –can perform this part of the task as well. After the DSP core has completed processing the video data, and the bitstream is available, application software running on the ARM9 can implement the encapsulation engine and send the resulting Ethernet stream over the available Ethernet MAC output from the device. This is a compact, convenient, and efficient solution to the complete video compression task (Figure 5).
The task of interfacing with numerous dissimilar video standards, compressing them to a common standard, and transmitting the result over an Ethernet link is well suited to a solution that implements an FPGA followed by a hybrid DSP device with a general-purpose CPU and DSP engine. One company taking this hybrid approach is GE Intelligent Platforms, which has significant experience and a broad product range in the area of using a combination of FPGA and DSP technologies in demanding graphics and video processing applications. Beyond efficiency and flexibility, the benefits include the elimination of overhead on the host CPU. Software-only updates allow for easy upgrades and maintenance. System-level demands are also well managed, especially requirements relating to minimal size, maximum processing power, and limited power dissipation. The system described can be implemented with less than 15 W of total power dissipation while supporting 1080p (or 1600x1200) frame size compression.
Tim Klassen is a Professional Engineer (P.Eng). He graduated from the University of Waterloo in electrical engineering/computer engineering, and joined Focus Automation Systems as a developer of image processing algorithms in 1997. In 2004, he joined SBS – since acquired by GE Intelligent Platforms – where he is Video Products Team Leader, working in graphics product engineering. He can be reached at firstname.lastname@example.org.
GE Intelligent Platforms