The DSP solution spectrum for medical imaging applications using FPGAs
One such instrument that requires complex DSP algorithms is Computerized Axial Tomography (CAT scanner), which captures a series of cross-sectional views of body projections that can be intelligently assembled into three-dimensional images. Very large computational bandwidths are required when multiple projections are transformed into two- and three- dimensional views. The processing performance of the image-rendering system determines how quickly a physician gets the necessary data to diagnose and treat the patient.
The first stage of a CAT scanner is data acquisition where radiation level samples from multiple detectors are converted from analog to digital, digitally filtered, and collected for further processing. The second stage, called back projection, uses the inverse radon transform which is a multiplication intensive algorithmic processing method where samples representing each angular projection are assembled into a two- dimensional slice of the body. As the XRAY source moves around the body, different angular projections generate new samples and thus new slices, which are ultimately combined to form a three-dimensional image.
Traditionally, algorithms have been developed, coded and improved in software, primarily the C language, and implemented on a General Purpose Processor (GPP) CPU. When more performance was required, the C language was ported to a DSP processor such as an Application Specific Standard Part (ASSP) and optimized in assembly code. If still higher performance was required, the software implementation was transformed into a high-level hardware description language and implemented in an Application Specific Integrated Circuit (ASIC). Respectively, each processing option required more implementation time and complexity as it diverged from the initial algorithm optimization, particularly in an ASIC.
Today, a new spectrum of processing options using Field Programmable Gate Arrays (FPGAs) exists, providing higher performance than GPP and DSP ASSPs alone. Algorithm development is continually improved in software and then implemented with a mix of DSPs and/or (FPGAs). One effective option uses a PCI-based FPGA accelerator board that is added to a GPP computer. For more performance without resorting to ASIC implementations, the additional spectrum of FPGA solutions include:
- Using FPGAs to distribute and collect data to and from a matrix of DSP ASSPs
- Using the FPGA as a coprocessor or custom peripheral to a DSP
- Using embedded soft CPUs and the parallel DSP processing structures within the FPGA to eliminate the DSP ASSPs
Let us take a deeper look at each processing option, as can be seen in Figure 1.
The DSP matrix
In general, CAT scanners and similar diagnostic instruments collect huge amounts of data, requiring extremely complex algorithms to convert the data into meaningful images. Traditionally, DSPs have been used to process these algorithms. In a DSP ASSP approach, engineers have only two options to increase performance:
- Replace the DSP with a higher frequency model
- Rewrite portions of the software into pipelined assembly code
A third option, leveraging FPGAs, creates a DSP matrix where multiple DSPs are arrayed together to deliver parallel DSP processing. One or more FPGAs receive the entire data stream and distribute it between DSPs, which perform the system-state machine management, compute linear pixel-to-pixel increments in the projection plan, and control the memory-and-accumulate module. The processed pixels are then sent to another FPGA for final accumulation, image reconstruction, and output to monitor.
The core difficulty with using DSPs for all pixel processing relates to cost. DSPs are essentially serial machines, processing one element of the signal chain at a time. In the case of some high-end DSPs, a small number of instructions can be processed simultaneously providing a small degree of parallelization. However, the cost of these DSPs can be ten times that of a non-parallel version. Although multiple DSPs can be utilized to obtain higher parallel processing, the costs rise very quickly as the process is implemented. Software for these systems becomes much more complex, requiring a real-time operating system or very careful interprocessor communications schemes. However, the degree of true parallelism is still relatively low and comes at a significant price in terms of design time, cost of goods, and time-to-market.
A better approach than using multiple DSPs, combines a DSP ASSP together with a FPGA. This approach leverages the existing code base of a single DSP software image with the massively parallel processor resources embedded within the FPGA. The DSP continues to execute the majority of the code including all complex control plane processing, allowing the FPGA to act as a custom peripheral or coprocessor to accelerate any process intensive code within the imaging algorithm, which can be seen in Figure 2. When the DSP ASSP gets to a process intensive section of code, it instructs the coprocessor to gather the data stored within the memory space via direct memory access, process the image and notify the DSP ASSP, when it completes its task.
Today’s FPGAs have distributed banks of DSP blocks containing multitudes of multipliers and accumulators. The key to hardware acceleration is to leverage these DSP resources by finding software loops in the algorithm which can be made massively parallel. However, DSP blocks alone are not sufficient to accelerate performance. In medical imaging applications, such as CAT scanner and Magnetic Resonance Imaging (MRI) equipment, the input data consists of large blocks of scanned image projections residing in memory. The distributed and aggregated blocks of memories in FPGAs enable the large blocks of image data to be processed concurrently within the DSP blocks.
Finally the use of reconfigurable logic and routing elements, the core building blocks within an FPGA, integrates all the parallel memory blocks and DSP elements together into an imaging co-processing engine. These FPGA features provide the ultimate flexibility necessary to create parallel processing engines for any imaging algorithm and performance far beyond that of DSP ASSPs alone. Many equipment designers have converted a cascaded set of operations in a DSP requiring thousands of clock cycles into a parallel FPGA structure which operates in a few clock cycles at +200 MHz, at least a two order of magnitude improvement.
With today’s capability to instantiate one or more soft CPUs within the FPGA resources (soft CPUs utilize only a portion of logic resources and thus multiple copies can be placed anywhere within the FPGA), the entire DSP function can be integrated in a single device. The complex control plane software in C can be ported to one FPGA CPU, while other custom peripherals and CPUs act as slave processors.
There are three primary methods of hardware acceleration within the FPGA:
- Custom Processor (CusP) or Application Specific Instruction Processor (ASIP) is software programmed processor consisting of building block functions with reconfigurable interconnections between the blocks
- Custom instruction(s) using hardware extensions of the soft CPU instruction set, such as a floating-point instruction implemented in hardware
- Custom peripheral or coprocessor (can be used with internal or external CPU) as described above
These methods are described and implemented in much more detail and in our references. A quick summary of the performance metrics of the three methods executing the same algorithm is shown in Table 1. Note the ASIP provides similar performance to a GPP with an implementation which is smaller and more power efficient in an FPGA.
While conceptually hardware acceleration sounds easy, tying together multiple processors and peripherals into an efficient architecture can be a time and resource consuming endeavor without advanced FPGA system building tools. Ultimately, the use massive hardware acceleration in FPGA technology justifies the effort, providing lower system costs and much faster image processing, benefiting both physicians and patients.
The flexibility of FPGA capabilities helps designers create systems that were previously impossible due to time-to-market issues, prohibitive cost-of-goods, or traditional DSPs could not handle the computational load. FPGAs offer additional functionality, faster execution speeds, and lower power requirements than DSP only solutions, not to mention declining cost. These factors enable medical equipment manufacturers to take advantage of highly optimized silicon without having to assume the expense and risk of developing an ASIC, which can add many months to a product design schedule.
A growing advantage of an FPGA-based system is that engineers can upgrade features by sending software and hardware improvements. This is possible because FPGA technology is reconfigurable at anytime during development, production or even in the field. The ability to upgrade an expensive piece of medical imaging equipment by a simple file transfer means a longer time-in-market for instru ment manufacturers and cost savings for healthcare establishments needing to replace equipment. For patients, this advance means more timely access to the best possible diagnosis and treatment.
For these reasons, equipment manufacturers are increasingly relying on programmable logic to provide the flexibility required to respond to shifting market demands, while maintaining the necessary cost and performance requirements. This reliance is reflected in the growth rate of FPGAs within medical imaging equipment, which is double that of the systems themselves.
FPGA vendors are directing major efforts to address DSP applications for medical imaging. Consequently, FPGA solutions will increasingly provide the processing power new medical imaging applications require to handle challenges such as ultra-high signal processing performance, very-high memory bandwidth, and increased interconnectivity between processing elements. The complementary capabilities of DSPs and FPGAs integrated into medical imaging systems will continue to enable designers to deliver lower equipment cost and better image resolution for tomorrow’s health providers.
- Mehta, Tapan A., FPGA Co-Processing Solutions for High-Performance Signal Processing Applications, GSPx, September 27, 2004.
- Seely, Joel A., Using Hardware Acceleration Units in Software Defined Radio Modem Functions, COTS Journal, January 14, 2005.
- Sidhu, Adesh, Programmable Logic Devices offer co processor muscle for high bandwidth image processing Part I and II, Planet Analog, September 10, 2003.