Integrating High-Level Synthesis designs into FPGA SoCs with less effort and risk
As FPGA SoCs grow in popularity, High-Level Synthesis (HLS) tools are becoming a popular tool for hardware development, though to take advantage of their benefits designers must overcome their challenges.
SoCs have been the rage in ASIC markets for years, but are now becoming common in FPGAs with a soft core or external processor as the CPU. FPGA vendors estimate that approximately 50 percent of FPGA designs are integrated with a CPU in some way, and they are starting to offer CPU cores integrated with FPGA fabric on a single chip. The industry now uses the term “FPGA SoCs” for these types of devices, and considering the substantial benefits to end users, this will undoubtedly continue as a long-term industry trend. High-Level Synthesis tools, if implemented correctly, can reduce the effort and risk associated with integrating custom hardware accelerators into FPGA SoCs.
High-Level Synthesis (HLS) tools have become popular for creating FPGA hardware with less effort and risk, especially when starting from very high-level design environments like Simulink. Although design productivity using HLS is often in the range of 2-5x, the exploration, verification, and re-use benefits can be much higher (over 10x). This is especially true for model-based HLS tools with a high abstraction level of the model library, and tools with the availability of HLS-optimizable IP such as multirate FIRs, IIRs, FFTs, and other application-specific functions. These functions would otherwise require deep design expertise and effort to support high-quality and -capacity implementation across different FPGA device families and vendors. HLS tools present challenges for designers in system integration, verification, and validation, but there are methods that can ease these challenges so designers can reap their benefits.
Interface integration and verification
HLS can work very well for algorithm and datapath content, but challenges arise when integrating them into the rest of the system. Other system components, like interconnects, interfaces, CPUs, and memory controllers are typically only available in Hardware Description Languages (HDL) like Verilog or VHDL and written at Register-Transfer Level (RTL). Thus, to integrate an HLS design into an SoC, designers must provide a standard interface that complies with the SoC interconnect, which must be done at the RTL, and this presents problems in the HLS-to-SoC flow (Figure 1).
For example, a memory-mapped read/write programming port is almost always required to set configuration parameters in the HLS design. Often, designs can have hundreds or even thousands of these parameters. The manual steps for integration can be formidable, as outlined in the following list and illustrated in Figure 1.
Example integration effort for programmable memory-mapped interfaces:
- HLS design with configuration parameters (often hundreds or thousands of fixed-point variables)
- HLS-created HDL must explicitly have ports for these parameters
- Fixed-point parameters must be converted to unsigned integer type of data bus (32-bit or 16-bit)
- Clock domain crossings must be implemented (for multirate designs)
- Create interface HDL with memory-map and parameter inputs and outputs
- Connect interface and converted HLS-HDL ports
- Modify HLS verification scripts to initialize parameters through interface
Another complication is the fact that many HLS designs may have multiple sample rates, which are different and potentially unrelated in frequency. The parameters for HW architecture exploration may vary the sample clocks, thus interface integration must include Clock Domain Crossings (CDCs), and each exploration case requires manual effort for each different CDC, thus limiting exploration capabilities.
Verification and validation of the integrated HLS design is now required, and because of the mixed high-level and RTL components, simulation and debugging are difficult. The first challenge is setting up the RTL simulation, which requires new testbenches or a re-working of the HLS testbench to accommodate the RTL interfaces. This requires manual effort for each HW exploration case. Another problem is debugging at the RTL for an HLS design. This can require significant effort when done at the RTL on an HLS design because often both the expertise of the HLS/algorithm designer and the HW engineer is required.
HLS with RTL encapsulation and C model generation
Instantiating RTL directly into the HLS model can address the interface integration problem. For this to work well within an HLS flow, the embedded RTL must integrate easily and simulate with higher performance than standard RTL simulators.
For example, designers can embed RTL in a Simulink model using the Synopsys RTL Encapsulation (RTLE) feature. The RTLE block achieves high-performance simulation by using unique RTL modeling technologies and optimizations under the hood. This means no external RTL simulators are required and simulation bottlenecks are reduced.
As shown in Figure 2, this capability allows interface specification and verification to be done more easily in the high-level environment and eliminates the effort and risk of RTL integration and debugging. Integration at the higher level is also easier with higher abstraction features such as vector and array notation for signal banks, and multirate tools for managing the CDCs.
System verification, debug, and validation
Increasingly, the industry is turning to higher performance C-based system-level modeling to simulate and debug the integrated HLS design and full system. This is especially useful for architecture validation and early hardware/software verification before going directly to hardware. Sometimes it is the only choice when the FPGA devices are not available or a board solution is not ready. Furthermore, a good system modeling environment can provide better visibility for this type of system-level validation and debug.
Automatic C models for system validation and virtual platforms
To address system validation, designers ideally would like to automatically create a high-performance C model of the entire HLS design, including the RTLE blocks. This means designers would automatically have a bus-functional, high-performance C model of their specific HLS design, plus system interfaces that can be used in C-based simulation environments and virtual platforms. Such a capability would eliminate the need to create C models manually, resulting in weeks or months of project delay.
For flexibility, such a tool for C model generation should include automatic wrapper creation for flexible support of various simulator technologies, which can be compiled to run as a direct executable, in an RTL simulator like ModelSim or VCS, in Simulink, or in a SystemC simulator. This flexibility supports most popular system simulators and virtual platform tools.
AXI digital radio HLS system integration
A digital radio designed with the Synphony Model Compiler (SMC) library and RTL Encapsulation feature illustrates system integration benefits. The design includes an AXI-compliant host interface for programmable parameters using the Synopsys DesignWare AXI interconnect core. Normally with such a design, days or weeks of manual RTL integration and debugging would be required to map the interfaces to the parameters in the multi-clock datapath, but with the RTLE interface integration this is done much more easily in the Simulink high-level model. The design flow steps for this example are described next.
High-level design of the digital radio multirate algorithm
This is simple and easy using the SMC IP model library, which supports multirate, fixed-point signal processing. One also uses the Simulink environment for simulation and verification of the algorithm.
AXI Interfaces and connection to the multirate datapath
The user creates a slave AXI interface RTL from the Synopsys DesignWare AXI interconnect IP core. In this case a simple write-only AXI 32-bit bus was used to generate the RTL, which was then instantiated into the SMC model using the RTL Encapsulation block.
The AXI interface runs at 20 MHz and is easily connected to 82 parameters using the high-level rate conversion blocks. Also, the 18-bit, 15-bit fractional datatypes are easily converted to the unsigned 32-bit type of the bus using conversion blocks in the SMC library.
Hardware implementation and exploration using HLS
The SMC HLS engine can create optimized architectures from this high-level model. Even with the interfaces, a wide range of power, area, and throughput tradeoffs can be explored for this multirate algorithm. For example, a highly parallel architecture with dedicated clocks for slower sample rates will create architectures using more than 50 percent less power, but requiring more area. Lower area architectures are possible using sequential implementation of the datapath, but require higher power. This type of multi-clock HW architecture exploration is possible using HLS while maintaining the same 20 MHz interfaces.
Automatic RTL test benches for
For each HW architecture generated using HLS, a complete testbench will be generated to verify the RTL works compared to the high-level Simulink simulation, including the interface behavior. This allows integrated interface HW verification almost immediately with little effort.
Automatic C model generation for system validation
Finally, a C model can be generated that can simulate the integrated design much faster than RTL simulators. The C model can be used in Simulink, SystemC, or other simulators, and includes HW-accurate and interface behavior so that it can be used to validate architecture bandwidth utilization, bus transactions, and HW/SW portioning choices.
As shown in Figure 3, the design and verification is relatively easy to capture using SMC and Simulink, even with the interfaces included. From this integrated, verified high-level model, one can still perform full hardware exploration of the datapath area, speed, and power tradeoffs using HLS.
Making HLS integration simpler and more efficient
HLS has many benefits to FPGA system design, but can also bring challenges for integration into FPGA SoCs. However, there are technologies designers can use to address these system integration problems. First is to embed RTL into HLS models, thus specifying cycle-accurate interfaces and control logic where required while maintaining the benefits of HLS for the rest of the design. Second is automated C model generation for both the RTL and HLS parts of the design that can be run in standard system simulator environments. These approaches eliminate manual interface integration and debugging, and allow system integration and verification tasks to be performed earlier in the project. Thus, an HLS-to-SoC flow is enabled that increases productivity and eliminates manual effort, errors, and risk.