Easing the integration headaches of FPGAs into heterogeneous embedded systems

Military systems integrators are increasingly turning to FPGAs and serial switched fabrics for their systems. While these technologies are powerful, they introduce new challenges that commercial off-the-shelf (COTS) board vendors need to address. Out-of-band control, plus embedded debugging can ease FPGA design issues.

Out-of-band command communications

One of the prime concerns facing embedded system developers is how to best meet the real-time requirements of their application. Key to the problem is how to ensure that data arrives where it is needed when it is needed. To address this issue many system designers choose to keep commands separate from data. While this is often achievable between boards by using Gigabit Ethernet (GbE) for commands, leaving primary communications fabrics for their data, it is typically not a practical approach for processor/FPGA heterogeneous systems because the burden of encoding and decoding TCP/IP packets is rarely the best use of FPGA resources. A much better approach for an out-of-band command path would be to have some sort of direct connection between the FPGA and a local general purpose processor.

Many processors implement a local bus for communicating with devices such as flash memories. The Freescale 8641D is one such processor. While not a particularly high-performance processor, at 31.25 MHz/32 bits, the 8641D local bus makes a fine "command bus" when hooked to a user-programmable FPGA, as seen in Figure 1. The processor's memory space is large enough to give developers complete access to the FPGA and its attached memories. Board Support Package (BSP) functions such as Direct Memory Access (DMA) commands and interrupt control can be handled over this command bus - leaving primary fabric connections such as Serial RapidIO or PCI Express completely free for high-speed data streams. This command bus could even be extended to mezzanine cards via the XMC connectors.

Figure 1
(Click graphic to zoom by 2.5x)

Fabric port monitoring

Once a system has been implemented and the application developer has data flowing between the microprocessors and FPGAs within the system, the next questions to be addressed are typically "Are the data flows correct?" and "Where can I optimize data flows to get better performance from my system?" In traditional bus-based systems such as VME or CompactPCI it was easier to address these concerns because integrators had the benefit of bus analyzers to examine not only the logical behavior of their data transfers but the physical characteristics of the bus itself. This made it relatively easy for developers to tune their system for performance or to identify subtle bugs related to interprocessor communications.

With the recent introduction of the VPX (VITA 46) standard and high-speed switched serial fabrics in embedded multicomputing systems, integrators face a problem that they never had to deal with previously in bus-based systems: mesh fabric connectivity. Not only does VPX utilize high-speed serial links to transfer data between boards, but it also supports mesh fabric connectivity. With mesh fabric switching, functionality is distributed between cards in the system and each card has a dedicated communication channel to the other boards in the system. While this offers system developers tremendous I/O bandwidth, it also presents several challenges to integrators, such as:

  • A distributed fabric with numerous high-speed serial links signifies that there is no single point for placing an analyzer device and see all of the interactions between components.
  • Placing multiple protocol analysis monitors is prohibitive from the standpoint of test equipment cost, board space requirements, and the impact on the signal integrity of high-speed serial links.
  • Understanding the interaction between components in a multi-stage switched interconnect, and how that impacts application level performance, requires a high level view of dataflow within the system.

These issues are further complicated by the increasing use of the RapidIO multicast transport capabilities in signal processing applications. While the use of multicast can simplify the distribution of data in complex multiprocessing systems, it complicates the optimization of traffic and flow control in systems that take advantage of it. A key concern involves the ability of misbehaving nodes to lock up the system with multicast traffic streams.

In one example of a response to these challenges, a RapidIO protocol capture feature has been implemented (by a third-party partner) directly into a Serial RapidIO Endpoint FPGA core for the Xilinx Virtex-5 (used in Curtiss-Wright's Serial RapidIO Endpoint Block). The block gives users the ability to monitor and capture Tx and Rx traffic directly at the RapidIO interface.

The protocol capture block can be programmed by the user at a high level to capture traffic based on complex traffic sequences. These sequences can consist of traffic events at the physical, transport, and logical layers of the RapidIO protocol. Since traffic is monitored and captured in both the transmit and receive directions, problems with flow control and protocol handshaking can be analyzed. The ability to monitor flow control can be crucial when analyzing multicast traffic issues.

These capabilities can be controlled and monitored either by the 8641D processor located on the card, or remotely by any other processing element that has access to the RapidIO fabric. Using the remote capture capabilities, traffic capture on several endpoint nodes can be coordinated by a single management entity as shown in Figure 2 This facilitates the debug of complex problems involving multiple RapidIO endpoints on multiple cards.

Figure 2
(Click graphic to zoom by 2.5x)

This protocol capture capability is supported with a library of software routines to setup capture sequences, and to decode captured data into several easily understandable formats. When combined with system level debug tools, the visibility afforded by this capability provides a powerful solution for system debug and optimization.

Both of these features an out-of-band control bus and a RapidIO protocol capture capability are found on Curtiss-Wright's CHAMP-FX2 FPGA-based processor board (Figure 3). The CHAMP-FX2 features two large Virtex-5 based FPGA nodes and a dual-core 8641D processor, all connected by a Serial RapidIO communications fabric with an onboard switch. The 8641D's local bus is tied to both of the FPGA processing nodes, as well as the mezzanine site, so that boards like Curtiss-Wright's XMC-442 can be commanded in a similar fashion. The Serial RapidIO endpoint block utilizes a Serial RapidIO core with its protocol capture port, and provides enhanced visibility into system dataflow.

Figure 3
(Click graphic to zoom by 2.5x)

The common theme for today's military systems is better time to deployment and the ability to diagnose complex problems in the field. As embedded heterogeneous multicomputing systems become more common and more complex, the need for better visibility into system operation is becoming critical to system integration and optimization. Embedding protocol capture and analysis in endpoint nodes within a RapidIO fabric gives greater visibility, reduces cost, and extends debug capabilities from the lab to the field deployments. Over time, one can expect board vendors to introduce new and innovative features to give developers better visibility and debugging capabilities.

Mark Littlefield is the Product Marketing Manager for Curtiss-Wright Controls Embedded Computing's FPGA computing products. He has more than 15 years of experience in the embedded computing industry, first as an engineer developing robot vision systems for NASA, then later as a field applications engineer, technical program manager, and product manager. Mark has a BS and an MS in Control Systems Engineering from the University of West Florida.

Curtiss-Wright Controls Embedded Computing