Serial RapidIO (sRIO) architectures in embedded systems

Modern embedded applications are dramatically driving performance requirements, which need sophisticated, direct peer to peer communication, high reliability, quality of service, distributed processing topologies and interoperability. High performance digital signal processors (DSPs) with fast and flexible IOs are needed to meet these real-time processing challenges of today’s applications. Embedded devices are now integrating industry-standard Serial RapidIO (sRIO) bus technology on-chip to provide high-performance, packet-switched, interconnect technology that addresses the industry's need for reliability, higher bandwidth and faster bus speed. This open standard, high bandwidth and system level interconnect decreases overall system cost by reducing the need for additional devices used for switching and processor aggregation.

Understanding necessary requirements for multiprocessing Embedded systems contain multiple processing elements spread across a single system-on-chip (SoC) device. These processing elements perform different functions of the overall application, including signal processing, data processing and control functions. There are also systems with many discrete processing elements on different devices spread across a board. These devices must communicate to share data and status with each other in order to complete the system processing task. With more devices comes more communication lines, and this drives requirements for communication interconnect between processors.

Interconnect requirements fall into two classes: functional and non-functional. Functional requirements include things like peer to peer communication and support for different device topologies. Non-functional requirements deal with low latency demands, scalable bandwidth and emergent requirements, which refer to reliability, flexibility and efficiency (in terms of performance related to both hardware and software) concerns. Flexibility is of particular importance because there are many system design alternatives depending on the application space.

RapidIO for interconnect problems Today, there are a few common strategies that address the need for internetworking including:

  • Bridging: external devices perform protocol and physical translation.
  • Flexible interfaces: device capable of being configured to support multiple interconnects and busses.
  • Device versions: chip manufacturers release devices tailored to support different interfaces.

Many multi-processor algorithms, such as DSP algorithms, depend on the flow of large data sets among the various compute nodes that share a problem. Often, the performance of the transfers across the system interconnect limits the overall system performance. One solution to this interconnect problem is RapidIO (RIO, www.rapidio.org). RapidIO is an industry-standard high-speed switched-packet interconnect that is becoming more common in embedded systems to support the demands of applications which require this model of interconnect and communication. The main design goals of the RIO are to offer a light-weight protocol, limit software impact on the system CPU and focus on ‘inside-the-box’ communications. This widely compatible architecture provides reliable high-performance, packet-switched technology at increased bandwidth for chip-to-chip as well as board-to-board communications (Figure 1).

21
Figure 1

In order to allow scalability and future enhancements while maintaining backward compatibility, RapidIO has a three-layered hardware architecture: physical, transport and logical. More specifically, this allows the flexibility of adding new transaction types to the logical specification without requiring modification to the transport or physical layer specifications.

The physical layer is the electrical signaling and link level handshaking mechanisms as well as CRC based error detection. RapidIO is a parallel interface that is either 8 bits or 16 bits wide and runs at 250 MHz to 1 GHz with data clocked on both edges. The transport layer dictates how packets are routed in the switched environment. RIO uses a destination based routing to route packets to the correct device based on a device ID embedded in the packet. The logical layer is the highest layer which defines packet types and function. RIO can have up to 256 byte packet payloads.

On-chip serial RapidIO (sRIO) There is a serial version of the RapidIO technology as well. Serial RapidIO is a point-to-point switched serial interconnect. sRIO is physically incompatible with parallel RapidIO technology so the developer must select the exact mix of parallel and serial connections. The signaling rate for sRIO is 1.25, 2.5 or 3.125 Gbps per differential transmit and receive pair (these are also called lanes). This provides up to 312.5 MBps in each direction per lane. Each SRIO port is configured as one or four lanes which can provide a maximum data rate of 1.25 Gbps per port.

sRIO makes it easier to support multiprocessing and eliminates the need for aggregation logic with its ability to support variety of topologies. For example, in video infrastructure applications, the physical layer data transmission uses SERDES (analog serializer/deserializers) technology to perform clock recovery from the data stream and incorporates 8B/10B coding. The serial specification supports 1-lane (1x) and 4-lane (4x) port sizes. A 1x sRIO link is fast enough to send two channels of HD 1080i raw video between devices and a 4x link can easily send four channels of HD 1080p raw video between devices with bandwidth to spare.

This serial architecture provides a maximum payload of 256 bytes per packet and does not provide the scalability of other interconnects like PCI Express or parallel RIO. For instance, in computing applications, the PCI bus is frequently used to connect multiple disk channels to a system. As disk throughput increases, so does the need for high system throughput, which can be achieved with higher bus frequencies. This leads to a smaller number of supportable devices per bus segment, but in order to connect the same number of devices, more bus segments are required. PCI-to-PCI bridge devices could be used to solve this problem but only within a tree-shaped hierarchy that increases system latency and cost as more PCI devices are added to the system. Instead, higher system level performance can be achieved by using RapidIO. RIO’s point-to-point topology enables the removal of devices with little or no electrical impact to neighboring devices or subsystems.

As an example of the growing use of RIO and sRIO in embedded applications, Texas Instruments’ TMS320C6455 DSP device (Figure 2) supports a sRIO bus interface. This interface can boost performance and I/O bandwidth in high-end and multi-channel applications such as video and voice transcoding, videoconferencing servers, high-definition (HD) video encoding and wireless base station transceivers. System performance can be increased up to 12x because sRIO eliminates IO bottlenecks by providing a low latency, high bandwidth (10 Gbps full duplex) and low pin count interconnect. The on board sRIO can also provide communications connectivity to third party tools, FPGAs, sRIO switches and sRIO equipped embedded processors.

22
Figure 2
(Click graphic to zoom)
“System performance can be increased up to 12x because sRIO eliminates IO bottlenecks by providing a low latency, high bandwidth (10 Gbps full duplex) and low pin count interconnect.”
Software and infrastructure In addition to the raw performance offered by sRIO, software developers can also develop applications without having to do a lot of low level device programming. Several embedded processor vendors provide support for sRIO in the kernel level software layer. For example, TI’s RTOS and DSP/BIOS software kernel foundation have a sRIO Message Queue API which allows application developers to development software applications at a higher level of abstraction.

The use of sRIO in infrastructure applications with large "DSP farms" will also lead to a reduction in system cost in terms of device count, board size and/or device cost. For example, embedded systems development now includes the increased use of Direct Memory Access (DMA) as well as other smart peripherals that can move data at extremely high rates. The current multi drop interconnects cannot support the required bandwidths without using more signals/pins and connectors which increased overall system cost. sRIO with its DSP farms overcomes these problems without affecting the cost.

Since different applications may require different system topologies, sRIO is highly flexible in that a system developer can arrange a sRIO-based network in both ring and mesh topologies (Figure 3). Since sRIO uses source routing instead of a broadcast routing approach, only the path between the sender and the receiver in the topology is actually burdened with the transaction. This leaves more bandwidth available in the topology for other DSP devices in the system to communicate with each other concurrently. In addition, multiple processing elements can be connected through a switch, with or without local connections to one another and to ASICs and FPGAs. Multiple processing elements can also be connected in a star topology (five DSPs are all connected to one another in Figure 3).

23
Figure 3
(Click graphic to zoom by 1.9x)

Lastly, interoperability work is a critical milestone that will help further ease the deployment of RapidIO technology-based systems in applications. The interoperability tests build on the trade association’s RapidIO Interconnect Specification Device Interoperability and Compliance Checklists, which was developed as a guide for engineers to develop multi-processing, multi-channel signal-processing solutions. For instance, the first step of device interoperability tests ensures “Device A” is proven to pass a given test with “Device B.” sRIO development systems are available so that developers can evaluate and prototype sRIO systems prior to committing to a board design. Thus, all key system elements are available enabling designers to build sRIO systems.