Remotely Connecting FPGAs through PCI Express

An approach that lowers latency and increases throughput can extend benefits to such applications as medical imaging and financial trading.

Relatively recent developments in PCI Express (PCIe), a well-known serial expansion bus standard for computers, have resulted in connectivity enhancements. FPGAs can now be remotely connected through PCI Express over cables or between backplane segments for higher throughput and lower latency. New techniques such as Device Lending from Dolphin Interconnect Solutions, combined with enhancements to existing features of PCIe Multi-cast and peer-to-peer transfers afford FPGAs faster communication to remote resources. Today, PCIe attached FPGAs can transfer data faster with less overhead or be shared as resources in a cluster in new exciting ways.

Figure 1: A standard capability of peer-to-peer (P2P) is for FPGAs to be set up to do direct access/RDMA operations between FPGAs within a system.

Figure 1: A standard capability of peer-to-peer (P2P) is for FPGAs to be set up to do direct access/RDMA operations between FPGAs within a system.

PCIe connectivity enhancements are profound if an application’s goal is high throughput or extremely low latency, which means real-time imaging, time-sensitive transactions, or other demanding functions benefit. A discernible lag in virtual reality isn’t acceptable, and neither is a lag in imaging for a colonoscopy when the scope has already crept past what is identifiable on the screen as a polyp. One company, Dolphin Interconnect Solutions, has made inroads into this technology. Dolphin has enhanced the potential for streaming peer-to-peer (P2P) technology, enabled devices to be dynamically lent between systems, and enhanced the capability of PCIe multi-cast. All while achieving amazingly low latencies, just 0.54 microseconds for data transfers.

Herman Paraison of Dolphin Interconnect Solutions explains it this way, “Dolphin specializes in several areas that have opened up around PCIe technology. From device lending to PCIe multi-cast, PCIe enables you to do more with limited resources, ultimately saving money while improving latency and throughput. FPGAs, GPUs, NVMes, or another high-performance computing device can take advantage of the inherent performance feature and flexibility of PCIe.”

Remote FPGA Peer-to-peer Transfers

There have been many implementations of Peer-to-peer with FPGAs communicating with GPUs and NVMe drives. Some PCIe chipset and systems support peer-to-peer communication between slots in a single system. Applications such as GPU direct enable peer-to-peer transfers across PCIe, allowing direct access to GPU memory within the same system. A standard capability of peer-to-peer is for FPGAs to be setup to do direct access / RDMA operations between FPGAs within a system. This is illustrated in Figure 1.

Remote peer-to-peer is supported. Dolphin extends this concept to multiple systems or backplane segments.
Figure 2 illustrates a remote peer-to-peer configuration.

The PCIe Network is used to connect two systems via a fiber or copper cable. These systems can then implement remote P2P to communicate over PCIe. Like standard PCIe peer-to-peer transfers, there is no requirement to utilize the CPU or memory. Data is transferred directly to a memory region in the remote FPGA. The target device can vary. It can be a CPU, GPU, or remote memory. In all instances, remote peer-to-peer can be used. Now, distributed FPGA systems can take advantage of the low latency of PCIe. This reduced overhead way of communicating will benefit real-time distributed systems such as financial trading systems seeking the lowest latency for trading transactions.

Figure 2: A remote peer-to-peer configuration.

Figure 2: A remote peer-to-peer configuration.

Multi-casting, or Reflective Memory

Multi-casting is another PCIe capability that should not be left unexplored, as it establishes a deterministic, low-latency, high-speed interface for sharing data. Also referred to as “reflective memory,” or “reflected memory,” multi-casting is an advanced PCIe solution that gives instantaneous and reliable data distribution. Multi-casting enables an FPGA to deliver data to multiple nodes without incurring extra overhead for the FPGA. In plain English, a single FPGA can transmit data to multiple other devices, instantly at the same time, as if it were communicating only to one device. The heavy lifting is accomplished by implementing advanced PCIe features. Dolphin’s PCIe multi-casting solution enables significantly higher performance at a lower cost than similar solutions. Connections with copper, long distance fiber-optic or mixing fiber and copper are possible.

Device Lending

The flexibility of being able to pool or move resources within a cluster is very attractive for system architects seeking to maximize usage of high valued resources such as FPGAs. Device lending is a concept targeted at maximizing device utilization. With device lending, FPGAs, GPUs, NVMe drives or other devices can be dynamically lent to remote systems from other nodes in a PCIe network. Device lending doesn’t require a CPU in a data path; PCIe enables direct configuration of resources into a PCIe domain. Therefore, no changes are required to operating systems, device drivers, or applications. Devices that are “lent out” may be removed and added to an OS during run-time for devices that support Hot-Add and Hot-Removal of PCIe devices. Clusters of FPGAs, GPUs, and CPUs can all share the same device resource pool.

For example, real-time, computer-aided diagnosis-support applications benefit from being able to apply more computing resources as necessary for more accurate diagnosis. With device lending, compute resources can be added to a system in the PCIe cluster from another system in the cluster without needing to physically install the device. In the case of computer-aided diagnosis, GPUs can be added to a thin client in an operating theater running, for example, polyp detection software. The GPUs can be installed in a remote system in a server room. The thin client in the operating theater will borrow the required number of GPUs from the remote server. The polyp detection software will use the GPUs as if they were locally installed in the thin client. The advantage is the thin client doesn’t need the power, size, or noise of the remote server. The software doesn’t need to change, and the GPUs can be returned to the remote server for use by another client in another operating room.

Dolphin Interconnect Solutions’ expertise in utilizing the newest PCIe technologies to competitive advantage have enhanced networking and benefitted many embedded applications. Financial applications have found and adopted Dolphin’s solutions as a means to trade with ultra-low latency for high-speed, high-stakes trading.

The growth in complexity of FPGAs stems from increasingly demanding applications as markets seek to gain an edge. PCIe, although considered by many as a simple connectivity solution for personal computers, has the potential to be the tipping point for any application where low latency, high-speed throughput, and extremely low computing overhead are at the top of the “must-haves” list.


LynnetteReese_115Lynnette Reese is Executive Editor, Embedded Systems Engineering and Embedded Intel Solutions, and has been working in various roles as an electrical engineer for over two decades. She is interested in open source software and hardware, the maker movement, and in increasing the number of women working in STEM so she has a greater chance of talking about something other than football at the water cooler.

Share and Enjoy:
  • Digg
  • Sphinn
  • Facebook
  • Mixx
  • Google
  • TwitThis