Waste Not—Fully Utilize PCIe Resources with Device Lending



Device lending through PCIe adds the flexibility to maximize system use with little effort.

Why waste unused resources in a system? It’s possible to take full advantage of high-value devices for maximum application performance, and sharing resources between systems with PCIe is the best method to maximize your investment and distribute high-value peripherals such as Graphic Processing Units (GPUs), FPGAs, and NVMe drives. New methods and technologies are being developed to increase the flexibility of sharing PCIe components between systems and configuring systems on the fly, based on application requirements. These technologies enable high-value PCIe components to be added or configured into systems on the fly. Devices in one system can be utilized by other systems on the network with no impact to drivers or application software.

Figure 1: Device Lending works seamlessly within PCIe and requires only hot add capability in the drivers. Single Root I/O Virtualization (SR-IOV) interfaces are also supported. (Source: Jonas Markussen, Simula Research Laboratory and University of Oslo)

Figure 1: Device Lending works seamlessly within PCIe and requires only hot add capability in the drivers. Single Root I/O Virtualization (SR-IOV) interfaces are also supported. (Source: Jonas Markussen, Simula Research Laboratory and University of Oslo)

Advantages of Distributing Resources
As systems become more complex, there are several advantages to “disaggregating” resources, including cost, flexibility, power, and performance. It’s possible to distribute resources so that applications can take maximum advantage of high value devices. One example of a prime resource these days is GPUs. GPUs are ultra-fast, specialized processors that are ideal for crunching repetitive data, including real-time image processing, and most recently an explosion in neural networking to support computer vision and machine learning. GPUs enable a dramatic boost in throughput for platform workloads. System managers and application developers need to manage these GPU resources in their environments to account for peak GPU resource demands, changing applications requirements, and better utilization of valuable resources. New application developers can also face the challenges of adding these powerful resources in environments that may have space, power, or cooling constraints. Being able to take advantage of the GPU resources distributed amongst systems provides a great benefit with which to handle these challenges.

Device Lending
Fortunately, there’s a way to manage PCIe devices between systems and improve availability without adding high cost or complex application changes. PCIe-enabled devices can be “borrowed” or “lent” between remote systems using standard PCIe. For systems with GPUs, this means that GPU resources can be moved between systems. Amazingly, this PCIe enabled device lending solution is simple and straight forward for the IT department, without complex middleware or alternate interconnect transport mechanisms in the data path. The concept is called “Device Lending” and is part of Dolphin Interconnect Solution’s SmartIO technology. Device lending is used to setup the transfer of devices (such as GPUs, FPGA, and NVMe drives) between systems, and, once transferred the existing system recognizes the remote devices as local; the heavy lifting is done underneath the PCIe hood, if you will.

Device lending is employed in eXpressWare SmartIO software from Dolphin Interconnect Solutions, a company with roots in Oslo, Norway, that has extensive experience in developing leading and innovative computer interconnect technology. Dolphin has been involved with industry standards (including PCI, ASI, and PCIe) since the 1990s. Device lending is Dolphin’s most recent innovative and elegant solution that yields a win-win for all parties.

Device lending simply makes good sense because dynamically sharing devices reduces cost and adds enormous flexibility. There are many advantages in being able to dynamically add and drop a remote resource. System designers can centralize resources as a pool and distribute them as needed to remote systems, or they can do the reverse and distribute resources within the network and centralize them when necessary. Using PCIe confers the benefits of low latency, high bandwidth, and a substantial history of support by a stable standards body that is steadily improving throughput while maintaining backward compatibility.

Applications
Device lending enables the use of remote resources while preserving low latency and full bandwidth. Military applications often require remote operation with high availability in addition to a fast response. They also tend to have size and power constraints. Disaggregating means there’s no single point of failure. For example, a remote military application may need additional compute resources or need to transfer a large amount of data. Device lending enables GPU compute resources or NVMe drives to be added to the systems during operations. Upon completion, the device can be released back to the original system. In the case of an NVMe file transfer, data can be moved rapidly to another location by simply passing the NVMe device instead of copying data to a remote device. For shared GPUs, CUDA applications run as if they are local. Better yet, the added GPU resource requires no special programming. The remote device uses a standard PCIe driver, and there is no change to the operating system or NVIDIA drivers.

Universities and research laboratory facilities can also benefit from this easy-to-implement solution. GPU resources can be used during the day for classroom and student resources. During off hours, GPUs can be re-assigned for research. Applications can be assigned these resources for better utilization.

Device lending has been successfully implemented in a real-time medical diagnosis. A real-time colonoscopy screener can detect polyps during a colonoscopy exam. Banks of GPUs are not practical in the examination room itself. Device lending allows a computer-aided medical diagnostic system to remotely access GPUs to employ image processing for real-time vision in the operating room. The number of GPUs needed can be assigned prior to the screening based on the processing requirement of the application. For complex screenings, more GPUs can be assigned and when the screening is complete, they can be released back to the remote host system.

The Role of PCIe
Device lending and PCIe networking is an advanced application of the PCIe standard. PCIe is a stable, standard technology that is widely implemented and thus readily available in most existing systems. PCIe is also set to reach 128 Giga Bytes / second with Gen 5. PCIe Gen 5 will be backward compatible and meet the ever-increasing performance needs of devices such as GPUs and NVMe drives. PCIe has latencies as low as 300ns end-to-end, dominates I/O bus technology, and is prolific in server, storage, mobile, and other markets.

Figure 2: Dolphin’s roadmap supports PCIe Gen 4 when parts and systems become available.

Figure 2: Dolphin’s roadmap supports PCIe Gen 4 when parts and systems become available.

The Role of NTBs in Device Lending
PCIe provides a mechanism called non-transparent bridging (NTB). Jon Mason, the sole developer of a PCIe non-transparent bridge (NTB) device driver for the Linux kernel, describes the operation of the NTB succinctly. “To communicate across the non-transparent bridge, each NTB endpoint has one (or more) apertures exposed to the local system. Writes to these apertures are mirrored to memory on the remote system.” The apertures create an entry for establishing communication channels, so the NTB can transfer communications to a remote system with local hosts remaining unencumbered. Mason goes on to state that the system of NTB drivers and subsequent communication channels “…provide a reliable way of transferring data from one side to the other, making it accessible so that the operating system can transfer data from one system to the other in a standard way [1].” By “standard way,” we mean that device lending works under the hood, transparently connecting resources via the low-latency, ubiquitous PCIe fabric and utilizing NTBs. PCIe networks are formed by using NTBs between systems or processors. These connections can be between two NTB-adapters with a cable, multiple hosts with switch based architecture or a backplane architecture. NTBs are available for any PCIe based machine as additions and are integrated in newer Intel® Xeon™ processors.

Figure 3: Two types of functions, lending and borrowing, are implemented with device lending.

Figure 3: Two types of functions, lending and borrowing, are implemented with device lending.

How Does Device Lending Work?
Since Device Lending uses standard PCIe, it doesn’t add any software overhead to the communication path. As outlined above, standard PCIe transactions are used between the systems. Dolphin’s eXpressWare software manages the connection and is responsible for setting up the NTB mappings.

Two types of functions are implemented with device lending (Figure 3). Described below are the lending function and the borrowing function:

Lending involves making devices available on the network for temporary access, however these PCIe devices are still located within the lending system.

The borrowing function can lookup available devices. Devices can then be temporarily borrowed as long as required. When use of the device is completed, the device can be released and borrowed by other systems on the network or returned to local use.

The Dolphin Device Lending software enables this process to be controlled using a set of tools and options. The tools can be used directly or integrated into any other higher-level resource management system. The Device Lending software is very flexible and does not require any particular boot order or power-on sequencing. PCIe devices borrowed from a remote system can be used as if they were local devices until they are given back. Device lending does not require any changes to standard device drivers or to the Linux kernel.

Device lending also enables an SR-IOV device to be shared as an MR-IOV device. SR-IOV functions can be borrowed by any system in the PCIe Network, thereby enabling the device to be shared by multiple systems. This maximizes the use of SR-IOV devices such as 100 Gbit Ethernet cards.

This might sound complicated, and few have time to indulge themselves in becoming experts at everything they touch. Sometimes you need something to just work, and quickly. Dolphin Interconnect Solutions has a turnkey implementation for anyone who wants to capitalize on an existing PCIe interconnect fabric for sharing GPUs, Ethernet devices, FPGAs, NVMe drives, or any other PCIe-addressable devices for transparent, dynamic sharing.

Complex systems that include GPUs, FPGAs, and NVMe drives are a hot prospect but come with an equally hot price tag that of course increases with scale. Device lending not only provides a new tool for working through challenging application requirements but is suitable for determining the right mix of components to maximize application performance before purchasing additional resources. PCIe provides the ability to create complex systems from small to large. Device lending adds the flexibility to maximize the use of these systems.

References:

  1. https://lwn.net/Articles/506761/
Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • TwitThis

Tags: