The Evolution of the PCI Express® Architecture: Going Strong 15 years and Five Generations Later



Delivering low-power and scalability performance to next-generation designs

The Peripheral Component Interconnect (PCI) architecture was originally developed in the early 1990s, as a bus-based I/O interface for PCs. It evolved through width and speed increases, sustaining the I/O needs of the rapidly evolving computing industry for more than a decade. After hitting a performance wall with the bus-based PCI architecture, PCI-SIG® moved to a serial, point-to-point, full-duplex, differential interconnect link called PCI Express® (PCIe®) architecture, with the release of the PCIe 1.0 specification in 2002. PCI-SIG ensured software compatibility between the PCI and PCIe architectures to allow for a seamless transition from the bus-based to the serial-based interface.

The PCI/ PCIe architecture has served as the backbone for I/O connectivity, enabling power-efficient, high-bandwidth, and low-latency communication between components. The PCIe specification, currently in its fifth generation, doubles bandwidth every generation while maintaining full backwards compatibility. It continues to outpace competing technologies in market share, capacity, and bandwidth.

PCI-SIG®, a consortium of 730+ member companies, owns and manages PCIe technology as an open industry standard. Backed by a robust compliance program and frequent workshops, PCI-SIG member companies can submit their products for interoperability and compliance testing to help ensure seamless plug-and-play capabilities. Over time, the PCIe architecture has demonstrated its ability to adapt to revolutionary trends in the computing continuum spanning a wide range of platforms including hand-held, client, server, storage, and communications systems.

Cost-effective Power and Performance Scalability
The PCIe architecture supports scalable bandwidth needs through two mechanisms: width and speed. The PCIe specification supports multiple lane widths: x1, x2, x4, x8, and x16. Further, a multi-lane PCIe link can be partitioned to multiple, independent PCIe links of narrower widths, thereby offering SKU flexibility to system on chip (SoC) designs supporting different widths in different platforms using the same SoC. For example, the 16 lanes in a CPU can be configured autonomously as follows: one x16; two x8s; one x8 and two x4s; four x4s; eight x2s; one x8, one x4, two x2s; etc. The PCIe architecture also supports fan-out for multiple links through the use of switches. Figure 1 demonstrates the platform evolution of PCI and PCIe architectures with connectivity for multiple I/O devices.

Figure 1: Evolution of PCI-based technologies over three decades

Figure 1: Evolution of PCI-based technologies over three decades

Data rate increases account for the second aspect of scalable bandwidth. The PCIe architecture doubles the data rate every generation, as shown in Figure 2. Each new generation of the PCIe specification is fully backwards compatible with all prior generations, offering investment protection for the industry and end users. While products conforming to the PCIe 4.0 specification are currently being shipped, PCI-SIG is working to deliver the PCIe 5.0 specification at 32 GT/s, far exceeding the initial expectations of this technology. In addition to these speed increases, the PCIe specification provides for protocol and power enhancements, keeping up with the device design trends.

The steady and consistent evolution of PCIe technology has resulted in its integration into the CPU package, making it a ubiquitous interconnect that can deliver high-bandwidth with low latency and power.

Figure 2: Evolution of PCIe technology

Figure 2: Evolution of PCIe technology

PCIe technology supports a layered architecture, where each layer has a well-defined functionality that remains invariant through its evolution. This allows each layer to evolve independently with full backwards compatibility.

  • Transaction Layer: Offers a fully packetized, split transaction protocol designed to deliver maximum link efficiency. A well-defined producer-consumer ordering model defines the software-hardware interface for data consistency. Virtual channel and credit-based flow-control is used to ensure quality of service.
  • Data Link Layer: Provides reliable data transport through a combination of CRC and retry.
  • Physical Logic Layer:  Responsible for encoding and scrambling.
  • Electrical Layer: Handles analog functions.

Capitalizing on interfaces such as the PHY Interface for PCI Express (PIPE) technology, companies realize economy-of-scale with off-the-shelf IP blocks—analog as well as logic layer stacks.  Resources are freed up, so companies can innovate using their own value-add IP. And flexibility is assured thanks to various mechanical form-factors, such as CEM, U.2, M.2, cable, soldered on mother-board, with the same components serving across multiple platforms spanning multiple market segments.

Figure 3: A Layered Architecture

Figure 3: A Layered Architecture

PCIe architecture, and its predecessor PCI architecture, are load/store architectures unlike any other open industry I/O standard. This enables PCIe devices to be mapped directly into the system memory space and access system memory directly, enabling energy-efficient performance. The specification defines a consistent interface between software and hardware for essential system functions including device discovery, device driver association, error handling, event notification, hot-plug, and power management. This ensure seamless interoperability across multiple platforms and operating systems for components designed to the PCIe specification. Advanced error detection, recovery, error reporting mechanisms, and hot-plug support provide the framework for delivering the challenging high reliability, availability and serviceability (RAS) requirements of servers and storage segments.

For extending the channel reach at higher data rates (e.g., PCIe 4.0 architecture and above), PCIe technology has defined a Retimer specification, which is comprised of the PHY logical and electrical layers. Up to two Retimers may be used in a Link, as shown in Figure 4. 4. Even though the Retimers are not mapped directly into the configuration space to keep them simple, the specification provides an optional mechanism to access the registers in the Retimers indirectly through the Downstream port registers. The same mechanism is used to perform the mandatory Receiver margining at PCIe 4.0 technology and above, as described below, in the Retimers.

Figure 4: Retimer extends the channel reach for high data rates.

Figure 4: Retimer extends the channel reach for high data rates.

For determining margin in each Link segment in actual platforms during run-time, PCIe technology has an architected mechanism using configuration registers to perform and report the time and voltage margin observed at each Receiver. This is initiated by software accessing the architected configuration registers and instructing a Receiver to perform margining. The Receivers in the Re-timer are accessed through the Downstream port registers. The Receiver applies the margin command and reports back the error count continuously in an architected register. Margining is non-destructive. The Receiver stops margining if the error count exceeds a software programmed threshold and reports failure at that margin point.

PCIe Specification: High Performance at Low Power
PCIe technology offers the best low-power solution both in active and low-power modes. IP providers have reported active power at 5 pJ/bit and idle power at 10 micro-watts with the idle exit latency in tens of milliseconds. The PCIe architecture has introduced deep low-power sub-states, achieving the micro-watt idle power the hand-held segment demands. It also provides an architected mechanism for system software and hardware to autonomously transition I/O devices into low-power link and device states. PCIe architecture enables system software to statically/dynamically allocate different power levels to various I/O subsystems, making performance-power trade-offs based on system I/O usage patterns possible.

Continuous Evolution = Relevance
Continuous evolution to meet compute platforms’ emerging and evolving needs has been the key to PCIe technology’s success. I/O Virtualization was introduced to enable device sharing among multiple virtual machines and to take advantage of the dense compute and the ability to host multiple independent system images in a platform. This approach has been widely adopted in client and server systems.

Transaction processing hints enable networking devices to move data directly to and from the CPU caches, eliminating memory and coherency-management traffic. This function helps to address the aggressive networking bandwidth demand in the data center. Accelerators can deliver high throughput by taking advantage of atomic operations to work collaboratively with each other, as well as the main CPU, without the intervention of host software.

The storage industry is increasing its adoption of PCIe technology, benefitting from power-efficient performance with high-bandwidth scalability and the direct attach model provided from a high fan-out from the CPU. Independent spread spectrum clocking (SRIS) enables cost-effective cables, both for storage and backplane interconnect use.

Conclusion
PCIe architecture is well positioned to continue to be the ubiquitous I/O for a wide range of computing applications in the foreseeable future. Its strength stems from being an open standard, backed by the collective expertise of 730+ member companies. PCI-SIG’s rich and successful experience of navigating several technology transitions in the past three decades will ensure that PCIe technology offers open, scalable, cost-effective, power-efficient, and leading-edge solutions across all market segments and usage models in the compute continuum.


Debendra-Das-Sharma-HeadshotDebendra Das Sharma is a Senior Principal Engineer and Director of the I/O Technology and Standards group in Intel where he oversees interconnect technologies spanning the USB, PCIe, Coherency, MCP, SoC, and Rack Scale Architecture. He is also a member of PCI-SIG Board of Directors, leads the PHY Logical Sub-team, and is an active contributor to the multiple generations of PCI Express specification. Debendra joined Intel in 2001 and led the development of several server chipsets, including the integration of PCIe Gen 3 into the server CPU chips. Prior to joining Intel, he was with Hewlett-Packard working on the development of server chipsets, including Superdome.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • TwitThis