Delivering 1 GIOPS Per Server with ReRAM

Tackling server design with low-latency/low-power storage subsystems based on new storage class memory which removes the bottleneck from the compute/storage side.

Hyper-converged infrastructure (HCI) is disrupting the traditional storage and data center markets because it creates a way for organizations to consolidate their systems to cut back on costs and management. According to Gartner, “The market for hyper-converged integrated systems (HCIS) will grow 79 percent to reach almost $2 billion in 2016, propelling it toward mainstream use in the next five years. HCIS will be the fastest-growing segment of the overall market for integrated systems, reaching almost $5 billion, which is 24 percent of the market, by 2019.”

What Is HCI?
Hyper-converged infrastructure (HCI) is a framework that combines storage, computing, and networking into a single system so organizations can reduce the complexity of their data centers through virtualization and software defined storage and networking.

Pain Points of Hyper-Convergence Infrastructure
With the evolution of HCI, there are new challenges. One challenge with hyper-converged infrastructure is that it changes the scale out dynamic. The basic elements of compute and storage cannot scale independently anymore. As a result, scale out is achieved by adding a new node, which introduces new bottlenecks.

Hyper-converged applications require multi-million IOPs storage performance due to the intensive I/O workload. And yet current SSD technologies based on NAND Flash memory introduce significant latency, at 100µs to 200µs for a read I/O. To overcome the limitations of NAND Flash, IT architects developed techniques such as massive parallelization and distributed workload to compensate for those limitations by having data storage accesses split across multiple NAND Flash components. Now that the servers are moving towards hyper-convergence, it will become very challenging to hide the inherent limitations of NAND Flash to the application level.

ReRAM Revolutionizing Data Storage Technology
New technologies such as Resistive Memories (ReRAM) are coming into the market that will slash latency to less than 10 microseconds, resulting in new products such as ultra-fast NVMe SSDs. Latencies will drop even further if designers use the memory bus as the physical interface rather than PCIe. Storage devices that will sit on the memory bus will be NV-DIMMs, providing latencies under the microsecond range.

While the substantial performance and power benefits of ReRAM can address the storage part of the equation, this new product category will require a fresh look at the CPU/compute side as well. System resources will continue to be consumed by the storage I/O. A new architecture will be necessary to ensure compute capabilities meet application and network interface needs while keeping power consumption low and bandwidth high.

Figure 1 illustrates an example of a bottleneck on the compute/storage side, where most of the resources are used for the storage IOs, such that there are not enough computing capabilities for the application and for the network interface. About 3.3 cores running at full time in a high-end CPU are required to manage 1 MIOPS with NVMe devices, which is an expensive power and cost budget. A typical 2U server integrates 24 SSDs, that leads to 18 MIOPS using 750k IOPS SSDs assuming the application requires a high queue depth. Therefore, 18×3.3 = 60 cores are required just for the IO management, which is 75% of the resources of a high-end 4-CPU-based architecture, as Figure 1 shows. In case the IOPS need to go over the network, the related throughput is in the range of 18M x 4096 x 8 = 600Gbit/s, which corresponds to 15 40GbE ports.

Figure 1: 18 MIOPS Flash-based NVMe storage system

The use of an Arm® RISC CPU provides enough computing capabilities for the I/O management, while keeping a low -power consumption and enough bandwidth for the application and network driver. The combination of the Crossbar ReRAM, through NVMe or NV-DIMM storage devices, and Arm RISC CPUs, successfully addresses IOPS and power consumption. Assuming that an Arm RISC CPU will be available in a reasonable power budget with 64 cores and 4 memory channels, we can estimate that a hyper-converged node can reach 12.5 MIOPS within a 100W power budget (figure 2). Because accessing a DIMM interface is simpler than accessing a PCIe device, we can estimate that the storage software driver will be faster to execute compared to the NVMe driver, leading to 1 MIOPS per core for the IO management. Due to the small form factor, about 20 of such nodes could be integrated in a 2U chassis, leading to a 250 MIOPS 2U hyper-converged server, with a 2kW power budget.

Figure 2: 12.5MIOPS Hyperconverged Node

In this case, getting the IOPS over the network represents a very high bandwidth: 250M x 4096 x 8 = 8Tbit/s, even if only 18% of the CPU resources are used for the IO management.

Coming back to the user level, as an example of a virtualization use case, such a server can execute 83,000 Virtual Machines (VM) in parallel (3000 IOPS/VM). In a current flash-based 2U hyper-converged server, integrating 24 2.5″ SSDs at 750k IOPS per SSD, only 6,000 VMs can be executed, such that 14 servers are needed to execute the same VM number. The Crossbar ReRAM provides about 15x improvement for the I/O performance density (up to 125 MIOPS/U), and performance efficiency (125k IOPS/W) at server level.

Figure 3: 83000 VMs on hyperconverged servers

For the same VM number, users will benefit from a reduced TCO due to a more integrated solution delivering the same performance with less space, less power, and fewer software licenses.

Table 1

R&D efforts are required on the compute/network side to get network interfaces in the few Tb/s range, and on the software side in order to reduce the storage driver execution time where the Crossbar ReRAM enables smaller I/O (512B or even lower, see Table 1), which can be used in big data analytics and OLTP data base applications, leading to 1 GIOPS per U in the server.

Sylvain Dubois is Vice President of Strategic Marketing and Business Development, Crossbar, Inc. He joined the Crossbar management team in 2013 as Vice President of Strategic Marketing and Business Development. With over 17 years of semiconductor experience in business development and strategic product marketing, he brings a proven ability to analyze market trends, identify new, profitable business opportunities and create precise product positioning that is in perfect sync with market demands to drive market share leadership and business results.

Dubois holds a Master of Science in Microelectronics from E.S.I.E.E. (Paris), University of Southampton (UK) and Universidad Pontifica Comillas (Spain).