Delivering Carrier Grade OCP to Telco Data Centers



Carriers as well as web service companies have some things in common (and some differences) when it comes to capitalizing on a common set of hardware solutions.

Many communications service providers (CSPs) are looking to adopt COTS hardware and virtualize many of their applications, deploying these diverse workloads on a common pool of hardware resources (Figure 1). The potential savings of COTS computing and networking hardware is also creating great interest in the latest data center innovation: open compute technologies.

Pioneered and promoted by the Open Compute Project (OCP), these technologies focus on the most efficient and economical ways of scaling COTS computing infrastructure. Founded by Facebook, the OCP’s original objective was to guide, from the ground up, design of the most cost-efficient data center infrastructure.

Figure 1:  Virtualization enables CSPs to run many applications on more cost-effective standardized, multi-vendor hardware. Deploying cloud technologies on this COTS hardware enables greater agility in service delivery.

Facebook sought a new rack-scale architecture that would use generic servers to make its new data centers as low cost and efficient as possible. The company viewed traditional rackmount servers as falling short in several areas.

Terms and Acronyms
POD—A logical and/or physical collection of racks within a shared infrastructure management domain.
POD Manager—The software that manages logical groupings of functionality across all infrastructure in a pod.
RMM—Rack Management Module. A physical system element that is responsible for managing the rack, which normally assigns IDs for the instances of PSME in the rack, and manages rack power and cooling.
PSME—Pooled System Management Engine. System management software that runs on the DMC and is responsible for the configuration of pooled storage modules by the Pooled Node Controller (PNC), the network (SDN) the compute modules, and the switches.
MMC—Module Management Controller. The controller that manages the blades in the module.
BMC—Baseboard Management Controller. A specialized service processor that monitors the physical state of a computer and provides services to monitor and controls certain compute/ storage module operations.
ME—Management Engine. A physical hardware resource that gives access to hardware features at the baseboard level below the operating system.
BIOS—Basic Input/Output System. Firmware that initializes and tests compute/storage module hardware components and loads a boot loader or an operation system from a mass memory device.

Traditional Rackmount Server Drawbacks

  • Each has a unique form factor, I/O, and management system, locking a company into a single supplier.
  • These servers typically have individual AC-DC power supplies, increasing cabling and power costs and limiting the ability for centralized management.
  • A short supply of real estate is available, curtailing attempts to increase functionality.

Facebook wanted a new rack-scale solution in which all servers would be identical no matter what company manufactures them. Servers needed to be powered, plugged into the rack, and cabled in the same manner. Determined to remove anything that didn’t contribute to efficiency, Facebook even had manufacturers remove server faceplates and other metalwork, choosing to handle the regulatory EMC shielding at the facility level instead of at the server or rack level.

Today, the goal of the OCP is to spark a collaborative dialog and effort among peers on OCP technology, collectively developing the most efficient computing infrastructure possible. Project focus includes addressing servers, storage, networking, hardware management, Open Rack (a rack standard), data center design, and certifications for solution providers.

Leveraging Open Compute and Open Rack
Some applications must be hosted in central offices or similar environments at the edge of the network. And many CSPs want to use the principles of the OCP in their network infrastructure, but the specification doesn’t lend itself to CSP’s maintenance or other equipment practices.

So, a group of carriers and technology vendors have collaborated to leverage Open Compute and Open Rack as a base model, but to adapt them for telecom central office environments and carrier grade building practices. The result is the CG-OpenRack-19 specification, which has been designated OCP-ACCEPTED™.

Figure 2:  Front and back of a CG-OpenRack-19 system (CG-OPENRACK-19 Image Courtesy of Pentair-Schroff).

CG-OpenRack-19 is a scalable carrier grade rack-level system that integrates high-performance compute, storage, and networking in a standard rack (Figure 2). CG-OpenRack-19 brings the OCP to carriers, tracking (but de-coupled from) changes driven by web companies and allowing compute, storage, and acceleration to scale independently. The capital expense (CAPEX) is driven down by flattening the supply chain, using OCP economies of scale and driving competition through an open source specification, while OPEX benefits from lower power consumption and a reduced maintenance overhead.

Figure 3:  The anatomy of a typical CG-OpenRack-19 compute sled.

Functional Elements
There are six major system elements (Figures 3 and 4):

  1.  System rack (19-inch)
  2. Power conversion and distribution via dual 12V bus bars
  3. White-box top-of-rack switches for optimized cable handling
  4. Two sizes of open bays for compute and storage elements (full- and half-width)
  5. A sled can be full- or half-width, each of which includes a single optical header on the back for connectivity
  6.  Pre-wired blind mate optical backplane

Figure 4:  Functional elements of the CG-OpenRack-19 systems.

CG-OpenRack-19 Specification Management Strategy
The CG-OpenRack-19 specification requires each sled to have a dedicated baseboard management controller (BMC) for various out-of-band platform management services, which is fully IPMI 2.0 and DCMI 1.5 compliant. The specification goes on to stipulate certain conditions that the BMC should meet, but implementation details are left to the developer.

Beyond the CG-OpenRack-19 specification is another open, industry-standard platform management specification and scheme called Redfish®, managed by the Distributed Management Task Force (DMTF). See Figure 5.

Redfish is a hierarchical pod/data center management tool, where a pod is a pool of compute resources. It recognizes that the scale-out hardware usage model differs from that of traditional enterprise platforms and requires a new approach to management.

While Redfish is not a part of the CG-OpenRack-19 specification, it is one of the most popular platform management approaches and is a requirement for a solution to be Intel® Rack Scale Design (Intel® RSD) compliant.

Figure 5: Redfish® pod logical hierarchy. See terms and acronyms.

 

Comparing CG-OpenRack-19 to OCP
There are some key differences between the OCP specification and carrier grade OCP as implemented in CG-OpenRack-19. For example, while web companies are comfortable with rack-level field replaceable units (FRUs), carriers with many more physical sites need to reduce costly on-site installation, maintenance, and repair activities. These costs are being eliminated through simpler hardware design. One example of such design is sled-level field replaceable units with a minimal-touch environment for service operatives.

Other differences found when comparing CG-OpenRack-19 and OCP include the following:

Physical:

  • Suitable for current central office and new telco data center environments
  • 19-inch rack-mounting defined (versus 21-inch in OCP)
  • Standard rack unit (RU) spacing
  • 1000 to 1200 mm cabinet depth, supporting GR-3160 spatial requirements
  • EMI shielding at the sled level
  • Terabit-capable blind mate optical cabled backplane with the ability to individually hot swap sleds
  • Consistent hardware user interface across different vendors to shorten the learning curve
  • Option for central office seismic, acoustic, and safety standards (NEBS)

System Management

  • Ethernet based out-of-band (OOB) device management network connecting all nodes and power shelf via a top-of-rack (TOR) switch
  • Ethernet based OOB application management network connecting all nodes via a TOR switch
  • Optional rack-level platform manager

Networking/Interconnect

  • One or more Ethernet TOR switches for I/O aggregation to nodes
  • Pre-cabled design—fiber cables in rack, blind mate to node with flexible interconnect mapping
  • Durable, blind-mating coupling connectors provide for rapid insertion/extraction and prevent accidental damage or incorrect placement

Figure 6: CG19-GPU Sled from Artesyn Embedded Technologies

What do Carriers See in CG-OpenRack-19?

Volume
As an example, an operator with 150 data centers might install 40 racks in each location, which equates to 6,000 racks, which is still tiny in scale compared to web companies, such as Facebook, Google, and Amazon. So, an open specification enables more customers to adopt the same specification and combine their purchasing power to drive down cost. A homogenous payload form factor simplifies training and maintenance, ensures no vendor lock, and facilitates technology upgrades.

Time-to-market
Verizon has estimated that by adopting the CG-OpenRack-19 specification, it has cut the deployment time for a typical 30-frame installation from two months to just two days.[1] This has significant benefits to the agility of the business and the services that carrier provides.

Multiple processor architectures
The speed with which different processor architectures can be implemented in the specification supports the different workloads of a carrier versus the more homogeneous workloads of a typical web company.

Agility
Bringing open source principles to hardware sourcing enables CSPs to move to a software-centric business model and focus on the applications and services that will differentiate them. A disaggregated hardware architecture allows individual elements, such as compute and storage, to be scaled independently according to the needs of the application.

Multi-vendor interoperability
The CG-OpenRack-19 initiative aims to build not just an ecosystem, but to ensure interoperability between elements from different vendors. Artesyn designed a GPU accelerator sled with up to four NVIDIA cards for a CG-OpenRack-19 deployed rack provided by a third party. The carrier plugged in the sled, and it worked correctly first time. Anyone with experience in telecom system integration will know that doesn’t happen every time!

A proven model
Facebook has been able to achieve savings with OCP data centers that use 3 percent less energy to do the same workload as other data centers, and at 24 percent less cost.

Over three years, the company achieved enough energy savings to power 80,000 homes. Carbon emission reductions of around 400,000 metric tons is equivalent to taking 95,000 cars off the road .[2]

Suitability
CG-OpenRack-19 was conceived with the deployment and maintenance needs of communication service providers in mind. An example of the ease of maintenance is that sleds can be inserted/extracted in a matter of seconds. An example of the scalability of the networking infrastructure is that the Ethernet interfaces between compute, storage, and other sleds can support from 1G to 100G, as needed.

Co-location is also an issue for many CSPs, who wish to deploy open compute technologies in existing data centers, side-by-side with other equipment. In that context, the equipment must have its own EMI shielding.

One Infrastructure. Any Workload.
CSPs are embarking on an exciting period of business transformation. The ability to use high-volume COTS servers to implement cloud technologies, such as virtualization and OpenStack, will help them reduce CAPEX and OPEX, unleash new flexibility and elasticity in their operations, and radically improve their time to market for new services.

While OCP provides an excellent solution for the enterprise data center, CSPs require a higher grade of hardware platform designed to meet their more challenging needs for low-latency performance, bandwidth scalability, reliability and serviceability, and regulatory and safety compliance.

Considering how critical and specialized their current equipment is, the shift requires a carefully managed transition.

Working with the OCP and other industry bodies, companies like Artesyn have developed solutions that will meet CSP requirements with standardized architectures such as CG-OpenRack-19 (Figure 6). Through a performance-optimized solution that maximizes data flows to virtualized applications while maintaining high reliability, CSPs in the future should easily and confidently be able to implement open compute solutions.

These solutions will enable open compute successes not only for their enterprise data center needs, but also for NFV solutions and a range of new innovative services that will help them better compete in the cloud provider and communications industries.


Todd Wynia is Vice President of Communications Products for Artesyn Embedded Technologies. He has written a number of white papers on industry standards and the telecom industry as well as serving on the board of CP-TA, VITA and participating extensively in the PCI Industrial Computer Manufacturers Group (PICMG). Todd is a graduate of the University of Wisconsin, where he earned his B.S. in economics with a math emphasis.

[1] https://www.rcrwireless.com/20170310/carriers/verizon-touts-data-center-simplification-from-carrier-grade-rack-platform-tag2

[2] https://www.facebook.com/notes/facebook-engineering/building-efficient-data-centers-with-the-open-compute-project/10150144039563920/

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • TwitThis