Multicore Comes of Age

The move to multicore is now well on its way, in applications from smartphones to networking equipment, and the door is even cracking open for safety-critical applications.

Multicore is becoming an expectation in nearly every type of embedded system. While hurdles still exist in the most stringent safety-critical arenas, the technology is beginning to make inroads even there. Our roundtable panel addresses the ongoing challenges for developers and new specifications and tools to address them, as well as exciting new developments in virtualization and high-speed interconnects Participants in this roundtable are Bill Graham, director of product marketing for Wind River; Pekka Varis, CTO for ARM and DSP Processors at Texas Instruments; and Mark Throndson, director of business development at Imagination.

EECatalog: As recently as 4 years ago we had dual cores and programmers were still confused as to how to program them. Fast-forward to today’s 8-core processors—how are developers coping?

Bill Graham PhotoBill Graham, Wind River: Although challenges still exist with programming for the true parallelism that multicore processors bring, embedded developers are realizing that multicore processors are offering different architecture options that weren’t available before. Rather than trying to reprogram their application to make use of 4, 8, 16 or more cores, they are porting their applications as-is to a single core and leveraging virtualization and the improved processing to power ratio to consolidate multiple systems. Dealing with multicore at the higher architecture level is paying off in significant ways despite the new programming challenges.

Pekka Varis TI_onlinePekka Varis, Texas Instruments: Developers for certain applications are coping just fine, but there is not one single silver bullet. Take networking for example: there are several implementations that achieve linear or nearly linear scaling with the numbers of cores. One of the more commonly known ones is 6WindGate networking stack, but similar approaches have been used before and are used in parallel. In the ARM® world, Linaro™ is standardizing this under the name OpenDataPlane (ODP). In the embedded world with a more computational focus, (for example scaling Fast Fourier Transforms (FFT)) standardization is with OpenMP on TI KeyStone multicore DSPs. However, many applications relying on heavy reuse of software programmed years ago remains a problem for developers.

Mark ThrondsonMark Throndson, Imagination: The long-standing methods of achieving higher performance through increasing clock frequency, process migration and maximizing single-thread CPU performance have yielded diminishing returns for some time. The move to multicore was a natural progression to enable performance scaling. Today, there is enough parallelism in many software workloads to readily make use of multiple cores or threads in a CPU. And given that this is a key path for increasing CPU performance, there will be a continued focus on increasing the parallelism in the software. However, thanks to the advancements in heterogeneous compute, developers today can also use the GPU for the heavy lifting part of an algorithm. Because the GPU is inherently a massively multi-threaded machine, it can handle these types of tasks much more efficiently.

EECatalog: What is the impact of the Multicore Association’s new SHIM spec? What else is in the works that will improve the efficiency of multicore/many-core programming?

Graham, Wind River: Although it’s an emerging specification, SHIM does promise an open standard for defining multicore architecture descriptions for software and tools. As we enter the many-core era, programming will require tool support in order to succeed; SHIM is working to make this possible. The impact of SHIM and other standards the Multicore Association is working on will create an open standards-based environment for programmers, tools developers and vendors. The Multicore Association’s other working groups such as the MCAPI, are furthering this standardization effort. In a similar vein, Intel has been promoting various programming options for multicore and many-core, such as OpenMP, Intel Thread Building Blocks and most recently Cilk Plus.

Varis, Texas Instruments: Multicore Association SHIM is in an early stage. Unifying the low-level interface to allow tools and compilers to leverage what infrastructure is on a given multicore device should enable tools to focus on the bigger issues rather than SoC-specific nuances. However, for SHIM to have a significant impact it will be important that it is part of a widely adopted standard.

EECatalog: What is changing in the use of multicore processors in safety-critical applications? How are new RTOSes with separate kernel and hypervisor impacting that?

Graham, Wind River: The significant change is the acceptance of multicore-based systems by the safety certification agencies around the world. A significant hurdle is designing and developing multicore systems that meet the strict safety standards such IEC 61508 and DO-178B/C. Solutions that are offering already-proven separation technology such as ARINC-653 on multicore platforms are the most likely to succeed. Similarly, bare-metal hypervisors are also being accepted in safety-critical design. In many ways, these virtualization and partitioning technologies, already proven on single-core systems, are the key to success in the multicore transition for safety-critical systems.

Throndson, Imagination: Traditionally, ensuring reliability and security in many safety-critical applications was done by separating tasks onto multiple independent CPUs. By running an RTOS with separate kernel and hypervisor on a CPU platform, the separation and prioritization of tasks can potentially be done securely and reliably on one CPU. This can be done as a software only, or para-virtualized implementation; however using a CPU family with hardware virtualization can minimize the overhead as well as leverage the use of existing software. This doesn’t mean that multicore isn’t necessary; it simply means that performance increases in importance as a motivation for a multicore implementation.

EECatalog: What interesting virtualization capabilities relying on multicore are taking place?

Graham, Wind River: Multicore processors are the enabling technology for virtualization. Although completely functional on single-core systems, the improved processing to power (and size and weight) ratio means that use cases for virtualization are more compelling today. The addition of hardware support for virtualization is removing most of the overhead and real-time responsiveness and latency is now possible with embedded virtualization. Probably the most exciting thing that virtualization brings is the new architecture options embedded developers enjoy. Consolidation of multiple systems into one: for example, industrial control systems with hundreds of programmable logic controllers, user interface, data acquisition and network gateway can be integrated on a single platform. In networking infrastructure, network functions virtualization (NFV) has taken off as a way to consolidate multiple, complex and expensive custom hardware pieces into the equivalent software services running on IT, server-grade hardware. Multicore processing is the key enabler that has made virtualization a reality in these new applications.

Varis, Texas Instruments: In the embedded space, multicore devices have always needed an element of virtualization to allow sharing of peripherals and accelerators. Initial approaches relied on multiple sets of registers, one per core, or later on, one per virtual machine. But for the higher end cores such as ARM® Cortex®-A series with deep out-of-order pipelines, any accesses outside cached memory carries a performance cost.

Ring-type structures in memory coherence for the high-end cores and the I/O seem to be promising, but for simpler cores and deterministic applications, hardware queues make sense. Regardless, the hardware and associated direct memory access must be able to parse and understand the same structures as software.

Throndson, Imagination: Virtualization has traditionally been used in servers needing many nodes operating separately in parallel, but the use of this technology is expanding into an increasing variety of applications. It can also be used for mixed-mode Linux and real-time environments in separate domains, with open source application processing in one, with real-time, latency-sensitive tasks running in another, with QoS/priority.v

There are many high-volume, consumer-oriented devices depending on multicore levels of performance today including smartphones, tablets, high end TVs and the list goes on. Virtualization becomes compelling as a solution in these applications to address the growing requirement for a scalable security implementation. It provides a scheme for independent, secure domains for DRM, content protection, transactions, identity protection and so on. We’ve recognized the importance of hardware virtualization, and that’s why it’s a foundational technology across the entire range of our MIPS Series5 Warrior CPUs.

EECatalog: What is the impact of inter-processor communication (IPC) busses like Intel’s Quickpath or Xilinx’s RocketIO or even Serial RapidIO with its RDMA? Do they extend the concept of closely-coupled multicore to non-contiguous/non-homogeneous “multiple cores”?

Graham, Wind River: These interconnected technologies promise to provide the same or better data rates that are achieved by processor busses including PCIe and fully buffered memory. This means that multicore can extend beyond a single physical processor to many processors without significant communication overhead. So yes, these extremely fast busses will enable multicore to many of the same or different (heterogeneous multicore) types of CPUs and I/O devices. As the consolidation use case becomes more and more prevalent, combining multiple systems with multiple CPU architectures on the same circuit boards is desired. Quickpath, RocketIO and SRIO make this a reality.

Varis, Texas Instruments: Interchip busses have been used in embedded systems for a while. There are systems from dozens of processors connected with SRIO to build a base station, to supercomputers with thousands of processors. Typically the proprietary interconnect such as the two above or TI’s Hyperlink, leverage high-speed SERDES and impose some restrictions to achieve more bandwidth or lower power per bandwidth than a standard interconnect. However, this approach creates fragmentation and interoperability barriers.

Serial RapidIO has a lot of good attributes, and it is a standard that has been proven to interwork, although some of the players in the market might prefer the fragmentation and success of proprietary technologies.

coupe_cherylCheryl Berglund Coupé is managing editor of Her articles have appeared in EE Times, Electronic Business, Microsoft Embedded Review and Windows Developer’s Journal and she has developed presentations for the Embedded Systems Conference and ICSPAT. She has held a variety of production, technical marketing and writing positions within technology companies and agencies in the Northwest.

Share and Enjoy:
  • Digg
  • Sphinn
  • Facebook
  • Mixx
  • Google
  • TwitThis
Extension Media websites place cookies on your device to give you the best user experience. By using our websites, you agree to placement of these cookies and to our Privacy Policy. Please click here to accept.