New HSA Spec Legitimizes AMD’s CPU+GPU Approach

After nearly 3 years since the formation of the Heterogeneous System Architecture (HSA) Foundation, the consortium releases 1.0 version of the Architecture Spec, Programmer’s Reference Manual, Runtime Specification and a Conformance Plan.

Note: This blog is sponsored by AMD.

HSA banner

 

UPDATE 3/17/15: Added Imagination Technologies as one of the HSA founders. C2

No one doubts the wisdom of AMD’s Accelerated Processing Unit (APU) approach that combines x86 CPU with a Radeon graphic GPU. Afterall, one SoC does it all—makes CPU decisions and drives multiple screens, right?

True. Both AMD’s G-Series and the AMD R-Series do all that, and more. But that misses the point.

In laptops this is how one uses the APU, but in embedded applications—like the IoT of the future that’s increasingly relying on high performance embedded computing (HPEC) at the network’s edge—the GPU functions as a coprocessor. CPU + GPGPU (general purpose graphics processor unit) is a powerful combination of decision-making plus parallel/algorithm processing that does local, at-the-node processing, reducing the burden on the cloud. This, according to AMD, is how the IoT will reach tens of billions of units so quickly.

Trouble is, HPEC programming is difficult. Coding the GPU requires a “ninja programmer”, as quipped AMD’s VP of embedded Scott Aylor during his keynote at this year’s Embedded World Conference in Germany. (Video of the keynote is here.) Worse still, capitalizing on the CPU + GPGPU combination requires passing data between the two architectures which don’t share a unified memory architecture. (It’s not that AMD’s APU couldn’t be designed that way; rather, the processors require different memory architectures for maximum performance. In short: they’re different for a reason.)

AMD’s Scott Aylor giving keynote speech at Embedded World, 2015. His message: some IoT nodes demand high-performance heterogeneous computing at the edge.

AMD’s Scott Aylor giving keynote speech at Embedded World, 2015. His message: some IoT nodes demand high-performance heterogeneous computing at the edge.

AMD realized this limitation years ago and in 2012 catalyzed the HSA Foundation with several companies including ARM, Texas Instruments, Imagination Technology, MediaTek, Qualcomm, Samsung and others. The goal was to create a set of specifications that define heterogeneous hardware architectures but also create an HPEC programming paradigm for CPU, GPU, DSP and other compute elements. Collectively, the goal was to make designing, programming, and power optimizing easy for heterogeneous SoCs (Figure).

Heterogeneous systems architecture (HSA) specifications version 1.0 by the HSA Foundation, March 2015.

The HSA Foundation’s goals are realized by making the coder’s job easier using tools—such as an HSA version LLVM open source compiler—that integrates multiple cores’ ISAs. (Courtesy: HSA Foundation; all rights reserved.) Heterogeneous systems architecture (HSA) specifications version 1.0 by the HSA Foundation, March 2015.

After three years of work, the HSA Foundation just released their specifications at version 1.0:

  • HSA System Architecture Spec: defines H/W, OS requirements, memory model (important!), signaling paradigm, and fault handling.
  • Programmers Reference Guide: essentially a virtual ISA for parallel computing, defines an output format for HSA language compilers.
  • HSA Runtime Spec: is an application library for running HSA applications; defines INIT, user queues, memory management.

With HSA, the magic really does happen under the hood where the devil’s in the details. For example, the HSA version LLVM open source compiler creates a vendor-agnostic HSA intermediate language (HSAIL) that’s essentially a low-level VM. From there, “finalizers” compile into vendor-specific ISAs such as AMD or Qualcomm Snapdragon. It’s at this point that low-level libraries can be added for specific silicon implementations (such as VSIPL for vector math). This programming model uses vendor-specific tools but allows novice programmers to start in C++ but end up with optimized, performance-oriented, and low-power efficient code for the heterogeneous combination of CPU+GPU or DSP.

There are currently 43 companies involved with HSA, 16 universities, and three working groups (and they’re already working on version 1.1). Look at the participants, think of their market positions, and you’ll see they have a vested interest in making this a success.

In AMD’s case, as the only x86 and ARM + GPU APU supplier to the embedded market, the company sees even bigger successes as more embedded applications leverage heterogeneous parallel processing.

One example where HSA could be leveraged, said Phil Rogers, President of the HSA Foundation, is for multi-party video chatting. An HSA-compliant heterogeneous architecture would allow the processors to work in a single (virtual) memory pool and avoid the multiple data set copies—and processor churn—prevalent in current programming models.

With key industry players supporting HSA including AMD, ARM, Imagination Technologies, Samsung, Qualcomm, MediaTek and others, a lot of x86, ARM, and MIPS-based SoCs are likely to be compliant with the specification. That should kick off a bunch of interesting software development leading to a new wave of high performance applications.

SMARC: ARM’d for a Power Play

ARM is migrating into the embedded board market, at the expense of x86 designs.

ARM is migrating into the embedded board market, at the expense of x86 designs.

In the world of multicore, it’s hard to get more cores than the quads now shipping in the latest smartphones, most of which are based upon ARM. But what about the board-level embedded market that I follow more closely?

You know it’s a foregone conclusion that ARM’s going to win the low power wars here too when even the x86 PC/104 vendors start musing about the need for ARM roadmaps.

 

WinSystems VP Bob Burckle spins a PC/104 board. The company is considering adding ARM processors to its predominantly x86-based boards.

WinSystems VP Bob Burckle spins a PC/104 board. The company is considering adding ARM processors to its predominantly x86-based boards.

In my discussion with WinSystems–a company that helps drive usually Intel-focused x86 trade consortia–Bob Burckle ponders an open standard form factor for ARM-based single board computers.  .

I’ve come to learn that ADLINK, Congatec, Kontron and others have pushed the very concept of ARM-based SBCs through the Standardization Group for Embedded Technologies (SGET) in a computer-on-module (COM) standard they’re calling Smart Mobility ARChitecture SMARC version 1.0.

Smart Mobility Architecture (SMARC) is a COM processor module ideally suited for ARM processors.

Smart Mobility Architecture (SMARC) is a COM processor module ideally suited for ARM processors. (Courtesy: Standardization Group for Embedded Technologies, SGET.org.)

It comes in 82mm x 50mm and 82mm x 80mm flavors, and Kontron is already implementing it for aircraft passenger In-Flight Entertainment systems.Figure 2 Kontron IFE plane cut-away

Look for ARM processors on PC/104, VME, COM Express…and SMARC boards soon. Choices will be from Texas Instruments, Atmel, Qualcomm, NVIDIA, Xilinx, and even AMD (which licensed the ARM for security engines in its APUs).

Kontron SMARC-sAT30 is a low profile platform based SMARC specification and integrates the 1.2 GHz NVIDIA Tegra 3 quad-core ARM processor (Cortex A9).

Kontron SMARC-sAT30 is a low profile platform based SMARC specification and integrates the 1.2 GHz NVIDIA Tegra 3 quad-core ARM processor (Cortex A9).