AMD Targets Embedded Graphics

As the PC market flounders, AMD continues focus on embedded, this time with three (3) new GPU families.

The widescreen LCD digital sign at my doctor’s office tells me today’s date, that it’s flu season, and that various health maintenance clinics are available if only I’d sign up. I feel guilty every time.

An electronic digital sign, mostly text based. (Courtesy: Wikimedia Commons.)

An electronic digital sign, mostly text based. (Courtesy: Wikimedia Commons.)

These kind of static, text-only displays are not the kind of digital sign that GPU powerhouses like AMD are targeting. Microsoft Windows-based text running in an endless loop requires no graphics or imaging horsepower at all.

Instead, high performance is captured in those Minority Report multimedia messages that move with you across multiple screens down a hallway; the immersive Vegas-style electronic gaming machines that attract senior citizens like moths to a flame; and the portable ultrasound machine that gives a nervous mother the first images of her baby in HD. These are the kinds of embedded systems that need high-performance graphics, imaging, and encode/decode hardware.

AMD announced three new embedded graphics families, spanning low power (4 displays) ranging up to 6 displays and 1.5 TFLOPs of number crunching for high-end GPU graphics processing.

AMD announced three new embedded graphics families, spanning low power (4 displays) ranging up to 6 displays and 1.5 TFLOPs of number crunching for high-end GPU graphics processing.

Advanced Micro Devices wants you to think of their GPUs for your next embedded system.

AMD just announced a collection of three new embedded graphics processor families using 28nm process technology designed to span the gamut from multi-display and low power all the way up to a near doubling of performance at the high end.  Within each new family, AMD is looking to differentiate from the competition at both the chip- and module/board-level. Competition comes mostly from Nvidia discrete GPUs, although some Intel processors and ARM-based SoCs cross paths with AMD. As well, AMD is pushing its roadmap quickly away from previous generation 40nm GPU devices.

Comparison between AMD 40nm and 28nm embedded GPUs.

Comparison between AMD 40nm and 28nm embedded GPUs.

A Word about Form Factors

Sure, AMD’s got PC-card plug-in boards in PCI Express format—long ones, short ones, and ones with big honking heat sinks and fans and plenty of I/O connections. AMD’s high-end embedded GPUs like the new E8870 Series are available on PCIe and boast up to 1500 GFLOPs (single precision) and 12 Compute Units. They’ll drive up to 6 displays and burn up to 75W of power without an on-board fan, yet since they’re on AMD’s embedded roadmap—they’ll be around for 5 years.

An MXM (Mobile PCIe Module) format PCB containing AMD’s mid-grade E8950 GPU.

An MXM (Mobile PCIe Module) format PCB containing AMD’s mid-grade E8950 GPU.

Compared to AMD’s previous embedded E8860 Series, the E8870 has 97% more 3DMark 11 performance when running from 4GB of onboard memory. Interestingly, besides the PCIe version—which might only be considered truly “embedded” when plugged into a panel PC or thin client machine—AMD also supports the MXM format.  The E8870 will be available on the Type B Mobile PCI Express Module (MXM) that’s a mere 82mm x 105mm and complete with memory, GPU, and ancillary ICs.

Middle of the Road

For more of a true embedded experience, AMD’s E8950MXM still drives 6 displays and works with AMD’s EyeFinity capability of stitching multiple displays together in Jumbotron fashion. Yet the 3000 GFLOPs (yes, that’s 3000 GFLOPs peak, single precision) little guy still has 32 Compute Units, 8 GB of GPU memory, and is optimized for 4K (UHD) code/decoding. If embedded 4K displays are your thing, this is the GPU you need.

Hardly middle of the road, right? Depending upon the SKU, this family can burn up to 95W and is available exclusively on one of those MXM modules described above. In embedded version, the E8950 is available for 3 years (oddly, two fewer than the others).

Low Power, No Compromises

Yet not every immersive digital sign, MRI machine, or arcade console needs balls-to-the-wall graphics rendering and 6 displays. For this reason, AMD’s E6465 series focuses on low power and small form factor (SFF) footprint. Able to drive 4 displays and having a humble 2 Compute Units, the series still boasts 192 GFLOPs (single precision), 2 GB of GPU memory, 5 years of embedded life, but consumes a mere 20W.

The E6465 is available in PCIe, MXM (the smaller Type A size at 82mm x 70mm), and a multichip module. The MCM format really looks embedded, with the GPU and memory all soldered on the same MCM substrate for easier design-in onto SFFs and other board-level systems.

More Than Meets the Eye

While AMD is announcing three new embedded GPU families, it’s easy to think the story stops with the GPU itself. It doesn’t. AMD doesn’t get nearly enough recognition for the suite of graphics, imaging, and heterogeneous processing software available for these devices.

For example, in mil/aero avionics systems AMD has a few design wins in glass cockpits such as with Airbus. Some legacy mil displays don’t always follow standard refresh timing, so the new embedded GPU products support custom timing parameters. Clocks like Timing Standard, Front Porch, Refresh Rate and even Pixel Clocks are programmable—ideal for the occasional non-standard military glass cockpit.

AMD is also a strong supporter of OpenCL and OpenGL—programming and graphics languages that ease programmers’ coding efforts. They also lend themselves to creating DO-254 (hardware) and DO-178C (software) certifiable systems, such as those found in Airbus military airframes. Airbus Defence has selected AMD graphics processors for next-gen avionics displays.

Avionics glass cockpits, like this one from Airbus, are prime targets for high-end embedded graphics. AMD has a design win in one of Airbus' systems.

Avionics glass cockpits, like this one from Airbus, are prime targets for high-end embedded graphics. AMD has a design win in one of Airbus’ systems.

Finally, AMD is the founding member of the HSA Foundation, an organization that has released heterogeneous system standard (HSA) version 1.0, also designed to make programmers’ jobs way easier when using multiple dissimilar “compute engines” in the same system. Companies like ARM, Imagination, MediaTek and others are HSA Foundation supporters.



New HSA Spec Legitimizes AMD’s CPU+GPU Approach

After nearly 3 years since the formation of the Heterogeneous System Architecture (HSA) Foundation, the consortium releases 1.0 version of the Architecture Spec, Programmer’s Reference Manual, Runtime Specification and a Conformance Plan.

Note: This blog is sponsored by AMD.

HSA banner


UPDATE 3/17/15: Added Imagination Technologies as one of the HSA founders. C2

No one doubts the wisdom of AMD’s Accelerated Processing Unit (APU) approach that combines x86 CPU with a Radeon graphic GPU. Afterall, one SoC does it all—makes CPU decisions and drives multiple screens, right?

True. Both AMD’s G-Series and the AMD R-Series do all that, and more. But that misses the point.

In laptops this is how one uses the APU, but in embedded applications—like the IoT of the future that’s increasingly relying on high performance embedded computing (HPEC) at the network’s edge—the GPU functions as a coprocessor. CPU + GPGPU (general purpose graphics processor unit) is a powerful combination of decision-making plus parallel/algorithm processing that does local, at-the-node processing, reducing the burden on the cloud. This, according to AMD, is how the IoT will reach tens of billions of units so quickly.

Trouble is, HPEC programming is difficult. Coding the GPU requires a “ninja programmer”, as quipped AMD’s VP of embedded Scott Aylor during his keynote at this year’s Embedded World Conference in Germany. (Video of the keynote is here.) Worse still, capitalizing on the CPU + GPGPU combination requires passing data between the two architectures which don’t share a unified memory architecture. (It’s not that AMD’s APU couldn’t be designed that way; rather, the processors require different memory architectures for maximum performance. In short: they’re different for a reason.)

AMD’s Scott Aylor giving keynote speech at Embedded World, 2015. His message: some IoT nodes demand high-performance heterogeneous computing at the edge.

AMD’s Scott Aylor giving keynote speech at Embedded World, 2015. His message: some IoT nodes demand high-performance heterogeneous computing at the edge.

AMD realized this limitation years ago and in 2012 catalyzed the HSA Foundation with several companies including ARM, Texas Instruments, Imagination Technology, MediaTek, Qualcomm, Samsung and others. The goal was to create a set of specifications that define heterogeneous hardware architectures but also create an HPEC programming paradigm for CPU, GPU, DSP and other compute elements. Collectively, the goal was to make designing, programming, and power optimizing easy for heterogeneous SoCs (Figure).

Heterogeneous systems architecture (HSA) specifications version 1.0 by the HSA Foundation, March 2015.

The HSA Foundation’s goals are realized by making the coder’s job easier using tools—such as an HSA version LLVM open source compiler—that integrates multiple cores’ ISAs. (Courtesy: HSA Foundation; all rights reserved.) Heterogeneous systems architecture (HSA) specifications version 1.0 by the HSA Foundation, March 2015.

After three years of work, the HSA Foundation just released their specifications at version 1.0:

  • HSA System Architecture Spec: defines H/W, OS requirements, memory model (important!), signaling paradigm, and fault handling.
  • Programmers Reference Guide: essentially a virtual ISA for parallel computing, defines an output format for HSA language compilers.
  • HSA Runtime Spec: is an application library for running HSA applications; defines INIT, user queues, memory management.

With HSA, the magic really does happen under the hood where the devil’s in the details. For example, the HSA version LLVM open source compiler creates a vendor-agnostic HSA intermediate language (HSAIL) that’s essentially a low-level VM. From there, “finalizers” compile into vendor-specific ISAs such as AMD or Qualcomm Snapdragon. It’s at this point that low-level libraries can be added for specific silicon implementations (such as VSIPL for vector math). This programming model uses vendor-specific tools but allows novice programmers to start in C++ but end up with optimized, performance-oriented, and low-power efficient code for the heterogeneous combination of CPU+GPU or DSP.

There are currently 43 companies involved with HSA, 16 universities, and three working groups (and they’re already working on version 1.1). Look at the participants, think of their market positions, and you’ll see they have a vested interest in making this a success.

In AMD’s case, as the only x86 and ARM + GPU APU supplier to the embedded market, the company sees even bigger successes as more embedded applications leverage heterogeneous parallel processing.

One example where HSA could be leveraged, said Phil Rogers, President of the HSA Foundation, is for multi-party video chatting. An HSA-compliant heterogeneous architecture would allow the processors to work in a single (virtual) memory pool and avoid the multiple data set copies—and processor churn—prevalent in current programming models.

With key industry players supporting HSA including AMD, ARM, Imagination Technologies, Samsung, Qualcomm, MediaTek and others, a lot of x86, ARM, and MIPS-based SoCs are likely to be compliant with the specification. That should kick off a bunch of interesting software development leading to a new wave of high performance applications.

Does Altera Have “Big Data” Communications on the Brain?

In wireless, wireline and financial “big data” applications, moving all those packets needs prodigious FPGA resources, not all of which Altera had before their recent series of acquisitions, partnerships, and otherwise wheeling-and-dealing.

Chris Balough of Altera (left) interviewed by Andy Frame from ARM. (Courtesy: YouTube.)

Chris Balough of Altera (left) interviewed by Andy Frame from ARM. (Courtesy: YouTube.)

I caught up with an old friend at April’s DESIGN West 2013 conference in San Jose: Chris Balough, Sr Director, Product Marketing for SoC products. I knew Chris from when he was at Triscend (purchased by Xilinx). Chris is now in charge of Altera’s SoC products which are Arria V, Stratix V and Cyclone FPGAs with ARM cores in them which compete with Xilinx’s Zynq devices. Chris shed some light on some of these announcements, but remained mum on what they all might mean taken collectively. I think they add up to something big in “Big Data”.

(Fun facts: Altera’s first “SoC” was Excalibur, no longer recommended for new designs. Altera’s most popular SoC processor is the soft Nios II, sold in roughly 30 percent of production SoCs, says Balough.)

X before A? We’ll See

Subconsciously I think of Xilinx first when the word “FPGA” is flashed in front of me, but Altera’s the company pushing more boundaries of late. Their rat-a-tat machine gun announcements this year got my attention.

In the summer of 2012, I did an interview with Altera’s Sr VP of R&D Brad Howe and he spread out as much of the roadmap on the table as he could. Things like HSA, OpenCL, and better gigabit transceivers were all on the horizon.  Shortly thereafter, Altera extended their  relationship with TSMC to 20nm for Arria and Cyclone FPGAs. Then in early 2013, they rocked the industry by locking up an exclusive FPGA relationship with Intel for the industry’s only production 14nm tri-gate FinFETs.

Spring Cleaning ; Altera’s Getting Ready For…?

Now in Spring 2013, Altera is making headlines like these:

- FPGA Design in the Cloud–Try It, You’ll Like It, Says Plunify. At DAC, Altera and Plunify are pushing cloud-based FPGA design tools. (See our February 2013 article with Plunify here.)

- Altera and AppliedMicro will Cooperate on Joint Solutions for High Growth Data Center Market.  Combines Stratix FPGAs and AppliedMicro’s Server on a Chip devices targeting data centers and optical transport networks (OTN).

- Altera Expands OTN Solution Capabilities with Acquisition of TPACK. Altera buys TPAK from AMCC to provide IP for FPGAs used in OTN for tasks like cross-bar switches used in 10/40/100Gbps PHYs.

- Altera Stratix V GX FPGAs Achieve PCIe Gen3 Compliance and Listing on PCI-SIG Integrators List. Right now, Gen2 and Gen3 PCIe is critical to data centers, cellular base stations, and all manner of high-speed long-haul/back-haul telco gear. Within 12 months, PCIe Gen2/3 will be “table stakes” in all manner of high-performance embedded systems like ATCA- or VME/VPX-based DSP systems for radar, sonar, SIGINT (signals intelligence) or data mining.

- Altera to Deliver Breakthrough Power Solutions for FPGAs with Acquisition of Power Technology Innovator Enpirion.  Maybe Enpirion’s DC-DC converter PowerSoCs with integrated inductors may some day end up inside a Stratix package (perhaps like Xilinx’s stacked chip interposer technology), but for now the two-chip solution reduces board space by 1/7 and simplifies system design considerably. The programmable DC-DC converters provide the multiple power rails–and power-up sequences–needed for big FPGAs.

The blue regions show places where FPGAs are used in wireless basestations.

The blue regions show places where FPGAs are used in wireless LTE basestations. (Courtesy: Altera.)

My Take: Altera’s Move in Big Data

Analysts estimate that nearly 50 percent of the revenue in  FPGAs comes from high end, high density, costly FPGAs like the Xilinx Vertex 7 and Altera Stratix V. Segments like wireless and wireline packet processing, plus financial or image processing algorithm processors increasingly rely on these kinds of FPGAs in lieu of ASICs, GPGPUs, or proprietary network processors. So every advantage in IP, process technology, or partnership that Altera has, gets the company one step closer to more design wins.

We’ll see what Altera does with all of these recent announcements. I’d expect to see something shake loose before the traditional “summer doldrums” set in when the semiconductor industry goes on its annual vacation next month in July.