AMD Targets Embedded Graphics

As the PC market flounders, AMD continues focus on embedded, this time with three (3) new GPU families.

The widescreen LCD digital sign at my doctor’s office tells me today’s date, that it’s flu season, and that various health maintenance clinics are available if only I’d sign up. I feel guilty every time.

An electronic digital sign, mostly text based. (Courtesy: Wikimedia Commons.)

An electronic digital sign, mostly text based. (Courtesy: Wikimedia Commons.)

These kind of static, text-only displays are not the kind of digital sign that GPU powerhouses like AMD are targeting. Microsoft Windows-based text running in an endless loop requires no graphics or imaging horsepower at all.

Instead, high performance is captured in those Minority Report multimedia messages that move with you across multiple screens down a hallway; the immersive Vegas-style electronic gaming machines that attract senior citizens like moths to a flame; and the portable ultrasound machine that gives a nervous mother the first images of her baby in HD. These are the kinds of embedded systems that need high-performance graphics, imaging, and encode/decode hardware.

AMD announced three new embedded graphics families, spanning low power (4 displays) ranging up to 6 displays and 1.5 TFLOPs of number crunching for high-end GPU graphics processing.

AMD announced three new embedded graphics families, spanning low power (4 displays) ranging up to 6 displays and 1.5 TFLOPs of number crunching for high-end GPU graphics processing.

Advanced Micro Devices wants you to think of their GPUs for your next embedded system.

AMD just announced a collection of three new embedded graphics processor families using 28nm process technology designed to span the gamut from multi-display and low power all the way up to a near doubling of performance at the high end.  Within each new family, AMD is looking to differentiate from the competition at both the chip- and module/board-level. Competition comes mostly from Nvidia discrete GPUs, although some Intel processors and ARM-based SoCs cross paths with AMD. As well, AMD is pushing its roadmap quickly away from previous generation 40nm GPU devices.

Comparison between AMD 40nm and 28nm embedded GPUs.

Comparison between AMD 40nm and 28nm embedded GPUs.

A Word about Form Factors

Sure, AMD’s got PC-card plug-in boards in PCI Express format—long ones, short ones, and ones with big honking heat sinks and fans and plenty of I/O connections. AMD’s high-end embedded GPUs like the new E8870 Series are available on PCIe and boast up to 1500 GFLOPs (single precision) and 12 Compute Units. They’ll drive up to 6 displays and burn up to 75W of power without an on-board fan, yet since they’re on AMD’s embedded roadmap—they’ll be around for 5 years.

An MXM (Mobile PCIe Module) format PCB containing AMD’s mid-grade E8950 GPU.

An MXM (Mobile PCIe Module) format PCB containing AMD’s mid-grade E8950 GPU.

Compared to AMD’s previous embedded E8860 Series, the E8870 has 97% more 3DMark 11 performance when running from 4GB of onboard memory. Interestingly, besides the PCIe version—which might only be considered truly “embedded” when plugged into a panel PC or thin client machine—AMD also supports the MXM format.  The E8870 will be available on the Type B Mobile PCI Express Module (MXM) that’s a mere 82mm x 105mm and complete with memory, GPU, and ancillary ICs.

Middle of the Road

For more of a true embedded experience, AMD’s E8950MXM still drives 6 displays and works with AMD’s EyeFinity capability of stitching multiple displays together in Jumbotron fashion. Yet the 3000 GFLOPs (yes, that’s 3000 GFLOPs peak, single precision) little guy still has 32 Compute Units, 8 GB of GPU memory, and is optimized for 4K (UHD) code/decoding. If embedded 4K displays are your thing, this is the GPU you need.

Hardly middle of the road, right? Depending upon the SKU, this family can burn up to 95W and is available exclusively on one of those MXM modules described above. In embedded version, the E8950 is available for 3 years (oddly, two fewer than the others).

Low Power, No Compromises

Yet not every immersive digital sign, MRI machine, or arcade console needs balls-to-the-wall graphics rendering and 6 displays. For this reason, AMD’s E6465 series focuses on low power and small form factor (SFF) footprint. Able to drive 4 displays and having a humble 2 Compute Units, the series still boasts 192 GFLOPs (single precision), 2 GB of GPU memory, 5 years of embedded life, but consumes a mere 20W.

The E6465 is available in PCIe, MXM (the smaller Type A size at 82mm x 70mm), and a multichip module. The MCM format really looks embedded, with the GPU and memory all soldered on the same MCM substrate for easier design-in onto SFFs and other board-level systems.

More Than Meets the Eye

While AMD is announcing three new embedded GPU families, it’s easy to think the story stops with the GPU itself. It doesn’t. AMD doesn’t get nearly enough recognition for the suite of graphics, imaging, and heterogeneous processing software available for these devices.

For example, in mil/aero avionics systems AMD has a few design wins in glass cockpits such as with Airbus. Some legacy mil displays don’t always follow standard refresh timing, so the new embedded GPU products support custom timing parameters. Clocks like Timing Standard, Front Porch, Refresh Rate and even Pixel Clocks are programmable—ideal for the occasional non-standard military glass cockpit.

AMD is also a strong supporter of OpenCL and OpenGL—programming and graphics languages that ease programmers’ coding efforts. They also lend themselves to creating DO-254 (hardware) and DO-178C (software) certifiable systems, such as those found in Airbus military airframes. Airbus Defence has selected AMD graphics processors for next-gen avionics displays.

Avionics glass cockpits, like this one from Airbus, are prime targets for high-end embedded graphics. AMD has a design win in one of Airbus' systems.

Avionics glass cockpits, like this one from Airbus, are prime targets for high-end embedded graphics. AMD has a design win in one of Airbus’ systems.

Finally, AMD is the founding member of the HSA Foundation, an organization that has released heterogeneous system standard (HSA) version 1.0, also designed to make programmers’ jobs way easier when using multiple dissimilar “compute engines” in the same system. Companies like ARM, Imagination, MediaTek and others are HSA Foundation supporters.



What’s the Nucleus of Mentor’s Push into Industrial Automation?

Mentor’s once nearly-orphaned Nucleus RT forms the foundation of a darned impressive software suite for controlling meat packing or nuclear power plants.

GlassesEveryone appreciates an underdog—the pale, wimpy kid with glasses and brown polyester sweater who gets routinely beaten up by the popular boys—but sticks it out day after day and eventually grows up to create a tech start-up everyone loves. (Part of this story is my personal history; I’ll let you guess which part.)

So it is with Mentor’s Nucleus RTOS, which the company announced forms the basis for the recent initiative into Industrial Automation (I.A.). Announced this week at the ARC Industry Forum in Orlando is Mentor’s “Embedded Solution for Industrial Automation” (Figure 1).  A cynic might look at this figure as a collection of existing Mentor products…slightly rearranged to make a compelling argument for a “solution” in the I.A. space.  That skinny kid Nucleus is right there, listed on the diagram. Oh, how many times have I asked Mentor why they keep Nucleus around only to get beaten up by the big RTOS kids!

Figure 1: Mentor’s Industrial Automation Solution for embedded, IoT-enabled systems relies on the Nucleus RTOS, including a secure hypervisor and enhanced security infrastructure.

Figure 1: Mentor’s Industrial Automation Solution for embedded, IoT-enabled systems relies on the Nucleus RTOS, including a secure hypervisor and enhanced security infrastructure. 

After all, you’ll recognize Mentor’s Embedded Linux, the Nucleus RTOS I just mentioned, and the company’s Sourcery debug/analyzer/IDE product suite. All of these have been around for a while, although Nucleus is the grown-up kid in this bunch. (Pop quiz: True or False…Did all three of these products came from Mentor acquisitions? Bonus question: From what company(ies)?)

Into this mix, Mentor is adding new security tools from our friends at Icon Labs, plus hooks to a hot new automation GUI/HMI called Qt. (Full disclosure: Icon Labs founder Alan Grau is one of our security bloggers; however, we were taken by surprise at this recent Mentor announcement!)

Industry 4.0: I.A. meets IoT

According to Mentor’s Director of Product Management for Runtime Solutions, Warren Kurisu (whose last name is pronounced just like my first name in Japanese: Ku-ri-su), I.A. is gaining traction, big time. There’s a term for it: “Industry 4.0”. The large industrial automation vendors—like GE, Siemens, Schneider Electric, and others—have long been collecting factory data and feeding it into the enterprise, seeking to reduce costs, increase efficiency, and tie systems into the supply chain. Today, we call this concept the Internet of Things (IoT) and Industry 4.0 is basically the promise of interoperability between currently bespoke (and proprietary) I.A. systems with smart, connected IoT devices plus a layer of cyber security thrown in.

Mentor’s Kurisu points out that what’s changed is not only the kinds of devices that will connect into I.A. systems, but how they’ll connect in more ways than via serial SCADA or FieldBus links. Industrial automation will soon include all the IoT pipes we’re reading about: Wi-Fi, Bluetooth LE, various mesh topologies, Ethernet, cellular—basically whatever works and is secure.

The Skinny Kid Prevails

Herein lies the secret of Mentor’s Industrial Automation Solution. It just so happens the company has most of what you’d need to connect legacy I.A. systems to the IoT, plus add new kinds of smart embedded sensors into the mix. What’s driving the whole market is cost. According to a recent ARC survey, reduced downtime, improved process performance, reduced  machine lifecycle costs—all of these, and more, are leading I.A. customers and vendors to upgrade their factories and systems.

Additionally, says Mentor’s Kurisu, having the ability to consolidate multiple pieces of equipment, reduce power, improve safety, and add more local, operator-friendly graphics are criteria for investing in new equipment, sensors, and systems.

Mentor brings something to the party in each of these areas:

- machine or system convergence, either by improved system performance or reduced footprint

- capabilities and differentiation, allowing I.A. vendors to create systems different from “the other guys”

- faster time-to-money, done through increased productivity, system design and debug, or anything to reduce the I.A. vendor’s and their customer’s efforts.

Graphic - Industrial Automation Flow

Figure 2: Industrial automation a la Mentor. The embedded pieces rely on Nucleus RTOS, or variations thereof. New Qt software for automation GUI’s plus security gateways from Icon Labs bring security and IoT into legacy I.A. installations.

Figure 2 sums up the Mentor value proposition, but notice how most of the non-enterprise blocks in the diagram are built upon the Nucleus RTOS.

Nucleus, for example, has achieved safety certification by TÜV SÜD complete with artifacts (called Nucleus SafetyCert). Mentor’s Embedded Hypervisor—a foundational component of some versions of Nucleus—can be used to create a secure partitioned environment for either multicore or multiple processors (heterogeneous or homogeneous), in which to run multiple operating systems which won’t cross-pollute in the event of a virus or other event.

New to the Mentor offering is an industry-standard Qt GUI running on Linux, or Qt optimized for embedded instantiations running on—wait for it—Nucleus RTOS. Memory and other performance optimizations reduce the footprint, boot faster, and there are versions now for popular IoT processors such as ARM’s Cortex-Mx cores.

Playground Victory: The Take-away

So if the next step in Industrial Automation is Industry 4.0—the rapid build-out of industrial systems reducing cost, adding IoT capabilities with secure interoperability—then Mentor has a pretty compelling offering. That consolidation and emphasis on low power I mentioned above can be had for free via capabilities already build into Nucleus.

For example, embedded systems based on Nucleus can intelligently turn off I/O and displays and even rapidly drive multicore processors into their deepest sleep modes. One example explained to me by Mentor’s Kurisu showed an ARM-based big.LITTLE system that ramped performance when needed but kept the power to a minimum. This is possible, in part, by Mentor’s power-aware drivers for an entire embedded I.A. system under the control of Nucleus.

And  in the happy ending we all hope for, it looks like the maybe-forgotten Nucleus RTOS—so often ignored by editors like me writing glowingly about Wind River’s VxWorks or Green Hill’s INTEGRITY—well, maybe Nucleus has grown up.  It’s the RTOS ready to run the factory of the future. Perhaps your electricity is right now generated under the control of the nerdy little RTOS that made it big.

PCI Express Switch: the “Power Strip” of IC Design

Need more PCIe channels in your next board design? Add a PCIe switch for more fanout.

Editor’s notes:

1. Despite the fact that Pericom Semiconductor sponsors this particular blog post, your author learns that he actually knows very little about the complexities of PCIe.

2. Blog updated 3-27-14 to correct the link to Pericom P/N PI7C9X2G303EL.

Perhaps you’re like me; power cords everywhere. Anyone who has more than one mobile doodad—from smartphone to iPad to Kindle and beyond—is familiar with the ever-present power strip.

An actual power strip from under my desk. Scary...

An actual power strip from under my desk. Scary…

The power strip is a modern version of the age-old extension cord: it expands one wall socket into three, five or more.  Assuming there’s enough juice (AC amperage) to power it all, the power strip meets our growing hunger for more consumer devices (or rather: their chargers).


And so it is with IC design. PCI Express Gen 2 has become the most common interoperable, on-board way to add peripherals such as SATA ports, CODECs, GPUs, WiFi chipsets, USB hubs and even legacy peripherals like UARTs. The wall socket analogy applies here too: most new CPUs, SoCs, MCUs or system controllers lack sufficient PCI Express (PCIe) ports for all the peripheral devices designers need. Plus, as IC geometries shrink, system controllers also have lower drive capability per PCIe port and signals degrade rather quickly.

The solution to these host controller problems is a PCIe switch to increase fanout by adding two, three, or even eight additional PCIe ports with ample per-lane current sourcing capability.

Any Port in a Storm?

While our computers and laptops strangle everything in sight with USB cables, inside those same embedded boxes it’s PCIe as the routing mechanism of choice. Just about any standalone peripheral a system designer could want is available with a PCIe interface. Even esoteric peripherals—such as 4K complex FFT, range-finding, or OFDM algorithm IP blocks—usually come with a PCIe 2.0 interface.

Too bad then that modern device/host controllers are painfully short on PCIe ports. I did a little Googling and found that if you choose an Intel or AMD CPU, you’re in good shape. A 4th Gen Intel Core i7 with Intel 8 Series Chipset has six PCIe 2.0 ports spread across 12 lanes. Wow. Similarly, an AMD A10 APU has four PCIe (1x as x4, or 4x as x1). But these are desktop/laptop processors and they’re not so common in embedded.

AMD’s new G-Series SoC for embedded is an APU with a boatload of peripherals and it’s got only one PCIe Gen 2 port (x4). As for Intel’s new Bay Trail-based Atom processors running the latest red-hot laptop/tablet 2:1’s:  I couldn’t find an external PCIe port on the block diagram.

Similarly…Qualcomm Snapdragon 800? Nvidia Tegra 4 or even the new K1? Datasheets on these devices are closely held for customers only but I found Developer References that point to at best one PCIe port. ARM-based Freescale processors such as the i.MX6, popular in set-top boxes from Comcast and others have one lone PCIe 2.0 port (Figure 1).

What to do if a designer wants to add more PCIe-based stuff?

Figure 1: Freescale i.MX ARM-based CPU is loaded with peripheral I/O, yet has only one PCIe 2.0 port. (Courtesy: Freescale Semiconductor.)

Figure 1: Freescale i.MX ARM-based CPU is loaded with peripheral I/O, yet has only one PCIe 2.0 port. (Courtesy: Freescale Semiconductor.)

‘Mo Fanout

A PCIe switch solves the one-to-many dilemma. Add in a redriver at the Tx and Rx end, and signal integrity problems over long traces and connectors all but disappear. Switches from companies like Pericom come in many flavors, from simple lane switches that are essentially PCIe muxes, to packet switches with intelligent routing functions.

One simple example of a Pericom PCIe switch is the PI7C9X2G303EL. This PCIe 2.0 three port/three lane switch has one x1 Up and two x1 Down and would add two ports to the i.MX6 shown in Figure 1. This particular device, aimed at those low power consumer doodads I mentioned earlier, boasts some advanced power saving modes and consumes under 0.7W.

Hook Me Up

Upon researching this for Pericom, I was surprised to learn of all the nuances and variables to consider with PCIe switches. I won’t cover them here, other than mentioning some of the designer’s challenges: PCIe Gen 1 vs Gen 2, data packet routing, latency, CRC verification (for QoS), TLP layer inspection, auto re-send, and so on.

It seems that PCIe switches seem to come in all flavors, from the simplest “power strip”, to essentially an intelligent router-on-a-chip. And for maximum interoperability, of them need to be compliant to the PCI-SIG specs as verified by a plugfest.

So if you’re an embedded designer, the solution to your PCIe fanout problem is adding a PCI Express switch.