AMD Targets Embedded Graphics

As the PC market flounders, AMD continues focus on embedded, this time with three (3) new GPU families.

The widescreen LCD digital sign at my doctor’s office tells me today’s date, that it’s flu season, and that various health maintenance clinics are available if only I’d sign up. I feel guilty every time.

An electronic digital sign, mostly text based. (Courtesy: Wikimedia Commons.)

An electronic digital sign, mostly text based. (Courtesy: Wikimedia Commons.)

These kind of static, text-only displays are not the kind of digital sign that GPU powerhouses like AMD are targeting. Microsoft Windows-based text running in an endless loop requires no graphics or imaging horsepower at all.

Instead, high performance is captured in those Minority Report multimedia messages that move with you across multiple screens down a hallway; the immersive Vegas-style electronic gaming machines that attract senior citizens like moths to a flame; and the portable ultrasound machine that gives a nervous mother the first images of her baby in HD. These are the kinds of embedded systems that need high-performance graphics, imaging, and encode/decode hardware.

AMD announced three new embedded graphics families, spanning low power (4 displays) ranging up to 6 displays and 1.5 TFLOPs of number crunching for high-end GPU graphics processing.

AMD announced three new embedded graphics families, spanning low power (4 displays) ranging up to 6 displays and 1.5 TFLOPs of number crunching for high-end GPU graphics processing.

Advanced Micro Devices wants you to think of their GPUs for your next embedded system.

AMD just announced a collection of three new embedded graphics processor families using 28nm process technology designed to span the gamut from multi-display and low power all the way up to a near doubling of performance at the high end.  Within each new family, AMD is looking to differentiate from the competition at both the chip- and module/board-level. Competition comes mostly from Nvidia discrete GPUs, although some Intel processors and ARM-based SoCs cross paths with AMD. As well, AMD is pushing its roadmap quickly away from previous generation 40nm GPU devices.

Comparison between AMD 40nm and 28nm embedded GPUs.

Comparison between AMD 40nm and 28nm embedded GPUs.

A Word about Form Factors

Sure, AMD’s got PC-card plug-in boards in PCI Express format—long ones, short ones, and ones with big honking heat sinks and fans and plenty of I/O connections. AMD’s high-end embedded GPUs like the new E8870 Series are available on PCIe and boast up to 1500 GFLOPs (single precision) and 12 Compute Units. They’ll drive up to 6 displays and burn up to 75W of power without an on-board fan, yet since they’re on AMD’s embedded roadmap—they’ll be around for 5 years.

An MXM (Mobile PCIe Module) format PCB containing AMD’s mid-grade E8950 GPU.

An MXM (Mobile PCIe Module) format PCB containing AMD’s mid-grade E8950 GPU.

Compared to AMD’s previous embedded E8860 Series, the E8870 has 97% more 3DMark 11 performance when running from 4GB of onboard memory. Interestingly, besides the PCIe version—which might only be considered truly “embedded” when plugged into a panel PC or thin client machine—AMD also supports the MXM format.  The E8870 will be available on the Type B Mobile PCI Express Module (MXM) that’s a mere 82mm x 105mm and complete with memory, GPU, and ancillary ICs.

Middle of the Road

For more of a true embedded experience, AMD’s E8950MXM still drives 6 displays and works with AMD’s EyeFinity capability of stitching multiple displays together in Jumbotron fashion. Yet the 3000 GFLOPs (yes, that’s 3000 GFLOPs peak, single precision) little guy still has 32 Compute Units, 8 GB of GPU memory, and is optimized for 4K (UHD) code/decoding. If embedded 4K displays are your thing, this is the GPU you need.

Hardly middle of the road, right? Depending upon the SKU, this family can burn up to 95W and is available exclusively on one of those MXM modules described above. In embedded version, the E8950 is available for 3 years (oddly, two fewer than the others).

Low Power, No Compromises

Yet not every immersive digital sign, MRI machine, or arcade console needs balls-to-the-wall graphics rendering and 6 displays. For this reason, AMD’s E6465 series focuses on low power and small form factor (SFF) footprint. Able to drive 4 displays and having a humble 2 Compute Units, the series still boasts 192 GFLOPs (single precision), 2 GB of GPU memory, 5 years of embedded life, but consumes a mere 20W.

The E6465 is available in PCIe, MXM (the smaller Type A size at 82mm x 70mm), and a multichip module. The MCM format really looks embedded, with the GPU and memory all soldered on the same MCM substrate for easier design-in onto SFFs and other board-level systems.

More Than Meets the Eye

While AMD is announcing three new embedded GPU families, it’s easy to think the story stops with the GPU itself. It doesn’t. AMD doesn’t get nearly enough recognition for the suite of graphics, imaging, and heterogeneous processing software available for these devices.

For example, in mil/aero avionics systems AMD has a few design wins in glass cockpits such as with Airbus. Some legacy mil displays don’t always follow standard refresh timing, so the new embedded GPU products support custom timing parameters. Clocks like Timing Standard, Front Porch, Refresh Rate and even Pixel Clocks are programmable—ideal for the occasional non-standard military glass cockpit.

AMD is also a strong supporter of OpenCL and OpenGL—programming and graphics languages that ease programmers’ coding efforts. They also lend themselves to creating DO-254 (hardware) and DO-178C (software) certifiable systems, such as those found in Airbus military airframes. Airbus Defence has selected AMD graphics processors for next-gen avionics displays.

Avionics glass cockpits, like this one from Airbus, are prime targets for high-end embedded graphics. AMD has a design win in one of Airbus' systems.

Avionics glass cockpits, like this one from Airbus, are prime targets for high-end embedded graphics. AMD has a design win in one of Airbus’ systems.

Finally, AMD is the founding member of the HSA Foundation, an organization that has released heterogeneous system standard (HSA) version 1.0, also designed to make programmers’ jobs way easier when using multiple dissimilar “compute engines” in the same system. Companies like ARM, Imagination, MediaTek and others are HSA Foundation supporters.

 

 

A Sign of the Times

AMD’s FirePro series lights up Godzilla-sized Times Square digital sign.

[Editor's note: blog updated 8-18-15 to remove "Radeon" and make other corrections.]

They say the lights are bright on Broadway, and they ain’t kidding.  A new AMD-powered digital sign makes a stadium Jumbotron look small.

I’ve done a few LAN parties and appreciate an immersive, high-res graphics experience. But nothing could have prepared me for the whopping 25,000 square feet of graphics in Times Square powered by AMD’s FirePro series (1535 Broadway, between 45th and 46th Streets).

The UltraHD media wall is the ultimate digital sign, comprising the equivalent of about 24 million RGB LED pixels. The media wall is a full city block long by 8 stories high! Designed and managed by Diversified Media Group, the sign is thought to be the largest of its kind in the world, and certainly the largest in the U.S.

AMD-powered digital sign will soon grace Times Square, boasting America's largest digital sign.

Three AMD FirePro UltraHD graphics cards drive the largest digital sign in the world.
This view of Times Square shows the commercial importance of high-res digital signs. [1 Times Square night 2013, by Chensiyuan; Licensed under GFDL via Wikimedia Commons.]

The combined 10,048 x 2,368 pixel “display” is powered by a mere three AMD FirePro graphics cards. Each card drives six sections of the overall display wall. The whole UHD experience is so realistic because of AMD’s Graphics Core Next architecture that executes billions of operations in parallel per cycle.

The Diversified Media Group’s Times Square digital sign is powered by AMD FirePro graphics, shown here under construction. [Courtesy: Diversified Media Group.]

The Diversified Media Group’s Times Square digital sign is powered by AMD FirePro graphics, shown here under construction. [Courtesy: Diversified Media Group.]

AMD’s well-proven EyeFinity capability sends partitioned images to various display zones (up to six), all coordinated across the three graphics cards using the FirePro S400 synchronization module.

The FirePro graphics family was introduced at NAB2014 specifically for high-res, media intensive applications like this. There’s 16 GB of GDDR5 memory, PCIe 3.0 for high-speed IO, and the 28nm process technology used in the Graphics  Core Next architecture balances 3D rendering with GPGPU computation. It all adds up to the performance needed for the Times Square “mombo-tron” skyscraper display.

Only three AMD’s W600 FirePro graphics cards like these power America’s largest digital sign in Times Square.

Only three AMD’s W600 FirePro graphics cards like these power America’s largest digital sign in Times Square.

According to the New York Times, approximately 300,000 people each day will see the sign, advertising that might sell for as much as $2.5 million for four weeks–certainly some pretty expensive real estate, even for NYC. So the sign must look astounding and work flawlessly.

This blog was sponsored by AMD.

 

New HSA Spec Legitimizes AMD’s CPU+GPU Approach

After nearly 3 years since the formation of the Heterogeneous System Architecture (HSA) Foundation, the consortium releases 1.0 version of the Architecture Spec, Programmer’s Reference Manual, Runtime Specification and a Conformance Plan.

Note: This blog is sponsored by AMD.

HSA banner

 

UPDATE 3/17/15: Added Imagination Technologies as one of the HSA founders. C2

No one doubts the wisdom of AMD’s Accelerated Processing Unit (APU) approach that combines x86 CPU with a Radeon graphic GPU. Afterall, one SoC does it all—makes CPU decisions and drives multiple screens, right?

True. Both AMD’s G-Series and the AMD R-Series do all that, and more. But that misses the point.

In laptops this is how one uses the APU, but in embedded applications—like the IoT of the future that’s increasingly relying on high performance embedded computing (HPEC) at the network’s edge—the GPU functions as a coprocessor. CPU + GPGPU (general purpose graphics processor unit) is a powerful combination of decision-making plus parallel/algorithm processing that does local, at-the-node processing, reducing the burden on the cloud. This, according to AMD, is how the IoT will reach tens of billions of units so quickly.

Trouble is, HPEC programming is difficult. Coding the GPU requires a “ninja programmer”, as quipped AMD’s VP of embedded Scott Aylor during his keynote at this year’s Embedded World Conference in Germany. (Video of the keynote is here.) Worse still, capitalizing on the CPU + GPGPU combination requires passing data between the two architectures which don’t share a unified memory architecture. (It’s not that AMD’s APU couldn’t be designed that way; rather, the processors require different memory architectures for maximum performance. In short: they’re different for a reason.)

AMD’s Scott Aylor giving keynote speech at Embedded World, 2015. His message: some IoT nodes demand high-performance heterogeneous computing at the edge.

AMD’s Scott Aylor giving keynote speech at Embedded World, 2015. His message: some IoT nodes demand high-performance heterogeneous computing at the edge.

AMD realized this limitation years ago and in 2012 catalyzed the HSA Foundation with several companies including ARM, Texas Instruments, Imagination Technology, MediaTek, Qualcomm, Samsung and others. The goal was to create a set of specifications that define heterogeneous hardware architectures but also create an HPEC programming paradigm for CPU, GPU, DSP and other compute elements. Collectively, the goal was to make designing, programming, and power optimizing easy for heterogeneous SoCs (Figure).

Heterogeneous systems architecture (HSA) specifications version 1.0 by the HSA Foundation, March 2015.

The HSA Foundation’s goals are realized by making the coder’s job easier using tools—such as an HSA version LLVM open source compiler—that integrates multiple cores’ ISAs. (Courtesy: HSA Foundation; all rights reserved.) Heterogeneous systems architecture (HSA) specifications version 1.0 by the HSA Foundation, March 2015.

After three years of work, the HSA Foundation just released their specifications at version 1.0:

  • HSA System Architecture Spec: defines H/W, OS requirements, memory model (important!), signaling paradigm, and fault handling.
  • Programmers Reference Guide: essentially a virtual ISA for parallel computing, defines an output format for HSA language compilers.
  • HSA Runtime Spec: is an application library for running HSA applications; defines INIT, user queues, memory management.

With HSA, the magic really does happen under the hood where the devil’s in the details. For example, the HSA version LLVM open source compiler creates a vendor-agnostic HSA intermediate language (HSAIL) that’s essentially a low-level VM. From there, “finalizers” compile into vendor-specific ISAs such as AMD or Qualcomm Snapdragon. It’s at this point that low-level libraries can be added for specific silicon implementations (such as VSIPL for vector math). This programming model uses vendor-specific tools but allows novice programmers to start in C++ but end up with optimized, performance-oriented, and low-power efficient code for the heterogeneous combination of CPU+GPU or DSP.

There are currently 43 companies involved with HSA, 16 universities, and three working groups (and they’re already working on version 1.1). Look at the participants, think of their market positions, and you’ll see they have a vested interest in making this a success.

In AMD’s case, as the only x86 and ARM + GPU APU supplier to the embedded market, the company sees even bigger successes as more embedded applications leverage heterogeneous parallel processing.

One example where HSA could be leveraged, said Phil Rogers, President of the HSA Foundation, is for multi-party video chatting. An HSA-compliant heterogeneous architecture would allow the processors to work in a single (virtual) memory pool and avoid the multiple data set copies—and processor churn—prevalent in current programming models.

With key industry players supporting HSA including AMD, ARM, Imagination Technologies, Samsung, Qualcomm, MediaTek and others, a lot of x86, ARM, and MIPS-based SoCs are likely to be compliant with the specification. That should kick off a bunch of interesting software development leading to a new wave of high performance applications.

Virtual, Immersive, Interactive: Performance Graphics and Processing for IoT Displays

Vending machines outside Walmart

Current-gen machines like these will give way to smart, IoT connected machines with 64-bit graphics and virtual reality-like customer interaction.

Not every IoT node contains a low-performance processor, sensor and slow comms link. Sure, there may be tens of billions of these, but estimates by IHS, Gartner, Cisco still infer the need for billions of smart IoT nodes with hefty processing needs. These intelligent IoT platforms are best left to 64-bit algorithm processors like AMD’s G-and R-Series of Accelerated Processing Units (APU). AMD’s claim to fame is 64-bit cores combined with on-board Radeon graphics processing units (GPU) and tons of I/O.

As an example, consider this year’s smart vending machine. It may dispense espresso or electronic toys, or maybe show the customer wearing virtual custom-fit clothing. Suppose the machine showed you–at that very moment–using or drinking the product in the machine you were just starting at seconds before.

Far fetched? Far from it. It’s real.

These machines require a multi-media, sensor fusion experience. Multiple iPad-like touch screens may present high-def product options while cameras track customers’ eye movements, facial expressions, and body language in three-space.

This “visual compute” platform will tailor the display information to best interact with the customer in an immersive, gesture-sort of experience. Fusing all these inputs, processing the data in real-time, and driving multiple displays is best handled by 64-bit APUs with closely-coupled CPU and GPU execution units, hardware acceleration, and support for standards like DirectX 11, HSA 1.0, OpenGL and OpenCL.

For heavy lifting in visual compute-intensive IoT platforms, keep an eye on AMD’s graphics-ready APUs.

If you are attending Embedded World February 24-26, be sure to check out the keynote Heterogeneous Computing for an Internet of Things World,” by Scott Aylor, Corporate VP and General Manager, AMD Embedded Solutions on Wednesday the 25th at 9:30.

This blog was sponsored by AMD.

PCI Express Switch: the “Power Strip” of IC Design

Need more PCIe channels in your next board design? Add a PCIe switch for more fanout.

Editor’s notes:

1. Despite the fact that Pericom Semiconductor sponsors this particular blog post, your author learns that he actually knows very little about the complexities of PCIe.

2. Blog updated 3-27-14 to correct the link to Pericom P/N PI7C9X2G303EL.

Perhaps you’re like me; power cords everywhere. Anyone who has more than one mobile doodad—from smartphone to iPad to Kindle and beyond—is familiar with the ever-present power strip.

An actual power strip from under my desk. Scary...

An actual power strip from under my desk. Scary…

The power strip is a modern version of the age-old extension cord: it expands one wall socket into three, five or more.  Assuming there’s enough juice (AC amperage) to power it all, the power strip meets our growing hunger for more consumer devices (or rather: their chargers).

 

And so it is with IC design. PCI Express Gen 2 has become the most common interoperable, on-board way to add peripherals such as SATA ports, CODECs, GPUs, WiFi chipsets, USB hubs and even legacy peripherals like UARTs. The wall socket analogy applies here too: most new CPUs, SoCs, MCUs or system controllers lack sufficient PCI Express (PCIe) ports for all the peripheral devices designers need. Plus, as IC geometries shrink, system controllers also have lower drive capability per PCIe port and signals degrade rather quickly.

The solution to these host controller problems is a PCIe switch to increase fanout by adding two, three, or even eight additional PCIe ports with ample per-lane current sourcing capability.

Any Port in a Storm?

While our computers and laptops strangle everything in sight with USB cables, inside those same embedded boxes it’s PCIe as the routing mechanism of choice. Just about any standalone peripheral a system designer could want is available with a PCIe interface. Even esoteric peripherals—such as 4K complex FFT, range-finding, or OFDM algorithm IP blocks—usually come with a PCIe 2.0 interface.

Too bad then that modern device/host controllers are painfully short on PCIe ports. I did a little Googling and found that if you choose an Intel or AMD CPU, you’re in good shape. A 4th Gen Intel Core i7 with Intel 8 Series Chipset has six PCIe 2.0 ports spread across 12 lanes. Wow. Similarly, an AMD A10 APU has four PCIe (1x as x4, or 4x as x1). But these are desktop/laptop processors and they’re not so common in embedded.

AMD’s new G-Series SoC for embedded is an APU with a boatload of peripherals and it’s got only one PCIe Gen 2 port (x4). As for Intel’s new Bay Trail-based Atom processors running the latest red-hot laptop/tablet 2:1’s:  I couldn’t find an external PCIe port on the block diagram.

Similarly…Qualcomm Snapdragon 800? Nvidia Tegra 4 or even the new K1? Datasheets on these devices are closely held for customers only but I found Developer References that point to at best one PCIe port. ARM-based Freescale processors such as the i.MX6, popular in set-top boxes from Comcast and others have one lone PCIe 2.0 port (Figure 1).

What to do if a designer wants to add more PCIe-based stuff?

Figure 1: Freescale i.MX ARM-based CPU is loaded with peripheral I/O, yet has only one PCIe 2.0 port. (Courtesy: Freescale Semiconductor.)

Figure 1: Freescale i.MX ARM-based CPU is loaded with peripheral I/O, yet has only one PCIe 2.0 port. (Courtesy: Freescale Semiconductor.)

‘Mo Fanout

A PCIe switch solves the one-to-many dilemma. Add in a redriver at the Tx and Rx end, and signal integrity problems over long traces and connectors all but disappear. Switches from companies like Pericom come in many flavors, from simple lane switches that are essentially PCIe muxes, to packet switches with intelligent routing functions.

One simple example of a Pericom PCIe switch is the PI7C9X2G303EL. This PCIe 2.0 three port/three lane switch has one x1 Up and two x1 Down and would add two ports to the i.MX6 shown in Figure 1. This particular device, aimed at those low power consumer doodads I mentioned earlier, boasts some advanced power saving modes and consumes under 0.7W.

Hook Me Up

Upon researching this for Pericom, I was surprised to learn of all the nuances and variables to consider with PCIe switches. I won’t cover them here, other than mentioning some of the designer’s challenges: PCIe Gen 1 vs Gen 2, data packet routing, latency, CRC verification (for QoS), TLP layer inspection, auto re-send, and so on.

It seems that PCIe switches seem to come in all flavors, from the simplest “power strip”, to essentially an intelligent router-on-a-chip. And for maximum interoperability, of them need to be compliant to the PCI-SIG specs as verified by a plugfest.

So if you’re an embedded designer, the solution to your PCIe fanout problem is adding a PCI Express switch. 

Does Altera Have “Big Data” Communications on the Brain?

In wireless, wireline and financial “big data” applications, moving all those packets needs prodigious FPGA resources, not all of which Altera had before their recent series of acquisitions, partnerships, and otherwise wheeling-and-dealing.

Chris Balough of Altera (left) interviewed by Andy Frame from ARM. (Courtesy: YouTube.)

Chris Balough of Altera (left) interviewed by Andy Frame from ARM. (Courtesy: YouTube.)

I caught up with an old friend at April’s DESIGN West 2013 conference in San Jose: Chris Balough, Sr Director, Product Marketing for SoC products. I knew Chris from when he was at Triscend (purchased by Xilinx). Chris is now in charge of Altera’s SoC products which are Arria V, Stratix V and Cyclone FPGAs with ARM cores in them which compete with Xilinx’s Zynq devices. Chris shed some light on some of these announcements, but remained mum on what they all might mean taken collectively. I think they add up to something big in “Big Data”.

(Fun facts: Altera’s first “SoC” was Excalibur, no longer recommended for new designs. Altera’s most popular SoC processor is the soft Nios II, sold in roughly 30 percent of production SoCs, says Balough.)

X before A? We’ll See

Subconsciously I think of Xilinx first when the word “FPGA” is flashed in front of me, but Altera’s the company pushing more boundaries of late. Their rat-a-tat machine gun announcements this year got my attention.

In the summer of 2012, I did an interview with Altera’s Sr VP of R&D Brad Howe and he spread out as much of the roadmap on the table as he could. Things like HSA, OpenCL, and better gigabit transceivers were all on the horizon.  Shortly thereafter, Altera extended their  relationship with TSMC to 20nm for Arria and Cyclone FPGAs. Then in early 2013, they rocked the industry by locking up an exclusive FPGA relationship with Intel for the industry’s only production 14nm tri-gate FinFETs.

Spring Cleaning ; Altera’s Getting Ready For…?

Now in Spring 2013, Altera is making headlines like these:

- FPGA Design in the Cloud–Try It, You’ll Like It, Says Plunify. At DAC, Altera and Plunify are pushing cloud-based FPGA design tools. (See our February 2013 article with Plunify here.)

- Altera and AppliedMicro will Cooperate on Joint Solutions for High Growth Data Center Market.  Combines Stratix FPGAs and AppliedMicro’s Server on a Chip devices targeting data centers and optical transport networks (OTN).

- Altera Expands OTN Solution Capabilities with Acquisition of TPACK. Altera buys TPAK from AMCC to provide IP for FPGAs used in OTN for tasks like cross-bar switches used in 10/40/100Gbps PHYs.

- Altera Stratix V GX FPGAs Achieve PCIe Gen3 Compliance and Listing on PCI-SIG Integrators List. Right now, Gen2 and Gen3 PCIe is critical to data centers, cellular base stations, and all manner of high-speed long-haul/back-haul telco gear. Within 12 months, PCIe Gen2/3 will be “table stakes” in all manner of high-performance embedded systems like ATCA- or VME/VPX-based DSP systems for radar, sonar, SIGINT (signals intelligence) or data mining.

- Altera to Deliver Breakthrough Power Solutions for FPGAs with Acquisition of Power Technology Innovator Enpirion.  Maybe Enpirion’s DC-DC converter PowerSoCs with integrated inductors may some day end up inside a Stratix package (perhaps like Xilinx’s stacked chip interposer technology), but for now the two-chip solution reduces board space by 1/7 and simplifies system design considerably. The programmable DC-DC converters provide the multiple power rails–and power-up sequences–needed for big FPGAs.

The blue regions show places where FPGAs are used in wireless basestations.

The blue regions show places where FPGAs are used in wireless LTE basestations. (Courtesy: Altera.)

My Take: Altera’s Move in Big Data

Analysts estimate that nearly 50 percent of the revenue in  FPGAs comes from high end, high density, costly FPGAs like the Xilinx Vertex 7 and Altera Stratix V. Segments like wireless and wireline packet processing, plus financial or image processing algorithm processors increasingly rely on these kinds of FPGAs in lieu of ASICs, GPGPUs, or proprietary network processors. So every advantage in IP, process technology, or partnership that Altera has, gets the company one step closer to more design wins.

We’ll see what Altera does with all of these recent announcements. I’d expect to see something shake loose before the traditional “summer doldrums” set in when the semiconductor industry goes on its annual vacation next month in July.

AMD’s Single Chip Embedded SoC: Upward and to the Right

Monolithic AMD embedded G Series SoCs combine x86 multicore, Radeon graphics, and a Southbridge. It’s one-stop-shopping, and it’s a flood targeting Intel again.

AMD arrow logoThe little arrow-like “a” AMD logo once represented an “upward and to right” growth strategy, back in the 1980s as the company was striving for $1.0B and I worked there just out of university.

In 2013, AMD is focusing on the embedded market with a vengeance and it’s “upward and to the right” again. The stated target is for AMD to grow embedded revenues from 5% in Q3 2012 to 20% of the total by Q4 2013. Wow. I’m excited about the company’s prospects, though I know they’ve had decades of false starts or technology successes that were later to sold off in favor of their personal war with Intel for PC dominance. (Flash memories and Vantis? The first DSP telephone modem Am7910? Telecom line cards? Alchemy “StrongMIPS”? All gone.)

Know what? PCs are in the tank right now, embedded is the market, and AMD might just be better positioned than Intel. They’re certainly saying all the right things. Take this week’s DESIGN West announcement of the new embedded G Series “SoCs”. Two years ago AMD invented the term Accelerated Processing Unit (APU) as a differentiated x86 CPU with an ATI GPU.

An AMD Accelerated Processing Unit merges a multicore x86 CPU with a Radeon GPU.

An AMD Accelerated Processing Unit merges a multicore x86 CPU with a Radeon GPU.

This week’s news is how the APU mind-melds with all of the traditional x86 Southbridge I/O to become a System-on-Chip (SoC).

The AMD G Series “SoC” does more real estate slight-of-hand by eliminating the Southbridge to bring all peripherals on-board the APU.

The AMD G Series “SoC” does more real estate slight-of-hand by eliminating the Southbridge to bring all peripherals on-board the APU.

The G Series SoCs meld AMD’s latest 28 nm quad-core “Jaguar” with the ATI Radeon 8000 series GPU and claim a 113 percent CPU and 20 percent GPU performance jump. More importantly, the single-chip SoC concept reduces footprint by 33 percent by eliminating a whole IC. On-board peripherals are HDMI/DVI/LVDS/VGA, PCIe, USB 2.0/3.0, SATA 2.x/3.x, SPI, SD card reader interface, and more. You know, the kind of stuff you’d expect in an all-in-one.

Available in 2- and 4-core flavors, the G Series SoC saves up to 33% board real estate, and even drives dual displays and high-res.

Available in 2- and 4-core flavors, the G Series SoC saves up to 33% board real estate, and even drives dual displays and high-res.

AMD is clearly setting their sites on embedded, and Intel is once again in the crosshairs. The company claims a 3x (218 percent) overall performance advantage with the GX-415GA SKU (quad core, 1.5 GHz, 2 MB L2) over Intel’s Atom D525 running Sandra Engineering 2011 Dhrystone ALU, Sandra Engineering 2011 Whetstone iSSE3, and other benchmarks such as those from EEMBC. Although AMD’s talking trash about the Atom, they’re disclosing all of their benchmarks, the hardware they were run on, and the OS assumptions. (The only thing that maybe seems hinky to me is that the respective motherboards use 4 GB DRAM (AMD) versus 1 GB DRAM (Intel).)

AMD CPU performance graph 1

And then there’s the built-in ECC which targets critical applications such as military, medical, financial, and casino gaming. The single-chip SoC is also designed ground-up to run -40 to +85C (operation) and will fit the bill in many rugged, defense, and medical applications requiring really good horsepower and graphics performance. Fan-less designs are the sweet spot with a 9W to 25W TDP, with all I/O’s blazing. Your mileage may vary, and AMD claims a much-better-than-Intel Performance-per-Watt number of 19 vs 9 as shown below. There are more family members to follow, some with sub 9W power consumption. Remember, that’s for CPU+GPU+Peripherals combined. Again, read the fine print.

AMD performance per Watt 1

I’m pretty enthused about AMD’s re-entry into the embedded market. Will Intel counter with something similar? Maybe not, but their own ultra low power Atom-based SoCs are winning smartphone designs left and right and have plenty of horsepower to run MPEG4 decode, DRM, and dual screen displays a la Apple’s AirPlay. So it’s game on, boys and girls.

The AMD vs Intel battle has always been good for the entire industry as it has “lifted all boats”. Here’s to a flood of new devices in embedded.