AMD Targets Embedded Graphics

As the PC market flounders, AMD continues focus on embedded, this time with three (3) new GPU families.

The widescreen LCD digital sign at my doctor’s office tells me today’s date, that it’s flu season, and that various health maintenance clinics are available if only I’d sign up. I feel guilty every time.

An electronic digital sign, mostly text based. (Courtesy: Wikimedia Commons.)

An electronic digital sign, mostly text based. (Courtesy: Wikimedia Commons.)

These kind of static, text-only displays are not the kind of digital sign that GPU powerhouses like AMD are targeting. Microsoft Windows-based text running in an endless loop requires no graphics or imaging horsepower at all.

Instead, high performance is captured in those Minority Report multimedia messages that move with you across multiple screens down a hallway; the immersive Vegas-style electronic gaming machines that attract senior citizens like moths to a flame; and the portable ultrasound machine that gives a nervous mother the first images of her baby in HD. These are the kinds of embedded systems that need high-performance graphics, imaging, and encode/decode hardware.

AMD announced three new embedded graphics families, spanning low power (4 displays) ranging up to 6 displays and 1.5 TFLOPs of number crunching for high-end GPU graphics processing.

AMD announced three new embedded graphics families, spanning low power (4 displays) ranging up to 6 displays and 1.5 TFLOPs of number crunching for high-end GPU graphics processing.

Advanced Micro Devices wants you to think of their GPUs for your next embedded system.

AMD just announced a collection of three new embedded graphics processor families using 28nm process technology designed to span the gamut from multi-display and low power all the way up to a near doubling of performance at the high end.  Within each new family, AMD is looking to differentiate from the competition at both the chip- and module/board-level. Competition comes mostly from Nvidia discrete GPUs, although some Intel processors and ARM-based SoCs cross paths with AMD. As well, AMD is pushing its roadmap quickly away from previous generation 40nm GPU devices.

Comparison between AMD 40nm and 28nm embedded GPUs.

Comparison between AMD 40nm and 28nm embedded GPUs.

A Word about Form Factors

Sure, AMD’s got PC-card plug-in boards in PCI Express format—long ones, short ones, and ones with big honking heat sinks and fans and plenty of I/O connections. AMD’s high-end embedded GPUs like the new E8870 Series are available on PCIe and boast up to 1500 GFLOPs (single precision) and 12 Compute Units. They’ll drive up to 6 displays and burn up to 75W of power without an on-board fan, yet since they’re on AMD’s embedded roadmap—they’ll be around for 5 years.

An MXM (Mobile PCIe Module) format PCB containing AMD’s mid-grade E8950 GPU.

An MXM (Mobile PCIe Module) format PCB containing AMD’s mid-grade E8950 GPU.

Compared to AMD’s previous embedded E8860 Series, the E8870 has 97% more 3DMark 11 performance when running from 4GB of onboard memory. Interestingly, besides the PCIe version—which might only be considered truly “embedded” when plugged into a panel PC or thin client machine—AMD also supports the MXM format.  The E8870 will be available on the Type B Mobile PCI Express Module (MXM) that’s a mere 82mm x 105mm and complete with memory, GPU, and ancillary ICs.

Middle of the Road

For more of a true embedded experience, AMD’s E8950MXM still drives 6 displays and works with AMD’s EyeFinity capability of stitching multiple displays together in Jumbotron fashion. Yet the 3000 GFLOPs (yes, that’s 3000 GFLOPs peak, single precision) little guy still has 32 Compute Units, 8 GB of GPU memory, and is optimized for 4K (UHD) code/decoding. If embedded 4K displays are your thing, this is the GPU you need.

Hardly middle of the road, right? Depending upon the SKU, this family can burn up to 95W and is available exclusively on one of those MXM modules described above. In embedded version, the E8950 is available for 3 years (oddly, two fewer than the others).

Low Power, No Compromises

Yet not every immersive digital sign, MRI machine, or arcade console needs balls-to-the-wall graphics rendering and 6 displays. For this reason, AMD’s E6465 series focuses on low power and small form factor (SFF) footprint. Able to drive 4 displays and having a humble 2 Compute Units, the series still boasts 192 GFLOPs (single precision), 2 GB of GPU memory, 5 years of embedded life, but consumes a mere 20W.

The E6465 is available in PCIe, MXM (the smaller Type A size at 82mm x 70mm), and a multichip module. The MCM format really looks embedded, with the GPU and memory all soldered on the same MCM substrate for easier design-in onto SFFs and other board-level systems.

More Than Meets the Eye

While AMD is announcing three new embedded GPU families, it’s easy to think the story stops with the GPU itself. It doesn’t. AMD doesn’t get nearly enough recognition for the suite of graphics, imaging, and heterogeneous processing software available for these devices.

For example, in mil/aero avionics systems AMD has a few design wins in glass cockpits such as with Airbus. Some legacy mil displays don’t always follow standard refresh timing, so the new embedded GPU products support custom timing parameters. Clocks like Timing Standard, Front Porch, Refresh Rate and even Pixel Clocks are programmable—ideal for the occasional non-standard military glass cockpit.

AMD is also a strong supporter of OpenCL and OpenGL—programming and graphics languages that ease programmers’ coding efforts. They also lend themselves to creating DO-254 (hardware) and DO-178C (software) certifiable systems, such as those found in Airbus military airframes. Airbus Defence has selected AMD graphics processors for next-gen avionics displays.

Avionics glass cockpits, like this one from Airbus, are prime targets for high-end embedded graphics. AMD has a design win in one of Airbus' systems.

Avionics glass cockpits, like this one from Airbus, are prime targets for high-end embedded graphics. AMD has a design win in one of Airbus’ systems.

Finally, AMD is the founding member of the HSA Foundation, an organization that has released heterogeneous system standard (HSA) version 1.0, also designed to make programmers’ jobs way easier when using multiple dissimilar “compute engines” in the same system. Companies like ARM, Imagination, MediaTek and others are HSA Foundation supporters.

 

 

Quiz question: I’m an embedded system, but I’m not a smartphone. What am I?

In the embedded market, there are smartphones, automotive, consumer….and everything else. I’ve figured out why AMD’s G-Series SoCs fit perfectly into the “everything else”.

amd-embedded-solutions-g-series-logo-100xSince late 2013 AMD has been talking about their G-Series of Accelerated Processing Unit (APU) x86 devices that mix an Intel-compatible CPU with a discrete-class GPU and a whole pile of peripherals like USB, serial, VGA/DVI/HDMI and even ECC memory. The devices sounded pretty nifty—in either SoC flavor (“Steppe Eagle”) or without the GPU (“Crowned Eagle”). But it was a head-scratcher where they would fit. After-all, we’ve been conditioned by the smartphone market to think that any processor “SoC” that didn’t contain an ARM core wasn’t an SoC.

AMD’s Stephen Turnbull, Director of Marketing, Thin Client markets.

AMD’s Stephen Turnbull, Director of Marketing, Thin Client markets.

Yes, ARM dominates the smartphone market; no surprise there.

But there are plenty of other professional embedded markets that need CPU/GPU/peripherals where the value proposition is “Performance per dollar per Watt,” says AMD’s Stephen Turnbull, Director of Marketing, Thin Clients. In fact, AMD isn’t even targeting the smartphone market, according to General Manager Scott Aylor in his many presentations to analysts and the financial community.

AMD instead targets systems that need “visual compute”: which is any business-class embedded system that mixes computation with single- or multi-display capabilities at a “value price”. What this really means is: x86-class processing—and all the goodness associated with the Intel ecosystem—plus one or more LCDs. Even better if those LCDs are high-def, need 3D graphics or other fancy rendering, and if there’s industry-standard software being run such as OpenCL, OpenGL, or DirectX. AMD G-Series SoCs run from 6W up to 25W; the low end of this range is considered very power thrifty.

What AMD’s G-Series does best is cram an entire desktop motherboard and peripheral I/O, plus graphics card onto a single 28nm geometry SoC. Who needs this? Digital signs—where up to four LCDs make up the whole image—thin clients, casino gaming, avionics displays, point-of-sale terminals, network-attached-storage, security appliances, and oh so much more.

G-Series SoC on the top with peripheral IC for I/O on the bottom.

G-Series SoC on the top with peripheral IC for I/O on the bottom.

According to AMD’s Turnbull, the market for thin client computers is growing at 6 to 8 percent CAGR (per IDC), and “AMD commands over 50 percent share of market in thin clients.” Recent design wins with Samsung, HP and Fujitsu validate that using a G-Series SoC in the local box provides more-than-ample horsepower for data movement, encryption/decryption of central server data, and even local on-the-fly video encode/decode for Skype or multimedia streaming.

Typical use cases include government offices where all data is server-based, bank branch offices, and “even classroom learning environments, where learning labs standardize content, monitor students and centralize control of the STEM experience,” says AMD’s Turnbull.

Samsung LFDs (large format displays) use AMD R-Series APUs for flexible display features, like sending content to multiple displays via a network. (Courtesy: Samsung.)

Samsung LFDs (large format displays) use AMD APUs for flexible display features, like sending content to multiple displays via a network. (Courtesy: Samsung.)

But what about other x86 processors in these spaces? I’m thinking about various SKUs from Intel such as their recent Celeron and Pentium M offerings (which are legacy names but based on modern versions of Ivy Bridge and Haswell architectures) and various Atom flavors in both dual- and quad-core colors. According to AMD’s  published literature, G-Series SoC’s outperform dual-core Atoms by 2x (multi-display) or 3x (overall performance) running industry-standard benchmarks for standard and graphics computation.

And then there’s that on-board GPU. If AMD’s Jaguar-based CPU core isn’t enough muscle, the system can load-balance (in performance and power) to move algorithm-heavy loads to the GPU for General Purpose GPU (GPGPU) number crunching. This is the basis for AMD’s efforts to bring the Heterogeneous System Architecture (HSA) spec to the world. Even companies like TI and ARM have jumped onto this one for their own heterogeneous processors.

G-Series: more software than hardware.

G-Series: more software than hardware.

In a nutshell, after two years of reading about (and writing about) AMD’s G-Series SoCs, I’m beginning to “get religion” that the market isn’t all about smartphone processors. Countless business-class embedded systems need Intel-compatible processing, multiple high-res displays, lots of I/O, myriad industry-standard software specs…and all for a price/Watt that doesn’t break the bank.

So the answer to the question posed in the title above is simply this: I’m a visually-oriented embedded system. And I’m everywhere.

This blog was sponsored by AMD.

 

 

A Sign of the Times

AMD’s FirePro series lights up Godzilla-sized Times Square digital sign.

[Editor's note: blog updated 8-18-15 to remove "Radeon" and make other corrections.]

They say the lights are bright on Broadway, and they ain’t kidding.  A new AMD-powered digital sign makes a stadium Jumbotron look small.

I’ve done a few LAN parties and appreciate an immersive, high-res graphics experience. But nothing could have prepared me for the whopping 25,000 square feet of graphics in Times Square powered by AMD’s FirePro series (1535 Broadway, between 45th and 46th Streets).

The UltraHD media wall is the ultimate digital sign, comprising the equivalent of about 24 million RGB LED pixels. The media wall is a full city block long by 8 stories high! Designed and managed by Diversified Media Group, the sign is thought to be the largest of its kind in the world, and certainly the largest in the U.S.

AMD-powered digital sign will soon grace Times Square, boasting America's largest digital sign.

Three AMD FirePro UltraHD graphics cards drive the largest digital sign in the world.
This view of Times Square shows the commercial importance of high-res digital signs. [1 Times Square night 2013, by Chensiyuan; Licensed under GFDL via Wikimedia Commons.]

The combined 10,048 x 2,368 pixel “display” is powered by a mere three AMD FirePro graphics cards. Each card drives six sections of the overall display wall. The whole UHD experience is so realistic because of AMD’s Graphics Core Next architecture that executes billions of operations in parallel per cycle.

The Diversified Media Group’s Times Square digital sign is powered by AMD FirePro graphics, shown here under construction. [Courtesy: Diversified Media Group.]

The Diversified Media Group’s Times Square digital sign is powered by AMD FirePro graphics, shown here under construction. [Courtesy: Diversified Media Group.]

AMD’s well-proven EyeFinity capability sends partitioned images to various display zones (up to six), all coordinated across the three graphics cards using the FirePro S400 synchronization module.

The FirePro graphics family was introduced at NAB2014 specifically for high-res, media intensive applications like this. There’s 16 GB of GDDR5 memory, PCIe 3.0 for high-speed IO, and the 28nm process technology used in the Graphics  Core Next architecture balances 3D rendering with GPGPU computation. It all adds up to the performance needed for the Times Square “mombo-tron” skyscraper display.

Only three AMD’s W600 FirePro graphics cards like these power America’s largest digital sign in Times Square.

Only three AMD’s W600 FirePro graphics cards like these power America’s largest digital sign in Times Square.

According to the New York Times, approximately 300,000 people each day will see the sign, advertising that might sell for as much as $2.5 million for four weeks–certainly some pretty expensive real estate, even for NYC. So the sign must look astounding and work flawlessly.

This blog was sponsored by AMD.

 

AMD’s “Beefy” APUs Bulk Up Thin Clients for HP, Samsung

There are times when a tablet is too light, and a full desktop too much. The answer? A thin client PC powered by an AMD APU.

Note: this blog is sponsored by AMD.

A desire to remotely access my Mac and Windows machines from somewhere else got me thinking about thin client architectures. A thin “client” machine has sufficient processing for local storage and display—plus keyboard, mouse and other I/O—and is remotely connected to a more beefy “host” elsewhere. The host may be in the cloud or merely somewhere else on a LAN, sometimes intentionally inaccessible for security reasons.

Thin client architectures—or just “thin clients”—find utility in call centers, kiosks, hospitals, “smart” monitors and TVs, military command posts and other multi-user, virtualized installations. At times they’ve been characterized as low performance or limited in functionality, but that’s changing quickly.

They’re getting additional processing and graphics capability thanks to AMD’s G-Series and A-Series Accelerated Processing Units (APUs). By some analysts, AMD is number one in thin clients and the company keeps winning designs with its highly integrated x86 plus Radeon graphics SoCs: most recently with HP and Samsung.

HP’s t420 and mt245 Thin Clients

HP’s ENERGY STAR certified t420 is a fanless thin client for call centers, Desktop-as-a-service and remote kiosk environments (Figure 1). Intended to mount on the back of a monitor such as the company’s ProDisplays (like you see at the doctor’s office), the unit runs HP’s ThinPro 32 or Smart Zero Core 32 operating system, has either 802.11n Wi-Fi or Gigabit Ethernet, 8 GB of Flash and 2 GB of DDR3L SDRAM.

Figure 1: HP’s t420 thin client is meant for call centers and kiosks, mounted to a smart LCD monitor. (Courtesy: HP.)

Figure 1: HP’s t420 thin client is meant for call centers and kiosks, mounted to a smart LCD monitor. (Courtesy: HP.)

USB ports for keyboard and mouse supplement the t420’s dual display capability (DVI-D  and VGA)—made possible by AMD’s dual-core GX-209JA running at 1 GHz.

Says AMD’s Scott Aylor, corporate vice president and general manager, AMD Embedded Solutions: “The AMD Embedded G-Series SoC couples high performance compute and graphics capability in a highly integrated low power design. We are excited to see innovative solutions like the HP t420 leverage our unique technologies to serve a broad range of markets which require the security, reliability and low total cost of ownership offered by thin clients.”

The whole HP thin client consumes a mere 45W and according to StorageReview.com, will retail for $239.

Along the lines of a lightweight mobile experience, HP has also chosen AMD for their mt245 Mobile Thin Client (Figure 2). The thin client “cloud computer” resembles a 14-inch (1366 x 768 resolution) laptop with up to 4GB of SDRAM and a 16 GB SSD, the unit runs Windows Embedded Standard 7P 64 on AMD’s quad core A6-6310 APU with Radeon R4 GPU. There are three USB ports, 1 VGA and 1 HDMI, plus Ethernet and optional Wi-Fi.

Figure 2: HP’s mt245 is a thin client mobile machine, targeting healthcare, education, and more. (Courtesy: HP.)

Figure 2: HP’s mt245 is a thin client mobile machine, targeting healthcare, education, and more. (Courtesy: HP.)

Like the t420, the mt245 consumes a mere 45W and is intended for employee mobility but is configured for a thin client environment. AMD’s director of thin client product management, Stephen Turnbull says the mt245 targets “a whole range of markets, including education and healthcare.”

At the core of this machine, pun intended, is the Radeon GPU that provides heavy-lifting graphics performance. The mt245 can not only take advantage of virtualized cloud computing, but has local moxie to perform graphics-intensive applications like 3D rendering. Healthcare workers might, for example, examine ultrasound images. Factory technicians could pull up assembly drawings, then rotate them in CAD-like software applications.

Samsung Cloud Displays

An important part of Samsung’s displays business involves “smart” displays, monitors and televisions. Connected to the cloud or operating autonomously as a panel PC, many Samsung displays need local processing such as that provided by AMD’s APUs.

Samsung’s recently announced (June 17, 2015) 21.5-inch TC222W and 23.6-inch TC242W also use AMD G-Series devices in thin client architectures. The dual core 2.2 GHz GX222 with Radeon HD6290 powers both displays at 1920 x 1080 (HD) and provides six USB ports, Ethernet, and runs Windows Embedded 7 out of 4GB of RAM and 32 GB of SSD.

Figure 3: Samsung’s Cloud Displays also rely on AMD G-Series APUs.

Figure 3: Samsung’s Cloud Displays also rely on AMD G-Series APUs.

Said Seog-Gi Kim, senior vice president, Visual Display Business, Samsung Electronics, “Samsung’s powerful Windows Thin Client Cloud displays combine professional, ergonomic design with advanced thin-client technology.” The displays rely on the company’s Virtual Desktop Infrastructure (VDI) through a centrally managed data center that increases data security and control (Figure 3). Applications include education, business, healthcare, hospitality or any environment that requires virtualized security with excellent local processing and graphics.

Key to the design wins is the performance density of the G-Series APUs, coupled with legacy x86 software interoperability. The APUs–for both HP and Samsung–add more beef to thin clients.

 

Move Over Arduino, AMD and GizmoSphere Have a “Jump” On You with Graphics

The UK’s National Videogame Arcade relies on CPU, graphics, I/O and openness to power interactive exhibits.

Editor’s note: This blog is sponsored by AMD.

When I was a kid I was constantly fascinated with how things worked. What happens when I stick this screwdriver in the wall socket? (Really.) How come the dinner plate falls down and not up?

Humans have to try things for ourselves in order to fully understand them; this sparks our creativity and for many of us becomes a life calling.

Attempting to catalyze visitors’ curiosity, the UK’s National Videogame Arcade (NVA) opened in March 2015 with the sole intention of getting children and adults interested in videogames through the use of interactive exhibits, most of which are hands-on. The hope is that young people will first be stimulated by the games, and secondly that they someday unleash their creativity on the videogame and tech industries.

The UK's National Videogame Arcade promotes gaming through hands-on exhibits powered by GizmoSphere embedded hardware.

The UK’s National Videogame Arcade promotes gaming through hands-on exhibits powered by GizmoSphere embedded hardware.

 Might As Well “Jump!”

The NVA is located in a corner building with lots of curbside windows—imagine a fancy New York City department store but without the mannequins in the street-side windows. Spread across five floors and a total of 33,000 square feet, the place is a cooperative effort between GameCity (a nice bunch of gamers), the Nottingham City Council, and local Nottingham Trent University.

The goal of pulling in 60,000 visitors a year is partly achieved by the NVA’s signature exhibit “Jump!” that allows visitors to experience gravity (without the plate) and how it affects videogame characters like those in Donkey Kong or Angry Birds. Visitors actually get to jump on the Jump-o-tron, a physics-based sensor that’s controlled by GizmoSphere’s Gizmo 2 development board.

The Jumpotron uses AMD's G-Series SoC combining an x86 and Radeon GPU.

The Jumpotron uses AMD’s G-Series SoC combining an x86 and Radeon GPU.

The heart of Gizmo 2 is AMD’s G-Series APU, combining a 64-bit x86 CPU and Radeon graphics processor. Gizmo 2 is the latest creation from the GizmoSphere nonprofit open source community which seeks to “bring the power of a supercomputer and the I/O capabilities of a microcontroller to the x86 open source community,” according to www.gizmosphere.org.

The open source Gizmo 2 runs Windows and Linux, bridging PC games to the embedded world.

The open source Gizmo 2 runs Windows and Linux, bridging PC games to the embedded world.

Jump!” allows visitors to experience—and tweak—gravity while examining the effect upon on-screen characters. The combination requires extensive processing—up to 85 GFLOPS worth—plus video manipulation and display. What’s amazing is that “Jump!”, along with many other NVA exhibits, isn’t powered by rackmount servers but rather by the tiny 4 x 4 inch Gizmo 2 that supports Direct X 11.1, OpenGL 4.2x, and OpenCL 1.2. It also runs Windows and Linux.

AMD’s “G” Powers Gizmo 2

Gizmo 2 is a dense little package, sporting HDMI, Ethernet, PCIe, USB (2.0 and 3.0), plus myriad other A/V and I/O such as A/D/A—all of them essential for NVA exhibits like “Jump!” Says Ian Simons of the NVA, “Gizmo 2 is used in many of our games…and there are plans for even more games embedded into the building,” including furniture and even street-facing window displays.

Gizmo 2’s small size and support for open source software and hardware—plus the ability to develop on the gamer’s Unity engine—makes Gizmo 2 the preferred choice. Yet the market contains ample platforms from which to choose. Arduino comes to mind.

Gizmo 2's schematic.

Gizmo 2′s schematic. The x86 G-Series SoC is loaded with I/O.

Compared to Arduino, the AMD G Series SoC (GX-210HA) powering Gizmo 2 is orders of magnitude more powerful, plus it’s x86 based and running at 1.0GHz (the integral GPU runs at 300 MHz). This makes the world’s cache of Intel-oriented, Windows-based software and drivers available to Gizmo 2—including some server-side programs. “NVA can create projects with Gizmo 2, including 3D graphics and full motion video, with plenty of horsepower,” says Simons. He’s referring to some big projects already installed at the NVA, plus others in the planning stages.

“One of things we’d like to do,” Simons says, “is continue to integrate Gizmo 2 into more of the building to create additional interactive exhibits and displays.” The small size of Gizmo 2, plus the wickedly awesome performance/graphics rendering/size/Watt of the AMD G-Series APU, allows Gizmo 2 to be embedded all over the building.

See Me, Feel Me

With a nod to The Who’s (1) rock opera Tommy, the NVA building will soon have more Gizmo 2 modules wired into the infrastructure, mixing images and sound. There are at least three projects in the concept stage:

  • DMX addressable logic in the central stairway.  With exposed cables and beams, visitors would be able to control the audio, video, and possibly LED lighting of the stairwell area using a series of switches. The author wonders if voice or other tactile feedback would create all manner of immersive “psychedelic” A/V in the stairwell central hall.
  • Controllable audio zones in the rooftop garden. The NVA’s Yamaha-based sound system already includes 40 zones. Adding AMD G-Series horsepower to these zones would allow visitors to create individually customized light/sound shows, possibly around botanical themes. Has there ever been a Little Shop of Horrors videogame where the plants eat the gardener? I wonder.
  • Sidewalk animation that uses all those street-facing windows to animate the building, possibly changing the building’s façade (Star Trek cloak, anyone?) or even individually controlling games inside the building from outside (or presenting inside activities to the outside). Either way, all those windows, future LCDs, and reams of I/O will require lots more Gizmo 2 embedded boards.

The Gizmo 2 costs $199 and is available from several retailers such as Element14. With Gerber schematics and all the board-focused software open source, it’s no wonder this x86 embedded board is attractive to gamers. With AMD’s G-Series APU onboard, the all-in-one HDK/SDK is an ideal choice for embedded designs—and those future gamers playing with the Gizmo 2 at the UK’s NVA.

BTW: The Who harkened from London, not Nottingham.

AMD on a Design Win Roll: GE and Samsung, Recent Examples

AMD is announcing several design wins per week as second-gen APUs show promise.

Note: AMD is a sponsor of this blog.

I follow many companies on Twitter, but lately it’s AMD that’s tweeting the loudest with weekly design wins. The company’s APUs—accelerated processing units—seem to be gaining traction in systems where PC functionality with game-like  graphics is critical. Core to both of these—pun intended!—is the x86 ISA with its PC compatibility and rich software ecosystem.

Here’s a look at two of AMD’s recent design wins, one for an R-Series and the other for the all-in-one G-Series APU.

Samsung’s “set-back box” adds high-res graphics and PC functions to their digital signage displays. (Courtesy: Samsung.)

Samsung’s “set-back box” adds high-res graphics and PC functions to their digital signage displays. (Courtesy: Samsung.)

Samsung Digital Signs on to AMD

In April Samsung and AMD announced that AMD’s second-gen embedded R-Series APU, previously codenamed “Bald Eagle” is powering Samsung’s latest set-back box (SBB) digital media players. I had no idea what a set-back box is until I looked it up.

Turns out it’s a slim embedded “pizza box” computer 310mm x 219mm x 32mm (12.2in x 8.6in x 1.3in) that’s inserted into the back (“set-back”) of a Samsung Large Format Display (LFD). These industrial-grade LFDs range in size from 32in to 82in and are used in digital signage applications.

Samsung LFDs (large format displays) use AMD R-Series APUs for flexible display features, like sending content to multiple displays via a network. (Courtesy: Samsung.)

Samsung LFDs (large format displays) use AMD R-Series APUs for flexible display features, like sending content to multiple displays via a network. (Courtesy: Samsung.)

What makes them so compelling is the reason they chose AMD’s R-Series APU. The SBB is a complete networked PC, alleviating the need for a separate box; they’re remotely controlled by Samsung’s MagicInfo software that allows up to 192 displays to be linked with same- or stitched-display information.

That is, one can build a video wall where the image is split across the displays—relying on AMD’s EyeFinity graphics feature—or content can be streamed across networked displays depending upon the retailer’s desired effect. Key to Samsung’s selling differentiation is remote management, RS232 control, and network-based self-diagnostics and active alert notification of problems.

Samsung is using the RX-425BB APU with integrated AMD Radeon R6 GPU. Per the datasheet, this version has a 35W TDP, 4 x86 cores and 6 GPU cores @ 654 MHz, is based on AMD’s latest “Steamroller” 64-bit CPU and Embedded Radeon E8860 discrete GPU. Each R-Series APU can drive four 3D, 4K, or HD displays (up to 4096 x 2160 pixels) while running DirectX 11.1, OpenGL 2.4 and AMD’s Mantle gaming SDK.

As neat as all of this is—it’s a super high-end embedded LAN-party “gaming” PC system, afterall—it’s the support for the latest HSA Foundation specs that makes the R-Series (and companion G-Series SOC) equally compelling for deeply embedded applications.  HSA allows mixed CPU and GPU computation which is especially useful in industrial control with its combination of general purpose, machine control, and display requirements.

GE Chooses AMD SOC for SFF

The second design win for AMD was back in February and it wasn’t broadcast widely: I stumbled across it while working on a sponsored piece for GE Intelligent Platforms (Disclosure: GE-IP is a sponsor of this blog.)

The AMD G-Series is now a monolithic, single-chip SOC that combines x86 CPU and Radeon graphics. (Courtesy: GE; YouTube.)

The AMD G-Series is now a monolithic, single-chip SOC that combines x86 CPU and Radeon graphics. (Courtesy: GE; YouTube.)

Used in a rugged, COM Express industrial controller, the AMD G-Series SOC met GE’s needs for low power and all-in-one processing, said Tommy Swigart, Global Product Manager at GE Intelligent Platforms. The “Jaguar” core in the SOC can sip as little as 5W TDP, yet still offers 3x PCIe, 2x GigE, 4x serial, plus HD audio and video, 10 USB (including 2x USB 3.0) and 2 SATA interfaces. What a Swiss Army knife of capability it is.

GE chose AMD’s G-Series APU for a rugged COM Express module for use in GE’s Industrial Internet. (Courtesy: GE Intelligent Platforms, YouTube.)

GE chose AMD’s G-Series APU for a rugged COM Express module for use in GE’s Industrial Internet. (Courtesy: GE Intelligent Platforms, YouTube.)

GE’s going all-in with the GE Industrial Internet, the company’s version of the IoT. Since the company is so diversified, GE can wring cost efficiencies for its customers by predicting aircraft maintenance, reducing energy in office HVAC installations, and interconnecting telemetry from locomotives to reduce track traffic and downtime. AMD’s G-Series APU brings computation, graphics, and bundles of I/O in a single-chip SOC—ideal for use in GE’s rugged SFF.

GE’s Industrial Internet runs on AMD’s G-Series APU. (Courtesy: GE; YouTube.)

GE’s Industrial Internet runs on AMD’s G-Series APU. (Courtesy: GE; YouTube.)

 

New HSA Spec Legitimizes AMD’s CPU+GPU Approach

After nearly 3 years since the formation of the Heterogeneous System Architecture (HSA) Foundation, the consortium releases 1.0 version of the Architecture Spec, Programmer’s Reference Manual, Runtime Specification and a Conformance Plan.

Note: This blog is sponsored by AMD.

HSA banner

 

UPDATE 3/17/15: Added Imagination Technologies as one of the HSA founders. C2

No one doubts the wisdom of AMD’s Accelerated Processing Unit (APU) approach that combines x86 CPU with a Radeon graphic GPU. Afterall, one SoC does it all—makes CPU decisions and drives multiple screens, right?

True. Both AMD’s G-Series and the AMD R-Series do all that, and more. But that misses the point.

In laptops this is how one uses the APU, but in embedded applications—like the IoT of the future that’s increasingly relying on high performance embedded computing (HPEC) at the network’s edge—the GPU functions as a coprocessor. CPU + GPGPU (general purpose graphics processor unit) is a powerful combination of decision-making plus parallel/algorithm processing that does local, at-the-node processing, reducing the burden on the cloud. This, according to AMD, is how the IoT will reach tens of billions of units so quickly.

Trouble is, HPEC programming is difficult. Coding the GPU requires a “ninja programmer”, as quipped AMD’s VP of embedded Scott Aylor during his keynote at this year’s Embedded World Conference in Germany. (Video of the keynote is here.) Worse still, capitalizing on the CPU + GPGPU combination requires passing data between the two architectures which don’t share a unified memory architecture. (It’s not that AMD’s APU couldn’t be designed that way; rather, the processors require different memory architectures for maximum performance. In short: they’re different for a reason.)

AMD’s Scott Aylor giving keynote speech at Embedded World, 2015. His message: some IoT nodes demand high-performance heterogeneous computing at the edge.

AMD’s Scott Aylor giving keynote speech at Embedded World, 2015. His message: some IoT nodes demand high-performance heterogeneous computing at the edge.

AMD realized this limitation years ago and in 2012 catalyzed the HSA Foundation with several companies including ARM, Texas Instruments, Imagination Technology, MediaTek, Qualcomm, Samsung and others. The goal was to create a set of specifications that define heterogeneous hardware architectures but also create an HPEC programming paradigm for CPU, GPU, DSP and other compute elements. Collectively, the goal was to make designing, programming, and power optimizing easy for heterogeneous SoCs (Figure).

Heterogeneous systems architecture (HSA) specifications version 1.0 by the HSA Foundation, March 2015.

The HSA Foundation’s goals are realized by making the coder’s job easier using tools—such as an HSA version LLVM open source compiler—that integrates multiple cores’ ISAs. (Courtesy: HSA Foundation; all rights reserved.) Heterogeneous systems architecture (HSA) specifications version 1.0 by the HSA Foundation, March 2015.

After three years of work, the HSA Foundation just released their specifications at version 1.0:

  • HSA System Architecture Spec: defines H/W, OS requirements, memory model (important!), signaling paradigm, and fault handling.
  • Programmers Reference Guide: essentially a virtual ISA for parallel computing, defines an output format for HSA language compilers.
  • HSA Runtime Spec: is an application library for running HSA applications; defines INIT, user queues, memory management.

With HSA, the magic really does happen under the hood where the devil’s in the details. For example, the HSA version LLVM open source compiler creates a vendor-agnostic HSA intermediate language (HSAIL) that’s essentially a low-level VM. From there, “finalizers” compile into vendor-specific ISAs such as AMD or Qualcomm Snapdragon. It’s at this point that low-level libraries can be added for specific silicon implementations (such as VSIPL for vector math). This programming model uses vendor-specific tools but allows novice programmers to start in C++ but end up with optimized, performance-oriented, and low-power efficient code for the heterogeneous combination of CPU+GPU or DSP.

There are currently 43 companies involved with HSA, 16 universities, and three working groups (and they’re already working on version 1.1). Look at the participants, think of their market positions, and you’ll see they have a vested interest in making this a success.

In AMD’s case, as the only x86 and ARM + GPU APU supplier to the embedded market, the company sees even bigger successes as more embedded applications leverage heterogeneous parallel processing.

One example where HSA could be leveraged, said Phil Rogers, President of the HSA Foundation, is for multi-party video chatting. An HSA-compliant heterogeneous architecture would allow the processors to work in a single (virtual) memory pool and avoid the multiple data set copies—and processor churn—prevalent in current programming models.

With key industry players supporting HSA including AMD, ARM, Imagination Technologies, Samsung, Qualcomm, MediaTek and others, a lot of x86, ARM, and MIPS-based SoCs are likely to be compliant with the specification. That should kick off a bunch of interesting software development leading to a new wave of high performance applications.

Virtual, Immersive, Interactive: Performance Graphics and Processing for IoT Displays

Vending machines outside Walmart

Current-gen machines like these will give way to smart, IoT connected machines with 64-bit graphics and virtual reality-like customer interaction.

Not every IoT node contains a low-performance processor, sensor and slow comms link. Sure, there may be tens of billions of these, but estimates by IHS, Gartner, Cisco still infer the need for billions of smart IoT nodes with hefty processing needs. These intelligent IoT platforms are best left to 64-bit algorithm processors like AMD’s G-and R-Series of Accelerated Processing Units (APU). AMD’s claim to fame is 64-bit cores combined with on-board Radeon graphics processing units (GPU) and tons of I/O.

As an example, consider this year’s smart vending machine. It may dispense espresso or electronic toys, or maybe show the customer wearing virtual custom-fit clothing. Suppose the machine showed you–at that very moment–using or drinking the product in the machine you were just starting at seconds before.

Far fetched? Far from it. It’s real.

These machines require a multi-media, sensor fusion experience. Multiple iPad-like touch screens may present high-def product options while cameras track customers’ eye movements, facial expressions, and body language in three-space.

This “visual compute” platform will tailor the display information to best interact with the customer in an immersive, gesture-sort of experience. Fusing all these inputs, processing the data in real-time, and driving multiple displays is best handled by 64-bit APUs with closely-coupled CPU and GPU execution units, hardware acceleration, and support for standards like DirectX 11, HSA 1.0, OpenGL and OpenCL.

For heavy lifting in visual compute-intensive IoT platforms, keep an eye on AMD’s graphics-ready APUs.

If you are attending Embedded World February 24-26, be sure to check out the keynote Heterogeneous Computing for an Internet of Things World,” by Scott Aylor, Corporate VP and General Manager, AMD Embedded Solutions on Wednesday the 25th at 9:30.

This blog was sponsored by AMD.

PCI Express Switch: the “Power Strip” of IC Design

Need more PCIe channels in your next board design? Add a PCIe switch for more fanout.

Editor’s notes:

1. Despite the fact that Pericom Semiconductor sponsors this particular blog post, your author learns that he actually knows very little about the complexities of PCIe.

2. Blog updated 3-27-14 to correct the link to Pericom P/N PI7C9X2G303EL.

Perhaps you’re like me; power cords everywhere. Anyone who has more than one mobile doodad—from smartphone to iPad to Kindle and beyond—is familiar with the ever-present power strip.

An actual power strip from under my desk. Scary...

An actual power strip from under my desk. Scary…

The power strip is a modern version of the age-old extension cord: it expands one wall socket into three, five or more.  Assuming there’s enough juice (AC amperage) to power it all, the power strip meets our growing hunger for more consumer devices (or rather: their chargers).

 

And so it is with IC design. PCI Express Gen 2 has become the most common interoperable, on-board way to add peripherals such as SATA ports, CODECs, GPUs, WiFi chipsets, USB hubs and even legacy peripherals like UARTs. The wall socket analogy applies here too: most new CPUs, SoCs, MCUs or system controllers lack sufficient PCI Express (PCIe) ports for all the peripheral devices designers need. Plus, as IC geometries shrink, system controllers also have lower drive capability per PCIe port and signals degrade rather quickly.

The solution to these host controller problems is a PCIe switch to increase fanout by adding two, three, or even eight additional PCIe ports with ample per-lane current sourcing capability.

Any Port in a Storm?

While our computers and laptops strangle everything in sight with USB cables, inside those same embedded boxes it’s PCIe as the routing mechanism of choice. Just about any standalone peripheral a system designer could want is available with a PCIe interface. Even esoteric peripherals—such as 4K complex FFT, range-finding, or OFDM algorithm IP blocks—usually come with a PCIe 2.0 interface.

Too bad then that modern device/host controllers are painfully short on PCIe ports. I did a little Googling and found that if you choose an Intel or AMD CPU, you’re in good shape. A 4th Gen Intel Core i7 with Intel 8 Series Chipset has six PCIe 2.0 ports spread across 12 lanes. Wow. Similarly, an AMD A10 APU has four PCIe (1x as x4, or 4x as x1). But these are desktop/laptop processors and they’re not so common in embedded.

AMD’s new G-Series SoC for embedded is an APU with a boatload of peripherals and it’s got only one PCIe Gen 2 port (x4). As for Intel’s new Bay Trail-based Atom processors running the latest red-hot laptop/tablet 2:1’s:  I couldn’t find an external PCIe port on the block diagram.

Similarly…Qualcomm Snapdragon 800? Nvidia Tegra 4 or even the new K1? Datasheets on these devices are closely held for customers only but I found Developer References that point to at best one PCIe port. ARM-based Freescale processors such as the i.MX6, popular in set-top boxes from Comcast and others have one lone PCIe 2.0 port (Figure 1).

What to do if a designer wants to add more PCIe-based stuff?

Figure 1: Freescale i.MX ARM-based CPU is loaded with peripheral I/O, yet has only one PCIe 2.0 port. (Courtesy: Freescale Semiconductor.)

Figure 1: Freescale i.MX ARM-based CPU is loaded with peripheral I/O, yet has only one PCIe 2.0 port. (Courtesy: Freescale Semiconductor.)

‘Mo Fanout

A PCIe switch solves the one-to-many dilemma. Add in a redriver at the Tx and Rx end, and signal integrity problems over long traces and connectors all but disappear. Switches from companies like Pericom come in many flavors, from simple lane switches that are essentially PCIe muxes, to packet switches with intelligent routing functions.

One simple example of a Pericom PCIe switch is the PI7C9X2G303EL. This PCIe 2.0 three port/three lane switch has one x1 Up and two x1 Down and would add two ports to the i.MX6 shown in Figure 1. This particular device, aimed at those low power consumer doodads I mentioned earlier, boasts some advanced power saving modes and consumes under 0.7W.

Hook Me Up

Upon researching this for Pericom, I was surprised to learn of all the nuances and variables to consider with PCIe switches. I won’t cover them here, other than mentioning some of the designer’s challenges: PCIe Gen 1 vs Gen 2, data packet routing, latency, CRC verification (for QoS), TLP layer inspection, auto re-send, and so on.

It seems that PCIe switches seem to come in all flavors, from the simplest “power strip”, to essentially an intelligent router-on-a-chip. And for maximum interoperability, of them need to be compliant to the PCI-SIG specs as verified by a plugfest.

So if you’re an embedded designer, the solution to your PCIe fanout problem is adding a PCI Express switch. 

PCI-SIG “nificant” Changes Brewing in Mobile

PCI-SIG Developers Conference, June 25, 2013, Santa Clara, CA

Of five significant PCI Express announcements made at this week’s PCI-SIG Developers Conference, two are aimed at mobile embedded.

From PCI to PCI Express to Gen3 speeds, the PCI-SIG is one industry consortium that lets no grass grow for long. As the embedded, enterprise and server industries roll out PCIe Gen3 and 40G/100G Ethernet, the PCI-SIG and its key constituents like Cadence, Synopsis, LeCroy and others are readying for another speed doubling to 16 GT/s (giga transfers/second) by 2015. The PCIe 4.0 next step evolves bandwidth to 16Gb/s or a whopping 64 GB/s (big “B”) total lane bandwidth in x16 width. PCIe 4.0 Rev 0.5 will be available Q1 2014 with Rev 0.9 targeted for Q1 2015.

Table of major PCI-SIG announcements at Developers Conference 2013

Table of major PCI-SIG announcements at Developers Conference 2013

Yet as “SIG-nificant” as this announcement is, PCI-SIG president Al Yanes said it’s only one of five major news items. The others include: a PCIe 3.1 specification that consolidates a series of ECNs in the areas of power, performance and functionality; PCIe Outside the Box which uses a 1-3 meter “really cheap” copper cable called PCIe OCuLink with an 8G bit rate; plus two embedded and mobile announcements that I’m particularly enthused about. Refer to the table for a snapshot.

New M.2 Specification

The new M.2 specification is a small, mobile embedded form factor designed to replace the previous “Mini PCI” in Mini Card and Half Mini Card sizes. The newer, as-yet-publicly-unreleased M.2 card will be smaller in size and volume but is intended to provide scalable PCIe performance to allow designers to tune SWaP and I/O requirements. PCI-SIG marketing workgroup chair Ramin Neshati told me that M.2 is part of the PCI-SIG’s increased focus on mobile.

The scalable M.2 card is designed as an I/O plug in for Bluetooth, Wi-Fi, WAN/cellular, SSD and other connectivity in platforms including ultrabook, tablet, and “maybe even smartphone,” said Neshati. At Rev 0.7 now, Rev 0.9 will be released soon and the final (Rev 1.0?) spec will become public by Q4 2013.

PCI-SIG M.2 card form factor

The PCI-SIG’s impending M.2 form factor is designed for mobile embedded ultrabooks, tablets, and possibly smartphones. The card will have a scalable PCIe interface and is designed for Wi-Fi, Bluetooth, cellular, SSD and more. (Courtesy: PCI-SIG.)

Mobile PCIe (M-PCIe)

Seeing the momentum in mobile and the interest in a PCIe on-board interconnect lead the PCI-SIG to work with the MIPI Alliance and create Mobile PCI Express: M-PCIe. The specification is now available to PCI-SIG members and creates an “adapted PCIe architecture” bridge between regular PCIe and MIPI M-PHY.

The Mobile PCI Express (M-PCIe) specification targets mobile embedded devices like smartphones to provide high-speed, on-board PCIe connectivity. (Courtesy: PCI-SIG.)

The Mobile PCI Express (M-PCIe) specification targets mobile embedded devices like smartphones to provide high-speed, on-board PCIe connectivity. (Courtesy: PCI-SIG.)

Using the MIPI M-PHY physical layer allows smartphone and mobile designers to stick with one consistent user interface across multiple platforms, including already-existing OS drivers. PCIe support is “baked into Windows, iOS, Android,” and others, says PCI-SIG’s Neshati.  PCI Express also has a major advantage when it comes to interoperability testing, which runs from the protocol stack all the way down to the electrical interfaces. Taken collectively, PCIe brings huge functionality and compliance benefits to the mobile space.

M-PCIe supports MIPI’s Gear 1 (1.25-1.45 Gbps), Gear 2 (2.5-2.9 Gbps) and Gear 3 (5.0-5.8 Gbps) speeds. As well, the M-PCIe spec provides power optimization for short channel mobile platforms, primarily aimed at WWAN front end radios, modem IP blocks, and possibly replacing MIPI’s own universal file storage UFS mass storage interface (administered by JEDEC).

M-PCIe by the PCI-SIG can be used in multiple high speed paths in a smartphone mobile device. (Courtesy: PCI-SIG and MIPI Alliance.)

M-PCIe by the PCI-SIG can be used in multiple high speed paths in a smartphone mobile device. (Courtesy: PCI-SIG and MIPI Alliance.)

PCI Express Ready for More

More information on these five announcements will be rolling out soon. But it’s clear that the PCI-SIG sees mobile and embedded as the next target areas for PCI Express in the post-PC era, while still not abandoning the standard’s bread and butter in PCs and high-end/high-performance servers.