Quiz question: I’m an embedded system, but I’m not a smartphone. What am I?

In the embedded market, there are smartphones, automotive, consumer….and everything else. I’ve figured out why AMD’s G-Series SoCs fit perfectly into the “everything else”.

amd-embedded-solutions-g-series-logo-100xSince late 2013 AMD has been talking about their G-Series of Accelerated Processing Unit (APU) x86 devices that mix an Intel-compatible CPU with a discrete-class GPU and a whole pile of peripherals like USB, serial, VGA/DVI/HDMI and even ECC memory. The devices sounded pretty nifty—in either SoC flavor (“Steppe Eagle”) or without the GPU (“Crowned Eagle”). But it was a head-scratcher where they would fit. After-all, we’ve been conditioned by the smartphone market to think that any processor “SoC” that didn’t contain an ARM core wasn’t an SoC.

AMD’s Stephen Turnbull, Director of Marketing, Thin Client markets.

AMD’s Stephen Turnbull, Director of Marketing, Thin Client markets.

Yes, ARM dominates the smartphone market; no surprise there.

But there are plenty of other professional embedded markets that need CPU/GPU/peripherals where the value proposition is “Performance per dollar per Watt,” says AMD’s Stephen Turnbull, Director of Marketing, Thin Clients. In fact, AMD isn’t even targeting the smartphone market, according to General Manager Scott Aylor in his many presentations to analysts and the financial community.

AMD instead targets systems that need “visual compute”: which is any business-class embedded system that mixes computation with single- or multi-display capabilities at a “value price”. What this really means is: x86-class processing—and all the goodness associated with the Intel ecosystem—plus one or more LCDs. Even better if those LCDs are high-def, need 3D graphics or other fancy rendering, and if there’s industry-standard software being run such as OpenCL, OpenGL, or DirectX. AMD G-Series SoCs run from 6W up to 25W; the low end of this range is considered very power thrifty.

What AMD’s G-Series does best is cram an entire desktop motherboard and peripheral I/O, plus graphics card onto a single 28nm geometry SoC. Who needs this? Digital signs—where up to four LCDs make up the whole image—thin clients, casino gaming, avionics displays, point-of-sale terminals, network-attached-storage, security appliances, and oh so much more.

G-Series SoC on the top with peripheral IC for I/O on the bottom.

G-Series SoC on the top with peripheral IC for I/O on the bottom.

According to AMD’s Turnbull, the market for thin client computers is growing at 6 to 8 percent CAGR (per IDC), and “AMD commands over 50 percent share of market in thin clients.” Recent design wins with Samsung, HP and Fujitsu validate that using a G-Series SoC in the local box provides more-than-ample horsepower for data movement, encryption/decryption of central server data, and even local on-the-fly video encode/decode for Skype or multimedia streaming.

Typical use cases include government offices where all data is server-based, bank branch offices, and “even classroom learning environments, where learning labs standardize content, monitor students and centralize control of the STEM experience,” says AMD’s Turnbull.

Samsung LFDs (large format displays) use AMD R-Series APUs for flexible display features, like sending content to multiple displays via a network. (Courtesy: Samsung.)

Samsung LFDs (large format displays) use AMD APUs for flexible display features, like sending content to multiple displays via a network. (Courtesy: Samsung.)

But what about other x86 processors in these spaces? I’m thinking about various SKUs from Intel such as their recent Celeron and Pentium M offerings (which are legacy names but based on modern versions of Ivy Bridge and Haswell architectures) and various Atom flavors in both dual- and quad-core colors. According to AMD’s  published literature, G-Series SoC’s outperform dual-core Atoms by 2x (multi-display) or 3x (overall performance) running industry-standard benchmarks for standard and graphics computation.

And then there’s that on-board GPU. If AMD’s Jaguar-based CPU core isn’t enough muscle, the system can load-balance (in performance and power) to move algorithm-heavy loads to the GPU for General Purpose GPU (GPGPU) number crunching. This is the basis for AMD’s efforts to bring the Heterogeneous System Architecture (HSA) spec to the world. Even companies like TI and ARM have jumped onto this one for their own heterogeneous processors.

G-Series: more software than hardware.

G-Series: more software than hardware.

In a nutshell, after two years of reading about (and writing about) AMD’s G-Series SoCs, I’m beginning to “get religion” that the market isn’t all about smartphone processors. Countless business-class embedded systems need Intel-compatible processing, multiple high-res displays, lots of I/O, myriad industry-standard software specs…and all for a price/Watt that doesn’t break the bank.

So the answer to the question posed in the title above is simply this: I’m a visually-oriented embedded system. And I’m everywhere.

This blog was sponsored by AMD.



New HSA Spec Legitimizes AMD’s CPU+GPU Approach

After nearly 3 years since the formation of the Heterogeneous System Architecture (HSA) Foundation, the consortium releases 1.0 version of the Architecture Spec, Programmer’s Reference Manual, Runtime Specification and a Conformance Plan.

Note: This blog is sponsored by AMD.

HSA banner


UPDATE 3/17/15: Added Imagination Technologies as one of the HSA founders. C2

No one doubts the wisdom of AMD’s Accelerated Processing Unit (APU) approach that combines x86 CPU with a Radeon graphic GPU. Afterall, one SoC does it all—makes CPU decisions and drives multiple screens, right?

True. Both AMD’s G-Series and the AMD R-Series do all that, and more. But that misses the point.

In laptops this is how one uses the APU, but in embedded applications—like the IoT of the future that’s increasingly relying on high performance embedded computing (HPEC) at the network’s edge—the GPU functions as a coprocessor. CPU + GPGPU (general purpose graphics processor unit) is a powerful combination of decision-making plus parallel/algorithm processing that does local, at-the-node processing, reducing the burden on the cloud. This, according to AMD, is how the IoT will reach tens of billions of units so quickly.

Trouble is, HPEC programming is difficult. Coding the GPU requires a “ninja programmer”, as quipped AMD’s VP of embedded Scott Aylor during his keynote at this year’s Embedded World Conference in Germany. (Video of the keynote is here.) Worse still, capitalizing on the CPU + GPGPU combination requires passing data between the two architectures which don’t share a unified memory architecture. (It’s not that AMD’s APU couldn’t be designed that way; rather, the processors require different memory architectures for maximum performance. In short: they’re different for a reason.)

AMD’s Scott Aylor giving keynote speech at Embedded World, 2015. His message: some IoT nodes demand high-performance heterogeneous computing at the edge.

AMD’s Scott Aylor giving keynote speech at Embedded World, 2015. His message: some IoT nodes demand high-performance heterogeneous computing at the edge.

AMD realized this limitation years ago and in 2012 catalyzed the HSA Foundation with several companies including ARM, Texas Instruments, Imagination Technology, MediaTek, Qualcomm, Samsung and others. The goal was to create a set of specifications that define heterogeneous hardware architectures but also create an HPEC programming paradigm for CPU, GPU, DSP and other compute elements. Collectively, the goal was to make designing, programming, and power optimizing easy for heterogeneous SoCs (Figure).

Heterogeneous systems architecture (HSA) specifications version 1.0 by the HSA Foundation, March 2015.

The HSA Foundation’s goals are realized by making the coder’s job easier using tools—such as an HSA version LLVM open source compiler—that integrates multiple cores’ ISAs. (Courtesy: HSA Foundation; all rights reserved.) Heterogeneous systems architecture (HSA) specifications version 1.0 by the HSA Foundation, March 2015.

After three years of work, the HSA Foundation just released their specifications at version 1.0:

  • HSA System Architecture Spec: defines H/W, OS requirements, memory model (important!), signaling paradigm, and fault handling.
  • Programmers Reference Guide: essentially a virtual ISA for parallel computing, defines an output format for HSA language compilers.
  • HSA Runtime Spec: is an application library for running HSA applications; defines INIT, user queues, memory management.

With HSA, the magic really does happen under the hood where the devil’s in the details. For example, the HSA version LLVM open source compiler creates a vendor-agnostic HSA intermediate language (HSAIL) that’s essentially a low-level VM. From there, “finalizers” compile into vendor-specific ISAs such as AMD or Qualcomm Snapdragon. It’s at this point that low-level libraries can be added for specific silicon implementations (such as VSIPL for vector math). This programming model uses vendor-specific tools but allows novice programmers to start in C++ but end up with optimized, performance-oriented, and low-power efficient code for the heterogeneous combination of CPU+GPU or DSP.

There are currently 43 companies involved with HSA, 16 universities, and three working groups (and they’re already working on version 1.1). Look at the participants, think of their market positions, and you’ll see they have a vested interest in making this a success.

In AMD’s case, as the only x86 and ARM + GPU APU supplier to the embedded market, the company sees even bigger successes as more embedded applications leverage heterogeneous parallel processing.

One example where HSA could be leveraged, said Phil Rogers, President of the HSA Foundation, is for multi-party video chatting. An HSA-compliant heterogeneous architecture would allow the processors to work in a single (virtual) memory pool and avoid the multiple data set copies—and processor churn—prevalent in current programming models.

With key industry players supporting HSA including AMD, ARM, Imagination Technologies, Samsung, Qualcomm, MediaTek and others, a lot of x86, ARM, and MIPS-based SoCs are likely to be compliant with the specification. That should kick off a bunch of interesting software development leading to a new wave of high performance applications.

PCI Express Switch: the “Power Strip” of IC Design

Need more PCIe channels in your next board design? Add a PCIe switch for more fanout.

Editor’s notes:

1. Despite the fact that Pericom Semiconductor sponsors this particular blog post, your author learns that he actually knows very little about the complexities of PCIe.

2. Blog updated 3-27-14 to correct the link to Pericom P/N PI7C9X2G303EL.

Perhaps you’re like me; power cords everywhere. Anyone who has more than one mobile doodad—from smartphone to iPad to Kindle and beyond—is familiar with the ever-present power strip.

An actual power strip from under my desk. Scary...

An actual power strip from under my desk. Scary…

The power strip is a modern version of the age-old extension cord: it expands one wall socket into three, five or more.  Assuming there’s enough juice (AC amperage) to power it all, the power strip meets our growing hunger for more consumer devices (or rather: their chargers).


And so it is with IC design. PCI Express Gen 2 has become the most common interoperable, on-board way to add peripherals such as SATA ports, CODECs, GPUs, WiFi chipsets, USB hubs and even legacy peripherals like UARTs. The wall socket analogy applies here too: most new CPUs, SoCs, MCUs or system controllers lack sufficient PCI Express (PCIe) ports for all the peripheral devices designers need. Plus, as IC geometries shrink, system controllers also have lower drive capability per PCIe port and signals degrade rather quickly.

The solution to these host controller problems is a PCIe switch to increase fanout by adding two, three, or even eight additional PCIe ports with ample per-lane current sourcing capability.

Any Port in a Storm?

While our computers and laptops strangle everything in sight with USB cables, inside those same embedded boxes it’s PCIe as the routing mechanism of choice. Just about any standalone peripheral a system designer could want is available with a PCIe interface. Even esoteric peripherals—such as 4K complex FFT, range-finding, or OFDM algorithm IP blocks—usually come with a PCIe 2.0 interface.

Too bad then that modern device/host controllers are painfully short on PCIe ports. I did a little Googling and found that if you choose an Intel or AMD CPU, you’re in good shape. A 4th Gen Intel Core i7 with Intel 8 Series Chipset has six PCIe 2.0 ports spread across 12 lanes. Wow. Similarly, an AMD A10 APU has four PCIe (1x as x4, or 4x as x1). But these are desktop/laptop processors and they’re not so common in embedded.

AMD’s new G-Series SoC for embedded is an APU with a boatload of peripherals and it’s got only one PCIe Gen 2 port (x4). As for Intel’s new Bay Trail-based Atom processors running the latest red-hot laptop/tablet 2:1’s:  I couldn’t find an external PCIe port on the block diagram.

Similarly…Qualcomm Snapdragon 800? Nvidia Tegra 4 or even the new K1? Datasheets on these devices are closely held for customers only but I found Developer References that point to at best one PCIe port. ARM-based Freescale processors such as the i.MX6, popular in set-top boxes from Comcast and others have one lone PCIe 2.0 port (Figure 1).

What to do if a designer wants to add more PCIe-based stuff?

Figure 1: Freescale i.MX ARM-based CPU is loaded with peripheral I/O, yet has only one PCIe 2.0 port. (Courtesy: Freescale Semiconductor.)

Figure 1: Freescale i.MX ARM-based CPU is loaded with peripheral I/O, yet has only one PCIe 2.0 port. (Courtesy: Freescale Semiconductor.)

‘Mo Fanout

A PCIe switch solves the one-to-many dilemma. Add in a redriver at the Tx and Rx end, and signal integrity problems over long traces and connectors all but disappear. Switches from companies like Pericom come in many flavors, from simple lane switches that are essentially PCIe muxes, to packet switches with intelligent routing functions.

One simple example of a Pericom PCIe switch is the PI7C9X2G303EL. This PCIe 2.0 three port/three lane switch has one x1 Up and two x1 Down and would add two ports to the i.MX6 shown in Figure 1. This particular device, aimed at those low power consumer doodads I mentioned earlier, boasts some advanced power saving modes and consumes under 0.7W.

Hook Me Up

Upon researching this for Pericom, I was surprised to learn of all the nuances and variables to consider with PCIe switches. I won’t cover them here, other than mentioning some of the designer’s challenges: PCIe Gen 1 vs Gen 2, data packet routing, latency, CRC verification (for QoS), TLP layer inspection, auto re-send, and so on.

It seems that PCIe switches seem to come in all flavors, from the simplest “power strip”, to essentially an intelligent router-on-a-chip. And for maximum interoperability, of them need to be compliant to the PCI-SIG specs as verified by a plugfest.

So if you’re an embedded designer, the solution to your PCIe fanout problem is adding a PCI Express switch. 

The man asked: “Did Intel Lose Altera?”

Altera has made hay around Intel’s 14nm tri-gate (FinFET) process advantages. Have Intel’s Broadwell delays pushed Altera away?

In a recent post entitled “Did Intel Lose Altera?” blogger Ashraf Eassa muses at investment site The Motley Fool that Altera “is crawling back to Taiwan Semiconductor [TSMC]” for Altera’s high end Stratix 10 devices. His post is based upon an article originally written in DigiTimes which I’ve been unable to locate. (This article is similar, but speculates about Apple turning to Intel.)

Altera's Stratix 10 relies on multicore ARM Cortex A53s, DSP blocks, OpenCL, and Intel's 14nm Tri-Gate process.

Altera’s Stratix 10 relies on multicore ARM Cortex A53s, DSP blocks, OpenCL, and Intel’s 14nm Tri-Gate process.

The point, I assume, is Intel’s recent stumble with 14-nm Broadwell CPUs which were originally planned for Q4 2013, with production in Q1 2014 (now), but could possibly be postponed to Q4 2014 (says DigiTimes here).  3D transistors at this fine geometry are approaching rocket science so any delay even by the mighty Intel is not surprising and I’d consider pretty uneventful.

Except I’m not Intel–with a predictable Tick-Tock roadmap and that whole Moore’s Law thing–nor am I running Altera. The FPGA company Altera, of course, is trying to one-up Xilinx who so far is sticking with TSMC’s 20nm goodness.

  • I offer some good insight into Altera’s Stratix 10 plans for Intel’s foundry here.
  • For some insight on Xilinx’s UltraScale products in TSMC’s process, read here.

All of this is playing out under the microscope of Intel’s shallow penetration into the mobile (smartphone and tablet) markets where ARM-based SoCs from Nvidia, Qualcomm and others seem to obviate the dramatic advances Intel has made with their Bay Trail and Quark roadmaps.  So Intel can’t afford any bad press to the financial community.

I’m a fan of Intel, believe they’ll dominate again, and give them kudos for countless new-generation Atom design wins in Windows 8.1, Android, Tizen and other mobile devices. And I’m stoked about how Intel is evolving the all-day battery UltraBook into the 2:1 laptop/tablet. Still, ARM’s licensees dominate this landscape.

So we’ll keep an eye on this story since it intersects several others we’ve posted in the past few months.


Intel’s Atom Roadmap Makes Smartphone Headway

After being blasted by users and pundits over the lack of “low power” in the Atom product line, new architecture and design wins show Intel’s making progress.

Intel EVP Dadi Permutter revealing early convertible tablet computer at IDF2012.

Intel EVP Dadi Permutter revealing early convertible tablet computer at IDF2012.

A 10-second Google search on “Intel AND smartphone” reveals endless pundit comments on how Intel hasn’t been winning enough in the low power, smartphone and tablet markets.  Business publications wax endlessly on the need for Intel’s new CEO Brian Krzanich to make major changes in company strategy, direction, and executive management in order to decisively win in the portable market. Indications are that Krzanich is shaking things up, and pronto.

Forecasts by IDC (June 2013) and reported by CNET.com (http://news.cnet.com/8301-1035_3-57588471-94/shipments-of-smartphones-tablets-and-oh-yes-pcs-to-top-1.7b/) peg the PC+smartphone+tablet TAM at 1.7B units by 2014, of which 82 percent (1.4B units, $500M USD) are low power tablets and smart phones. And until recently, I’ve counted only six or so public wins for Intel devices in this market (all based upon the Atom Medfield SoC with Saltwell ISA I wrote about at IDF 2012). Not nearly enough for the company to remain the market leader while capitalizing on its world-leading tri-gate 3D fab technology.

Behold the Atom, Again

Fortunately, things are starting to change quickly. In June, Samsung announced that the Galaxy Tab 3 10.1-inch SKU would be powered by Intel’s Z2560 “Clover Trail+” Atom SoC running at 1.2GHz.  According to PC Magazine, “it’ll be the first Intel Android device released in the U.S.” (http://www.pcmag.com/article2/0,2817,2420726,00.asp)and it complements other Galaxy Tab 3 offerings with competing processors. The 7-inch SKU uses a dual-core Marvell chip running Android 4.1, while the 8-inch SKU uses Samsung’s own Exynos dual-core Cortex-A9 ARM chip running Android 4.2. The Atom Z2560 also runs Android 4.2 on the 10.1-incher. Too bad Intel couldn’t have won all three sockets, especially since Intel’s previous lack of LTE cellular support has been solved by the company’s new XMM 7160 4G LTE chip, and supplemented by new GPS/GNSS silicon and IP from Intel’s ST-Ericsson navigation chip acquisition.

The Z2560 Samsung chose is one of three “Clover Trail+” platform SKUs (Z2760, Z2580, Z2560) formerly known merely as “Cloverview” when the dual-core, Saltwell-based, 32-nm Atom SoCs were leaked in Fall 2012. The Intel alphabet soup starts getting confusing because the Atom roadmap looks like rush hour traffic feeding out of Boston’s Sumner tunnel. It’s being pushed into netbooks (for maybe another quarter or two); value laptops and convertible tablets as standalone CPUs; smartphones and tablets as SoCs; and soon into the data center to compete against ARM’s onslaught there, too.

Clover Trail+ replaces Intel’s Medfield smartphone offering and was announced at February’s MWC 2013. According to Anandtech.com (thank you, guys!) Intel’s aforementioned design wins with Atom used the 32nm Medfield SoC for smartphones. Clover Trail is still at 32nm using the Saltwell microarchitecture but has targeted Windows 8 tablets, while Clover Trail+ targets only smartphones and non-Windows Tablets. That explains the Samsung Galaxy Tab 3 10.1-inch design win. The datasheet for Clover Trail+ is here, and shows a dual-core SoC with multiple video CODECs, integrated 2D/3D graphics, on-board crypto, multiple multimedia engines such as Intel Smart Sound, and it’s optimized for Android and presumably, Intel/Samsung’s very own HTML5-based Tizen OS (Figure 1).

Figure 1: Intel Clover Trail+ block diagram used in the Atom Z2580, Z2560, and Z2520 smartphone SoCs. This is 32nm geometry based upon the Saltwell microarchitecture and replaces the previous Medfield single core SoC. (Courtesy: Intel.)

Figure 1: Intel Clover Trail+ block diagram used in the Atom Z2580, Z2560, and Z2520 smartphone SoCs. This is 32nm geometry based upon the Saltwell microarchitecture and replaces the previous Medfield single core SoC. (Courtesy: Intel.)

I was unable to find meaningful power consumption numbers for Clover Trail+, but it’s 32nm geometry compares favorably to ARM’s Cortex-A15 28nm geometry so Intel should be in the ballpark (vs Medfield’s 45nm). Still, the market wonders if Intel finally has the chops to compete. At least it’s getting much, much closer–especially once the on-board graphics performance gets factored into the picture compared to ARM’s lack thereof (for now).

Silvermont and Bay Trail and…Many More Too Hard to Remember

But Intel knows they’ve got more work to do to compete against Qualcomm’s home-grown Krait ARM-based ISA, some nVidia offerings, and Samsung’s own in-house designs. Atom will soon be moving to 22nm and the next microarchitecture is called Silvermont. Intel is finally putting power curves up on the screen, and at product launch I’m hopeful there will be actual Watt numbers shown, too.

For example, Intel is showing off Silvermont’s “industry-leading performance-per-Watt efficiency” (Figure 2). Press data from Intel says the architecture will offer 3x peak performance, or 5x lower power compared to the Clover Trail+ Saltwell microarchitecture. More code names to track: the quad-core Bay Trail SoC for 2013 holiday tablets; Merrifield with increased performance and battery life; and finally Avoton that provides 64-bit energy efficiency for micro servers and boasts ECC, Intel VT and possibly vPro and other security features. Avoton will go head-to-head with ARM in the data center where Intel can’t afford to lose any ground.

Figure 2: The 22nm Atom microarchitecture called Silvermont will appear in Bay Trail, Avoton and other future Atom SoCs from "Device to Data Center", says Intel. (Courtesy: Intel.)

Figure 2: The 22nm Atom microarchitecture called Silvermont will appear in Bay Trail, Avoton and other future Atom SoCs from “Device to Data Center”, says Intel. (Courtesy: Intel.)

Oh Yeah? Who’s Faster Now?

As Intel steps up its game because it has to win or else, the competition is not sitting still. ARM licensees have begun shipping big.LITTLE SoCs, and the company has announced new graphics, DSP, and mid-range cores. (Read Jeff Bier and BDTi’s excellent recent ARM roadmap overview here.)

A recent report by ABI Research (June 2013) tantalized (or more appropriately galvanized) the embedded and smartphone markets with the headline “Intel Apps Processor Outperforms NVIDA, Qualcomm, Samsung”. In comparison tests, ABI Research VP of engineering Jim Mielke noted that that Intel Atom Z2580  ”not only outperformed the competition in performance but it did so with up to half the current drain.”

The embedded market didn’t necessarily agree with the results, and UBM Tech/EETimes published extensive readers’ comments with colorful opinions.  On a more objective note, Qualcomm launched its own salvo as we went to press, predicting “you’ll see a whole bunch of tablets based upon the Snapdragon 800 in the market this year,” said Raj Talluri, SVP at Qualcomm, as reported by Bloomberg Businessweek.

Qualcomm  has made its Snapdragon product line more user-friendly and appears to be readying the line for general embedded market sales in Snapdragon 200, 400, 600, and “premium” 800 SKU versions. The company has made available development tools (mydragonboard.org/dev-tools) and is selling COM-like Dragonboard modules through partners such as Intrinsyc.

Intel Still Inside

It’s looking like a sure thing that Intel will finally have competitive silicon to challenge ARM-based SoCs in the market that really matters: mobile, portable, and handheld. 22nm Atom offerings are getting power-competitive, and the game will change to an overall system integration and software efficiency exercise.

Intel has for the past five years been emphasizing a holistic all-system view of power and performance. Their work with Microsoft has wrung out inefficiencies in Windows and capitalizes on microarchitecture advantages in desktop Ivy Bridge and Haswell CPUs. Security is becoming important in all markets, and Intel is already there with built-in hardware, firmware, and software (through McAfee and Wind River) advantages. So too has the company radically improved graphics performance in Haswell and Clover Trail+ Atom SoCs…maybe not to the level of AMD’s APUs, but absolutely competitive with most ARM-based competitors.

And finally, Intel has hedged its bets in Android and HTML5. They are on record as writing more Android code (for and with Google) than any other company, and they’ve migrated past MeeGo failures to the might-be-successful HTML5-based Tizen OS which Samsung is using in select handsets.

As I’ve said many times, Intel may be slow to get it…but it’s never good to bet against them in the long run. We’ll have to see how this plays out.

PCI-SIG “nificant” Changes Brewing in Mobile

PCI-SIG Developers Conference, June 25, 2013, Santa Clara, CA

Of five significant PCI Express announcements made at this week’s PCI-SIG Developers Conference, two are aimed at mobile embedded.

From PCI to PCI Express to Gen3 speeds, the PCI-SIG is one industry consortium that lets no grass grow for long. As the embedded, enterprise and server industries roll out PCIe Gen3 and 40G/100G Ethernet, the PCI-SIG and its key constituents like Cadence, Synopsis, LeCroy and others are readying for another speed doubling to 16 GT/s (giga transfers/second) by 2015. The PCIe 4.0 next step evolves bandwidth to 16Gb/s or a whopping 64 GB/s (big “B”) total lane bandwidth in x16 width. PCIe 4.0 Rev 0.5 will be available Q1 2014 with Rev 0.9 targeted for Q1 2015.

Table of major PCI-SIG announcements at Developers Conference 2013

Table of major PCI-SIG announcements at Developers Conference 2013

Yet as “SIG-nificant” as this announcement is, PCI-SIG president Al Yanes said it’s only one of five major news items. The others include: a PCIe 3.1 specification that consolidates a series of ECNs in the areas of power, performance and functionality; PCIe Outside the Box which uses a 1-3 meter “really cheap” copper cable called PCIe OCuLink with an 8G bit rate; plus two embedded and mobile announcements that I’m particularly enthused about. Refer to the table for a snapshot.

New M.2 Specification

The new M.2 specification is a small, mobile embedded form factor designed to replace the previous “Mini PCI” in Mini Card and Half Mini Card sizes. The newer, as-yet-publicly-unreleased M.2 card will be smaller in size and volume but is intended to provide scalable PCIe performance to allow designers to tune SWaP and I/O requirements. PCI-SIG marketing workgroup chair Ramin Neshati told me that M.2 is part of the PCI-SIG’s increased focus on mobile.

The scalable M.2 card is designed as an I/O plug in for Bluetooth, Wi-Fi, WAN/cellular, SSD and other connectivity in platforms including ultrabook, tablet, and “maybe even smartphone,” said Neshati. At Rev 0.7 now, Rev 0.9 will be released soon and the final (Rev 1.0?) spec will become public by Q4 2013.

PCI-SIG M.2 card form factor

The PCI-SIG’s impending M.2 form factor is designed for mobile embedded ultrabooks, tablets, and possibly smartphones. The card will have a scalable PCIe interface and is designed for Wi-Fi, Bluetooth, cellular, SSD and more. (Courtesy: PCI-SIG.)

Mobile PCIe (M-PCIe)

Seeing the momentum in mobile and the interest in a PCIe on-board interconnect lead the PCI-SIG to work with the MIPI Alliance and create Mobile PCI Express: M-PCIe. The specification is now available to PCI-SIG members and creates an “adapted PCIe architecture” bridge between regular PCIe and MIPI M-PHY.

The Mobile PCI Express (M-PCIe) specification targets mobile embedded devices like smartphones to provide high-speed, on-board PCIe connectivity. (Courtesy: PCI-SIG.)

The Mobile PCI Express (M-PCIe) specification targets mobile embedded devices like smartphones to provide high-speed, on-board PCIe connectivity. (Courtesy: PCI-SIG.)

Using the MIPI M-PHY physical layer allows smartphone and mobile designers to stick with one consistent user interface across multiple platforms, including already-existing OS drivers. PCIe support is “baked into Windows, iOS, Android,” and others, says PCI-SIG’s Neshati.  PCI Express also has a major advantage when it comes to interoperability testing, which runs from the protocol stack all the way down to the electrical interfaces. Taken collectively, PCIe brings huge functionality and compliance benefits to the mobile space.

M-PCIe supports MIPI’s Gear 1 (1.25-1.45 Gbps), Gear 2 (2.5-2.9 Gbps) and Gear 3 (5.0-5.8 Gbps) speeds. As well, the M-PCIe spec provides power optimization for short channel mobile platforms, primarily aimed at WWAN front end radios, modem IP blocks, and possibly replacing MIPI’s own universal file storage UFS mass storage interface (administered by JEDEC).

M-PCIe by the PCI-SIG can be used in multiple high speed paths in a smartphone mobile device. (Courtesy: PCI-SIG and MIPI Alliance.)

M-PCIe by the PCI-SIG can be used in multiple high speed paths in a smartphone mobile device. (Courtesy: PCI-SIG and MIPI Alliance.)

PCI Express Ready for More

More information on these five announcements will be rolling out soon. But it’s clear that the PCI-SIG sees mobile and embedded as the next target areas for PCI Express in the post-PC era, while still not abandoning the standard’s bread and butter in PCs and high-end/high-performance servers.


Does Altera Have “Big Data” Communications on the Brain?

In wireless, wireline and financial “big data” applications, moving all those packets needs prodigious FPGA resources, not all of which Altera had before their recent series of acquisitions, partnerships, and otherwise wheeling-and-dealing.

Chris Balough of Altera (left) interviewed by Andy Frame from ARM. (Courtesy: YouTube.)

Chris Balough of Altera (left) interviewed by Andy Frame from ARM. (Courtesy: YouTube.)

I caught up with an old friend at April’s DESIGN West 2013 conference in San Jose: Chris Balough, Sr Director, Product Marketing for SoC products. I knew Chris from when he was at Triscend (purchased by Xilinx). Chris is now in charge of Altera’s SoC products which are Arria V, Stratix V and Cyclone FPGAs with ARM cores in them which compete with Xilinx’s Zynq devices. Chris shed some light on some of these announcements, but remained mum on what they all might mean taken collectively. I think they add up to something big in “Big Data”.

(Fun facts: Altera’s first “SoC” was Excalibur, no longer recommended for new designs. Altera’s most popular SoC processor is the soft Nios II, sold in roughly 30 percent of production SoCs, says Balough.)

X before A? We’ll See

Subconsciously I think of Xilinx first when the word “FPGA” is flashed in front of me, but Altera’s the company pushing more boundaries of late. Their rat-a-tat machine gun announcements this year got my attention.

In the summer of 2012, I did an interview with Altera’s Sr VP of R&D Brad Howe and he spread out as much of the roadmap on the table as he could. Things like HSA, OpenCL, and better gigabit transceivers were all on the horizon.  Shortly thereafter, Altera extended their  relationship with TSMC to 20nm for Arria and Cyclone FPGAs. Then in early 2013, they rocked the industry by locking up an exclusive FPGA relationship with Intel for the industry’s only production 14nm tri-gate FinFETs.

Spring Cleaning ; Altera’s Getting Ready For…?

Now in Spring 2013, Altera is making headlines like these:

- FPGA Design in the Cloud–Try It, You’ll Like It, Says Plunify. At DAC, Altera and Plunify are pushing cloud-based FPGA design tools. (See our February 2013 article with Plunify here.)

- Altera and AppliedMicro will Cooperate on Joint Solutions for High Growth Data Center Market.  Combines Stratix FPGAs and AppliedMicro’s Server on a Chip devices targeting data centers and optical transport networks (OTN).

- Altera Expands OTN Solution Capabilities with Acquisition of TPACK. Altera buys TPAK from AMCC to provide IP for FPGAs used in OTN for tasks like cross-bar switches used in 10/40/100Gbps PHYs.

- Altera Stratix V GX FPGAs Achieve PCIe Gen3 Compliance and Listing on PCI-SIG Integrators List. Right now, Gen2 and Gen3 PCIe is critical to data centers, cellular base stations, and all manner of high-speed long-haul/back-haul telco gear. Within 12 months, PCIe Gen2/3 will be “table stakes” in all manner of high-performance embedded systems like ATCA- or VME/VPX-based DSP systems for radar, sonar, SIGINT (signals intelligence) or data mining.

- Altera to Deliver Breakthrough Power Solutions for FPGAs with Acquisition of Power Technology Innovator Enpirion.  Maybe Enpirion’s DC-DC converter PowerSoCs with integrated inductors may some day end up inside a Stratix package (perhaps like Xilinx’s stacked chip interposer technology), but for now the two-chip solution reduces board space by 1/7 and simplifies system design considerably. The programmable DC-DC converters provide the multiple power rails–and power-up sequences–needed for big FPGAs.

The blue regions show places where FPGAs are used in wireless basestations.

The blue regions show places where FPGAs are used in wireless LTE basestations. (Courtesy: Altera.)

My Take: Altera’s Move in Big Data

Analysts estimate that nearly 50 percent of the revenue in  FPGAs comes from high end, high density, costly FPGAs like the Xilinx Vertex 7 and Altera Stratix V. Segments like wireless and wireline packet processing, plus financial or image processing algorithm processors increasingly rely on these kinds of FPGAs in lieu of ASICs, GPGPUs, or proprietary network processors. So every advantage in IP, process technology, or partnership that Altera has, gets the company one step closer to more design wins.

We’ll see what Altera does with all of these recent announcements. I’d expect to see something shake loose before the traditional “summer doldrums” set in when the semiconductor industry goes on its annual vacation next month in July.

AMD’s Single Chip Embedded SoC: Upward and to the Right

Monolithic AMD embedded G Series SoCs combine x86 multicore, Radeon graphics, and a Southbridge. It’s one-stop-shopping, and it’s a flood targeting Intel again.

AMD arrow logoThe little arrow-like “a” AMD logo once represented an “upward and to right” growth strategy, back in the 1980s as the company was striving for $1.0B and I worked there just out of university.

In 2013, AMD is focusing on the embedded market with a vengeance and it’s “upward and to the right” again. The stated target is for AMD to grow embedded revenues from 5% in Q3 2012 to 20% of the total by Q4 2013. Wow. I’m excited about the company’s prospects, though I know they’ve had decades of false starts or technology successes that were later to sold off in favor of their personal war with Intel for PC dominance. (Flash memories and Vantis? The first DSP telephone modem Am7910? Telecom line cards? Alchemy “StrongMIPS”? All gone.)

Know what? PCs are in the tank right now, embedded is the market, and AMD might just be better positioned than Intel. They’re certainly saying all the right things. Take this week’s DESIGN West announcement of the new embedded G Series “SoCs”. Two years ago AMD invented the term Accelerated Processing Unit (APU) as a differentiated x86 CPU with an ATI GPU.

An AMD Accelerated Processing Unit merges a multicore x86 CPU with a Radeon GPU.

An AMD Accelerated Processing Unit merges a multicore x86 CPU with a Radeon GPU.

This week’s news is how the APU mind-melds with all of the traditional x86 Southbridge I/O to become a System-on-Chip (SoC).

The AMD G Series “SoC” does more real estate slight-of-hand by eliminating the Southbridge to bring all peripherals on-board the APU.

The AMD G Series “SoC” does more real estate slight-of-hand by eliminating the Southbridge to bring all peripherals on-board the APU.

The G Series SoCs meld AMD’s latest 28 nm quad-core “Jaguar” with the ATI Radeon 8000 series GPU and claim a 113 percent CPU and 20 percent GPU performance jump. More importantly, the single-chip SoC concept reduces footprint by 33 percent by eliminating a whole IC. On-board peripherals are HDMI/DVI/LVDS/VGA, PCIe, USB 2.0/3.0, SATA 2.x/3.x, SPI, SD card reader interface, and more. You know, the kind of stuff you’d expect in an all-in-one.

Available in 2- and 4-core flavors, the G Series SoC saves up to 33% board real estate, and even drives dual displays and high-res.

Available in 2- and 4-core flavors, the G Series SoC saves up to 33% board real estate, and even drives dual displays and high-res.

AMD is clearly setting their sites on embedded, and Intel is once again in the crosshairs. The company claims a 3x (218 percent) overall performance advantage with the GX-415GA SKU (quad core, 1.5 GHz, 2 MB L2) over Intel’s Atom D525 running Sandra Engineering 2011 Dhrystone ALU, Sandra Engineering 2011 Whetstone iSSE3, and other benchmarks such as those from EEMBC. Although AMD’s talking trash about the Atom, they’re disclosing all of their benchmarks, the hardware they were run on, and the OS assumptions. (The only thing that maybe seems hinky to me is that the respective motherboards use 4 GB DRAM (AMD) versus 1 GB DRAM (Intel).)

AMD CPU performance graph 1

And then there’s the built-in ECC which targets critical applications such as military, medical, financial, and casino gaming. The single-chip SoC is also designed ground-up to run -40 to +85C (operation) and will fit the bill in many rugged, defense, and medical applications requiring really good horsepower and graphics performance. Fan-less designs are the sweet spot with a 9W to 25W TDP, with all I/O’s blazing. Your mileage may vary, and AMD claims a much-better-than-Intel Performance-per-Watt number of 19 vs 9 as shown below. There are more family members to follow, some with sub 9W power consumption. Remember, that’s for CPU+GPU+Peripherals combined. Again, read the fine print.

AMD performance per Watt 1

I’m pretty enthused about AMD’s re-entry into the embedded market. Will Intel counter with something similar? Maybe not, but their own ultra low power Atom-based SoCs are winning smartphone designs left and right and have plenty of horsepower to run MPEG4 decode, DRM, and dual screen displays a la Apple’s AirPlay. So it’s game on, boys and girls.

The AMD vs Intel battle has always been good for the entire industry as it has “lifted all boats”. Here’s to a flood of new devices in embedded.