Posts Tagged ‘top-story’

New Generation of Heterogeneous Systems for AI Applications

Tuesday, January 15th, 2019

Open computing platforms can take AI past the training wheels stage, with benefits for automotive, healthcare, industrial and more.

Today’s AI applications will touch every aspect of our lives—including transport, finance, retail, health care, smart manufacturing, education, and services industries. AI technologies will be at the forefront of digitally connected cars, smart manufacturing, and medical image recognition. The question to ask ourselves is how can we leverage the power of AI with today’s diverse systems and protocols? The answer lies in an emerging new ecosystem designed to unite many of today’s heterogeneous “pieces of computing power.”

Bringing Abstraction to Heterogeneous Platforms
Because heterogeneous processors are widely available, new platforms will be expected to leverage a huge amount of computing power. This includes acceleration units (GPU, DSP, and FPGA). Understandably, artificial intelligence, machine learning, and neural networks are at the forefront of this new computing paradigm. New architectures are also needed to address the massive computing capability augmented by CPU cluster-based computers. Migrating this approach to the mainstream presents a challenge, principally because heterogeneous programming models have not been standardized, lacking portability.

Enter HSA’s Open Computing Platform
The challenge facing many industries is that existing architectures are inadequate for today’s AI and big data workloads. An open computing platform of Heterogeneous Systems Architecture (HSA) offers an elegantly viable solution. This new breed of architecture will spearhead an entirely new realm of opportunities, not the least of which are autonomous driving, more computing power, and robust data centers. Systems designers will finally have an efficient new ecosystem, one designed specifically to address today’s burgeoning array of computer architectures and protocols.

Easier Programming of Hetero Devices
The HSA Foundation’s consortium of semiconductor companies, tools/IP providers, software vendors, and academic institutions develops royalty-free standards and open-source software. This makes it dramatically easier to program heterogeneous computing devices. It reduces the complexities of heterogeneous systems through a new ecosystem; one that specifies parameters like runtime and system architecture APIs that piggyback cache-coherent shared virtual memory hardware. No more time-consuming operating system calls. Systems now run at the user level. With single-source programming, both control and computer code reside in the same file or project. No need for expert programmers to decipher tool-chains of multiple processors for individual access.

Programming in Standard Languages
Another key benefit for AI applications developers is that the HSA platform conforms to a variety of different programming languages. Compilation tools are available from both proprietary and open-source projects (LLVM and GCC). HSA compilers are available for C/C++, OpenCL, OpenMP, C++AMP, Python, and more. This flexibility vastly extends the power and reach of AI applications now on many drawing boards.

Leveraging Developer Productivity
Defined as a productivity engine that leverages the power and potential of heterogeneous computing, HSA removes many of the barriers of traditional heterogeneous programming. Developers can finally focus on their algorithms without having to micro-manage system resources. The goal is to sponsor applications that seamlessly blend scalar processing with high-performance computing on CPUs, GPUs, DSPs, Image Signal Processors, VLIWs, Neural Network Processors, FPGAs, and more.

There’s little doubt that AI applications will impact how we live, work, and play. AI technologies will be at the forefront of digitally connected transportation, smart manufacturing, and medical technologies. But it will be the power and flexibility of  heterogeneous computing that will make these AI breakthroughs feasible and change the face of our world.

Dr. John Glossner is president of the HSA Foundation.

A Brief Tutorial on Artificial Neural Networks and their Training

Thursday, November 15th, 2018

A machine learning algorithm is only a computer program, but it works by improving its performance with every piece of clearly identified data. Artificial neural networks are merely fitting the parameters of a complex function to huge data sets using mathematical models and statistics to make decisions.

Early efforts at defining Artificial Intelligence began in the 1950s, showing promise as computers became more powerful. AI is not new, but how it’s done (and why it’s better) is fairly new. AI has finally made it along a successful path due to affordable (and thus widely accessible) computational power (i.e., high performance computing), an abundance of very large sets of identified data, and the maturation of AI algorithms.

Machine learning is a subset of the field of artificial intelligence. Machine learning was defined by IBM’s Samuel Arthur, a pioneer in AI, as “a field of study that gives computers the ability to learn without being explicitly programmed.”[i]  One type of machine learning, patterned on neural networks of the human brain, is proving successful. The term for this concept in programming is unfortunately borrowed directly from medical terminology and is often called “neural networks,” although some refer to them as “Artificial Neural Networks” (ANN).

A biological neuron accepts inputs at the dendrites and produces output through the axons (see Figure 1). Axons fans out to connects through synapses to other dendrites on other neurons. A mathematical model for the brain’s neurons is the basis for neural networks in the machine learning. Neural networks in a machine are abstract constructs created within a computer program. In the human brain, the signals travel along the axons, which is modeled as x0 in figure 1. The signal crosses a synapse in the brain, it’s modelled as picking up a multiplier w0. The strength, or weight, of the multiplier at each synapse builds with other weights from other synapses (e.g., w1x1, w2x2), forming a strong influence based on a positive or negative weight. All of the values are summed up at the cell body, where if the total value is above a certain threshold, the neuron fires. In the mathematical model, the timing of the “neuron firing” is not considered important, only the firing rate (f).[ii]

Biological neurons are much more complex, dynamic, and involve much more than static weights, of course. The electronic version of “neural networks” is woefully simplistic in comparison. Since the human brain has on the order of 86 billion neurons and an estimated 1015 synapses, it makes sense that AI required accessible computational power before this technology could blossom.

Figure 1: Biological neuron (left) and a coarse and rudimentary mathematical model of the biological neuron (right). (Image: Efficient Processing of Deep Neural Networks: A Tutorial and Survey).

Conceptually speaking, the machine learning version of neural networks is made up of a few layers of these weighted “neurons” that are connected. Deep Neural Networks (DNN) have many more layers, so they can handle more complex problems. Computer vision using DNN may assign a single neuron for each pixel, for instance. Weights assigned to neurons are stored in the computer program as a matrix. Fast, multicore processors are desirable for DNN so that algorithms do not take long to compute. Speed in computation is especially important when the DNN has many layers. The inner layers are “hidden,” meaning that they perform without anyone seeing the many dynamic changes as they occur as weights influence decisions for each artificial neuron.


Figure 2: Left: A two-layer artificial neural network with three inputs, one hidden layer of four neurons, and one output layer of two neurons. Right: A three-layer neural network with three inputs, two hidden layers of four neurons each, and one output layer. Note that there are connections between neurons across layers, but not within a layer. (Image: CS231n,

A machine learning algorithm is only a computer program, but it works by improving its performance with every piece of clearly identified data. Data that’s not clearly identified influences the DNN to make mistakes. For a simple example, images with and without cats in them would ideally be identified as “cat” or “not cat.” The label or property for the training (data) set in this case is “cat.” For accurate identification after training (known as “inference”), extremely large data sets are needed for training. The training data set can easily have anywhere from hundreds to more than a million images.

The availability of more clearly identified images makes the DNN more accurate, generally speaking. For instance, if you were to train a DNN to identify a ping pong ball in an image, you would feed the DNN with a training set full of images labeled with “ping pong ball” and “not ping pong ball.” However, if you only include images that also have a paddle in them (very common), you will likely get an identification for “ping pong ball” even if there is a paddle but no ball in the image. The DNN will have trained with the creator’s oversight to not include several images of ping pong balls without paddles. Therefore, the data sets that we use to train a neural network are critical in how the network later makes its decisions. You may have heard the saying from an Intel executive that “data is the new oil.”[iii]  It’s noteworthy that companies are collecting data like never before. Facebook collects in the area of 350 million images every day through normal user activity. Google’s YouTube has over 1.3 billion active users in a one-month period, on average, with 300 hours of video uploaded every minute.[iv]  Not all data is good, however. Data needs to be clearly identified, classified and labelled for it to be of use in AI.

There are many approaches to AI for use as smart machine tools in various areas. The main categories for the different types of machine learning algorithms are: unsupervised learning, supervised learning, reinforcement learning, and Deep Learning (DL). Unsupervised learning is useful when you need to determine any non-obvious relationships in an unlabeled dataset, i.e., a data set with no pre-assigned items. Supervised learning is just as the name implies: training progress is monitored, and feedback is injected into the process (e.g., perhaps because a label is missing for a portion of the training data set). Reinforcement learning is somewhere in between unsupervised and supervised learning. Reinforcement learning is like learning from experience, or by perceiving general patterns. In training, the model gets penalized for incorrect decisions and “rewarded” for correct decisions. AlphaGo, the machine that beat the world champion of the game Go, began with supervised learning and later self-trained using reinforcement training.[v]  Note that with reinforcement learning, there is some delay involved because the feedback of correct or incorrect answer must be determined for an action. The most common Deep Learning algorithms in use today are the Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Reinforcement Learning (RL). The Deep Neural Network (DNN) and the Restricted Boltzmann Machine (RBM) are also DL algorithms. Some applications use more than one DL technique to obtain results.

An artificial neural network is nowhere nearly as complex as the human brain. Neural networks today are not able to provide true AI by any means. Artificial neural networks are merely fitting the parameters of a complex function to huge data sets using statistics and mathematical models to make decisions based on the patterns that they find. The dangers in relying on AI to do critical jobs for us lie within the training data set and in assuming that AI will adapt to changes without intervention. At the end of the day, AI is another form of computer programming, which is only as good as the programmer.

Lynnette Reese is Editor-in-Chief, Embedded Intel Solutions and Embedded Systems Engineering, and has been working in various roles as an electrical engineer for over two decades

[i]Puget, Jean Franscois. “What Is Machine Learning? (IT Best Kept Secret Is Optimization).” IBM Cognitive Advantage Reports, IBM Corporation, 18 May 2016,

[ii] Karpathy, Andrej. “CS231n Convolutional Neural Networks for Visual Recognition.” CS231n Convolutional Neural Networks for Visual Recognition, 2018,

[iii] Gharib, Susie. “Intel CEO Says Data Is the New Oil.” Fortune, Fortune, 7 June 2018,

[iv] salman.aslam.mughal. “ YouTube by the Numbers (2018): Stats, Demographics & Fun Facts.” Pinterest by the Numbers (2018): Stats, Demographics & Fun Facts, 5 Feb. 2018,

[v] Silver, David, et al. “Mastering the Game of Go without Human Knowledge.” Nature News, Nature Publishing Group, 18 Oct. 2017,

MIPS Technologies: Q&A with Majid Bemanian and Mark Throndson

Tuesday, May 15th, 2018

The venerable MIPS architecture excels at real-time response, enabling AI for functional safety compliance, automotive ADAS, robotics, and much more.

Editor’s Note: Embedded Systems Engineering (ESE) sat down with two engineers: Majid Bemanian, Director of Marketing, and Mark Throndson, Director of Processor Marketing, from MIPS Technologies. MIPS was one of the first RISC architectures. MIPS was acquired by Imagination Technologies in late 2017. Imagination Technologies spun MIPS Technologies off as an independent company. With MIPS’ recent boost in VC funding, it is returning to the embedded space with an eye on Artificial Intelligence (AI). ESE discussed recent developments, a potential threat from open source RISC-V, and the hardware hacking threat known as Spectre.

Lynnette Reese, Embedded Systems Engineering (LR)I read a fairly recent press release from MIPS Technologies. The release made note of the MIPS architecture’s multithreading capabilities and other features related to Artificial Intelligence (AI). Multithreaded MIPS CPUs excel at real-time response.

Majid Bemanian, MIPS Technologies (MB): Yes, one example of a capability that the autonomous driving/ADAS/ AI sector finds of interest is management of their accelerators. The response time that you have to get for some of these applications is shrinking. So, they need some efficiencies in response time, and multithreading offers that. With multithreading you can create a more real-time environment, and you can also bring more efficiency to the process. Every microsecond counts. And the multithreading capability we offer can respond to interrupts, for example, at a much faster rate than traditional CPUs that are not multithreaded. As an example, the MIPS I6500-F is a highly scalable 64-bit MIPS multiprocessing solution that has been stringently assessed and validated to meet functional safety compliance for ISO 26262 and IEC 61508 standards.

Image: MIPS Technologies

LR: AI would be just one area.

MB: Right, for instance, MediaTek uses MIPS in its LTE modem. A good portion of the phone MediaTek ships is using MIPS today to handle the LTE traffic. MediaTek has been running into the fact that data rates keep going up from LTE to 5G, but at the same time they can’t really afford to vary consumption at the higher data rates. They need more efficient processing management. At the same time, they need more real-time response. We have been able to help MediaTek with its architecture. With multithreading they can have the traffic at much higher efficiencies and the quality of service and class of service. And for MediaTek, again it’s a question of real-time response, getting more efficiency out of your data path.

Mark Throndson, MIPS Technologies (MT): Smartphone modem technology is growing, delivering more and more bandwidth to and from the phone and thus creating more parallelism in that the wireless data paths are being used to transmit and receive the data. That in turn has led to the creation of more carriers and bundling of those carriers together to create a wider pipe. So, there’s a narrowing of the time slippage on each carrier that data is being sent over, helping deliver Quality-of-Service type features in addition to pure data bandwidth. As Majid said, there’s a real-time responsiveness requirement that is being driven by this. And in addition to the real-time responsiveness aspect, there is a performance aspect. The hardware multithreading technology in our cores addresses both aspects.

We can handle more performance with multithreading out of a similar size CPU. The cost for adding a second thread, for instance, is roughly 10 percent. But we can deliver anywhere from 30 to 50 percent in performance gains. In part, multithreading is a replication of an execution state within the CPU, a whole register set, and instruction queues at various points in the pipeline. So, you can switch between threads on a cycle-by-cycle basis. When an event comes up that needs a very fast response the MIPS CPU can immediately switch over to a second, third, or fourth thread—without any overhead. This brings both of those key values—fast response with low overhead—to a particular application that requires that.

This capability has applications in smartphone communications. You’ve probably heard about 5G for some time, but it’s taking a while to get to the point where it’s actually rolled out in production services. The industry has to grow into these new technologies. But when it comes down to the device level plugging into that larger network, our multithreaded processor cores running the protocol stack and control plane of the modem implementation inside the smartphone SoC are very well suited for this kind of problem.

LR: Taking a bit of a fork in the road, here, wasn’t part of the fallibility with the Spectre flaw that each discipline is used to working in their own world, hardware or software? Nobody can hack physical traces and a PCB unless they have physical access. But this was something that was a vulnerability…. What are your thoughts on that?

MT: People are trying to hack security all the time, over networks, and in devices. A lot of that has to do with trying to figure out ways to get around hardware-implemented items. The Spectre and Meltdown vulnerabilities with respect to speculative executing CPUs are not necessarily fundamentally different than other attempts to find back doors.

LR: I think it’s commendable that MIPS had only two cores that were potentially vulnerable.

MB: A look at the Intel architecture would indicate that Intel likely runs speculative execution on many of its processors. As you move on to more of an embedded, and let’s say IoT architecture, for that matter, the problem starts to diminish. It depends on the architecture that you have around it.

LR: Speaking of other architectures, do you perceive that RISC-V, being open source, is a threat?

MT: RISC-V is clearly a viable option. [But while] a lot of momentum has been gathered around it— name brands—when you look at the architecture for CPUs and the multiple components from a licensing perspective, that is an insignificant portion. There are a lot of challenges associated with implementation of the architecture that translate to significant amounts of dollars in cost and maintenance. It takes time to develop a good ecosystem, as well. In my experience, it takes five to seven years for the ecosystem around an available architecture or microarchitecture to become stable. While there is momentum behind RISC-V, it is fragmented today. Companies have to expend a lot of effort to go into it from many microarchitectural angles, with the aim of bringing it up to the levels that either Arm, or MIPS, or other architectures have been at for decades. At MIPS, we continue to innovate. But for other architectures, there are probably five or six different ISAs out there at any given time that may not last in the long term.

MB: I agree that RISC-V is fragmented to a great extent. There are a lot of boutique shops and a lot of talk about cores that are being developed around it.

MT: However, there’s a difference between momentum and completeness. RISC-V as a technology or as a full specification, relative to what’s available on other architectures, is not as complete. As an ISA, RISC-V is starting from scratch and although people are offering various RISC-V solutions, they also have to fill in the blanks somehow, and that opens up the idea of filling in those blanks in different ways. And RISC-V is not free. The idea is to provide an open source alternative to Arm and Intel, which are probably the top options, but that is simply allowing for collaboration of companies at the architecture level. You don’t build architectures; however, you build micro-architectures, and you build chips. And you build the software that runs on top of those systems and all the optimizations that happen to the architectures.

LR: Arm enjoys some of the benefits of open source in that both are widespread and have a huge ecosystem.

MT: Lots of open source software runs on Arm, but a lot of open source software runs on MIPS and is readily available on MIPS. In fact, when it comes to something like Linux, that is, as a focal point running on MIPS, a very large portion of MIPS runs Linux or open source software-based designs.

Lynnette Reese is Editor-in-Chief, Embedded Intel Solutions and Embedded Systems Engineering, and has been working in various roles as an electrical engineer for over two decades





Q&A with Dave Ditzel, CEO, Esperanto Technologies

Friday, February 16th, 2018

Gaming, artificial intelligence, superior graphics performance, and more have put RISC-V in the catbird seat—industry standard tools and languages can help keep it there.

Editor’s Note: At the time of the  7th RISC-V Workshop, Esperanto Technologies announced plans for RISC-V based computing solutions, and EECatalog spoke with company president and CEO Dave Ditzel. Edited excerpts of our conversation follow:

EECatalog:  For our readers who may not be closely familiar with RISC-V, please give us a bit of context.

Dave Ditzel, Esperanto Technologies

Dave Ditzel, Esperanto Technologies: RISC-V started at universities with a bunch of people that were frustrated that there was no open instruction set to work with. There are new chips every year, but software lasts forever, so having some kind of common instruction set that the software could run on is very important. So, some of the folks at UC Berkeley have done several generations of RISC processors. They are now on their fifth generation, that is why this new instruction set is called RISC-V.

The term “RISC microprocessor” was started in 1980 with a paper called “The Case for the Reduced Instruction Set Computer” by UC Berkeley professor David Patterson and myself. I was one of his first graduate students. And then I went off and did RISC processors for AT&T Bell Labs, Sun Microsystems, Transmeta and Intel.

EECatalog: You’ve noted certain questions need to be fielded if RISC-V is to be taken seriously.

Dave Ditzel, Esperanto Technologies: Yes, you’ve got folks asking, “Well, if RISC-V is really going to take off, where is the high end?  Where are chips at leading edge process nodes like 7 nanometer CMOS? What do we do for graphics on the chip? Where can I get a RISC-V design based on industry standard tools and languages like Verilog?”

EECatalog: What approach is Esperanto Technologies taking to win over the folks asking the questions you mention?

Ditzel, Esperanto Technologies: We’ve put together a team of top processor designers to show a high-end RISC-V that is more compelling than some of the alternatives out there, so people don’t see RISC-V just as a low-end embedded play. Using leading edge 7nm TSMC CMOS, our goal is to have the highest single-thread RISC-V performance as well as the best teraflops per watt. And we’ll do this using industry standard CAD tools. Esperanto will sell chips and offer IP. When we offer IP we are going to offer it in Verilog, but, more important, in a human-readable synthesizable Verilog so it’s easy to maintain. We are also going to do a strong physical design effort, optimizing for 7nm technology.

EECatalog: What are some of the details our readers should be aware of?

Ditzel, Esperanto Technologies: We are going to make a chip that incorporates not one, but two different kinds of RISC-V processors. One of the RISC-V processors we are doing is called the ET-MAXION, and its goal is to be the highest single-thread performance RISC-V processor out there with performance such as you would find from any other IP vendor like Arm.

One of reasons the RISC-V community needs a high-end CPU is because if you only have low-end CPUs, then some customer has to build a chip or put in a CPU from Arm or others, and you’re not going to be viewed as loyal Arm customer if you are mixing these.

We wanted to make sure there was availability of a high-end RISC-V processor. Our goal is to have the highest single thread performance as well as the best teraflops per watt. And we are going to do this using industry standard CAD tools as well. In particular this means we are going to sell chips and offer IP. When we offer IP we are going to offer it in Verilog, but, more important, in a human-readable synthesizable Verilog so it’s easy to maintain. We are also going to do a strong physical design effort, optimizing for 7nm technology.

We are doing a second CPU as well, our ET-MINION, which is also a full 64-bit RISC-V instruction set compatible processor. However, it also has an integrated vector floating point unit, so it can issue multiple floating point operations per cycle—we are not saying exactly how many yet. It’s also optimized for energy efficiency, which is key because we are running into a power wall with our chips.

When you have any kind of a power limit, whether it’s a 10-watt limit on an embedded part or maybe a 100-watt limit in a high-end system, it is all about how many teraflops can you cram into that particular power limit. That is what reducing the amount of energy per operations is all about.

EECatalog: You’ve mentioned applicability to machine learning.

Ditzel, Esperanto Technologies: Esperanto has enhanced the RISC-V instruction set with Tensor instructions. Tensors are important data types used in machine learning. We have also added a few special instructions that allow us to run graphics codes faster. This small processor, the Minion, has interfaces so we can add hardware accelerators.

As with our Maxion processor, are going to use the Minion processor in our product as well as making it available as a licensable core. Then what we are going to do is put a number of these processors on a single chip in seven nanometer. It is going to have 16 of our high-performance 64-bit Maxion cores, but it is going to have over 4,000 full 64-bit ET Minion RISC-V cores—each of which will have its own vector floating point unit.

If you look at multicore chips today, it is not that uncommon to find an Intel chip with 16 to 28 cores for servers. On your desktop, you’d be lucky to find six cores, maybe. But RISC-V is so simple, we can put 4,000 microprocessors on a single chip

EECatalog: What machine learning trends are you keeping an eye on, and are there chicken-and-egg problems you have suggestions for solving?

Ditzel, Esperanto Technologies: I think the trend is just how to get more and more and more teraflops into a particular system and how to do it at reasonable power levels.

We’re trying to do a real focus on keeping the power levels down. We see a lot of people doing 300-watt chips. That just seems very scary. If you are doing an embedded application you probably can’t put in a 300W chip; you are probably looking at a lot of embedded applications having to be fanless, there is probably about a 10 watt limit there.

One of the things we’re doing is making the Esperanto solution very scalable for performance but also in power. We can get down to systems using just a few watts. And our goal is to get as many teraflops into those few watts as we can.

EECatalog: How are you setting yourselves apart from companies working on similar projects?

Ditzel, Esperanto Technologies: We see a distinction between what we are doing and what others are doing out there. We see some other companies proposing special purpose hardware for machine learning using proprietary instruction sets. [They say] “Oh, we have the latest thing for machine learning, but it doesn’t use any standards, it is not x86, Arm or RISC-V.”  Esperanto thinks a better approach to building chips for artificial intelligence is to base all the processing on RISC-V, therefore we can use all the software that is being developed for RISC-V. There is Linux for RISC-V; we will have operating systems, compilers, and other applications.

And, in order to make the machine learning performance even better, what we’ll do is have a few extensions on top of RISC-V, as mentioned earlier. Our approach to AI is to use thousands of these very energy-efficient RISC-V cores, each including vector acceleration.

Those who are experts in machine learning will recognize a benchmark here (Figure 1) called Resnet50. Very well known here; for image processing. It will recognize a picture of your cat, or a bicycle or something similar.

Figure 1: The Resnet50 Deep Neural Network benchmark. RISC-V+ with Tensor extensions running on Esperanto’s ET-Minion Verilog RTL Inference on one batch of eight images, running all layers.

Each red dot in the bottom right shows one of the four thousand 64-bit RISC-V microprocessors; where it shows bright red it is very busy, where it’s white, it is not so busy. What the video from which the Figure 1 image is taken shows is the running of each one of the convolution layers. It goes through 50 different layers here in order to try to recognize—what is this a picture of? We already have full Verilog RTL of our Minion cores up and working, and we already have the software compiling the benchmarks into RISC-V instructions that are running on top of that Verilog RTL.

Our chip is not done yet, but we are far enough along that we wanted to share this with the RISC-V community to get feedback from them. It shows that we have done a lot of very serious work towards making a great AI processor based on RISC-V.

EECatalog: What is the applicability to high-end graphics?

Ditzel, Esperanto Technologies: High-end graphics chips typically have thousands of shader processors, and those processors aren’t too different from RISC-V processors.

If you want to run a high-end video game you need a shader compiler, so we wrote one and also the software that can distribute all the graphics computation across our 4,000 cores, and that works pretty well.


Figure 2

Here we have two high end video game demos, one called Crysis and another one called Adam (Figure 2). Adam has a lot of robots in it. We are rendering scenes out of those video games on our processors. Of course, this is still just in simulation. We don’t have silicon, but it shows we have a pretty good graphics solution here, and it is all based on the open RISC-V instruction set.


EECatalog: Do you feel you have a strong financial plan for supporting continued research and innovation?

Ditzel, Esperanto Technologies: The same question was asked in the early days of Linux; Red Hat is one company that comes to mind. We are like the Red Hat of hardware.

When we look at licensing, if we did not license our core people would still buy our chips, but they might go out and use an Arm core or some other core. And we want to make the entire RISC-V ecosystem more popular, so we think there is more to gain by sharing and licensing our cores than there is to lose by holding it so tight and proprietary and having a smaller market.

Right now, the number of RISC-V users is small, we need to get more adopters. The more adopters there are, the more software there is, the more software there is the more people want to buy hardware chips.

I encourage everybody to take a hard look at RISC-V. It is being adopted very quickly, and a lot of people are not familiar with it yet, but it is going to have a lot of impact in the same way that Linux had impact many years ago. When everybody gets together to support an open standard, they are doing it for an important reason. RISC-V has been designed very carefully; it’s a great general-purpose RISC instruction set, and by adding special purpose extensions we can make it even better for AI and graphics.

EECatalog: Anything to add before we wrap up?

Ditzel, Esperanto Technologies: We believe that using general purpose RISC-V processors is a better solution for AI than using these special purpose chips with some new custom proprietary implementation. Part of what is happening in the field of machine learning is the algorithms are still changing rapidly.

So, if you go off and build special-purpose hardware to do one kind of machine learning algorithm, you may find out six to 12 months from now that because the algorithms are changing, you have built the wrong hardware. In the past, those of us who have designed processors have seen the world moving toward more general-purpose chips—that is why Intel has done so well with its x86 chip.

Our approach is to use a general-purpose RISC-V instruction set as a base and where necessary add extra domain specific extensions. These are things like Tensor instructions or hardware accelerators or merely using the standard RISC-V vector instruction set.

The nice thing about RISC V is anybody is free to innovate on it, in any way you want, whereas with an instruction set like Arm it is controlled by Arm itself. Arm doesn’t allow you to go off and play with the instructions without getting a special license and incurring extra costs, so much of the innovative research that is happening is going on with RISC-V today.

So, this is a call to the community to say, “Hey, you know there are all these people running special purpose machine learning processors, but we can make something just as good or better if we build on top of RISC-V rather than making the whole thing proprietary.”

We’re going to build advanced computing chips and systems, and we think that this can be the best system for machine learning, not just the best one using RISC-V. And when we can do that on top of RISC-V, people will say, “Wow!”, and it will help accelerate the adoption of RISC-V itself.

Because we are building a number of new kinds of RISC-V cores along the way, when we have those cores done, it will be very easy to license those out to other people. We are looking at licensing our Maxion and Minion cores as well as our solution for graphics here—and again, all this is optimized for TSMC 7nm CMOS but can be used in other older technologies as well. We will provide the most highly optimized version starting in 7nm. As it turns out, in 7nm there are a lot of reasons to have the physical design be very specifically optimized for that technology to obtain best results.

In addition to the licensable IP we will probably also support some free IP. There is a RISC-V design that we have adopted called BOOM from UC Berkeley and since we hired the person who did that, everybody wants to be sure it stays supported and maintained, and yes, we are going to do that.

Our call to the RISC-V community is, “Hey, this is what we are doing, we want to let you know, let’s all work together and make the piece of the pie bigger that we get for RISC-V. Then we all win.”




Beyond Automation: Building the Intelligent Factory

Friday, December 1st, 2017

Why the fate of factories and that of machine learning are intertwined.

Factories already have a lot in common with living beings. They consume raw materials, require energy, and have interlocking systems that all move in a complex choreographed dance toward a shared goal. Automation and computationally driven designs have given us factory equipment that can perform repetitive tasks with some variation based on operating conditions and control signals.

But today’s factories can’t learn from their own mistakes, innovate autonomously, or teach themselves how to optimize existing processes. That day is coming soon, on a wave of machine learning that will drive the intelligent factory of the near future.

Machine learning, combining distributed artificial intelligence (AI), advanced sensors, and precision robotics, is taking manufacturing into Industry 4.0. It will be the fourth major era for manufacturing, following steam power, assembly lines, and automation.

Crucial Technologies for the Intelligent Factory
A number of significant advances are coming together at the right time to make learning machines and intelligent factories a reality. Wireless networking meshes have reached a degree of speed and reliability such that hundreds or even thousands of devices in a single factory can quickly and safely exchange information with each other and with central data stores. Data mining and analysis have advanced to help both human and AI analysts find patterns hidden in the records, uncover buried inefficiencies, and drive errors out of the workflow. Cloud technologies can store untold amounts of data and perform constant analysis. And small ultra-low-power networked sensors are capable of accurate measurements well below 1mm and can distinguish between materials such as plastic, drywall, and fabric.

Meanwhile, the huge investment in self-driving automobiles benefits manufacturing with machine-vision breakthroughs, making computers better than ever at recognizing objects and correctly manipulating them. Computationally powerful but energy-efficient multicore processors are small and affordable, and can be programmed and repurposed by a wide range of coders worldwide. All of these elements are the building blocks of automated systems that will guide, control, and educate the next generation of manufacturing capital.

How Data Becomes Wisdom in the Intelligent Factory
For decades, data has been essential to safe and efficient operations in any factory. Human operators already collect and analyze raw facts and figures about inputs, outputs, waste, duty cycles and mechanical failures. Advances in AI and big data processing make it possible to create machines that cannot only generate more raw data, but can also process the data into meaningful information, understanding its content and applying that information as learned wisdom. These machines will come together in intelligent factories and learn how to avoid mistakes, correct imbalances, and improve processes.

Today’s “smart” machines are only as adaptable as their programming. Even a thorough coder cannot account for all of the contingencies and variations a typical factory environment can face. Wear and tear; variation in raw material quality; and environmental factors like temperature, dust and grime can cause yields to fall and components to fail, forcing costly slowdowns, repairs, and adjustments.

With access to massive cloud data storage and computation, as well as high-speed integrated processors, machines can start learning from conditions as they occur. Distributed intelligence networks can analyze every robot’s position and activity and every sensor’s report on temperature, proximity, orientation, chemical composition, distance, and more.

Figure 1:  A self-controlled machine acts based on wisdom distilled from lower levels, ultimately arising from massive amounts of data.

Instead of just collecting data for later analysis, intelligent factories will be able to apply AI to reach conclusions, make informed judgments, and take corrective action. Robots will compensate for drift as parts heat up or bearings wear down. Chemical control systems will optimize recipes as conditions change, analyzing slight variances in supply batches. Re-tuned and synchronized motors will work more efficiently on cooperative jobs.

The Impact of the Intelligent Factory
When they do come online, intelligent factories will become a new engine of growth and profitability as self-healing, self-improving centers of innovation. Manufacturing excellence in the Industry 4.0 world will belong to those who give their machines the data and resources they need to perceive and report on the work they are doing, with enough computational heft to translate that data into wisdom and act automatically.

Combining AI with advances in both machine vision and voice-activated agents will make robots not only more powerful and productive, but also safer and more reliable.

The intelligent factory doesn’t mean the end of human labor. In fact, industrial intelligence could enable people and large-scale robots to work together much more closely. Instead of being separated by safety barriers, intelligent factory robots will be able to automatically detect people nearby and adapt their own work to take greater precautions. As the safety barriers around robots shrink or come down entirely, further work on power conditioning and signal isolation will ensure that robots have steady and reliable power sources that pose little risk to people and other machinery in proximity.

We’ve yet to imagine the impact of intelligent factories. No one could have invented Kanban or computer numerical control (CNC) without first seeing an assembly line. In the same way, it’s safe to say that many of the processes uniquely well-suited to intelligent factories won’t be invented until the machines themselves start coming online. Human imagination, unbounded by the need to rigidly program robots for specific tasks and contingencies, will play a huge role in shaping the increasingly complex mechanical, chemical, and biological products made possible by intelligent factories.

Matthieu Chevrier is Systems Manager, PLC systems. Chevrier leads the system team based in Freising Germany, which supports a WW base of PLC (Programmable Logic Controllers) customers. He brings to his role his extensive experience in embedded system designs in both hardware (power management, mixed signal, and so on) and software (such as low-level drivers, RTOS, and compilers). He earned his master of science in electrical engineering (MSEE) from Supélec, an Ivy League university in France. Chevrier holds patents from IPO, EPO, and USPTO.


Tobias Puetz is a SystemsEengineer in the Texas Instruments Factory Automation and Control team, where he is focusing on Robotics and Programmable Logic Controllers (PLCs). Puetz brings to this role his expertise in various sensing technologies, power design, and wireless charging as well as software design. He earned his master’s degree in electrical engineering and information technology at the Karlsruhe Institute of Technology (KIT), Germany in 2014.


The Machine Learning Group at Arm

Thursday, November 2nd, 2017

An Interview with Jem Davies, Arm Fellow and Arm’s new VP of Machine Learning.

Arm has established a new Machine Learning (ML) group. Putting this within context, machine learning is a subset of AI, and deep learning is a subset of ML. Neural networks are a way of organizing computational capabilities that are particularly effective for delivering the results that we see with machine learning. With machine learning, computers “learn” rather than get programmed. Machine learning is accomplished by feeding an extensive data set of known-good examples of what the computer scientist wants to see from the machine.


Figure 1: Deep learning, using Neural Networks (NN), attempts to model real life with data using multiple processing layers that build on each other. Examples of algorithms integral to ML include Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and others. (Credit: Arm, Jem Davies)

Arm has published some of its viewpoints about Artificial Intelligence (AI) online.

According to Jem Davies, Arm Fellow and Arm’s new VP of Machine Learning, Client Line of Business, machine learning is already a large part of video surveillance in the prevention of crime. Davies’ prior role as general manager and Fellow, Media Processing Group at Arm, is an excellent segue into ML, as Graphic Processing Units (GPUs) hold a primary role in accelerating the computational algorithms needed for ML. ML requires large amounts of good data and computational power that is fast at processing repetitive algorithms. Accelerators like GPUs and now FPGAs are used to off-load CPUs so the entire ML process is accelerated.

Davies is the kind of good-humored, experienced engineer whom everyone wants to work with; his sense of humor is just one tool in his arsenal for encouraging others with an upbeat attitude. I had an opportunity to meet with Davies at Arm TechCon. Edited excerpts of our interview follow.

Lynnette Reese (LR): Artificial Intelligence (AI) as a science has been around for a very long time.  What attributes in improved technology do you think contributed the most to the recent maturing in AI? Is it attributed to the low cost of compute power?

Jem Davies, Arm

 Jem Davies: Really, it’s the compute power that’s available at the edge. In the server, there’s no real change, but the compute power available at the edge has transformed the last five years, so that’s made a huge difference. What’s fired up interest in the neural network world is the availability of good quality data. Neural networks provide a technique that’s more than 50 years old, what we’ve got now is the training data, good, quality, correlated data. So for example, the one that sort of drove it initially was image recognition. In order to train a neural network to do image recognition, you have to have vast quantities of images that are labeled. Where are you going to find one of those? As it turns out, Google and Facebook have all of your pictures. You’ve tagged all of those pictures, and you clicked on the conditions that said they could do what they wanted with them. The increasing capability of computing, particularly in devices, has led to the explosion in data.

LR: You said that the explosion of data is the impetus for machine learning, and this is clear with image recognition, perhaps, but where else do we see this?

Davies: The computational linguists are having a field day. Nowadays we have access to all sorts of conversations that take place on the internet. You have free, easy access to this data. You want to work out how people talk to each other, look on the internet. If you are trying to work out how people talk to each other, look on the web. And it turns out that they do it in all sorts of different languages, and it’s free to take. So, the data is there.

LR: So, applying successful ML to any problem first requires good data?

Davies: If you haven’t got the data, it’s difficult to get a neural network algorithm. They are working on that; there is research being done to work using much smaller amounts of training data, which is interesting because it opens up training at the device level. We are doing training on-device now but in a relatively limited way; but you don’t need six trillion pictures of cats to accomplish cat identification.

LR: In your Arm talk about Computer Vision last year you said there were 6 million CCTVs in the U.K. What do you imagine AI will be doing with images CCTV 20 years from now? For instance, do you perceive that we can combat terrorism much more efficiently?

Davies: It is being done today. We are analyzing gait, suspicious behavior; there are patterns people have that give themselves away. This is something an observational psychologist already knows. People give themselves away by the way they stand; the way they hold themselves.

LR: What about sensing beyond visual recognition? For example, can you use an IR sensor to determine whether a facial prosthesis is in use, for example?

Davies: When engineering moves beyond the limited senses that humans possess, you can throw more information at the problem. Many activities work much better using IR than in the visible spectrum. IR poses fewer issues with shadows, for instance. One example of challenges we face with a security camera is that the camera might have to cover an area where the sun is streaming down, and there’s a shadow at the other end of the frame. If you are tracking someone from one side to the other of the frame, shadows can interfere with obtaining consistent detail in such situations. Move that to the IR domain, and it gets a whole lot easier. But why stop there? You can add all sorts of other things to it as well. Why not add radar, microwaves? You can do complete contour mapping.

LR: So, you could get very detailed with this? Adding additional sensors can give more data.

Davies: Yes, sensor fusion is the way forward. We [humans] are adding together the input from all our senses all the time. And our brains sometimes tell us, “That input doesn’t fit, just ignore it.” I can turn my head one way and think I can still see someone in my peripheral vision. But actually, you can’t. The spot right in front of you is the area that you can see in any detail. The rest is just your brain filling things in for you.

LR:  What’s Arm doing to innovate for AI?

Davies: We are doing everything; hardware, software, and working with the ecosystem. If you look at hardware, we are making our existing IP, our current processors better at executing machine learning workloads. We are doing that for our CPUs and our GPUs. On the interconnect side, things like DynamIQ [technology], enable our partners to connect other devices into SoCs containing our IP. This is a considerable amount of software because people do not want to get deep into the details.

If you look at the Caltrain example, where an image recognition model for trains was developed with data from Google images and used on a 32-bit Arm-based Raspberry Pi, it’s becoming quite easy to apply ML techniques. He just downloaded stuff off the web; he didn’t know anything about it. He’s not an image recognition specialist, he doesn’t know anything about neural networks, and, why should he?  If we [Arm] do our jobs properly, if we provide the software to people, it just works. It turns out there’s a lot of software involved; probably half my engineers are software engineers. The Arm compute library, is given away [as] open source; it has optimized routines to do all the things that go into machine learning. That is what powers the implementations on our devices. Google’s Tensorflow, Facebook’s Caffe, and others plug into that, and so you end up with force multiplier effect. We do the work, give it to the ecosystem, and Facebook has now got millions of devices that are optimized to run on Arm CPUs and Arm Mali GPUs. As you can see, there’s a lot of hardware development, software development, and a significant amount of working with the ecosystem. Everybody’s getting involved.

LR: What can you tell me about Arm’s new Machine Learning business? Do you have any industry numbers?

Davies: Industry numbers are hard to get. What I will say is that it’s going to be huge. It’s going to affect everything we do. One of the reasons why we formed the machine learning business as it is, is that it cuts across all new lines of business.

LR: Not that you should take sides, but what would you say about using FPGAs vs. GPUs in AI?

Davies: Arm doesn’t take sides. Arm always plays both sides. FPGAs are flexible; you can reconfigure the hardware to great benefit. But that comes at the cost of much less density and much more power. People [physically] get burnt touching an FPGA. For us, it’s a trade-off. If you can implement something in an FPGA that’s absolutely, specifically tailored to that problem. Presumably, it will be more efficient. But executing on an FPGA…an FPGA is bigger, much more expensive, and uses more power. Which way does that balance come down? It’s a different problem, as it comes down to it. Pretty much for anything battery powered, the answer is that FPGAs are a bust in that space. FPGAs don’t fit there; not in small, portable electronic devices. For environments that are much bigger, less power constrained, maybe there’s a place for it. However, note that both Altera and Xilinx have products with Arm cores now.

LR: What would you say to young engineers starting out today that want to go into Machine Learning?

Davies: “Come with us, we are going to change the world,” which is precisely what I said in an all-hands meeting with my group just last week. And I don’t think that’s too grand. Look at what Arm did with graphics. We started a graphics business in 2006; we had zero market share. Yet our partners shipped over a billion chips last year containing Mali Arm GPUs.

LR: Billions of people are tapping on their devices using Arm’s technology.

Davies: Yes. If I look back on what we have achieved at Arm, the many hundreds of people doing this, you can easily say that Arm has changed the world.

LR: So, Arm is not a stodgy British company? Everyone needs good talent, and Arm is changing the way we live?

Davies: Absolutely, we are a talent business. Don’t tell the accountants, but the assets of the company walk out the door every night. If you treat people well, they come back in the next morning.

LR: It sounds like Arm values employees very much.

Davies: Well, we definitely try to. Clearly, as any company, we occasionally get things wrong, but we put a lot of effort into looking after people because together we make the company. As with any company, our ambition is limited by the number of people we have. Effectively, we are no longer limited by money [due to Softbank’s vision upon acquiring Arm].

LR: So now you can build cafeterias with free food and busses to carry employees around?

Davies: Right, I am still waiting for my private jet…but seriously, that’s what I was talking about, that we are changing the world. I think [new graduates] would want to be part of that.

Lynnette Reese is Editor-in-Chief, Embedded Intel Solutions and Embedded Systems Engineering, and has been working in various roles as an electrical engineer for over two decades. She is interested in open source software and hardware, the maker movement, and in increasing the number of women working in STEM so she has a greater chance of talking about something other than football at the water cooler.


Questions to Ask When Specifying a Frame Grabber for Machine Vision Applications

Tuesday, May 30th, 2017
Over the last two decades there had been a general trend towards the standardization of machine vision interfaces. Not long ago, engineers were required to purchase unique and expensive cables for every camera-to-frame grabber combination, or even camera to PC interface. In some cases, different cameras from the same manufacturer required an entirely different set of cables, resulting in costly upgrades and unsatisfied customers.
Standardization has changed that, making life easier for engineers and manufacturers alike. Camera Link became a universally accepted interface in 2001 and is still going strong. CoaXPress (2011), USB3 (2013), GigE Vision (2006) and Camera Link HS (2012) are also now universally accepted interfaces for machine vision solutions. Others such as Firewire have been in the market since the 80’s but are being replaced by newer interfaces. Thunderbolt®, an Apple technology, is still on the periphery and not as widely developed or accepted into the market.
Nowhere has standardization been more keenly observed than in frame grabbers. Along with cameras and cables, frame grabbers are essential components in most high-end machine vision systems. Frame grabbers are essential for these high end applications as the data rates exceed anything that can be provided by a non frame grabber solution. They are also required when complex I/O signals are introduced into the vision system. Examples of these are quarature encoders, strobes and triggers of various types.
With the introduction of high-speed communication links like Ethernet, Firewire and USB, pundits forecasted an end of the frame grabber. After all, a smart digital video camera was capable of packaging information into packets for direct feed into a PC’s communication ports, so there was no long a need for a frame grabber, right?
Not so fast. For all their hype Direct-to-PC standards are, at best, adequate for lower-end applications. Cameras are evolving at a rapid pace and now produce megapixel images at extremely fast frame and line rates, far exceeding the 120MB/s serial-interface limit. Vision engineers have found that frame grabbers offer advantages that continue to make them necessary, perhaps more now than ever before.
The main component in a machine vision system that determines the frame grabber is the sensor. To find the right sensor, the customer asks themselves three questions about their machine vision solution:
  • What do I need to image?
  • How do I need to image it?
  • When do I need to image it?
Upon deciding a choice of sensor, the customer then goes about finding which companies can offer this sensor in a package; or in other words, a camera. Next, they must determine their application’s imaging requirements to choose what performance features are needed in a frame grabber, including:
  • Is there a large volume of data that needs to be acquired from the camera?
  • Is there a high speed of data acquisition involved?
  • What about timing?
  • Are interrupts OK?
  • Can the system deal with dropped or lost frames?
  • Are there other components to consider such as encoders or strobes? Other I/O?
  • Is it a multi-camera system?
  • What is the maximum distance between the camera and the PC?
Other important questions for frame grabbers are: “How do I hook up my encoder?” or “How do I test various I/O options?” For this reason it is important that the frame grabber have the capability of hosting a number of optional components, be it a DIN mounted I/O test fixture or a cable to get the I/O outside the PC to connect to an external I/O.
As important as these questions are, frame grabber cost can be as much of a factor as performance. For some, performance is the end-all and as such, they will specify the frame grabber that does precisely what is required no matter the price point. For others, price will dictate just how “exact” they want their system to be, or how much they can operate with certain limitations such as bandwidths and distances.

Frame grabbers are no longer exclusively used in machine vision; they are today an essential component of dozens of industries. It is therefore important that the frame grabber manufacturer is involved in standards committees and other groups monitoring the evolution of this fast-changing technology.
It is equally critical that the manufacturer works closely with camera manufacturers, cable companies and image processing software developers to ensure that the customer will be able to integrate their choice of components with a specific frame grabber. BitFlow has been in the business of frame grabbers since 1993. Over that time, BitFlow has advanced and adopted the various machine vision interfaces to best serve the needs of the customer. The company’s frame grabber interfaces now include Camera Link, CoaXPress and Differential, and is coupled with powerful software and APIs’ compatible with popular image processing software packages.
To learn more, please visit
Affordable yet powerful, the BitFlow Aon-CXP is optimized for use with the newest generation of smaller, cooler operating CXP single-link cameras popular in the IIoT

About BitFlow

BitFlow has been developing reliable, high-performance Frame Grabbers for use in imaging applications since 1993. BitFlow is the leader in Camera Link frame grabbers, building the fastest frame grabbers in the world, with the highest camera/frame grabber densities, triggering performance, and price. With thousands of boards installed throughout the world, into hundreds of imaging applications, BitFlow is dedicated to using this knowledge and experience to provide customers with the best possible image acquisition and application development solutions. BitFlow, located in Woburn, MA, has distributors and resellers located all over the world including Asia, the Americas, and Europe. Visit our website at

Contact Information

BitFlow, Inc.

400 West Cummings Park Suite 5050
Woburn, MA, 01801

tele: 781-932-2900
fax: 781-932-2900

Consumer Robot Sales to Surpass 50 Million Units Annually by 2022, According to Tractica

Wednesday, May 24th, 2017

Rising Adoption of Household, Toy, and Educational Robots Will Drive a Five-Fold Increase in Consumer Robot Shipments in 5 Years

The consumer robotics market is undergoing a period of significant evolution, with rising demand for household, toy, and educational robots being fueled by several key market developments including a proliferation of robotics startups with innovative products, rapidly declining prices due to lower component costs, the growth of connected smart devices as an enabler for consumer robots, and demographic trends around the world.

According to a new report from Tractica, worldwide shipments of consumer robots will increase from 10.0 million in 2016 to 50.7 million units annually by 2022. During that period, the market intelligence firm forecasts that the consumer robotics market will grow from $3.8 billion in 2016 to $13.2 billion by 2022.

“Consumer robotics is shifting from a phase of being largely dominated by cleaning robots, into robotic personal assistants or family companions,” says research analyst Manoj Sahi. “In addition, robotic toys, which, until now, were largely gimmicks, are transforming into interactive connected play devices that have virtually limitless possibilities, as well as useful educational tools as a part of science, technology, engineering, and math (STEM)-based curriculum.”

Tractica’s report, “Consumer Robotics”, examines the global market trends for consumer robots and provides market sizing and forecasts for shipments and revenue during the period from 2016 through 2022. The report focuses on crucial market drivers and challenges, in addition to assessing the most important technology issues that will influence market development. In total, 109 key and emerging industry players are profiled. An Executive Summary of the report is available for free download on the firm’s website.

About Tractica

Tractica is a market intelligence firm that focuses on human interaction with technology. Tractica’s global market research and consulting services combine qualitative and quantitative research methodologies to provide a comprehensive view of the emerging market opportunities surrounding Artificial Intelligence, Robotics, User Interface Technologies, Wearable Devices, and Digital Health. For more information, visit or call +1-303-248-3000.

Contact Information


1111 Pearl Street
Suite 201
Boulder, CO , 80302

Extension Media websites place cookies on your device to give you the best user experience. By using our websites, you agree to placement of these cookies and to our Privacy Policy. Please click here to accept.