Memory-Centric SoCs and AI



What can be realized as high-performing and low-power memory-centric SoCs pair with IoT devices and cloud-based servers?

According to a recent Stanford report , artificial intelligence (AI) will not replace the need for humans any time soon, but it will have a profound impact on everyday lives, transforming industries from transportation, education, medical and entertainment. The exciting developments that deep neural network algorithms and machine learning can produce will result in new intelligent interfaces, new virtual assistants, and advancements in a wide range of applications across multiple industries—once only dreamed about in research laboratories and now becoming possible for consumer adoption.

Figure 1: Courtesy https://de.wikipedia.org/wiki/Go_(Spiel)#/media/File:Go_board.jpg Donarreiskoffer - Selbst fotografiert

Figure 1: Courtesy https://de.wikipedia.org/wiki/Go_(Spiel)#/media/File:Go_board.jpg Donarreiskoffer - Selbst fotografiert

The first examples of machine learning algorithms arrived to a larger audience when computers started to defeat chess players and more recently the best Go player (Figure 1). However there are many more possibilities. We are now entering an era where surveillance cameras and autonomous driving are real-life applications of advanced computer vision powered by machine learning techniques. Voice recognition and smart sensors using deep learning algorithms can increase context awareness for robotics applications. Cloud based analytics for e-commerce and advertisement recommendations, and business analytics or medical treatment recommendations are other examples….and the list goes on.

To transition AI beyond experiments, the user experience needs to improve in order to accelerate adoption and enable companies to make money. AI and more specifically machine learning are at the core of how nearly every business of the future will make money and have greater differentiation compared to their competition. In the past, storing data was a cost center, but today it has the opportunity to serve as a profit center as it trains algorithms and creates new applications.

Algorithms require large data sets to be trained. The algorithms needed to process that data are increasingly complex, and everything needs to happen in real time. For several years, many companies well understood that data gathering was the most valuable asset of their business model. There is no doubt now why social media companies provided a very handy and free platform to share pictures and post comments. This huge amount of qualified data is now a very valuable asset and justifies the high valuation of these companies. Most of the data collected by their applications or devices is directly fed into deep neural networks to train them. Zettabytes of data storage that can be quickly accessed to train machine-learning algorithms will enable even more innovative products and applications.

Figure 2: AI can turn data into experience.

Figure 2: AI can turn data into experience.

IoT as Stepping Stone to AI

IoT is a natural stepping-stone to AI, and eventually the two will merge when all these connected devices become smarter and more predictive. While IoT is about data collection, AI is about data consumption. The more data, the smarter the AI algorithms will be.

Voice activated virtual assistants built into smart devices like Apple’s Siri, Amazon’s Alexa, or Google’s Home are getting more and more traction. Experiences similar to the intelligent virtual assistant Samantha from the romantic science-fiction film “Her” from Spike Jonze may only be a few months away. At a more short-term scale, growth of AI is increasing in surveillance cameras, where data has to be analyzed locally. Performance and cost challenges to upload streams of 4K video to the cloud for processing and downloading will prevent the required action to be taken. Imagine driving assistance malfunctions when the wireless connection is poor. Most smart devices need a better understanding of their environment and consumers’ habits to better perform according to users’ expectations. Evolution is based on learning from our past experiences and improving our future behavior.

For these technologies to produce rich sets of data that can be analyzed and acted upon, innovative memory technologies are needed to deliver high performance and low energy.

Non-volatile memory technologies such as Crossbar RRAM are helping address the performance and energy challenges of embedded IoT by delivering lower power and lower voltage operation, monolithic integration with computing cores, faster read, and byte-addressable writes. RRAM is the lowest energy memory technology for IoT applications and can be integrated with processing cores on a single chip solution.

On-chip integration with dedicated logic makes RRAM adequate to accelerate deep neural network algorithms. Data coming from sensors can be stored on-chip and directly fed through deep neural networks to take direct actions. Objects become smart not by fetching lines of software but by reacting to external data coming from various sensors. Data collection and processing could be integrated onto a single chip solution with embedded RRAM. By integrating high-density memory on-chip with the processor at the same node, the inherent latencies associated with moving data from processors to off-chip memory sub-systems and back again are improved. As a result, RRAM technology is an important innovation to accelerate the potential of the new big data, artificial intelligence universe, enabling a multitude of applications that can speed performance, dramatically improve energy efficiency, enable advanced security and reduce chip count and size.

Figure 3: Non-volatile memory and computing logic on same silicon speeds data access and improves energy efficiency.

Figure 3: Non-volatile memory and computing logic on same silicon speeds data access and improves energy efficiency.

New Architecture Needed for AI

Moore’s Law is screeching to a halt and CPU refreshes are less frequent. The von Neumann memory bottleneck of current Intel-based processors could be solved with new system architectures that will be more memory-centric. The performance gap between storage technologies and computing has to be reduced. Traditional Flash-based storage solutions deliver read latencies in the 100µs range while Crossbar 3D RRAM reaches 1µs. In addition to Crossbar’s RRAM, there are several other initiatives powered by emerging memory technologies like Intel Micron XPoint PCM, Everspin, and Avalanche MRAM all trying to solve this challenge.

Traditional architectures in data centers usually have three distinct and stand-alone parts:

• Computing with best-in-class processing cores and attached DRAM memories
• Data storage with lowest cost per bit SSDs or HDDs
• Networking to interconnect the computing part and data storage part

Analysts are observing a massive trend towards higher integration in data centers where these three parts: computing, storage, and networking are condensed in a compact form-factor, called the hyperconverged server.

Based on some recent reports, already 40% of the enterprise data centers use hyperconverged units, and this market is expected to grow close to 80% in the next five years.

The data center infrastructure market is evolving rapidly to handle lower latencies across the various elements of a server. Bringing all components into the same box is a way to reduce latencies, total cost of ownership, and power. In hyperconverged servers, compute, storage and networking can now be interconnected in a more efficient way. Emerging industry consortiums are arising to handle high-bandwidth low latency data accesses across processors, storage and IOs. In high performance computing applications, Intel Micron 3D XPoint PCM and Crossbar 3D RRAM are expected to provide significant improvements by reducing the performance gap between storage and computing.

Memory technologies that can be directly integrated on-chip with the processing logic will enable brand new memory-centric SoC architectures. When non-volatile memory and computing logic share the same silicon, the performance bottleneck of the external bus is removed. One particular application for embedded persistent memory technology is deep learning hardware acceleration built on memory-centric SoC. Artificial intelligence and deep learning represent the most probable evolution of computing in the decade. The deep neural network is all about data and how trained algorithms react to new sets of data. There is tremendous interest from the industry to develop new computing platforms with massive parallelism coming from multiple processing engines with dedicated embedded RRAM cores. Several companies are already working on using RRAM cells as synapses in neuromorphic processing architectures.

Figure 4: An RRAM-centric parallel computing platform for new AI experiences.

Figure 4: An RRAM-centric parallel computing platform for new AI experiences.

Bringing these high-performing and low-power memory-centric SoCs to IoT devices and cloud-based servers will make data and computing ubiquitous—available to users and applications whenever, wherever they need it. These solutions will deliver not only the “BIG” data to the cloud, in terms of capacity and scale, but also the energy efficiency, security, and blazing performance with low latency required for embedded applications to make new artificial intelligence applications possible.

________________________________________________________________________________________________

sylvain-duboisSylvain Dubois joined the Crossbar management team in 2013 as Vice President of Strategic Marketing and Business Development. Prior to joining Crossbar, Mr. Dubois led strategic product positioning and market engagement for developing new products at Spansion. He holds a Master of Science in Microelectronics from E.S.I.E.E. (Paris), University of Southampton (UK) and Universidad Pontifica Comillas (Spain).

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • TwitThis