The Smallest 64-bit Processor for the Next Billion Smartphone Users
By Caroline Hayes, European Editor
When it was introduced at ARM® TechCon 2015, the ARM® Cortex®-A35 was described as the smallest, lowest power and most efficient ARMv8-A processor the company has ever built. This is quite a statement, following as it did, the introductions of the ARM Cortex-A5 in 2009 and the ARM Cortex-A7 (2011).
Whereas the Cortex-A5 has features designed for mobile computing, and the Cortex-A7’s multi-tasking suits smartphones, this latest processor is based on the ARMv8-A architecture. It supports both 32- and 64-bit compute capabilities but consumes 10 percent less active power than the Cortex-A7.
Nandan Nayampally, VP Marketing, CPU Group, ARM, is looking forward to continued growth in the mobile phone market. The company has already shipped two billion entry-level smartphones, equipped with Cortex-A5 and Cortex-A7. He estimates there will be an eight percent compound annual growth rate (CAGR) in the period 2015 to 2020. However, most of the growth will be in entry-level smartphones, he cautions.
For higher growth rate in the next level of devices, the Cortex-A35 offers the same footprint as the Cortex-A7 (“still the efficiency benchmark,” he points out), but with 10 percent lower power and an increase in performance from six to 40 percent, depending on the application “all with 64-bit capability,” concludes Nayampally, adding that this processor offers the industry the means to bring 64-bit capability to the next generation of entry-level phones: “It’s plus, plus, plus.”
This eight-stage pipeline features 64-bit compute capabilities and a redesigned instruction fetch operation for efficiency with fewer cache accesses for lower power consumption. The instruction fetch bandwidth is optimized to accommodate the new branch prediction techniques. Throughput is also accelerated while minimizing area and power costs, using the instruction queue, which is balanced between the fetch and execute units.
Another architecture feature is the pipelined, double precision multiplier, which contributes to doubling the speed on single precision floating point operations and runs at five times the speed for double precision floating point operations, compared with the Cortex-A7.
Mobile devices are prized for web browsing, audio, image, video and gaming capabilities, all of which are memory-intensive. The Cortex-A35 has a mechanism to automatically pre-fetch multiple streams of data, improving memory streaming compared with the single stream capability of the Cortex-A7, and allowing it to write big sets of data faster. The increase in memory processing speed compared with the Cortex-A7 processor is 375 percent, a significant increase for mobile web browsing, video and gaming operations.
The first processor based on the ARMv8-A architecture was the Cortex-A53. The Cortex-A35’s core is 25 percent smaller, for a typical configuration that includes 32 kbyte L1 caches, ARM NEON™ (the 128-bit Single Instruction, Multiple Data [SIMD] architecture extension used to accelerate consumer and multimedia applications), and crypto blocks. Microarchitecture improvements alone have resulted in a 10 percent reduction in dynamic power, compared with the Cortex-A7 (Figure 2). Dynamic power consumption is 10 percent less than that of the Cortex-A7. Power consumption per core is 35 percent lower and 25 percent more efficient than the Cortex-A53, making it the smallest, lowest power and most efficient ARMv8-A processor to date.
Internet browsing is the mainstay of mobile device use, so the Cortex-A35 delivers 16 percent improvement, compared to Cortex-A7, clocking at the same frequency. Furthermore, a 2.0 GHz clocked Cortex-A35 will improve browsing performance by 84 percent compared to Cortex-A7 running at 1.2 GHz.
Changes to ARM NEON technology have also been implemented that result in 16 percent faster video performance, using popular formats, such as mp4 files. A 36 percent improvement on floating point workloads boosts gaming engine performance.
New idle power management capabilities have been introduced that can be integrated in SoC designs for power-sensitive, mobile device designs. One of the capabilities introduced with the Cortex-A35 are a standardized mechanism for managing a device’s idle state, using Q channels (Figure 3). A Q-channel is a standardized set of four signals that can be connected to the power controller. It is an evolution of the AXI low-power interface and can simplify integration and power management software used to manage idle states for the CPU cluster.
Four new power modes are supported in the CPU with CPU, NEON, L2 caches and top-level logic power domains, each controlled independently.
A governor block adds logic to support automatic entry and exit from retention mode. This is an important feature for the Cortex-A35’s power management, as implementing retention states simplifies power management across the CPU cluster to save vital mW of power without affecting operation. It works like this: if, for example, a NEON instruction does not enter the pipeline for a programmable number of cycles, the governor block can interact with the power controller, using the dedicated Q channel, as described above, allowing the block to move into retention state. Equally, if a NEON instruction enters the pipeline, the governor block interacts with the power controller and moves into a run state and executes the operation.
Power consumption is a critical consideration in mobile devices, and the Cortex-A35 is ARM’s lowest power 64-bit application processor. The smallest configuration Cortex-A35 consumes less than 6mW, running at 100 MHz. This is a significant advance in power, bearing in mind that a typical core configuration, at 28nm process node and running at 1 GHz, consumes less than 90mW total power.
The increased mobility offered by today’s smartphones and devices means that more applications are executed online, but some are more sensitive than others. Take banking, as an example. Here, efficient, secure operation is vital on a mobile device.
The Cryptography Extension feature in the ARMv8-A enables software-based implementation of algorithms such as Advanced Encryption Standard (AES) and Secure Hash Algorithm (SHA) for secure operation. This optional extension boosts performance by 350 percent for executing SHA-1 algorithms and by 1100 percent for AES.
The ability to scale from one to four cores per cluster, and to support configurable cache sizes, means that the Cortex-A35 can meet a number of mobile and embedded application requirements. It also supports optional cache protection in L1 and L2 caches, parity in L1 instruction caches and error correcting code (ECC) in L1 and L2 data caches. The level of reliability that this support brings makes the processor equally suitable for automotive and industrial applications.
For use in the mobile market, it can be configured as a cluster of four processors, each with 32 kbyte L1 caches, NEON and crypto-engines, and for area sensitive embedded use cases, it can scale up to 10 times smaller in size, to occupy an area of less than 0.4mm².
Backwards compatible with the earlier ARMv7-A software, the Cortex-A35 adds the benefits of the new ARMv8-A architecture, with its support for 64-bit compute capability. This 64-bit capability is recognized with not only applications, which are becoming increasingly sophisticated, but also in its ability to process large files efficiently. Mobile devices now are expected to quickly manipulate data to and from the Cloud as well as compute intensive mobile and microserver applications.
This article was sponsored by ARM.