How to Verify an SoC Meets Your Power Budget
Power consumption is becoming a critical aspect of hardware design. No longer does verifying an SoC solely mean answering the question “Does it work?” Now designers must also answer the question, “Does it meet my power budget?”
Correct assessment of an SoC’s power consumption requires analysis of real application stimuli and correlation with the software running on the device. This is a huge challenge when using traditional methods relying on software simulators. However, emulation platforms with their high capacity and performance offer the promise of handling this work, provided that the relevant information can be extracted from the machine and properly interpreted. It is the case for Veloce®, which not only has the capacity to handle the largest SoCs and run realistic software loads, but also to efficiently collect switching activity data and model power while providing visibility on the software that is running. The activity information correlates directly to power consumption and allows the verification team to find periods and regions of high power consumption, respectively power peaks and hotspots. Also, the Veloce platform has several tools for debugging software, including a non-intrusive method, which is needed when collecting power data.
Switching Activity Generation
Veloce comes with specific hardware built-in that enables it to collect switching activity for all the nets of a design on RTL, but it can also be done on gate-level netlists for improved accuracy. This activity data can be collected for the complete design for all clock cycles. It can also be limited to a subset of the design or it can be sparsely sampled—that is, collected not on every clock edge but only on a subset of the clocks during execution, typically 1 kilocycle every 8, 96, or 1024 kilocycles (see Figure 1). The sparse sampling usually enables a unique combination of fast execution and a statistically accurate view of switching activity in the design while the cycle-accurate approach enables very fine-grain analysis.
Since the overall activity is a complex consequence of software activity (including the OS), full software needs to be considered when verifying power consumption. However, traditional software development debug solutions for emulation are intrusive—while they do the job, they cause multi-million additional clock cycles to be executed, exercise the debug channels, and even flush processor caches when interacting with the processor.
Non-Intrusive Visibility into Software Execution
Veloce supports a non-intrusive debug methodology using Codelink®, a hardware/software debug environment. Codelink traces the activity of the processors as they execute code. This trace data is passed to the co-model host, where it is processed into a database that can reconstruct the state of the processor and its surrounding memory at any point in time. This can be used to display the state of the code in a traditional software debugger. Most importantly, it can correlate a specific line of software with a given point in time in the hardware execution. This makes it possible to see what all the processors were doing during or immediately prior to periods of unexpectedly high power consumption.
A Real-World Example
The following is an example of how this can be applied to real-world verification scenarios. It concerns a design where a physical prototype had been created. Using the physical prototype, an ammeter was attached to the power supply to determine the power consumption. Most of the time the system performed as expected with respect to power consumption. However, about 10 percent of the time the system quickly drained the batteries. After significant debugging on the prototype, it was determined that one of the peripherals was left running unnecessarily.
Unable to determine the source of the problem on the physical prototype, the developers moved back to emulation on Veloce, where the increased visibility enabled them to find the source of the problem faster. Using the activity plot, they were able to collect the switching activity of the design. The initial plot showing the problem can be seen in Figure 2.
The design was configured to run two processes: one was using peripheral A, the other was using peripheral A and peripheral B. As can be seen in the graph, one peripheral is accessed at one frequency, creating one set of spikes in switching activity. The second process accesses both peripherals, but less frequently, producing the taller set of spikes.
Figure 2 shows that at some point, the spikes on peripheral A disappear—that is, peripheral A gets left on when peripheral B gets turned on. This is the point where the block is constantly running, but is needed only from time to time. Close examination of the system showed that, indeed, the signal controlling peripheral A in the resource allocation system was kept active.
Correlating Switching Activity to Possible Bugs
With Codelink and Veloce, the designers were able to correlate where the cores were, in terms of software execution, relative to the changes in switching activity shown in the activity plot. Figure 3 shows a correlation cursor in the activity plot near a point where peripheral A gets turned on, along with the code running on the processor cores in the Codelink debugger window.
The problem was related to stopping a peripheral, so the Codelink correlation cursor was set to where the system should have switched off peripheral A (see Figure 4).
At this point, there were two processes active on two different cores that were both turning off peripheral A at the same time (see Figure 5).
Since this system is comprised of multiple processes running on multiple processors, all needing a different mix of peripherals enabled at different times, a system of reference counts is used. When each process starts, it reads a reference count register for each of the peripherals it needs. If it reads a 0, then there are no current users of the block, and the process turns it on. It also increments the reference count and writes it back to the reference count register.
When the process exits, and no longer needs the peripheral to run, it basically reverses the process, decreasing the counter and switching off the block if it reaches zero.
At any point in time, the reference count shows the number of processes currently running that need the peripheral running.
Single Stepping through Problem Code
Using Codelink, the developers were able to single step through the section of code where the block got stuck in the “ON” position. What they saw were two processes, each on a different core, both releasing the same resource. They both read “2” from the reference register, meaning there are two active processes using the peripheral. Next, both cores decided not to turn off the peripheral, as they each saw that another process was actively using it and they both set the counter to “1”. This left the system in a state where there was no process using the peripheral, but it was turned on. As a result, unnecessary toggles and associated power was wasted until the system was rebooted, or ran out of power.
On the surface, this appears to be a standard race condition. In this case, these bus accesses need to be exclusive references to prevent the multiple threads from encountering the race condition. However, it turns out that the software was, in fact, using an exclusive access instruction to reach the reference count register. The hardware team had implemented support for the Advanced eXtensible Interface (AXI) “Exclusive Access” bus cycle. During an exclusive access the slave is required to note which master performed the read. If the next cycle is an exclusive access from that same master, the cycle is allowed. If any other cycle occurs, either a read or a write, then the exclusive access is cancelled. Any subsequent exclusive write is not written, and an error is returned, thus theoretically preventing race conditions.
On closer examination, it turned out that the AXI fabric was implementing the notion of “master” as the AXI master ID from the fabric. Since the processor had four cores, the traffic on the AXI bus for all four cores was coming from the same master port. From the fabric’s perspective and the slave’s perspective, the reads and writes were all originating from the same master—so the accesses were allowed. An exclusive access from one core could be followed by an exclusive access from another core in the same cluster (see Figure 6). This was the crux of the bug.
The ID of the core that originates an AXI transaction is coded into part of the transaction ID. By adding this to the master, which was used for determining the exclusivity of the access to the reference count register, the design allowed it to correctly process the exclusive accesses.
The Veloce emulator gave the developers the needed performance to run the algorithm to the point where the problem could be reproduced. Codelink delivered the debug visibility needed to discover the cause of the problem. The activity plot is an indispensable feature that lets developers understand the relative power consumption of their designs. Together, these give engineers the information and the means to make higher performing, more efficient designs.
Guillaume Boillet is specialist for power products in the Emulation Product Marketing group at Mentor, a Siemens business. He has 15 years of experience in low power design and power analysis working in the mobile chip industry and then EDA. Boillet holds two MSEEs from Supelec in Paris and Ecole Polytechnique de Montreal, and got his MBA from Grenoble Ecole de Management in 2012.
Jean-Marie Brunet is the Senior Marketing Director for the Emulation Division at Mentor, a Siemens business. He has served for over 20 years in application engineering, marketing and management roles in the EDA industry, and has held IC design and design management positions at STMicrolectronics, Cadence, and Micron among others. Jean-Marie holds a Master’s degree in Electrical Engineering from I.S.E.N Electronic Engineering School in Lille, France. Jean-Marie Brunet can be reached at firstname.lastname@example.org