Ultra-Low Power Hands-Free Solution for Voice Powered Smart TV Remotes

Remember the story of Alibaba and the forty thieves? The thieves open a treasure chest with a voice trigger “Open Sesame” in the middle of a forest. I wonder if the captain of the thieves ever had to worry about battery replacement inside the treasure chest. But, that fictional voice interface experience is a perfect analogy for what modern day consumers expect to get out of their voice assistants. Unfortunately, the power-hungry nature of signal processing algorithms, if not tethering devices needing power to the wall, at the very least demands frequent charging.

A smart TV remote control with Push to Talk is one such use case that lacks the futuristic user experience of a voice activated TV remote due to the power constraints. Programming content on the TV is not so different from opening Alibaba’s treasure, but a very complex task to execute using only keypad controls, given the numerous viewing options available both on broadcast as well as IPTV platforms. Finding a remote that is buried somewhere under the couch cushions and then discovering the specific program to view has always been a challenge, even for an average viewer, let alone less tech savvy or agile couch potatoes.

Hands-free voice functionality, therefore, is an indispensable feature to add to TV remote control, whether you want to catch up on your favorite news channel while sipping a cup of coffee in the morning or watch a movie on a Friday night lying on your couch with a bowl of popcorn. With rapid technological advancements in the Smart TV industry, it might appear that integrating far-field voice pickup into a smart TV is a viable option. For the TV to be always connected and using internet raises security concerns as well as challenges. These challenges include making a distinction between the user commands and TV playback and other background noises. And traditional always-on, always-listening solutions consume high standby power, thus eroding the battery life benefits achieved with the push-to-talk option on current voice remotes.

A typical TV remote with a push-to-talk feature has a battery lifetime of six months to one year for normal everyday operation. A seamless user experience along with a battery life comparable to that provided by current push-to-talk solutions on a voice remote is the need of the hour for the TV industry. Do we have to invent the wheel to find an alternate for an always listening solution? Not necessarily, as already on the market are technologies which offer simple but power-efficient solutions that can solve the battery life challenges for a hands-free remote.

One example of a way to replace push-to-talk remotes with an always listening solution is Zero Power Listening™ technology from Vesper Technologies. This solution uses a capacitive MEMS microphone to build a far-field voice remote. Such a design will continuously run a Voice activity detector to detect voiced versus non-voiced frames based on the speech activity level, as shown in the large dashed box in Figure 1. The active voice frames then trigger the rest of the system beginning with a wake word detection engine to identify if a wake word such as “Alexa” is spoken.

Figure 1: Flow chart for Wake word detection using Wake On Sound Mode

Once a wake word is detected, the voice command that follows the wake word is transmitted over a low energy transport protocol such as Bluetooth Low Energy, Zigbee etc. For the Voice activity detector to continuously monitor the systems, microphones at the front end need to operate in standby mode, consuming 200 µA on an average, even when there is no sound to be heard.

Total system standby power, from Voice Activity Detect (VAD) to wake word check and command execution, therefore accumulates to the order of milliamps. In addition, the need for a microphone array to achieve high response accuracy in far-field scenarios only multiplies the standby power consumption of the entire system. Wake word detection therefore becomes the only bottleneck for a system with low power transport such as Bluetooth Low Energy that only consumes 0.1µA on standby. A push-to-talk system on the other hand is only active when the user presses the microphone button, thereby consuming very little overall power during command processing. For a far-field remote to operate with a long-term battery life, we need microphones with ultra-low power consumption in standby with a fast wakeup and a simple detection logic to wake up the DSP on voice activity. Vesper’s Zero Power Listening chimes in as a perfect solution for the challenges mentioned above.

Zero Power Listening is a new power optimized architecture for always-on listening systems with an ultra-low power sound detector. The technology is implemented in Vesper’s VM1010 microphone, which switches between two modes based on the acoustic activity in the environment:

  1. Wake on Sound (WoS) mode with a current consumption of only 8µA where it is looking for sound activity that exceeds a certain sound pressure level (SPL) threshold within the voice band
  2. Normal mode with a current consumption of 85 µA once a sound activity is detected.

The additional Wake on Sound mode in VM1010 acts as an acoustic watchdog to wake up the rest of the processor when there is a sound activity in the environment (Figure 1). In other words, WoS mode runs below the lowest power voice activity detect mode. In a noisy environment the system will move into voice activity detect mode. The threshold configuration between 65 – 78 dBSPL(A) provides an additional adjustment to control the WoS mode based on the expected acoustic level in the surrounding environment. In the case of a voice remote where the user is located at distance of 1-2m from the remote and the remote itself is, for instance, on a couch another 2m from the TV, the threshold provides the flexibility to fine tune the listening level of the microphone against the TV playback volume level. The minimum threshold level of 65 dB avoids false triggering from non-speech activity within the voice band. For a remote-control scenario, the background noise from the TV playback is a major concern to avoid switching from WoS to normal mode operation. Therefore, the maximum threshold of 78 dB is an optimal solution to avoid false triggering in this case. Figure 2 shows the mode triggers recorded with WoS microphone in a 24-hour period, which indicates that the switching between two modes happens only during the most active periods of the day in a typical household. This selective triggering with WoS mode, therefore, saves standby power as the rest of the TV remote including converters, voice processors etc. is in sleep mode.

Figure 2: Logged data from VM1010 (x-axis shows time in a 24-hr period)

Case Study
Figure 3 shows the charge depleted by a TV remote using ZPL technology compared to alternate listening solutions. For the case study, the activity from a VM1010 microphone in a 24-hour period is recorded and then used to calculate the power numbers with the assumption that two AA batteries (3V @2400 mAh) are used. The WoS mode saves 80 percent in total energy compared to an alternate listening solution with capacitive MEMS microphones. On the other hand, the power consumed with the WoS microphone is also comparable to a push-to-talk solution. These savings in energy directly translate into battery life savings on the voice powered remote as shown in Figure 4. The WoS microphone increases the standby life of remotes by 10x and provides an overall battery life savings 4x times that of alternate listening solutions with a typical daily use.

Figure 3: Energy depletion with Wake on Sound as compared to alternate listening solutions

Figure 4: Battery life savings with wake on sound as compared to alternate listening solutions

The power savings obtained from the WoS remote are directly proportional to the number of wake on sound mode hours, i.e., the percent of time the TV playback level is lower than the wake on sound threshold in a day. Considering a TV playback time of three hours/day, Figure 5 shows that the WoS mode significantly increases the battery life by five times compared to when there is no WoS mode present, even when the VM1010 microphone is always in the normal mode during TV playback. These savings further improve as the wake on sound threshold is adjusted so that the microphone is in normal mode for half the time of TV playback.

Figure 5: Battery Life Vs. Wake on Sound hours per day

Vesper’s piezoelectric microphones also offer additional advantages for the voice remote use case. Piezoelectric MEMS microphones have a quick startup time of 50 µsec which is 1000x less than a capacitive MEMS microphone, enabling more keyword detection accuracy. Piezoelectric material is inherently resistant to environmental contaminants such as water, dust, and even kitchen oil or popcorn butter, thereby offering robust performance for the long term. With Vesper’s ZPL technology, a far-field voice remote also eliminates the need for an Accelerometer in the TV remotes, which is used as a wake-up trigger to identify when a user picks up the remote to use the push-to-talk feature. Perhaps this could help in cost and BOM savings for the remote manufacturers. For additional details on Voice remote case study and Vesper’s product portfolio, please reach out.

Udaynag Pisipati is Sr. Field Applications Engineer at Vesper Technologies Inc. and has over 12 years of experience working in Speech/Voice applications for wireless devices. He holds a Masters degree in Electrical Engineering from University of Missouri – Columbia and an MBA from Santa Clara University. A firm believer in speech as a natural user interface for human-machine interaction, his areas of interest include everything related to speech processing such as Microphones/Speakers, Signal Processing and Machine Learning.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • TwitThis