Voice Initiated: Processing Voice Signals for Intelligent Applications



Improving the clarity of speech signals for hands-free Voice-assist applications is critical to today’s voice initiated intelligent applications. As the Internet of Things (IoT) technologies penetrate our daily lives and homes, the intelligent voice market has become a critical component of these technologies.

 

Parks Associates (www.parksassociates.com) announced Voice Control as the No.1 consumer IoT trend at CES®2017, citing “Voice control is vying to become the primary user interface for the smart home and connected lifestyle”. (Parks Associates announces “Top 10 Consumer IoT Trends in 2017” and players to watch in the new year, 2016).

Voice assist applications have expanded well beyond the smartphone into diverse markets vying for inclusion in the home and business. User experience differentiates and defines the better product. Consumers expectation of reliable and accurate voice control applications is driving development.

Intelligent voice-assist applications include both currently on the market and in development, but are not limited to AI voice assistants, doorbell/intercom, smart-home, security systems, smoke detectors/alarms, medical information relay systems, baby monitors, remote classroom, smart appliances, and home security.

In a voice-assist application, the speech stream is generally initiated a distance from the voice assist microphone. This is referred to as Far-Field speech. In many voice assist applications, a Keyword sometimes referred to as Wake-word, must be recognized by the application.

 

Speech recognition performance degrades drastically under noisy and reverberant environments. As in any home, office, or even outdoor application, sound is all around us. The greater distance a speaker is from a microphone, the greater the level of distortion with the addition of the ambient noise streams. Background noises, such as a running dishwasher, television set, children playing, dogs barking, need to be removed from the sound stream so that the keyword can be distinguished from other speech signals by the application.

The mixing of background noise with the speech of interest results in a dramatic decline of speech recognition accuracy in the presence of noise and reverberation. This is especially true when the background noise is itself speech. The effect worsens as the distance between the talker and the microphone increases.

Introducing mixed signal Far-Field Voice Input Processing, Adaptive Digital Technologies’ TMS320C5517 HD AEC Clear Speech Solution.

Adaptive Digital’s high definition acoustic echo canceller (HD AEC) is exceptional at handling echo. HD AEC has, has integrated both Noise Reduction, and Automatic Gain Control (AGC) into its AEC algorithm.

Adaptive Digital uses certain algorithms that recognize the dominant voice and suppress background chatter noise. The Far-field Voice Input Processing software first detects far-field speech, then reduces the clutter in the voice stream so that the voice input application can send a clear voice signal, or distinguish a wake-word from other noise sources.

For certain environments, a microphone array may be employed for voice capture. In a microphone array, a number of microphones can be arranged in either a circular, or linear pattern and used to pick up speech signals via phase steering. Essentially, the microphones, while not physically pointing in any specific direction will point acoustically in one or many directions. When a voice command emanates from a particular direction, the clutter noise on the periphery of that direction is either reduced or not picked up by the microphone array.

The number of microphones and the distance between them in the array will affect the accuracy, frequency and direction of the directional beam.

Beamforming software improves the signal to noise ratio of speech signals by exploiting the phase arrivals of the speech at each microphone. The beamforming algorithm causes the microphone gain to be maximum in the direction voice of the dominant speaker. By increasing the gain in that direction while reducing the gain in the direction of the reflective paths, the signal-to-interferer ratio is increased, which reduces the reverb effect. Acoustic beamforming software is attached as a pre-processor. With the addition of beamforming software, a more robust solution is required. Adaptive Digital provides a TMS320C6748 Clear Speech Solution with Acoustic Beamforming for high processor performance applications.

The process of cleaning up the sound stream is done through the implementation of noise reduction/suppression of any noise that is not voice.

The difference in location to the microphone will affect the intensity of the voice signal, and as with any human element such as speech, there are many differentiations of intensity, deep or high pitched, soft or loud in volume. A gain level adjusting algorithm is applied to the voice signal to adjust the signal to a consistent level no matter the intensity level of the original voice stream.

The clean and enhanced speech signal can then be recognized by the application, allowing speech detection/recognition to take place.

The future of voice recognition technologies will lie in the detection of inflection and emotion. Adaptive Digital’s clear speech algorithms will aid in the advancement of these technologies.

Other applications for the Adaptive Digital HD AEC clear voice solution include multiple microphone video conferencing systems, soft phones, bluetooth speakers, IP cameras, automobile cabins, USB headsets, voice Command and control applications (voice enabled Smart Speaker, voice enabled Smart Gateway) and security to name a few.


About Adaptive Digital Technologies, Inc.
Adaptive Digital (www.adaptivedigital.com) is an industry-leading provider of voice technology. Adaptive Digital’s products include high definition acoustic echo cancellation (HD AEC), HD stereo acoustic echo cancellation, Packet, line and Network Echo Cancellation, Voice Engine with SIP, high-density conference engines, speech compression, telephony, and voice-quality algorithms across many platforms. Adaptive Digital’s customers include Cisco Systems Inc., Foxconn, General Dynamics, Harris, Motorola, Northrup Grumman, Sonus, Starleaf, Texas Instruments, Windstream, and Yealink. Adaptive Digital is a member of the Texas Instruments’ Design Network.

Contact Information

Adaptive Digital Technologies, Inc.

525 Plymouth Road
Suite 316
Plymouth Meeting, PA, 19462
USA

tele: 610-825-0182 x120
toll-free: 1-800-340-2066
sales@adaptivedigital.com
www.adaptivedigital.com

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • TwitThis