Audio Front End:
The Key to Making AR Glasses Usable in the Real World
Daniel Shefer

Voice control is a crucial feature of augmented reality (AR) glasses, allowing users to interact with the digital world hands-free. While many companies such as Apple, Magic Leap, and others have integrated speech recognition into their AR glasses, these companies must focus on the capabilities of the audio front end technology to ensure widespread usability and adoption.

The use of AR glasses in the healthcare industry for hands-free documentation, rapid diagnostics, and other use cases is an example of the critical need for accurate speech recognition for hands-free use.

The audio front end in smart AR glasses captures and processes the user’s voice, filtering background noise and transmitting the signal to voice ID or communication modules. Accurate voice control lets users operate AR glasses hands-free to make phone calls, record video, navigate, and use other components of the glasses.

The Case Against Beamforming

Voice-enabled AR glasses and voice user interfaces (VUIs) have traditionally used beamforming technology to reduce background noise and focus on the speaker’s voice. This technology separates signals based on their direction of arrival at the microphone array. Companies such as Qualcomm, NXP, MediaTek, and DSP provide beamforming solutions.

However, beamforming has limitations. The performance decreases when microphones are placed closer together, which in AR glasses is constrained by the size of the frame. As a general guideline, beamforming can provide ~N^2dB of noise reduction for N microphones in the array without adding distortions.

Additionally, beamforming struggles with handling echoes and situations where noise and desired speech come from the same direction. Some solutions limit the number of microphones they can support.

Kardome’s Voice AI Breaks Away from Beamforming

Kardome has developed a unique spotforming technology that leverages reverberation to separate sound (speech) from different locations. The voice AI company bases its technology on a 3D neural network model specifically designed to address the challenges of noisy, multi-speaker environments.

Kardome’s Spatial Hearing software is a comprehensive voice stack that utilizes its patented spotforming technology to provide superior speech recognition accuracy and enable noise reduction, source separation, audio zooming, wake word capabilities, and biometric identification directly on the AR glasses’ processor. These capabilities unlock AR glasses’ potential for a better voice-user experience.

Kardome also uses AI to achieve highly accurate speech recognition in noisy and reverberant environments. Its voice AI constantly analyzes and adapts to environmental noise sources, forming a virtual bubble around the desired sound source. Its Spatial Hearing software captures sounds from different paths, focusing on the preferred source’s location. As a result, the signal-to-noise ratio (SNR) increases significantly, improving performance by up to ~35 dB without distortion.

Kardome’s AI-driven spotforming technology significantly improves speech recognition performance for SNRs below 10 dB. Kardome can be the difference between a non-functioning ASR and a seamless user experience in challenging scenarios with an SNR of -15 dB or lower.

Three Key Areas Where Kardome Provides Benefits

  • Communication
    AR glasses must support multiple voice use cases, including hands-free telephony, speech recognition, and video recording. The human ear prefers noise reduction, while ASRs prefer distortionless speech. Kardome solves this problem by mitigating interfering signals up to 35 dB, enabling distortionless speech recognition in AR glasses in any challenging acoustic environment.
  • Biometrics
    Voice-enabled devices must prevent unauthorized access. Beamforming can’t attenuate outside noise from any direction, so voice biometrics is used to identify speakers quickly and accurately. Kardome’s Spatial Voice Biometrics delivers 95% accuracy for utterances as short as 1 second in length in any acoustic environment.
  • RecordingAR glasses can record and share videos of what the user sees. When the user focuses on a specific area, Kardome’s Audio Zoom technology can hone in on the desired speaker’s voice, eliminating background noise and other people talking to provide clear audio to accompany the video recording.


Kardome is the premier voice technology provider for flawless speech and voice recognition in any environment, quiet or chaotic. Its patented technology enables consumer electronics, automotive, and enterprise hardware developers to create next-generation voice experiences for consumers.

Daniel Shefer, Kardome VP Business Development, North America