Deep Learning Assisted Denoising to Enhance Speakerphone Call Quality: A Comprehensive Literature Review
Author: Tanish Dogra
Journal Name:
Download PDF
Abstract
The many challenges of a good and intelligible user experience when one is on a speakerphone originate from background noise, reverb, distracting surfaces, echo, and several speakers speaking at once. In the case of speakerphone calls, this work focuses on the application of machine learning methods for the improvement of speech as well as the reduction of noise. We build an online noise reduction system for VOIP and mobile phones with the latest Transformer models, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) in real time. The experiments show the effectiveness of cancelling background noise and enhancing the voice clarity.
Keywords
Noise Suppression, Speech Enhancement, CNNs, RNNs, Transformer Models, AEC, DSP, STFT
Conclusion
This research has established the effectiveness of deep learning-based speech denoising approaches in enhancing speakerphone call quality. Traditional methods of noise reduction, such as spectral subtraction and Wiener filtering, cannot cope with non-stationary noise and introduce artifacts. Deep learning approaches, such as CNNs, RNNs, and GANs, have consistently achieved exceptional improvement in signal-to-noise ratio (SNR), speech intelligibility (STOI), and perceptual quality (PESQ and MOS scores). Deep learning models improve speech intelligibility by 10-25% compared to traditional methods. GAN-based models achieve the highest PESQ scores with speech clarity and naturalness. Optimized models support real-time denoising with inference times under 50 ms per frame, supporting deployment on mobile. Despite these advances, problems such as high computational expense and real-time latency remain significant concerns. Model compression, hybrid solutions, and Edge AI deployment are some future research areas that can enhance the deep learning-based noise suppression system efficiency. With the integration of deep learning and efficient inference algorithms, this effort paves the way for real-time noise cancellation technologies, which improve speakerphone call communication clarity, hearing aids, and other speech-processing devices
References
-
How to cite this article
-