September 2019 | White Paper
WEnets: A Convolutional Framework for Evaluating Audio Waveforms
Cite This Publication
.
Abstract:
In this white paper, we describe a new convolutional framework for waveform evaluation, WEnets, and build a Narrowband Audio Waveform Evaluation Network, or NAWEnet, using this framework. NAWEnet is single-ended (or no-reference) and was trained three separate times in order to emulate PESQ, POLQA, or STOI with testing correlations 0.95, 0.92, and 0.95, respectively when training on only 50% of available data and testing on 40%. Stacks of 1-D convolutional layers and non-linear downsampling learn which features are important for quality or intelligibility estimation. This straightforward architecture simplifies the interpretation of its inner workings and paves the way for future investigations into higher sample rates and accurate no-reference subjective speech quality predictions.
Keywords: speech quality; no reference (NR); speech intelligibility; CNN; neural nets
For technical information concerning this report, contact:
Stephen D. Voran
Institute for Telecommunication Sciences
(303) 497-3839
svoran@ntia.gov
For funding information concerning this report, click this link.
Funding Information
Performing Agency
Funding Agency
Disclaimer:
Certain commercial equipment, components, and software may be identified in this report to specify adequately the technical aspects of the reported results. In no case does such identification imply recommendation or endorsement by the National Telecommunications and Information Administration, nor does it imply that the equipment or software identified is necessarily the best available for the particular application or uses.
For questions or information on this or any other NTIA scientific publication, contact the ITS Publications Office at ITSinfo@ntia.gov or 303-497-3572.