Speech enhancement algorithms for audiological applications
- Ayllón Álvarez, David
- Manuel Rosa Zurera Director/a
- Roberto Gil Pita Codirector/a
Universidad de defensa: Universidad de Alcalá
Fecha de defensa: 29 de noviembre de 2013
- Saturnino Maldonado Bascón Presidente/a
- Manuel Utrilla Manso Secretario/a
- Alberto González Salvador Vocal
- Nicolás Ruiz Reyes Vocal
- Hamid Krim Vocal
Tipo: Tesis
Resumen
The improvement of speech intelligibility is a traditional problem which still remains open and unsolved. The recent boom of applications such as hands-free communications or automatic speech recognition systems and the ever-increasing demands of the hearing-impaired community have given a definitive impulse to the research in this area. This PhD thesis is focused on speech enhancement for audiological applications. Most of the research conducted in this thesis has been focused on the improvement of speech intelligibility in hearing aids, considering the variety of restrictions and limitations imposed by this type of devices. The combination of source separation techniques and spatial filtering with machine learning and evolutionary computation has originated novel and interesting algorithms which are included in this thesis. The thesis is divided in two main parts. The first one contains a preliminary study of the problem and a thorough review of the state of the art in this field, from which the goals of the thesis are defined. The second part contains a description of the research conducted to fulfill the objectives of this thesis, including the experimental work and the results obtained. In a first stage, the speech enhancement problem is formally described and studied in the time-frequency domain. The particular engineering constraints and requirements demanded by hearing aids are also defined. Once the problem has been described, a review of the state of the art has been carried out. The review includes existing solutions to both the single-channel and multichannel speech enhancement problem, considering the noise reduction and the source separation approaches, as well as a review of the application of such algorithms in hearing aids. The first problem addressed in this thesis is the sound source separation of undetermined mixtures in the time-frequency domain, without considering any type of computational restriction. The performance of the so-called DUET algorithm, which performs speech separation with only two microphones, has been evaluated in a variety of scenarios including linear and binaural anechoic mixtures, echoic mixtures, and mixtures of speech with other types of sources such as noise and music. The study reveals the lack of robustness of the original DUET algorithm, whose performance is notably decreased in echoic and binaural mixtures and when mixing speech with noise and music. In order to overcome this problem, a novel source separation algorithm that combines the mean shift clustering technique with the basis of DUET has been proposed. The clustering step in DUET, which is based on a weighted histogram, is replaced by a weighted-Gaussian kernel mean shift algorithm, which has been inferred for the problem at hand. The analysis of the results obtained demonstrates that the proposed algorithm clearly outperforms the original DUET and a modification thereof using k-means. Additionally, the proposed algorithm has been extended to the case of using any number of microphones and array geometry. The automatic speech source enumeration problem, which is related to the source separation problem, has also been tackled. A novel algorithm based on information theoretic criteria and the estimation of the source delays between the signals received by two microphones has been proposed. The algorithm has obtained very good results and it has shown good robustness in the enumeration of anechoic mixtures up to 5 speech sources. Additionally, the potential of the algorithm to enumerate sources in echoic mixtures has been demonstrated. The remaining of the thesis has been focused on hearing aids. The first problem related to hearing aids addressed in this thesis is the improvement of speech intelligibility in monaural hearing aids. First, a study of the computational resources available for signal processing in state-of-the-art commercial hearing aids has been carried out. The result of this study has been used to limit the computational cost of the speech enhancement algorithms for hearing aids proposed in this thesis. After that, a low-cost algorithm for single-channel speech enhancement has been proposed. The algorithm combines a generalized version of the LS estimator with a tailored feature selection algorithm based on evolutionary computation, with the purpose of estimating a time-frequency soft mask that maximizes the output PESQ value, which is a metric highly correlated with intelligibility. The mask is estimated using a novel set of features extracted from the STFT of the mixture. Excellent results are obtained even with low SNRs. The next work approaches the speech enhancement problem in wireless-communicated binaural hearing aids. In this case, the two devices are connected with a wireless link, which increases the power consumption. The objective in this thesis is the design of low-cost speech enhancement algorithms that increase the energy efficiency of the wireless-communicated binaural hearing aids. First, an extremely low-cost binaural speech separation system that maximizes the WDO has been proposed. It is based on a quadratic discriminant that uses the ILD and ITD cues to classify each time-frequency point between speech or noise. The weights of the discriminant are calculated using a tailored evolutionary algorithm. The second low-cost algorithm uses the information from neighbor time-frequency points to estimate the IBM, using a generalized version of the LS-LDA, introducing a weighted MSE metric that allows estimating the IBM and maximizing the WDO factor at the same time. In both algorithms, a transmission schema to enhance the energy efficiency of the wireless system has been proposed. The schema quantizes the amplitude and phase values of each frequency band with a different number of bits. The bit distribution among frequency bands is optimized by evolutionary computation. Finally, the last work included in this thesis concerns the design of beamformers for hearing aids fitted to a determined person. The beamforming filter coefficients can be easily fitted to a specific subject as long as the HRTF of that person is known. Unfortunately, this information is not available for every person that needs a new device, and the lack of this knowledge causes gain reduction and distortions. With this problem in mind, three different approaches to optimize the beamforming filter coefficients in case of unknown HRTF have been proposed. The three methods aim at maximizing the average array gain while minimizing the average speech distortions, using a design dataset. The experimental work has demonstrated that the proposed methods decrease significantly the gain reduction and distortions caused by computing the filter coefficients with unknown HRTF of the subject.