Automated Speech Processing using Filter Banks and MFCCs

This article was written by Haytham Fayek.

Speech processing plays an important role in any speech system whether its Automatic Speech Recognition (ASR) or speaker recognition or something else. Mel-Frequency Cepstral Coefficients (MFCCs) were very popular features for a long time; but more recently, filter banks are becoming increasingly popular. In this post, I will discuss filter banks and MFCCs and why are filter banks becoming increasingly popular.

Computing filter banks and MFCCs involve somewhat the same procedure, where in both cases filter banks are computed and with a few more extra steps MFCCs can be obtained. In a nutshell, a signal goes through a pre-emphasis filter; then gets sliced into (overlapping) frames and a window function is applied to each frame; afterwards, we do a Fourier transform on each frame (or more specifically a Short-Time Fourier Transform) and calculate the power spectrum; and subsequently compute the filter banks. To obtain MFCCs, a Discrete Cosine Transform (DCT) is applied to the filter banks retaining a number of the resulting coefficients while the rest are discarded. A final step in both cases, is mean normalization.

Table of content:

Setup
Pre-Emphasis
Framing
Window
Fourier-Transform and Power Spectrum
Filter Banks
Mel-frequency Cepstral Coefficients (MFCCs)
Mean Normalization
Filter Banks vs MFCCs
Conclusion

To read the whole article, click here.

Automated Speech Processing using Filter Banks and MFCCs

Leave a Reply Cancel reply