Home » Uncategorized

Automated Speech Processing using Filter Banks and MFCCs

This article was written by Haytham Fayek.

Speech processing plays an important role in any speech system whether its Automatic Speech Recognition (ASR) or speaker recognition or something else. Mel-Frequency Cepstral Coefficients (MFCCs) were very popular features for a long time; but more recently, filter banks are becoming increasingly popular. In this post, I will discuss filter banks and MFCCs and why are filter banks becoming increasingly popular.

Computing filter banks and MFCCs involve somewhat the same procedure, where in both cases filter banks are computed and with a few more extra steps MFCCs can be obtained. In a nutshell, a signal goes through a pre-emphasis filter; then gets sliced into (overlapping) frames and a window function is applied to each frame; afterwards, we do a Fourier transform on each frame (or more specifically a Short-Time Fourier Transform) and calculate the power spectrum; and subsequently compute the filter banks. To obtain MFCCs, a Discrete Cosine Transform (DCT) is applied to the filter banks retaining a number of the resulting coefficients while the rest are discarded. A final step in both cases, is mean normalization.

time_signal

 Table of content:

  • Setup
  • Pre-Emphasis
  • Framing
  • Window
  • Fourier-Transform and Power Spectrum
  • Filter Banks
  • Mel-frequency Cepstral Coefficients (MFCCs)
  • Mean Normalization
  • Filter Banks vs MFCCs
  • Conclusion

To read the whole article, click here.