This formula-free summary provides a short overview about how PCA (principal component analysis) works for dimension reduction, that is, to select k features (also called variables) among a larger set of n features, with k much smaller than n. This smaller set of k features built with PCA is the best subset of k features, in the sense that it minimizes the variance of the residual noise when fitting data to a linear model. Note that PCA transforms the initial features into new ones, that are linear combinations of the original features.
Steps for PCA
The PCA algorithm proceeds as follows:
The proportion of the variance that each eigenvector represents can be calculated by dividing the eigenvalue corresponding to that eigenvector by the sum of all eigenvalues.
If the original features are highly correlated, the solution will be very unstable. Also the new features are linear combinations of the original features, and thus, may lack interpretation. The data does not need to be multinormal, except if you use this technique for predictive modeling using normal models to compute confidence intervals.
Source for picture: click here
Click here (Wikipedia) to read the implementation details. This is a very long article, but you can focus on the section entitled Computing PCA using the covariance method.