# .

This formula-free summary provides a short overview about how PCA (principal component analysis) works for dimension reduction, that is, to select k features (also called variables) among a larger set of n features, with k much smaller than n. This smaller set of k features built with PCA is the best subset of k features, in the sense that it minimizes the variance of the residual noise when fitting data to a linear model. Note that PCA transforms the initial features into new ones, that are linear combinations of the original features.

Steps for PCA

The PCA algorithm proceeds as follows:

1. Normalize the original features: remove the mean from each  feature
2. Compute the covariance matrix on the normalized data. This is an n x n symmetric matrix, where n is the number of original features, and the element in row i and column j is the covariance between the i-th and j-th column in the data set.
3. Calculate the eigenvectors and eigenvalues of the covariance matrix. These eigenvectors must be unit eigenvectors, that is, their lengths are 1. This step is the most intricate, and most software packages can do it automatically.
4. Choose the k eigenvectors with the highest eigenvalues.
5. Compute the final k features, associated with the k highest eigenvalues: for each one, multiply the data set matrix, by the associated eigenvector. Here we assume that the eigenvector has one column and n rows (n is the number of original variables), while the data set matrix has n columns and m rows (m is the number of observations), Thus the resulting final features have m rows and one column: it provides the values for the new features, computed at each of the m observations.
6. You may want to put back the mean that was removed in step #1.

The proportion of the variance that each eigenvector represents can be calculated by dividing the eigenvalue corresponding to that eigenvector by the sum of all eigenvalues.

Caveats

If the original features are highly correlated, the solution will be very unstable. Also the new features are linear combinations of the original features, and thus, may lack interpretation. The data does not need to be multinormal, except if you use this technique for predictive modeling using normal models to compute confidence intervals.

Click here (Wikipedia) to read the implementation details. This is a very long article, but you can focus on the section entitled Computing PCA using the covariance method.

DSC Resources

Popular Articles

Views: 17687

Tags: predictive modeling

Comment

Join Data Science Central Comment by Zhongmin Luo on May 26, 2017 at 12:03am

PCA has many applications in finance industry; I have plan to write a paper in this area.This paper has used PCA to detect and mitigate the risk of highly correlated feature variables (which is common in finance): https://ssrn.com/abstract=2967184 Comment by Lance Norskog on April 26, 2017 at 5:32pm

Well, PCA is not "based on" rotation matrices (it's from 1900 after all) but "an example of" a rotation matrix. Comment by Lance Norskog on April 26, 2017 at 5:31pm

PCA became clear to me when I realized that it is based on the computer graphics tool called a rotation matrix.

https://en.wikipedia.org/wiki/Rotation_matrix

A rotation matrix rotates a shape in 2-space or 3-space, keeping the same area (volume). Take a 100x3 matrix, consider the rows as points in 3-space, and the columns as dimensions x, y, and z.  PCA creates a rotation matrix which gives the largest distance in (x,y) between points in (x,y,z). This gives the most dramatic visualization when you plot in 2D using (x,y) and dropping (z).