This article is written by Giuseppe Bonaccorso.
One of the most interesting effects of PCA (Principal Component Analysis) is to decorrelate the input covariance matrix C, by computing the eigenvectors and operating a base change using a matrix V:
The eigenvectors are sorted in descending order considering the corresponding eigenvalue, therefore Cpca is a diagonal matrix where the non-null elements are λ1 >= λ2 >= λ3 >= … >= λn. By selecting the top p eigenvalues, it’s possible to operate a dimensionality reduction by projecting the samples in the new sub-space determined by the p top eigenvectors (it’s possible to use Gram-Schmidt orthonormalization if they don’t have a unitary length). The standard PCA procedure works with a bottom-up approach, obtaining the decorrelation of C as a final effect, however, it’s possible to employ neural networks, imposing this condition as an optimization step. One the most effective model has been proposed by Rubner and Tavan (and it’s named after them). Its generic structure is:
where V (n × n) is a lower-triangular matrix with all diagonal elements to 0 and W has a shape (n × m). Moreover, it’s necessary to store the y(t) in order to compute y(t+1). This procedure must be repeated until the output vector has been stabilized. In general after k < 10 iterations, the modifications are under a threshold of 0.0001, however, it’s important to check this value in every real application.
The training process is managed with two update rules:
The first one is Hebbian based on the Oja’s rule, while the second is anti-Hebbian because its purpose is to reduce the correlation between output units. In fact, without the normalization factor, the update is in the form dW = -αy(i)y(k), so to reduce the synaptic weight when two output units are correlated (same sign).