Let's say a set of documents 'S' has a large set of 'pure' texts.
On all documents in S, I am spelling normalisation method, which yields a normalised set S'.
Then I use the chosen method M (which method? ) to make clusters in S, obtaining a clustering result C.
Then I use the same method M to make clusters in S', obtaining a clustering results C'.
Finally I need to compare if there are statistically significant differences between C and C'.
Any help in identifying what technique or method (M) I should use for clustering the text documents?