Another important point is to check the SVM algorithm parameters. As many Machine Learning algorithms, SVM has some parameters that have to be tuned to gain better performance. This is very important: SVM is very sensitive to the choice of parameters. Even close parameters values might lead to very different classification results. Really! In order to find the best for your problem, you might want to test some different values. A great tool to help this job in R is the tune.svm() method. It can test several different values, and return the ones which minimizes the classification error for the 10-fold cross validation.
Example of tune.svm() output:
The γ (gama) has to be tuned to better fit the hyperplane to the data. It is responsible for the linearity degree of the hyperplane, and for that, it is not present when using linear kernels. The smaller γ is, the more the hyperplane is going to look like a straight line.If γ is too great, the hyperplane will be more curvy and might delineate the data too well and lead to overfitting.
Another parameter to be tuned to help improve accuracy is C. It is responsible for the size of the "soft margin" of SVM. The soft margin is a "gray" area around the hyperplane. This means that points inside this soft margin are not classified as any of the two categories. The smaller the value of C, the greater the soft margin.
How to Prepare Data
The svm() method in R expects a matrix or dataframe with one column identifying the class of that row and several features that describes that data. The following table shows an example of two classes, 0 and 1, and some features. Each row is a data entry.
class f1 f2 f3
The input for the svm() method could be:
> svm(class ~., data = my_data, kernel = "radial", gamma = 0.1, cost = 1)
Here "class" is the name of the column that describes the classes of your data and "my_data" is obviously your dataset. The parameters should be the ones best suitable for your problem.