Choice of K in K-fold Cross Validation for Classification in Financial Market

Cross Validation is often used as a tool for model selection across classifiers. As discussed in detail in the following paper https://ssrn.com/abstract=2967184, Cross Validation is typically performed in the following steps:

  • Step 1: Divide the original sample into K sub samples; each subsample typically has equal sample size and is referred to as one fold, altogether, K-fold.
  • Step 2: In turn, while keeping one fold as a holdout sample for the purpose of Validation, perform Training on the remaining K-1 folds; one needs to repeat this step for K iterations.
  • Step 3: The performance statistics (e.g., Misclassification Error) calculated from K iterations reflects the overall K-fold Cross Validation performance for a given classifier.

However, one question often pops up: how to choose K in K-fold cross validation. The rule-of-thumb choice often suggested by literature based on non-financial market is K=10. The question is: is it true for Financial Market?

In the following paper, in the context of Financial Market, we compare a range of choices for K in K-fold cross validation for the following 8 most popular classifiers:

  • Neural Network
  • Support Vector Machine
  • Ensemble
  • Discriminant Analysis.
  • Naïve Bayes.
  • K-nearest Neighbours.
  • Decision Tree.
  • Logistic Regression

For those who want to know a bit more, the paper is available: https://ssrn.com/abstract=2967184

Views: 3957

Tags: Analysis, Bayes, Classification, Cross, Decision, Discriminant, Ensemble, K-nearest, Learning, Logistic, More…Machine, Naive, Neighbours, Network, Neural, Regression, Support, Tree, Validation, Vector


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Regi Mathew on May 10, 2020 at 12:17am

Thanks a lot Zhongmin

Comment by Zhongmin Luo on May 9, 2020 at 11:38pm

Hi Regi,

Thanks for the comments regarding choice K in KNN. The paper "CDS Rate Construction Methods by Machine Learning Techniques" goes beyond examines the parameterization of the most popular 8 ML algorithms. I applied all of them to solve a real-world problem, which banks as well as other participants in financial markets are facing. So the paper is titled as such.  

The associated discussions about KNN are presented in bullet #4 of section 3.2 and relevant results are available in Figure 8 and Table 16 in Appendix. 

The version of the paper below is compact; the published version has been expanded into two parts, published in Journal of Financial Data Science, Vol. No. 2, 2019:



Comment by Regi Mathew on May 9, 2020 at 10:29pm

Can you please check the article reference? It is opening the paper below.

"CDS Rate Construction Methods by Machine Learning Techniques"


© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service