A Guide for Applying Machine Learning Techniques in Finance

Does it sound familiar to you? In order to get an idea of how to choose a parameter for a given classifier, you have to cross reference to a number of papers or books, which often turn out to present competing arguments for or against a certain parameterization choice but with few applications to real-world problems.

For example, you may find a few papers discussing optimal selection of K in K-nearest Neighbour, one supporting so-called square-root of sample size N method, another talking about selecting K based on how well the classifier performs according to its cross-validation samples. The parameterization choices have signficant impacts on the performances of classifiers; so it's important to get them right. Parameterized differently, as shown in the paper below, the performances of each of the 8 most popular classification algorithms can be significantly different.

The following 51-page paper introduces 8 most popular classifiers in Machine Learning and illustrates each with an example based on financial data from real world. It can serve as a guide for how to apply Machine Learning Techniques to solve problems faced by finance industry: https://ssrn.com/abstract=2967184.

Please see the presentation slides that present a summary of classification techniques used in finance industry: https://ssrn.com/abstract=2973065.

Views: 18813

Tags: Analysis, Bayes, Classification, Decision, Discriminant, Ensemble, Financial, KNN, Learning, Logistic, More…Machine, Market, Naive, Network, Neural, Regression, SVM, Tree


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Zhongmin Luo on June 18, 2017 at 10:16am

Also, very happy you like. Thanks for comments.

Comment by Zhongmin Luo on June 18, 2017 at 9:24am

I agree that academia and practitioners have not paid much attention to correlations between feature variables, which may not be as problematic as the correlation impacts in finance. 

In finance, especially for those variables with term structure, such as yield curves, CDS curves (which is our focus in this paper), we have to investigate the correlation impacts on classification performance. Thus, we use Naive Bayes as our entry point and you can find the rest from our paper. 

Comment by Ann Kang on June 18, 2017 at 9:02am

It's understood that Multicolinearity is an issue related to multivariate regression; when that happens, the coefficient estimates and the performance of the regression is not reliable.

However, the similar issue, i.e., correlation and its impact on classification performance is underinvestigated or rarely investigated to my understanding. The paper adds value in this regard i believe.

Comment by Ann Kang on June 18, 2017 at 8:23am


I like the paper; it's very well written and full of interesting examples, illustrations with a comprehensive introduction about financial application of a list of classifiers and Machine Learning in general.

First, I don't follow why correlation and its impact on performance are discussed in the paper. Second, where are the further research directions in this area?



Comment by Zhongmin Luo on June 12, 2017 at 11:44am

Hi, I am glad you like it. You are right, each of the qualitative information is created as Indicator Function, i.e., 0 or 1. Regression based on these dummy variables is run on the data.

When you have your regression fitting with all the data, you are building regression model for the whole dataset. The alternative is to build separate regression model for each segment based on the qualitative information, which inevitably reduces the sample size for your model, which is not desirable in our context.

Hope it helps.



Comment by Jean-Charles Forszpaniak on June 12, 2017 at 3:07am


First things,first: very nice paper. Papers on Finance / Corporate Finance are often under-represented... unfortunately.

Very short question - sorry if it's trivial: on page 3, the formula is clear, but how do you manage your indicator variables, as they map qualitative information (sector, region)., and, hence, be discrete by nature. I am not sure I grasp the whole principle, here...

Best Regards,

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service