Subscribe to DSC Newsletter

Machine Learning with C++ - Polynomial Regression (CPU)

There are a lot of articles about how to use Python for solving Machine Learning problems, with this article I start series of materials on how to use modern C++ for solving same problems and which libraries can be used. I assume that readers are already familiar with Machine Learning concepts and will concentrate on programming issues only.

The first part is about creating Polynomial Regression model with XTensor library. This is C++ library for numerical analysis with multi-dimensional array expressions, and containers of XTensor are inspired by NumPy. A lot of functions in this library also have semantic similar to NumPy.so should be easier to start with this library rather then with Eigen or ViennaCL if you already familiar with NumPy.

I start with simple polynomial regression to make a model to predict an amount of traffic passed through the system at some time point. Our prediction will be based on data gathered over some time period. The X data values correspond to time points and Y data values correspond to time points.

For this tutorial I chose XTensor library.This library was chosen because of its API, which is made similar to numpy as much as possible. There are a lot of other linear algebra libraries for C++ like Eigen or VieanCL but this one allows you to convert numpy samples to C++ with a minimum effort.

  1. Short polynomial regression definition Polynomial regression is a form of linear regression in which the relationship between the independent variable x and the dependent variable y is modeled as an n-th degree polynomial in x.


    Because our training data consist of multiple samples we can rewrite this relation in matrix form:


    Where


    and k is a number of samples if the training data. So the goal is to estimate the parameters vector . In this tutorial I will use gradient descent for this task. First let's define a cost function:


    Where Y is vector of values from our training data. Next we should take a partial derivatives with respect to each  term of polynomial:


    Or in the matrix form:


    And use these derivatives to update vector  on each learning step:


    Where l is a learning rate.

Continue reading the article and source code here. Please feel free to leave comment or create issue in repository if you find some mistakes.

Views: 1155

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Kyrylo Kolodiazhnyi on June 25, 2018 at 1:39am

Juan,

This example requires C++14 compatible compiler.

Comment by Juan Carlos Flores on June 14, 2018 at 3:48pm

I like C++ a lot for being able to manage the CPU at low level and was one of my first battle languages. just curious to know which version are using? 

Comment by Kyrylo Kolodiazhnyi on April 25, 2018 at 10:28am

Yura,

You are definitely right that a code in R or Python will be much simpler for this task. But in some cases you need more precise control over computational resources on a target machine, and C++ is a right tool for such case. Because you can control the size of a binary, memory usage strategy, cpu or gpu load balance, computational threads count, .... So on model selection and creation steps other languages will be more reasonable choice but on deployment stage using C++ may also have sense.

Comment by Yura Boyko on April 23, 2018 at 5:35pm

It's very nice! May be in R it will be more simple?

Comment by Kyrylo Kolodiazhnyi on April 21, 2018 at 10:34pm
Thanks, I'm going to continue this series.
Comment by Dr S Kotrappa on April 19, 2018 at 7:15pm

Nice good article thank you , expect many more !!

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service