Subscribe to DSC Newsletter

Blaine Bateman's Blog (4)

Exploring Kaggle Titanic data with R Packages naniar and UpSetR

Recently (6/8/2018), I saw a post about a new R package "naniar", which according to the package documentation, "provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of 'ggplot2' and tidy data."  naniar is authored…

Continue

Added by Blaine Bateman on July 3, 2018 at 10:30am — No Comments

Simple automated feature selection using lm() in R

There are many good and sophisticated feature selection algorithms available in R.  Feature selection refers to the machine learning case where we have a set of predictor variables for a given dependent variable, but we don’t know a-priori which predictors are most important and if a model can be improved by eliminating some predictors from a model.  In linear regression, many students are taught to fit a data set to find the best model using so-called “least squares”.  In most…

Continue

Added by Blaine Bateman on April 30, 2018 at 7:30am — No Comments

Extending churn analysis to revenue forecasting using R

In this article we will review application of clustering to customer order data in three parts.  First, we will define the approach to developing the cluster model including derived predictors and dummy variables; second we will extend beyond a typical “churn” model by using the model in a cumulative fashion to predict customer re-ordering in the future defined by a set of time cutoffs; last we will use the cluster model to forecast actual revenue by estimating the ordering parameter…

Continue

Added by Blaine Bateman on March 27, 2018 at 10:00am — 8 Comments

Weighted Linear Regression in R

If you are like me, back in engineering school you learned linear regression as a way to “fit a line to data” and probably called in “least squares”.  You probably extended it to multiple variables affecting a single dependent variable.  In a statistics class you had to calculate a bunch of stuff and estimate confidence intervals for those lines.  And that was probably about it for a long time, unless you were focusing on math or statistics.  You may have…

Continue

Added by Blaine Bateman on March 23, 2018 at 6:30am — 2 Comments

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service