.

In this blog, I will introduce a R package for Heterogeneous Ensemble Learning (Classification, Regression) that is fully automated. It significantly lowers the barrier for the practitioners to apply heterogeneous ensemble learning techniques in an amateur fashion to their everyday predictive problems.

Before we dwell into the package details, let’s start with understanding a few basic concepts.

**Why Ensemble Learning?**

Generally, predictions become unreliable when the input sample is out of the training distribution, bias to data distribution or error prone to noise, and so on. Most approaches require changes to the network architecture, fine tuning, balanced data, increasing model size, etc. Further, the selection of the algorithm plays a vital role, while the scalability and learning ability decrease with the complex datasets. Combining multiple learners is an effective approach, and have been applied to many real-world problems. Ensemble learners combine a diverse collection of predictions from the individual base models to produce a composite predictive model that is more accurate and robust than its components. With meta ensemble learning one can minimize generalization error to some extent irrespective of the data distribution, number of classes, choice of algorithm, number of models, complexity of the datasets, etc. So, in summary, the predictive models will be able to generalize better.

How can we build models in more stable fashion while minimizing under-fitting/overfitting which is very critical to the overall outcome? The solution is ensemble meta-learning of a heterogeneous collection of base learners.

**Common Ensemble Learning Techniques**

The different popular ensemble techniques are referred to in the figure below. Stacked generalization is a general method of using a high-level model to combine lower- level models to achieve greater predictive accuracy. In the Bagging method, the independent base models are derived from the bootstrap samples of the original dataset. The Boosting method grows an ensemble in a dependent fashion iteratively, which adjusts the weight of an observation based on the past prediction. There are several extensions of bagging and boosting.

**Overview**

metaEnsembleR is an R package for automated meta-learning (Classification, Regression). The functionalities provided includes simple user input based predictive modeling with the selection choice of the algorithms, train-validation-test split, model valuations, and easy guided unseen data prediction which can help the user’s to build stack ensembles on the go. The core aim of this package is to cater the larger audiences in general. metaEnsembleR significantly lowers the barrier for the practitioners to apply heterogeneous ensemble learning techniques in an amateur fashion to their everyday predictive problems.

**Using metaEnsembleR**

The package consists of the following components:

- Ensemble Classifiers Training and Prediction
- Ensemble Regressor Training and Prediction
- Model Evaluation, Model Results (Observation vs. Prediction on test data) & new unseen data prediction and Disk write I/O performance charts & saving prediction results

All these functions are very intuitive, and their use is illustrated with examples below covering the Classification and Regression problem in general.

**Getting Started**

The package can be installed directly from CRAN

Install from Rconsole:install.packages(“metaEnsembleR”)

However, the latest stable version (if any) could be found on Github, and installed using devtools package.

Install from GitHub:if(!require(devtools)) install.packages(“devtools”)

devtools::install_github(repo = ‘ajayarunachalam/metaEnsembleR’, ref = ‘main’)

**Usage**

library(“metaEnsembleR”)

set.seed(111)

- Training the ensemble classification model is as simple as one-line call to the
**ensembler.classifier**function, in the following ways either passing the csv file directly or the imported dataframe, that takes into account the arguments in the following order starting the Dataset, Outcome/Response Variable index, Base Learners, Final Learner, Train-Validation-Test split ratio, and the Unseen data

ensembler_return ←

ensembler.classifier(iris[1:130,], 5, c(‘treebag’,’rpart’), ‘gbm’, 0.60, 0.20, 0.20, read.csv(‘./unseen_data.csv’))

OR

unseen_new_data_testing iris[130:150,]

ensembler_return ←

ensembler.classifier(iris[1:130,], 5, c(‘treebag’,’rpart’), ‘gbm’, 0.60, 0.20, 0.20, unseen_new_data_testing)

The above function returns the following, i.e., test data with the predictions, prediction labels, model result, and finally the predictions on unseen data.

testpreddata ← data.frame(ensembler_return[1])

table(testpreddata$actual_label)

table(ensembler_return[2])

#### Performance comparison #####

modelresult ← ensembler_return[3]

modelresult

#### Unseen data ###

unseenpreddata ← data.frame(ensembler_return[4])

table(unseenpreddata$unseenpreddata)

- Training the ensemble regression model is the same as one-line call to the
**ensembler.regression**function, in the following ways either passing the csv file directly or the imported dataframe, that takes into account the arguments in the following order starting the Dataset, Outcome/Response Variable index, Base Learners, Final Learner, Train-Validation-Test split ratio, and the Unseen data

house_price ←read.csv(file = ‘./data/regression/house_price_data.csv’)

unseen_new_data_testing_house_price ←house_price[250:414,]

write.csv(unseen_new_data_testing_house_price, ‘unseen_house_price_regression.csv’, fileEncoding = ‘UTF-8’, row.names = F)

ensembler_return ←

ensembler.regression(house_price[1:250,], 1, c(‘treebag’,’rpart’), ‘gbm’, 0.60, 0.20, 0.20, read.csv(‘./unseen_house_price_regression.csv’))

OR

ensembler_return ←

ensembler.regression(house_price[1:250,], 1, c(‘treebag’,’rpart’), ‘gbm’, 0.60, 0.20, 0.20, unseen_new_data_testing_house_price )

The above function returns the following, i.e., test data with the predictions, prediction values, model result, and finally the unseen data with the predictions.

testpreddata ← data.frame(ensembler_return[1])

#### Performance comparison #####

modelresult ← ensembler_return[3]

modelresult

write.csv(modelresult[[1]], “performance_chart.csv”)

#### Unseen data ###

unseenpreddata ← data.frame(ensembler_return[4])

**Examples**

library(“metaEnsembleR”)

attach(iris)

data(“iris”)

unseen_new_data_testing ← iris[130:150,]

write.csv(unseen_new_data_testing, ‘unseen_check.csv’, fileEncoding = ‘UTF-8’, row.names = F)

ensembler_return ← ensembler.classifier(iris[1:130,], 5, c(‘treebag’,’rpart’), ‘gbm’, 0.60, 0.20, 0.20, unseen_new_data_testing)

testpreddata ← data.frame(ensembler_return[1])

table(testpreddata$actual_label)

table(ensembler_return[2])

####Performance comparison#####

modelresult ← ensembler_return[3]

modelresult

act_mybar ← qplot(testpreddata$actual_label, geom= “bar”)

act_mybar

pred_mybar ← qplot(testpreddata$predictions, geom= ‘bar’)

pred_mybar

act_tbl ← tableGrob(t(summary(testpreddata$actual_label)))

pred_tbl ← tableGrob(t(summary(testpreddata$predictions)))

ggsave(“testdata_actual_vs_predicted_chart.pdf”,grid.arrange(act_tbl, pred_tbl))

ggsave(“testdata_actual_vs_predicted_plot.pdf”,grid.arrange(act_mybar, pred_mybar))

####unseen data###

unseenpreddata ← data.frame(ensembler_return[4])

table(unseenpreddata$unseenpreddata)

table(unseen_new_data_testing$Species)

library(“metaEnsembleR”)

data(“rock”)

unseen_rock_data ← rock[30:48,]

ensembler_return ← ensembler.regression(rock[1:30,], 4,c(‘lm’), ‘rf’, 0.40, 0.30, 0.30, unseen_rock_data)

testpreddata ← data.frame(ensembler_return[1])

####Performance comparison#####

modelresult ← ensembler_return[3]

modelresult

write.csv(modelresult[[1]], “performance_chart.csv”)

####unseen data###

unseenpreddata ← data.frame(ensembler_return[4])

**Comprehensive Examples**

More demo examples can be found in the Demo.R file, to see the results run Rscript Demo.R in the terminal.

**Contact**

If there is some implementation you would like to see here or add in some examples feel free to do so. You can always reach me at [email protected]

**Always Keep Learning & Sharing Knowledge!!!**

Views: 254

Tags: Classification, Ensemble, Generalization, Learning, Machine, Meta-Learning, Regression, Stacked, dsc_graph, dsc_ml, More…dsc_tagged

- 11 data science skills for machine learning and AI
- Get started on AWS with this developer tutorial for beginners
- Microsoft, Zoom gain UCaaS market share as Cisco loses
- Develop 5G ecosystems for connectivity in the remote work era
- Choose between Microsoft Teams vs. Zoom for conference needs
- How to prepare networks for the return to office
- Qlik keeps focus on real-time, actionable analytics
- Data scientist job outlook in post-pandemic world
- 10 big data challenges and how to address them
- 6 essential big data best practices for businesses
- Hadoop vs. Spark: Comparing the two big data frameworks
- With accelerated digital transformation, less is more
- 4 IoT connectivity challenges and strategies to tackle them

Posted 10 May 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central