Opinion from team of experts would yield better results, giving us confidence compared to single person’s opinion. That exactly ‘Ensemble Techniques’ would do. A methodology where multiple models are built and results are combined from each model giving us improved outcomes.

Here are the few popular techniques.

**Decision Tree**

A flowchart-like tree structure where an internal node represents feature, the branch represents a decision rule and each leaf node represents the outcome. Repetitively divides the working area into sub part by identifying lines.

Code:

from sklearn.tree import DecisionTreeClassifier

DT_model = DecisionTreeClassifier(criterion=’entropy’)

DT_model = DT_model.fit(x_train, y_train)

*entropy helps in getting purity of the classification

**Random Forest**

A meta-estimator that fits a number of decision tree classifiers on various sub-samples of the datasets and uses averaging to improve the predictive accuracy and control over-fitting.

Code:

from sklearn.ensemble import RandomForestClassifier

RF_model = RandomForestClassifier(n_estimators=100)

RF_model = RF_model.fit(x_train, y_train)

*n_estimator = Number of base estimators in the ensemble

**Adaboost Classifier**

A method that converts weak learners to strong learners. Boosting Algorithms adds iterations of the model sequentially, adjusting weights of the weak learners along the way. Reducing the bias from the model and typically improves the accuracy.

Code:

from sklearn.ensemble import AdaBoostClassifier

AB_model = AdaBoostClassifier(n_estimator=100, learning_rate=0.01)

AB_model = AB_model.fit(x_train, y_train)

*learning_rate = Shrinks the contribution of each tree based on value defined

**Bagging Classifier**

Leverages bootstrap procedure to separate training data into different random subsamples. For prediction, bagging classifier will use the prediction with the most votes from each model to product its output and a bagging regression will take average of all the models to product the output.

Code:

from sklearn.ensemble import BaggingClassifier

BC_model = BaggingClassifier(n_estimator=100, max_samples=.7)

BC_model = BC_model.fit(x_train, y_train)

*max_sample = Number of samples to draw from x to train each base estimator

**Gradient Boost Classifier**

An addictive training free classification method where tress are built in series and compared to each other based on mathematically derived score of splits. The trees are compared based on weighted leaf scores within each leaf.

Code:

from sklearn.ensemble import GradientBoostingClassifier

GBC_model = GradientBoostingClassifier(n_estimators = 100, learning_rate = 0.1)

GBC_model = GBC_model.fit(x_train, y_train)

**Evaluate the score for each of the above model using cross_val_score**

from Sklearn.model_selection import cross_val_score

DecisionTree = (cross_val_score(DT_model, x_train, y_train, cv=k_fold, n_jobs=1, scoring='accuracy').mean())

RandomForest = (cross_val_score(RF_model, x_train, y_train, cv=k_fold, n_jobs=1, scoring='accuracy').mean())

Adaboost = (cross_val_score(AB_model, x_train, y_train, cv=k_fold, n_jobs=1, scoring='accuracy').mean())

Bagging = (cross_val_score(BC_modell, x_train, y_train, cv=k_fold, n_jobs=1, scoring='accuracy').mean())

GradientBoost = (cross_val_score(GBC_model, X_train, y_train, cv=k_fold, n_jobs=1, scoring='accuracy').mean())

**Print and compare the results**

models = pd.DataFrame({'Models': ['GradientBoost', 'Bagging', 'Adaboost', 'DecisionTree', 'RandomForest'], 'Score': ['GradientBoost', 'Bagging', 'Adaboost', 'DecisionTree', 'RandomForest']})

models.sort_values(by='Score', ascending = False)

Hope, this article helps.

© 2020 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central