Subscribe to DSC Newsletter

Opinion from team of experts would yield better results, giving us confidence compared to single person’s opinion. That exactly ‘Ensemble Techniques’ would do. A methodology where multiple models are built and results are combined from each model giving us improved outcomes.

Here are the few popular techniques.  

Decision Tree

A flowchart-like tree structure where an internal node represents feature, the branch represents a decision rule and each leaf node represents the outcome. Repetitively divides the working area into sub part by identifying lines.

Code:

from sklearn.tree import DecisionTreeClassifier

DT_model = DecisionTreeClassifier(criterion=’entropy’)

DT_model = DT_model.fit(x_train, y_train)

 

*entropy helps in getting purity of the classification

 

Random Forest

A meta-estimator that fits a number of decision tree classifiers on various sub-samples of the datasets and uses averaging to improve the predictive accuracy and control over-fitting.

Code:

from sklearn.ensemble import RandomForestClassifier

RF_model = RandomForestClassifier(n_estimators=100)

RF_model = RF_model.fit(x_train, y_train)

 

*n_estimator = Number of base estimators in the ensemble

 

Adaboost Classifier

A method that converts weak learners to strong learners. Boosting Algorithms adds iterations of the model sequentially, adjusting weights of the weak learners along the way. Reducing the bias from the model and typically improves the accuracy.

Code:

from sklearn.ensemble import AdaBoostClassifier

AB_model = AdaBoostClassifier(n_estimator=100, learning_rate=0.01)

AB_model = AB_model.fit(x_train, y_train)

 

*learning_rate = Shrinks the contribution of each tree based on value defined

 

Bagging Classifier

Leverages bootstrap procedure to separate training data into different random subsamples. For prediction, bagging classifier will use the prediction with the most votes from each model to product its output and a bagging regression will take average of all the models to product the output.

Code:

from sklearn.ensemble import BaggingClassifier

BC_model = BaggingClassifier(n_estimator=100, max_samples=.7)

BC_model = BC_model.fit(x_train, y_train)

 

*max_sample = Number of samples to draw from x to train each base estimator

 

Gradient Boost Classifier

An addictive training free classification method where tress are built in series and compared to each other based on mathematically derived score of splits. The trees are compared based on weighted leaf scores within each leaf.

Code:

from sklearn.ensemble import GradientBoostingClassifier

GBC_model = GradientBoostingClassifier(n_estimators = 100, learning_rate = 0.1)

GBC_model = GBC_model.fit(x_train, y_train)

 

Evaluate the score for each of the above model using cross_val_score

 

from Sklearn.model_selection import cross_val_score

 

DecisionTree = (cross_val_score(DT_model, x_train, y_train, cv=k_fold, n_jobs=1, scoring='accuracy').mean())

RandomForest = (cross_val_score(RF_model, x_train, y_train, cv=k_fold, n_jobs=1, scoring='accuracy').mean())

Adaboost = (cross_val_score(AB_model, x_train, y_train, cv=k_fold, n_jobs=1, scoring='accuracy').mean())

Bagging = (cross_val_score(BC_modell, x_train, y_train, cv=k_fold, n_jobs=1, scoring='accuracy').mean())

GradientBoost = (cross_val_score(GBC_model, X_train, y_train, cv=k_fold, n_jobs=1, scoring='accuracy').mean())

 

Print and compare the results

 

models = pd.DataFrame({'Models': ['GradientBoost', 'Bagging', 'Adaboost', 'DecisionTree', 'RandomForest'], 'Score': ['GradientBoost', 'Bagging', 'Adaboost', 'DecisionTree', 'RandomForest']})

models.sort_values(by='Score', ascending = False)

Hope, this article helps.

Views: 452

Tags: #ensembletechniques, #machinelearning

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service