Surrogate models can help explain machine learning models of medium to high complexity. They are simpler models that can be used to explain a more complex model. They are assumed to be indicative of the internal mechanisms of the complex model and are not able to perfectly represent the underlying response function, nor are they capable of capturing the complex feature relationships. They help the users to understand the trends in the prediction outputs given out by the model, with variations in selected attributes from the set of independent variables. From the past real-world experience, the users will have clear expectation about the model outputs when the inputs are varied in a particular fashion. These input-output relations are captured in surrogate models. With simple plots of input-output relations generated from the surrogate models, you can easily explain the response of the models to selected atributes in a specific range. To fully explain the model, you need to train multiple surrogate models by selecting one or more inputs from a set of important attributes of the model.

Training a surrogate model is the easiest method of interpreting the behavior of an existing machine learning model. To train a surrogate model, you don’t need to know anything about the production model and you may see it as a black box. It has an input data, and when we pass it, we get an output. Following are the essential necessities for training a surrogate model:

- An existing machine learning model.

- Input data that can be processed by the existing model. This can be real-world data from the production environment.

Follow these steps:

1. Pass the data (independent variable) into the black box model and get the prediction value.

2. Train the surrogate model, using the independent variables from input data and the prediction from the black box as the dependent variable.

3. Calculate the prediction error of the surrogate model and compare it with the predictions of the black box. The smaller the error, the better the surrogate model explains the black box.

When we get a surrogate model which has an acceptable prediction error, we can look at its parameters to understand which features are important and how the black box model works. Since the surrogate models are trained only on the predictions of the black box model instead of the real outcome, they can only interpret the model, and not the real data.

The globally interpretable attributes of a simple model are used to explain global attributes of a more complex model. Like global surrogate models, local surrogate models are simple models of complex models, but they are trained only for certain, interesting rows of data(for instance, the best customers in a dataset or most-likely-to-fail pieces of equipment according to some model’s predictions). Global surrogate cares about explaining the whole logic of the model, while local surrogate is only interested in understanding predictions restricted with a limited range of input variables.

See you next time .........

Janardhanan PS

Machine Learning Evangelist

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Upcoming DSC Webinar**

- DataOps: How Bell Canada Powers their Business with Data - July 15

Demand for data outstrips the capacity of IT organizations and data engineering teams to deliver. The enabling technologies exist today and data management practices are moving quickly toward a future of DataOps. DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. Register today.

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Upcoming DSC Webinar**

- DataOps: How Bell Canada Powers their Business with Data - July 15

Demand for data outstrips the capacity of IT organizations and data engineering teams to deliver. The enabling technologies exist today and data management practices are moving quickly toward a future of DataOps. DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. Register today.

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central