.

Surrogate models can help explain machine learning models of medium to high complexity. They are simpler models that can be used to explain a more complex model. They are assumed to be indicative of the internal mechanisms of the complex model and are not able to perfectly represent the underlying response function, nor are they capable of capturing the complex feature relationships. They help the users to understand the trends in the prediction outputs given out by the model, with variations in selected attributes from the set of independent variables. From the past real-world experience, the users will have clear expectation about the model outputs when the inputs are varied in a particular fashion. These input-output relations are captured in surrogate models. With simple plots of input-output relations generated from the surrogate models, you can easily explain the response of the models to selected atributes in a specific range. To fully explain the model, you need to train multiple surrogate models by selecting one or more inputs from a set of important attributes of the model.

Training a surrogate model is the easiest method of interpreting the behavior of an existing machine learning model. To train a surrogate model, you don’t need to know anything about the production model and you may see it as a black box. It has an input data, and when we pass it, we get an output. Following are the essential necessities for training a surrogate model:

- An existing machine learning model.

- Input data that can be processed by the existing model. This can be real-world data from the production environment.

Follow these steps:

1. Pass the data (independent variable) into the black box model and get the prediction value.

2. Train the surrogate model, using the independent variables from input data and the prediction from the black box as the dependent variable.

3. Calculate the prediction error of the surrogate model and compare it with the predictions of the black box. The smaller the error, the better the surrogate model explains the black box.

When we get a surrogate model which has an acceptable prediction error, we can look at its parameters to understand which features are important and how the black box model works. Since the surrogate models are trained only on the predictions of the black box model instead of the real outcome, they can only interpret the model, and not the real data.

The globally interpretable attributes of a simple model are used to explain global attributes of a more complex model. Like global surrogate models, local surrogate models are simple models of complex models, but they are trained only for certain, interesting rows of data(for instance, the best customers in a dataset or most-likely-to-fail pieces of equipment according to some model’s predictions). Global surrogate cares about explaining the whole logic of the model, while local surrogate is only interested in understanding predictions restricted with a limited range of input variables.

See you next time .........

Janardhanan PS

Machine Learning Evangelist

- Using a business rules engine to streamline decision-making
- IBM boosts vertical cloud push with financial services cloud
- Exploring GRC automation benefits and challenges
- Check model accuracy with Facebook AI's new data set
- AR use cases gain ground due to COVID-19, maturing tech
- Air Force's data overhaul makes analytics a priority
- AI adoption in the supply chain requires a strategic approach
- New DataRobot CEO sees bright AI future for the vendor
- Why consider an augmented data catalog?
- Consider IoT TPM security to augment existing protection
- 11 Best Data Science Blogs to Follow

Posted 12 April 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central