Member-only story

Model interpretability

Model interpretability — Making your model confesses: Surrogate models

5 min readMay 12, 2020

For an introduction to the subject see:

Model interpretability — Making your model confesses: Shapley valuesIn a previous post, I wrote about why checking model fairness is such a critical task. Now, let’s see how you can start…
medium.com

As we saw in my Introduction to the Model interpretability, the most straightforward way to get an interpretable machine learning model is to use an algorithm that creates interpretable models at the very beginning. This includes models like Linear Models, Decision Trees, etc.

General Idea

Surrogate models try to extend this idea by training an interpretable model to “mimic” the behavior of a black-box model hoping that by understanding the “mimic” model we will get an understanding of how the black-box model behaves. Hence, surrogate models are a model-agnostic method since they do not require any information about how the black-box model works.

Local vs Global

This method is usually called “global” surrogate models since they try to approximate the behavior for the entire black-box model. On the other hand, there is a variation of this method called “local” surrogate models, where the interpretable model only tries to approximate the behavior of the black-box in a restricted area of the input space (i.e. the neibouring area for an specific sample). The popular framework LIME belongs to this category. Check my post about LIME for a description of the method.

Cover of the book “Engineering Design via Surrogate Modelling”. Not related with Machine Learning though, but kind of the same idea.

To train a surrogate for a black-box model M, trained on and a dataset X with target y, we have to:

Run the model M to get the predictions for the training dataset X to generate ŷ
Generate a new dataset using X and replacing the label column y with the predictions column ŷ
Select an interpretable model to be the surrogate model. This can be Linear Models, Logistic Regression, Decision Trees, Naïve Bayes, or K-nearest neighbors.
Train the interpretable model on the new dataset (the target is ŷ)