Member-only story
Model interpretability
Model interpretability — Making your model confesses: Surrogate models
For an introduction to the subject see:
As we saw in my Introduction to the Model interpretability, the most straightforward way to get an interpretable machine learning model is to use an algorithm that creates interpretable models at the very beginning. This includes models like Linear Models, Decision Trees, etc.
General Idea
Surrogate models try to extend this idea by training an interpretable model to “mimic” the behavior of a black-box model hoping that by understanding the “mimic” model we will get an understanding of how the black-box model behaves. Hence, surrogate models are a model-agnostic method since they do not require any information about how the black-box model works.
Local vs Global
This method is usually called “global” surrogate models since they try to approximate the behavior for the entire black-box model. On the other hand, there is a variation of this method called “local” surrogate models, where the interpretable model only tries to approximate the behavior of the black-box in a restricted area of the input space (i.e. the neibouring area for an specific sample). The popular framework LIME belongs to this category. Check my post about LIME for a description of the method.

To train a surrogate for a black-box model M, trained on and a dataset X with target y, we have to:
- Run the model M to get the predictions for the training dataset X to generate ŷ
- Generate a new dataset using X and replacing the label column y with the predictions column ŷ
- Select an interpretable model to be the surrogate model. This can be Linear Models, Logistic Regression, Decision Trees, Naïve Bayes, or K-nearest neighbors.
- Train the interpretable model on the new dataset (the target is ŷ)