Member-only story

Model interpretability — Making your model confesses: Saliency maps

How to interpret NLP models using Saliency Maps: an example with transformers.

Facundo Santiago

12 min readJun 8, 2022

For an introduction to the interpretability subject see:

Model interpretability — Making your model confesses: Shapley values

In a previous post, I wrote about why checking model fairness is such a critical task. Now, let’s see how you can start…

santiagof.medium.com

Introduction

Despite their performance, machine learning models are imperfect. But this concept is not new to anyone in the ML field because the question usually is reduced to how wrong should these models be to not be useful? However, answering that question is not that simple for deep learning models. Their opaque nature and their ability to fail in counterintuitive ways sometimes leave us clueless about their behavior.

Interpretability techniques try to alleviate this problem by providing mechanisms to provide explanations about why a given prediction turned out to be what it was. So far, we have explored techniques that make strong assumptions about the inputs of a model (their features) and freely reshape them to introduce intuition about what the model is doing. However, such methods can’t distinguish what comes from perturbing the data, a misbehaving model, or a misbehaving interpretability technique.

Model interpretability — Making your model confesses: Saliency maps

How to interpret NLP models using Saliency Maps: an example with transformers.

Model interpretability — Making your model confesses: Shapley values

In a previous post, I wrote about why checking model fairness is such a critical task. Now, let’s see how you can start…

Introduction

Written by Facundo Santiago

Responses (2)