Model interpretability — Making your model confesses: Saliency maps

How to interpret NLP models using Saliency Maps: an example with transformers.

Facundo Santiago
12 min readJun 8, 2022

For an introduction to the interpretability subject see:

Introduction

Despite their performance, machine learning models are imperfect. But this concept is not new to anyone in the ML field because the question usually is reduced to how wrong should these models be to not be useful? However, answering that question is not that simple for deep learning models. Their opaque nature and their ability to fail in counterintuitive ways sometimes leave us clueless about their behavior.

Interpretability techniques try to alleviate this problem by providing mechanisms to provide explanations about why a given prediction turned out to be what it was. So far, we have explored techniques that make strong assumptions about the inputs of a model (their features) and freely reshape them to introduce intuition about what the model is doing. However, such methods can’t distinguish what comes from perturbing the data, a misbehaving model, or a misbehaving interpretability technique.

--

--

Facundo Santiago

Product Manager @ Microsoft AI. Graduate adjunct professor at University of Buenos Aires. Frustrated sociologist.