Model interpretability — Making your model confesses: Saliency maps
How to interpret NLP models using Saliency Maps: an example with transformers.
For an introduction to the interpretability subject see:
Introduction
Despite their performance, machine learning models are imperfect. But this concept is not new to anyone in the ML field because the question usually is reduced to how wrong should these models be to not be useful? However, answering that question is not that simple for deep learning models. Their opaque nature and their ability to fail in counterintuitive ways sometimes leave us clueless about their behavior.
Interpretability techniques try to alleviate this problem by providing mechanisms to provide explanations about why a given prediction turned out to be what it was. So far, we have explored techniques that make strong assumptions about the inputs of a model (their features) and freely reshape them to introduce intuition about what the model is doing. However, such methods can’t distinguish what comes from perturbing the data, a misbehaving model, or a misbehaving interpretability technique.