$$ \newcommand{\pmi}{\operatorname{pmi}} \newcommand{\inner}[2]{\langle{#1}, {#2}\rangle} \newcommand{\Pb}{\operatorname{Pr}} \newcommand{\E}{\mathbb{E}} \newcommand{\RR}{\mathbf{R}} \newcommand{\script}[1]{\mathcal{#1}} \newcommand{\Set}[2]{\{{#1} : {#2}\}} \newcommand{\argmin}[2]{\underset{#1}{\operatorname{argmin}} {#2}} \newcommand{\optmin}[3]{ \begin{align*} & \underset{#1}{\text{minimize}} & & #2 \\ & \text{subject to} & & #3 \end{align*} } \newcommand{\optmax}[3]{ \begin{align*} & \underset{#1}{\text{maximize}} & & #2 \\ & \text{subject to} & & #3 \end{align*} } \newcommand{\optfind}[2]{ \begin{align*} & {\text{find}} & & #1 \\ & \text{subject to} & & #2 \end{align*} } $$

This paper addresses three questions:

- Why do we want interpretability in the context of supervised machine learning?
- What is meant by
*interpretability*in the context of supervised machine learning? - What tools do we have to study the interpretability of machine learning and especially deep neural network models?

Its thesis is three-fold:

- The desire for interpretability arises from a desire to
*trust*machine learning systems, to demonstrate*causality*, to be assured of*transferability*to unseen data, to uncover latent structure in the data (*informativeness*, and to facilitate*ethical decision-making*. - There are two types of useful interprations: those that afford
*transparency*and those that provide for*post-hoc*explanations. - Linear models are not necessarily more interpretable than deep neural networks.

Of the desiderata in (1), trust is ill-defined and causality, I think,
a straw-man. Ethical decision-making is crucial — the ethics of machine
learning algorithms and are woefully underexamined when such algorithms
are deployed. Lipton mentions the questionable use of machine learning models
for predicting the chances of recidivism in courts of law. Also pressing, in
my mind, is the use of machine learning to create hyper-personalized
information filters that would cast individuals into static molds and more
worrisome as objects to be optimized. But in this latter example,
ethical decision-making is not a matter of *interpretability* — the
models for filtering information are interpretable enough, at least at a macro
level; it is rather a matter of whether filtering *as such* is ethical.

The paper proposes two types of interpretability.

*Transparency* takes the forms of *simulatibility*, in that a *human* should
be able to simulate the model by hand, and *decomposability*, in that
each part of the model admits an intuitive explanation, and *algorithmic*,
in that the model should converge to a unique solution. The first point is
somewhat silly — the point of computers is to automate tasks that humans cannot
do in a reasonable amount of time; the second point is fine; the third point
is silly — unique solutions (one example of algorithmic transparency provided
by the author) are not even guaranteed in convex land, and
algorithmic determinism (another example provided by the author) is besides the
point in machine learning.

A model is *post-hoc interpretable* if its predictions admit retrospective
explanations. This section surveys a few standard techniques for querying the
activations of a trained model (t-SNE of learned representations, sensitivity
analyses, etc.). All of these post-hoc techniques are also *ad hoc*.

Lipton makes a wholly unconvincing argument that linear models are not
more interpretable than neural networks. He qualifies his argument by saying
that they are not *strictly* more interpretable, but this much is of course
obvious, for deep neural networks include linear models as a special case.
Lipton makes absolutely no appeal to the statistical properties that
accompany linear models, a surprising oversight. He states that we do not
have a theoretical reason why neural networks underperform linear models
in studying the natural world. This is false; the parameters in
a linear regression, for example, carry with them information about statistical
significance (if certain assumptions about the data hold true).

Lipton also argues that linear models are just as succeptible to spoofing as deep neural networks. This too is false. As a rule of thumb, the more complex your system, the easier it is to spoof it.

In my mind, there are two things that matter with respect to interpretability:

- Does this model generalize to unseen data? A model is
*interpretable*if hypotheses about generalization hold true after attempts to falsify them. I suppose this is more or less the same as Lipton’s transferability. - Is the model
*secure*, i.e., is it resistant to spoofing?

Determing the degree to which a model is secure is an open question; perhaps
it can be studied by using some of the tools that Lipton surveys in his paper.
It is clear to me, however, that validating the security of a neural network
model is *significantly* harder than doing the same for linear ones.