$$ \newcommand{\pmi}{\operatorname{pmi}} \newcommand{\inner}[2]{\langle{#1}, {#2}\rangle} \newcommand{\Pb}{\operatorname{Pr}} \newcommand{\E}{\mathbb{E}} \newcommand{\RR}{\mathbf{R}} \newcommand{\script}[1]{\mathcal{#1}} \newcommand{\Set}[2]{\{{#1} : {#2}\}} \newcommand{\argmin}[2]{\underset{#1}{\operatorname{argmin}} {#2}} \newcommand{\optmin}[3]{ \begin{align*} & \underset{#1}{\text{minimize}} & & #2 \\ & \text{subject to} & & #3 \end{align*} } \newcommand{\optmax}[3]{ \begin{align*} & \underset{#1}{\text{maximize}} & & #2 \\ & \text{subject to} & & #3 \end{align*} } \newcommand{\optfind}[2]{ \begin{align*} & {\text{find}} & & #1 \\ & \text{subject to} & & #2 \end{align*} } $$
This paper addresses three questions:
Its thesis is three-fold:
Of the desiderata in (1), trust is ill-defined and causality, I think, a straw-man. Ethical decision-making is crucial — the ethics of machine learning algorithms and are woefully underexamined when such algorithms are deployed. Lipton mentions the questionable use of machine learning models for predicting the chances of recidivism in courts of law. Also pressing, in my mind, is the use of machine learning to create hyper-personalized information filters that would cast individuals into static molds and more worrisome as objects to be optimized. But in this latter example, ethical decision-making is not a matter of interpretability — the models for filtering information are interpretable enough, at least at a macro level; it is rather a matter of whether filtering as such is ethical.
The paper proposes two types of interpretability.
Transparency takes the forms of simulatibility, in that a human should be able to simulate the model by hand, and decomposability, in that each part of the model admits an intuitive explanation, and algorithmic, in that the model should converge to a unique solution. The first point is somewhat silly — the point of computers is to automate tasks that humans cannot do in a reasonable amount of time; the second point is fine; the third point is silly — unique solutions (one example of algorithmic transparency provided by the author) are not even guaranteed in convex land, and algorithmic determinism (another example provided by the author) is besides the point in machine learning.
A model is post-hoc interpretable if its predictions admit retrospective explanations. This section surveys a few standard techniques for querying the activations of a trained model (t-SNE of learned representations, sensitivity analyses, etc.). All of these post-hoc techniques are also ad hoc.
Lipton makes a wholly unconvincing argument that linear models are not more interpretable than neural networks. He qualifies his argument by saying that they are not strictly more interpretable, but this much is of course obvious, for deep neural networks include linear models as a special case. Lipton makes absolutely no appeal to the statistical properties that accompany linear models, a surprising oversight. He states that we do not have a theoretical reason why neural networks underperform linear models in studying the natural world. This is false; the parameters in a linear regression, for example, carry with them information about statistical significance (if certain assumptions about the data hold true).
Lipton also argues that linear models are just as succeptible to spoofing as deep neural networks. This too is false. As a rule of thumb, the more complex your system, the easier it is to spoof it.
In my mind, there are two things that matter with respect to interpretability:
Determing the degree to which a model is secure is an open question; perhaps it can be studied by using some of the tools that Lipton surveys in his paper. It is clear to me, however, that validating the security of a neural network model is significantly harder than doing the same for linear ones.