OptNet: Differentiable Optimization as a Layer in Neural Networks (Amos 2017)

Nut graf: OptNet is a neural network architecture that includes embedded within it quadratic programs. The paper’s main contributions are: (1) an architecture that contains as layers (the solutions to) quadratic programs; (2) a method to differentiate through quadratic programs; (3) a method that reuses a matrix factorization used in solving a quadratic program to render insignificant the cost of computing gradients for said program; and (4) an efficient GPU-accelerated implementation of a primal-dual QP solver. The authors acknowledge that training OptNets can be quite challenging (many parameter changes are just reductions between problems), but provide evidence for their claim that OptNet might better suited than traditional neural networks to learn hard constraints via experimental validation on learning to play mini-Sudoku.

Differentiating through quadratic programs

The method differentiates through the KKT optimality conditions for quadratic programs. This method is well-defined because the solution of a quadratic program is subdifferentiable almost everywhere. Backpropagation reuses the LU factorization of the (symmetrized) KKT matrix (computed during the solve) at the solution when obtaining gradients, making the complexity of gradient computation quadratic.