Transformation of linearized computational graphs driven by the associativity of the chain rule
Mosenkis, Viktor; Naumann, Uwe (Thesis advisor); Steihaug, Trond (Thesis advisor)
Aachen (2017, 2018)
Dissertation / PhD Thesis
Dissertation, RWTH Aachen University, 2017
The eﬃcient computation of derivatives of mathematical functions which are implemented as computer programs has become more important in various branches of industry and academia. While derivative information is highly desired, its eﬃcient and reliable computation remains a big challenge. In the past ﬁnite diﬀerences was a very popular approach to approximate derivatives This approach fails in more and more real world problems for two reasons. The ﬁrst one is that for many functions approximating tangents by secants does not provide satisfactory results. The second problem of using ﬁnite diﬀerences is that the computational costs for computing the full dense Jacobian grows linearly with the number of inputs of the function. Therefore Algorithmic (or Automatic) Diﬀerentiation (AD) becomes the method of choice for computing derivatives as it promises to address both problems. AD is a collection of techniques allowing computation of accurate derivative information of a function F : Rn →Rm that is implemented as a computer program in some high level programming language. Often the algorithms for AD are expressed as elimination methods on the linearized directed acyclic graphs or l-DAGs. L-DAGs are computational graphs, where the edge weights are equal to the local partial derivatives of the respective operations. The most popular elimination methods are edge and vertex elimination. To compute the Jacobian with the help of edge or vertex eliminations one need to transform the l-DAG into a bipartite graph. Both techniques are based on local exploitation of the chain rule. Vertex elimination is a special case of edge elimination as it can be expressed by a sequence of edge eliminations. Associativity of the chain rule allows to perform the edge and vertex eliminations in diﬀerent orders. The problem of ﬁnding an order of vertex [edge] eliminations that minimizes the number of multiplications involved in the computation of the Jacobian is known as vertex [edge] elimination problem (VE [EE]). Another closely related problem is the minimum edge count problem (MEC). This problem arises when Jacobian-vector products are needed and computation of full Jacobian is not required. The MEC problem aims to ﬁnd a sequence of vertex (edge) eliminations such that the resulting l-DAG has the smallest amount of edges. VE, EE and MEC problems are conjectured to be NP-hard. Therefore in practice the solution of these problems is approximated by greedy like heuristics or computed by branch and bound based methods if the exact solution is of interest. The main aim of this thesis is to support further development of algorithms for solving these problems. Therefore this thesis tries to provide a better understanding of the problem structure by proving several properties of the l-DAGs. These properties help to reduce the search space and establish the base for other outcomes of the thesis. Building blocks for new algorithms are provided by introducing a new approach for computing lower bounds and proving various optimality preserving eliminations for all three problems. It is shown that the lower bounds computed according to this new approach are in most cases superior to the existing lower bounds. For the MEC problem this is the ﬁrst algorithm for computing lower bounds so far. Better lower bounds allow to perform earlier cuts in the branch and bound type of algorithms and also help to understand the quality of the given solution. Optimality preserving eliminations is another useful technique for branch and bound and greedy like algorithms. They allow the search space for the branch and bound algorithm to be reduced and can also be used to improve the greedy heuristics. The underlying search space of the edge elimination is signiﬁcantly bigger compared to that of vertex elimination, therefore edge elimination is almost never used in applications. In this thesis several optimality preserving elimination are proved. The optimality preserving eliminations for edge eliminations are generalized versions of the results for vertex eliminations. Thus mixing of edge and vertex elimination in algorithms is suggested to improve the results for approximative algorithm and to reduce the corresponding search space if the exact solution is required, demonstrating that edge eliminations should not be neglected in praxis.