Exploitation of structural sparsity in algorithmic differentiation

  • Ausnutzung der strukturellen D├╝nnbesetztheit im algorithmischen Differenzieren

Varnik, Ebadollah; Naumann, Uwe (Thesis advisor)

Aachen : Publikationsserver der RWTH Aachen University (2011)
Dissertation / PhD Thesis

Aachen, Techn. Hochsch., Diss., 2011

Abstract

The background of this thesis is algorithmic differentiation (AD) of in practice very computationally expensive vector functions given as computer programs. Traditionally, most AD software provide forward and reverse modes of AD to calculating the Jacobian matrix accurately at a given point on some kind of internal representation kept on memory or hard disk. In fact, the storage is known to be the bottleneck of AD to handle larger problems efficiently, especially, in reverse mode. The forward mode AD can be implemented very cheaply in terms of memory by single forward propagation of directional derivatives at runtime. However, the reverse mode needs to store some data in the so-called forward sweep to allow the data flow reversal needed for backward propagation of adjoints. The latter is recently the focus of ongoing research activities of the AD community for scalar functions as a single application of reverse mode is enough to accumulate the entire gradient of the target function. To handle the memory bottleneck, checkpointing schedules e.g. revolve have been developed for time-dependent problems. However, they require user's knowledge in both the function as well as the reverse mode AD. In this context, we aim to provide a tool, which minimizes non-AD user's effort in application of the reverse mode AD on their problems for large dimensions. Moreover, we investigate methods to improve the exploitation of structural sparsity of in general derivative tensors such as Jacobians and Hessians. Existing methods are based on the knowledge of the nonzero pattern of target derivative structures, where a compression is usually achieved by the application of some coloring algorithms to a graphical representation. We consider partial distance-2 coloring and star/acyclic coloring of the bipartite and adjacency graph of Jacobians and Hessians, respectively, provided by the coloring package ColPack. To achieve better compression, we distinguish between variable and constant nonzeros, where the latter is supposed to be unchanged at all those points of interest with fix flow of control. Hence, only the former is needed to be computed at runtime. Therefore, general runtime algorithms are provided to compute the variable pattern and the constant entries. We test also their performance in both runtime and achieved colors in the process of sparse Jacobian and Hessian computation. Furthermore, we present an algorithm to overestimate the Hessian sparsity pattern that is referred to as the conservative Hessian pattern estimation. It is gained by exploiting the partial separability of the underlying function. We present numerical results on the computational cost as well as the coloring performance in terms of runtime and achieved colors of the conservative pattern and compare them with those of the exact (nonzero) Hessian pattern. The computational complexity of the latter is known to be quadratic. Finally, the conservative algorithm is refined to a recursive version that is referred to as the recursive Hessian pattern estimation. The recursive algorithm is supposed to converge to the exact one in both runtime and the resulting pattern for sufficiently large recursion level. Thereby, the recursion level one yields exactly the same pattern as the conservative one.

Identifier