A Data Locality Optimizing Algorithm

A Data Locality Optimizing Algorithm

June 26-28, 1991 | Michael E. Wolf and Monica S. Lam
This paper proposes an algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling. The algorithm is based on two concepts: a mathematical formulation of reuse and locality, and a loop transformation theory that unifies the various transforms as unimodular matrix transformations. The algorithm has been implemented in the SUIF compiler and is successful in optimizing codes such as matrix multiplication, successive over-relaxation (SOR), LU decomposition without pivoting, and Givens QR factorization. Performance evaluation indicates that locality optimization is especially crucial for scaling up the performance of parallel code. The paper discusses the importance of tiling in improving data locality, using matrix multiplication as an example. Tiling reduces the number of intervening iterations and thus data fetched between data reuses, allowing reused data to still be in the cache or register file, and hence reducing memory accesses. The tile size can be chosen to allow the maximum reuse for a specific level of memory hierarchy. The paper also discusses the problem of finding the best combination of loop interchanges, skewing, reversal and tiling that maximizes data locality within loop nests. It introduces a loop transformation theory that unifies common loop transforms as unimodular matrix transforms. This theory allows the legality of all compound transformations to be reduced to satisfying the same simple constraints. The paper also introduces a formulation of an objective function for data locality, capturing the potential of data locality optimization for a given loop nest. The paper presents an algorithm to improve locality by applying a combination of loop interchanges, skewing, reversal and tiling. The algorithm uses the evaluation function and the legality constraints to reduce the search space of the transforms. It is successful in tiling numerical algorithms such as matrix multiplication, successive over-relaxation (SOR), LU decomposition without pivoting, and Givens QR factorization. The algorithm is implemented in the SUIF compiler and handles non-rectangular loop bounds, generating non-uniform tiles to handle non-perfectly loop nests.This paper proposes an algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling. The algorithm is based on two concepts: a mathematical formulation of reuse and locality, and a loop transformation theory that unifies the various transforms as unimodular matrix transformations. The algorithm has been implemented in the SUIF compiler and is successful in optimizing codes such as matrix multiplication, successive over-relaxation (SOR), LU decomposition without pivoting, and Givens QR factorization. Performance evaluation indicates that locality optimization is especially crucial for scaling up the performance of parallel code. The paper discusses the importance of tiling in improving data locality, using matrix multiplication as an example. Tiling reduces the number of intervening iterations and thus data fetched between data reuses, allowing reused data to still be in the cache or register file, and hence reducing memory accesses. The tile size can be chosen to allow the maximum reuse for a specific level of memory hierarchy. The paper also discusses the problem of finding the best combination of loop interchanges, skewing, reversal and tiling that maximizes data locality within loop nests. It introduces a loop transformation theory that unifies common loop transforms as unimodular matrix transforms. This theory allows the legality of all compound transformations to be reduced to satisfying the same simple constraints. The paper also introduces a formulation of an objective function for data locality, capturing the potential of data locality optimization for a given loop nest. The paper presents an algorithm to improve locality by applying a combination of loop interchanges, skewing, reversal and tiling. The algorithm uses the evaluation function and the legality constraints to reduce the search space of the transforms. It is successful in tiling numerical algorithms such as matrix multiplication, successive over-relaxation (SOR), LU decomposition without pivoting, and Givens QR factorization. The algorithm is implemented in the SUIF compiler and handles non-rectangular loop bounds, generating non-uniform tiles to handle non-perfectly loop nests.
Reach us at info@study.space