This paper introduces a highly efficient learning-based method for computing good approximations of optimal sparse codes. Sparse coding (SC) is a method for reconstructing input vectors using a sparse linear combination of basis vectors. SC has become popular for feature extraction from data, particularly when the dictionary of basis vectors is learned from unlabeled data. However, the inference algorithm for SC is computationally expensive, making it unsuitable for real-time applications. To address this, the authors propose two versions of a fast algorithm that produces approximate sparse codes, which can be used to compute good visual features or initialize exact iterative algorithms.
The main idea is to train a non-linear, feed-forward predictor with a specific architecture and fixed depth to approximate the sparse code. This predictor is differentiable and can be integrated into globally-trained recognition systems. The method allows for approximate "explaining away" during inference, where if two sets of basis vectors could reconstruct the input equally well, one set is selected while the other is suppressed.
The paper presents two encoder architectures: Learned ISTA (LISTA) and Learned Coordinate Descent (LCoD). LISTA is a truncated form of the Iterative Shrinkage and Thresholding Algorithm (ISTA), while LCoD is a truncated form of the Coordinate Descent algorithm. Both methods learn the matrices involved in the algorithms to minimize approximation error. LISTA is shown to produce approximate solutions with 10 times less computation than the original ISTA for the same error. LCoD is about 20 times faster than CoD for small numbers of iterations.
Experiments show that LISTA and LCoD significantly reduce the number of iterations needed to achieve a given code prediction error. For example, LISTA with one iteration achieves the same error as FISTA with 18 iterations for m = 100 and 35 iterations for m = 400. LCoD with 5 iterations achieves the same error as CoD with 100 iterations. These methods are also effective for object recognition tasks, such as on the MNIST dataset, where LCoD with 10 iterations achieves performance close to the optimal.
The paper concludes that learning the filters and mutual inhibition matrices of truncated versions of FISTA and CoD leads to a dramatic reduction in the number of iterations needed to reach a given code prediction error. This method enables the use of sparse feature extraction in real-time vision and pattern recognition systems.This paper introduces a highly efficient learning-based method for computing good approximations of optimal sparse codes. Sparse coding (SC) is a method for reconstructing input vectors using a sparse linear combination of basis vectors. SC has become popular for feature extraction from data, particularly when the dictionary of basis vectors is learned from unlabeled data. However, the inference algorithm for SC is computationally expensive, making it unsuitable for real-time applications. To address this, the authors propose two versions of a fast algorithm that produces approximate sparse codes, which can be used to compute good visual features or initialize exact iterative algorithms.
The main idea is to train a non-linear, feed-forward predictor with a specific architecture and fixed depth to approximate the sparse code. This predictor is differentiable and can be integrated into globally-trained recognition systems. The method allows for approximate "explaining away" during inference, where if two sets of basis vectors could reconstruct the input equally well, one set is selected while the other is suppressed.
The paper presents two encoder architectures: Learned ISTA (LISTA) and Learned Coordinate Descent (LCoD). LISTA is a truncated form of the Iterative Shrinkage and Thresholding Algorithm (ISTA), while LCoD is a truncated form of the Coordinate Descent algorithm. Both methods learn the matrices involved in the algorithms to minimize approximation error. LISTA is shown to produce approximate solutions with 10 times less computation than the original ISTA for the same error. LCoD is about 20 times faster than CoD for small numbers of iterations.
Experiments show that LISTA and LCoD significantly reduce the number of iterations needed to achieve a given code prediction error. For example, LISTA with one iteration achieves the same error as FISTA with 18 iterations for m = 100 and 35 iterations for m = 400. LCoD with 5 iterations achieves the same error as CoD with 100 iterations. These methods are also effective for object recognition tasks, such as on the MNIST dataset, where LCoD with 10 iterations achieves performance close to the optimal.
The paper concludes that learning the filters and mutual inhibition matrices of truncated versions of FISTA and CoD leads to a dramatic reduction in the number of iterations needed to reach a given code prediction error. This method enables the use of sparse feature extraction in real-time vision and pattern recognition systems.