The paper explores the misalignment between learning by reconstruction and learning for perception in deep learning models. It demonstrates that the features learned through reconstruction are often uninformative for perception tasks, even though they are interpretable and useful for reconstruction. The study identifies three main reasons for this misalignment: (1) the features with the most reconstructive power are the least informative for perceptual tasks, (2) the features useful for perception are learned last, and (3) there are different model parameters that can produce the same reconstruction error but exhibit significant performance gaps for perception tasks. The paper also discusses the benefits of denoising strategies, such as masking, in alleviating the misalignment, while other noise distributions like additive Gaussian noise are not beneficial. The findings highlight the need for careful design of denoising tasks to improve the alignment between reconstruction and perception tasks, and provide insights into the limitations and potential improvements of reconstruction-based learning methods.The paper explores the misalignment between learning by reconstruction and learning for perception in deep learning models. It demonstrates that the features learned through reconstruction are often uninformative for perception tasks, even though they are interpretable and useful for reconstruction. The study identifies three main reasons for this misalignment: (1) the features with the most reconstructive power are the least informative for perceptual tasks, (2) the features useful for perception are learned last, and (3) there are different model parameters that can produce the same reconstruction error but exhibit significant performance gaps for perception tasks. The paper also discusses the benefits of denoising strategies, such as masking, in alleviating the misalignment, while other noise distributions like additive Gaussian noise are not beneficial. The findings highlight the need for careful design of denoising tasks to improve the alignment between reconstruction and perception tasks, and provide insights into the limitations and potential improvements of reconstruction-based learning methods.