3 Apr 2024 | Druv Pai*, Ziyang Wu, Sam Buchanan, Yaodong Yu, Yi Ma
This paper introduces CRATE-MAE, a white-box transformer-like architecture designed for unsupervised representation learning. The authors leverage the connection between diffusion, compression, and (masked) completion to derive a deep transformer-like masked autoencoder. CRATE-MAE is structured to transform data distributions into and from structured representations, with each layer explicitly identified and designed to perform a specific task. The architecture is evaluated on large-scale image datasets, demonstrating promising performance with significantly fewer parameters compared to standard masked autoencoders. The learned representations are both structured and semantically meaningful, showcasing the effectiveness of the white-box design paradigm in unsupervised learning. The code for CRATE-MAE is available on GitHub.This paper introduces CRATE-MAE, a white-box transformer-like architecture designed for unsupervised representation learning. The authors leverage the connection between diffusion, compression, and (masked) completion to derive a deep transformer-like masked autoencoder. CRATE-MAE is structured to transform data distributions into and from structured representations, with each layer explicitly identified and designed to perform a specific task. The architecture is evaluated on large-scale image datasets, demonstrating promising performance with significantly fewer parameters compared to standard masked autoencoders. The learned representations are both structured and semantically meaningful, showcasing the effectiveness of the white-box design paradigm in unsupervised learning. The code for CRATE-MAE is available on GitHub.