2024 | Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Haozheng Luo, Hong-Yu Chen, Weijian Li, Wei-Po Wang, Han Liu
The paper introduces the Outlier-Efficient Modern Hopfield Model (OutEffHop) to address the issue of outlier inefficiency in large transformer-based models. The main contribution is a novel associative memory model that facilitates efficient retrieval of outliers. This model is inspired by the interpretation of an outlier-efficient attention mechanism (Softmax1) and introduces outlier-efficient Hopfield layers as alternatives to traditional attention mechanisms. The theoretical analysis shows that OutEffHop retains the desirable properties of standard modern Hopfield models, including fixed-point convergence and exponential storage capacity. Empirical results demonstrate that OutEffHop reduces average kurtosis and maximum infinity norm by 22% and 26%, respectively, across four large-scale models (BERT, OPT, ViT, and STanHop-Net). The model also improves these metrics by 3% and 4% compared to other variants of STanHop-Net. The paper provides a detailed theoretical analysis and experimental validation, highlighting the effectiveness of OutEffHop in reducing outliers and improving model performance.The paper introduces the Outlier-Efficient Modern Hopfield Model (OutEffHop) to address the issue of outlier inefficiency in large transformer-based models. The main contribution is a novel associative memory model that facilitates efficient retrieval of outliers. This model is inspired by the interpretation of an outlier-efficient attention mechanism (Softmax1) and introduces outlier-efficient Hopfield layers as alternatives to traditional attention mechanisms. The theoretical analysis shows that OutEffHop retains the desirable properties of standard modern Hopfield models, including fixed-point convergence and exponential storage capacity. Empirical results demonstrate that OutEffHop reduces average kurtosis and maximum infinity norm by 22% and 26%, respectively, across four large-scale models (BERT, OPT, ViT, and STanHop-Net). The model also improves these metrics by 3% and 4% compared to other variants of STanHop-Net. The paper provides a detailed theoretical analysis and experimental validation, highlighting the effectiveness of OutEffHop in reducing outliers and improving model performance.