[slides and audio] Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

OutEffHop is a novel outlier-efficient modern Hopfield model designed to address the outlier inefficiency problem in large transformer-based models. The model introduces an associative memory mechanism that efficiently retrieves memories while minimizing the impact of outliers. The key contribution is the Outlier-Efficient Hopfield layer (OutEffHop), which serves as an alternative to traditional attention mechanisms, offering superior performance in post-quantization scenarios. Theoretical analysis shows that OutEffHop retains and improves the desirable properties of standard Hopfield models, including fixed point convergence and exponential memory capacity. Empirical results demonstrate that OutEffHop achieves significant reductions in average kurtosis and maximum infinity norm across various models, including BERT, OPT, ViT, and STanHop-Net. The model is also effective in reducing outliers during pretraining, particularly in layers of these models. OutEffHop is shown to be a promising alternative to traditional attention mechanisms, with strong outlier-reducing capabilities and improved performance in quantization scenarios. The model is implemented in GitHub and is available for further research and development.OutEffHop is a novel outlier-efficient modern Hopfield model designed to address the outlier inefficiency problem in large transformer-based models. The model introduces an associative memory mechanism that efficiently retrieves memories while minimizing the impact of outliers. The key contribution is the Outlier-Efficient Hopfield layer (OutEffHop), which serves as an alternative to traditional attention mechanisms, offering superior performance in post-quantization scenarios. Theoretical analysis shows that OutEffHop retains and improves the desirable properties of standard Hopfield models, including fixed point convergence and exponential memory capacity. Empirical results demonstrate that OutEffHop achieves significant reductions in average kurtosis and maximum infinity norm across various models, including BERT, OPT, ViT, and STanHop-Net. The model is also effective in reducing outliers during pretraining, particularly in layers of these models. OutEffHop is shown to be a promising alternative to traditional attention mechanisms, with strong outlier-reducing capabilities and improved performance in quantization scenarios. The model is implemented in GitHub and is available for further research and development.

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

2024 | Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Haozheng Luo, Hong-Yu Chen, Weijian Li, Wei-Po Wang, Han Liu