Understanding Test-Time Model Adaptation with Only Forward Passes

The paper introduces a novel test-time adaptation (TTA) method called Forward-Optimization Adaptation (FOA), which aims to adapt trained models to unseen test samples with potential distribution shifts without using backpropagation. FOA is designed to be applicable on resource-limited devices, such as FPGAs, and quantized models, which often lack the computational power and memory capacity required for traditional TTA methods. The key contributions of FOA include: 1. **Input-Level Adaptation**: FOA learns a new prompt as the model's input using a derivative-free covariance matrix adaptation (CMA) evolution strategy. This reduces the dimensionality of the solution space and avoids altering the model weights. 2. **Output Feature-Level Adaptation**: A back-to-source activation shifting mechanism is introduced to directly adjust the activations of out-of-distribution (OOD) testing samples, aligning them with the source in-distribution (ID) samples. 3. **Fitness Function**: A novel unsupervised fitness function is designed to evaluate candidate solutions, combining model prediction entropy and activation statistics discrepancy. 4. **Efficiency and Memory Reduction**: FOA achieves up to 24-fold memory reduction compared to gradient-based TTA methods on ImageNet-C, while outperforming them in terms of accuracy and expected calibration error (ECE). Experiments on four benchmarks (ImageNet-C, ImageNet-R, ImageNet-V2, ImageNet-Sketch) and full precision/quantized models (8-bit and 6-bit ViT) demonstrate the effectiveness of FOA. The method shows superior performance in both accuracy and ECE, making it suitable for deployment on resource-constrained devices and quantized models.The paper introduces a novel test-time adaptation (TTA) method called Forward-Optimization Adaptation (FOA), which aims to adapt trained models to unseen test samples with potential distribution shifts without using backpropagation. FOA is designed to be applicable on resource-limited devices, such as FPGAs, and quantized models, which often lack the computational power and memory capacity required for traditional TTA methods. The key contributions of FOA include: 1. **Input-Level Adaptation**: FOA learns a new prompt as the model's input using a derivative-free covariance matrix adaptation (CMA) evolution strategy. This reduces the dimensionality of the solution space and avoids altering the model weights. 2. **Output Feature-Level Adaptation**: A back-to-source activation shifting mechanism is introduced to directly adjust the activations of out-of-distribution (OOD) testing samples, aligning them with the source in-distribution (ID) samples. 3. **Fitness Function**: A novel unsupervised fitness function is designed to evaluate candidate solutions, combining model prediction entropy and activation statistics discrepancy. 4. **Efficiency and Memory Reduction**: FOA achieves up to 24-fold memory reduction compared to gradient-based TTA methods on ImageNet-C, while outperforming them in terms of accuracy and expected calibration error (ECE). Experiments on four benchmarks (ImageNet-C, ImageNet-R, ImageNet-V2, ImageNet-Sketch) and full precision/quantized models (8-bit and 6-bit ViT) demonstrate the effectiveness of FOA. The method shows superior performance in both accuracy and ECE, making it suitable for deployment on resource-constrained devices and quantized models.

Test-Time Model Adaptation with Only Forward Passes

2024 | Shuaicheng Niu, Chunyan Miao, Guohao Chen, Pengcheng Wu, Peilin Zhao