Understanding Exploration of Activation Fault Reliability in Quantized Systolic Array-Based DNN Accelerators

This paper presents a comprehensive methodology for exploring and enabling a holistic assessment of the trilateral impact of quantization on model accuracy, activation fault reliability, and hardware efficiency in systolic-array-based DNN accelerators. A fully automated framework is introduced that applies various quantization techniques, fault injection, and hardware implementation to measure hardware parameters like area and latency. The framework also proposes a novel lightweight protection technique to ensure the dependable deployment of the final systolic-array-based FPGA implementation. The framework is capable of analyzing the impact of quantization on reliability, hardware performance, and network accuracy, particularly concerning transient faults in the network's activations. The experiments on established benchmarks demonstrate the analysis flow and the profound implications of quantization on reliability, hardware performance, and network accuracy. The methodology includes quantization-aware training, post-training quantization, range extraction, and design space exploration. Fault simulation is performed using a fault injection engine to inject faults into the activations of the DNNs in the systolic architecture. The reliability analysis step compares the accuracy of the network-under-test with the golden run (without faults) and the faulty run (with faults), using metrics such as SDC (Silent Data Corruption) rate. The framework also includes a fault mitigation technique that detects and corrects out-of-range outputs by reassigned them to the respective upper or lower-bound values. This technique is implemented using specialized hardware units and is designed for easy replacement with other protection methods without compromising the overall versatility of the framework. The results show that the proposed protection technique significantly improves the reliability of the network in the presence of faults. For example, protection Method 3 improves the reliability of the network by more than 34.23% in the worst case for Lenet-5 and more than 51.79% for AlexNet. These results demonstrate the positive impact of the protection technique on reducing the criticality of faults in both networks. The experiments also show that the DNNs used in this experiment are not suitable for a safety-critical application due to their vulnerability to faults in activations and logic. The framework provides a comprehensive analysis of the trade-off between reliability, accuracy, and required computational resources for different quantization levels.This paper presents a comprehensive methodology for exploring and enabling a holistic assessment of the trilateral impact of quantization on model accuracy, activation fault reliability, and hardware efficiency in systolic-array-based DNN accelerators. A fully automated framework is introduced that applies various quantization techniques, fault injection, and hardware implementation to measure hardware parameters like area and latency. The framework also proposes a novel lightweight protection technique to ensure the dependable deployment of the final systolic-array-based FPGA implementation. The framework is capable of analyzing the impact of quantization on reliability, hardware performance, and network accuracy, particularly concerning transient faults in the network's activations. The experiments on established benchmarks demonstrate the analysis flow and the profound implications of quantization on reliability, hardware performance, and network accuracy. The methodology includes quantization-aware training, post-training quantization, range extraction, and design space exploration. Fault simulation is performed using a fault injection engine to inject faults into the activations of the DNNs in the systolic architecture. The reliability analysis step compares the accuracy of the network-under-test with the golden run (without faults) and the faulty run (with faults), using metrics such as SDC (Silent Data Corruption) rate. The framework also includes a fault mitigation technique that detects and corrects out-of-range outputs by reassigned them to the respective upper or lower-bound values. This technique is implemented using specialized hardware units and is designed for easy replacement with other protection methods without compromising the overall versatility of the framework. The results show that the proposed protection technique significantly improves the reliability of the network in the presence of faults. For example, protection Method 3 improves the reliability of the network by more than 34.23% in the worst case for Lenet-5 and more than 51.79% for AlexNet. These results demonstrate the positive impact of the protection technique on reducing the criticality of faults in both networks. The experiments also show that the DNNs used in this experiment are not suitable for a safety-critical application due to their vulnerability to faults in activations and logic. The framework provides a comprehensive analysis of the trade-off between reliability, accuracy, and required computational resources for different quantization levels.

Exploration of Activation Fault Reliability in Quantized Systolic Array-Based DNN Accelerators

17 Jan 2024 | Mahdi Taheri¹, Natalia Cherezova¹, Mohammad Saeed Ansari², Maksim Jenihhin¹, Ali Mahani³,⁴, Masoud Daneshatalab¹,⁵, Jaan Raik¹