This article provides a comprehensive tutorial and survey on the efficient processing of Deep Neural Networks (DNNs). It covers the background of DNNs, their applications, and the challenges posed by their high computational complexity. The article discusses various hardware platforms and architectures that support DNNs, highlighting key trends in reducing computation costs through hardware design changes and joint hardware-DNN algorithm co-designs. It also summarizes development resources, benchmarking metrics, and design considerations for evaluating DNN hardware implementations. The reader will gain insights into the design considerations for DNNs, evaluate different hardware implementations, understand trade-offs between architectures, and grasp recent implementation trends and opportunities. The article emphasizes the importance of efficient processing for DNN inference, especially in resource-constrained environments like embedded devices.This article provides a comprehensive tutorial and survey on the efficient processing of Deep Neural Networks (DNNs). It covers the background of DNNs, their applications, and the challenges posed by their high computational complexity. The article discusses various hardware platforms and architectures that support DNNs, highlighting key trends in reducing computation costs through hardware design changes and joint hardware-DNN algorithm co-designs. It also summarizes development resources, benchmarking metrics, and design considerations for evaluating DNN hardware implementations. The reader will gain insights into the design considerations for DNNs, evaluate different hardware implementations, understand trade-offs between architectures, and grasp recent implementation trends and opportunities. The article emphasizes the importance of efficient processing for DNN inference, especially in resource-constrained environments like embedded devices.