20 May 2018 | Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
TVM is an automated end-to-end optimizing compiler for deep learning that enables performance portability across diverse hardware back-ends. It addresses challenges in deep learning deployment, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. TVM uses a novel learning-based cost modeling method to automatically optimize low-level programs for hardware characteristics. Experimental results show that TVM delivers competitive performance with state-of-the-art hand-tuned libraries for low-power CPUs, mobile GPUs, and server-class GPUs. TVM also demonstrates the ability to target new accelerator back-ends, such as an FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.
TVM provides a compiler that takes a high-level specification of a deep learning program from existing frameworks and generates low-level optimized code for a diverse set of hardware back-ends. It includes three key modules: (1) a tensor expression language to build operators and provide program transformation primitives, (2) an automated program optimization framework guided by a machine learning-based cost model, and (3) a graph rewriter that takes full benefit of high-level and operator-level optimizations. TVM can take model descriptions from existing deep learning frameworks and perform joint high-level and low-level optimizations to generate hardware-specific optimized code for back-ends such as CPUs, GPUs, and FPGA-based accelerators.
TVM's contributions include identifying major optimization challenges in providing performance portability, proposing and implementing a machine learning-based optimization system, and building an end-to-end compilation and optimization stack that allows deployment of deep learning workloads specified in high-level frameworks to diverse hardware back-ends. TVM is open sourced and is in production use inside several major companies.
TVM evaluates performance across server-class GPUs, embedded GPUs, embedded CPUs, and FPGA-based accelerators using real-world workloads. Experimental results show that TVM offers portable performance across back-ends and achieves speedups ranging from 1.2× to 3.8× over existing frameworks backed by hand-optimized libraries.
TVM's architecture includes a tensor expression language, a schedule space, and a graph rewriter. It supports various hardware back-ends and provides a user API for deploying deep learning models. TVM's automated optimization includes a machine learning-based cost model and a schedule explorer that proposes new configurations. TVM's evaluation shows that it can optimize deep learning workloads across multiple platforms and outperforms existing frameworks on various back-ends. It also supports new, emerging workloads in deep learning, such as depthwise convolution and low precision operations. TVM's ability to support new specialized accelerators is demonstrated through its evaluation on a generic inference accelerator design.TVM is an automated end-to-end optimizing compiler for deep learning that enables performance portability across diverse hardware back-ends. It addresses challenges in deep learning deployment, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. TVM uses a novel learning-based cost modeling method to automatically optimize low-level programs for hardware characteristics. Experimental results show that TVM delivers competitive performance with state-of-the-art hand-tuned libraries for low-power CPUs, mobile GPUs, and server-class GPUs. TVM also demonstrates the ability to target new accelerator back-ends, such as an FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.
TVM provides a compiler that takes a high-level specification of a deep learning program from existing frameworks and generates low-level optimized code for a diverse set of hardware back-ends. It includes three key modules: (1) a tensor expression language to build operators and provide program transformation primitives, (2) an automated program optimization framework guided by a machine learning-based cost model, and (3) a graph rewriter that takes full benefit of high-level and operator-level optimizations. TVM can take model descriptions from existing deep learning frameworks and perform joint high-level and low-level optimizations to generate hardware-specific optimized code for back-ends such as CPUs, GPUs, and FPGA-based accelerators.
TVM's contributions include identifying major optimization challenges in providing performance portability, proposing and implementing a machine learning-based optimization system, and building an end-to-end compilation and optimization stack that allows deployment of deep learning workloads specified in high-level frameworks to diverse hardware back-ends. TVM is open sourced and is in production use inside several major companies.
TVM evaluates performance across server-class GPUs, embedded GPUs, embedded CPUs, and FPGA-based accelerators using real-world workloads. Experimental results show that TVM offers portable performance across back-ends and achieves speedups ranging from 1.2× to 3.8× over existing frameworks backed by hand-optimized libraries.
TVM's architecture includes a tensor expression language, a schedule space, and a graph rewriter. It supports various hardware back-ends and provides a user API for deploying deep learning models. TVM's automated optimization includes a machine learning-based cost model and a schedule explorer that proposes new configurations. TVM's evaluation shows that it can optimize deep learning workloads across multiple platforms and outperforms existing frameworks on various back-ends. It also supports new, emerging workloads in deep learning, such as depthwise convolution and low precision operations. TVM's ability to support new specialized accelerators is demonstrated through its evaluation on a generic inference accelerator design.