[slides] Automatic differentiation in machine learning%3A a survey

The paper "Automatic Differentiation in Machine Learning: a Survey" by Atılm Güneş Baydin provides an overview of automatic differentiation (AD) and its applications in machine learning. AD, also known as algorithmic differentiation or autodiff, is a technique for efficiently and accurately computing derivatives of numeric functions expressed as computer programs. The paper highlights the importance of derivatives in machine learning, particularly in the form of gradients and Hessians, and discusses the limitations of traditional methods such as manual differentiation, numerical differentiation, and symbolic differentiation. AD is introduced as a powerful alternative, capable of handling complex control flow and expressive code, making it suitable for modern machine learning frameworks. The paper covers the two main modes of AD: forward accumulation and reverse accumulation (or adjoint). Forward accumulation is efficient for functions with a large number of inputs, while reverse accumulation is more efficient for functions with a large number of outputs. The paper also discusses the historical development of AD, its applications in optimization, neural networks, deep learning, computer vision, natural language processing, and probabilistic inference. In the context of machine learning, AD is particularly useful for gradient-based optimization, where it can significantly reduce the computational cost of gradient computation. The paper provides examples of how AD has been applied in these areas, including the use of AD in training neural networks, optimizing deep learning models, and solving inverse graphics problems in computer vision. The paper concludes by discussing the emerging terminology in the deep learning community, such as "define-and-run" and "dynamic computational graphs," and the concept of differentiable programming, which emphasizes the use of differentiable functions and gradient-based optimization in model training.The paper "Automatic Differentiation in Machine Learning: a Survey" by Atılm Güneş Baydin provides an overview of automatic differentiation (AD) and its applications in machine learning. AD, also known as algorithmic differentiation or autodiff, is a technique for efficiently and accurately computing derivatives of numeric functions expressed as computer programs. The paper highlights the importance of derivatives in machine learning, particularly in the form of gradients and Hessians, and discusses the limitations of traditional methods such as manual differentiation, numerical differentiation, and symbolic differentiation. AD is introduced as a powerful alternative, capable of handling complex control flow and expressive code, making it suitable for modern machine learning frameworks. The paper covers the two main modes of AD: forward accumulation and reverse accumulation (or adjoint). Forward accumulation is efficient for functions with a large number of inputs, while reverse accumulation is more efficient for functions with a large number of outputs. The paper also discusses the historical development of AD, its applications in optimization, neural networks, deep learning, computer vision, natural language processing, and probabilistic inference. In the context of machine learning, AD is particularly useful for gradient-based optimization, where it can significantly reduce the computational cost of gradient computation. The paper provides examples of how AD has been applied in these areas, including the use of AD in training neural networks, optimizing deep learning models, and solving inverse graphics problems in computer vision. The paper concludes by discussing the emerging terminology in the deep learning community, such as "define-and-run" and "dynamic computational graphs," and the concept of differentiable programming, which emphasizes the use of differentiable functions and gradient-based optimization in model training.

Automatic Differentiation in Machine Learning: a Survey

5 Feb 2018 | Atılım Güneş Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, Jeffrey Mark Siskind