TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

November 9, 2015 | Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng
TensorFlow is an interface for expressing machine learning algorithms and an implementation for executing them. It allows computations to be executed on a wide variety of heterogeneous systems, from mobile devices to large-scale distributed systems. The system is flexible and can be used for training and inference algorithms for deep neural networks, and has been used in research and production across multiple domains, including speech recognition, computer vision, robotics, and more. The paper describes the TensorFlow interface and implementation, which was released as open-source under the Apache 2.0 license in November 2015. TensorFlow is a second-generation system for large-scale machine learning, built based on experience with DistBelief. It allows computations described using a dataflow-like model to be mapped onto various hardware platforms. TensorFlow computations are expressed as stateful dataflow graphs, and the system is designed to be both flexible and high-performance. It supports a wide range of models and hardware platforms, and allows for various kinds of parallelism through replication and parallel execution of a core model dataflow. The paper describes the programming model and basic concepts of TensorFlow, including operations, kernels, sessions, variables, and devices. It also discusses the implementation of TensorFlow, including single-device and distributed execution, device placement, cross-device communication, and fault tolerance. The paper also describes several extensions to the basic programming model, including gradient computation, partial execution, device constraints, control flow, input operations, queues, and containers. It also discusses optimizations such as common subexpression elimination, controlling data communication and memory usage, asynchronous kernels, optimized libraries for kernel implementations, and lossy compression. The paper also discusses the status and experience of TensorFlow, including its open-source release, documentation, tutorials, and examples. It describes lessons learned from migrating machine learning models from DistBelief to TensorFlow, including the importance of building tools for parameter analysis, starting small and scaling up, ensuring objective consistency, matching single-machine implementations before debugging distributed ones, guarding against numerical errors, and analyzing numerical error magnitude. The paper also describes common programming idioms for TensorFlow, including data parallel training, model parallel training, and concurrent steps for model computation pipelining. Finally, it discusses performance and tools, including TensorBoard for visualization of graph structures and summary statistics.TensorFlow is an interface for expressing machine learning algorithms and an implementation for executing them. It allows computations to be executed on a wide variety of heterogeneous systems, from mobile devices to large-scale distributed systems. The system is flexible and can be used for training and inference algorithms for deep neural networks, and has been used in research and production across multiple domains, including speech recognition, computer vision, robotics, and more. The paper describes the TensorFlow interface and implementation, which was released as open-source under the Apache 2.0 license in November 2015. TensorFlow is a second-generation system for large-scale machine learning, built based on experience with DistBelief. It allows computations described using a dataflow-like model to be mapped onto various hardware platforms. TensorFlow computations are expressed as stateful dataflow graphs, and the system is designed to be both flexible and high-performance. It supports a wide range of models and hardware platforms, and allows for various kinds of parallelism through replication and parallel execution of a core model dataflow. The paper describes the programming model and basic concepts of TensorFlow, including operations, kernels, sessions, variables, and devices. It also discusses the implementation of TensorFlow, including single-device and distributed execution, device placement, cross-device communication, and fault tolerance. The paper also describes several extensions to the basic programming model, including gradient computation, partial execution, device constraints, control flow, input operations, queues, and containers. It also discusses optimizations such as common subexpression elimination, controlling data communication and memory usage, asynchronous kernels, optimized libraries for kernel implementations, and lossy compression. The paper also discusses the status and experience of TensorFlow, including its open-source release, documentation, tutorials, and examples. It describes lessons learned from migrating machine learning models from DistBelief to TensorFlow, including the importance of building tools for parameter analysis, starting small and scaling up, ensuring objective consistency, matching single-machine implementations before debugging distributed ones, guarding against numerical errors, and analyzing numerical error magnitude. The paper also describes common programming idioms for TensorFlow, including data parallel training, model parallel training, and concurrent steps for model computation pipelining. Finally, it discusses performance and tools, including TensorBoard for visualization of graph structures and summary statistics.
Reach us at info@study.space