MediaPipe: A Framework for Building Perception Pipelines

MediaPipe: A Framework for Building Perception Pipelines

14 Jun 2019 | Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, Wan-Teh Chang, Wei Hua, Manfred Georg and Matthias Grundmann
MediaPipe is a framework for building perception pipelines that process arbitrary sensory data. It allows developers to combine existing perception components to build prototypes, advance them to polished cross-platform applications, and measure system performance and resource consumption on target platforms. The framework enables developers to focus on algorithm or model development, using MediaPipe as an environment for iteratively improving their applications with reproducible results across devices and platforms. MediaPipe is open-sourced at https://github.com/google/mediapipe. MediaPipe is designed for machine learning practitioners, including researchers, students, and software developers, who implement production-ready ML applications, publish code accompanying research work, and build technology prototypes. The main use case for MediaPipe is rapid prototyping of perception pipelines with inference models and other reusable components. MediaPipe also facilitates the deployment of perception technology into real-world applications. MediaPipe allows developers to prototype a pipeline incrementally. A pipeline is defined as a directed graph of components where each component is a Calculator. The graph is specified using a GraphConfig protocol buffer and then run using a Graph object. Calculators are connected by data Streams. Each stream represents a time-series of data Packets. Together, the calculators and streams define a data-flow graph. The packets which flow across the graph are collated by their timestamps within the time-series. The framework allows operations on arbitrary data types and has native support for streaming time-series data, making it suitable for analyzing audio and sensor data. MediaPipe consists of three main parts: (a) a framework for inference from sensory data, (b) a set of tools for performance evaluation, and (c) a collection of reusable inference and processing components called calculators. MediaPipe supports GPU compute and rendering nodes, and allows combining multiple GPU nodes, as well as mixing them with CPU-based nodes. It also provides tools for evaluating graph performance, such as the Tracer and Visualizer. The Tracer module follows individual packets across a graph and records timing events along the way, while the Visualizer helps users understand the topology and overall behavior of their pipelines. MediaPipe has been immensely successful at Google for over 6 years. One of the main reasons for its success can be attributed to the ecosystem of reusable calculators and graphs. After the open-source release, the focus will be on community support, including third-party development of calculators and curating a set of recommended calculators and graphs. Furthermore, the framework will further improve tooling to make performance and quality evaluation easy for users.MediaPipe is a framework for building perception pipelines that process arbitrary sensory data. It allows developers to combine existing perception components to build prototypes, advance them to polished cross-platform applications, and measure system performance and resource consumption on target platforms. The framework enables developers to focus on algorithm or model development, using MediaPipe as an environment for iteratively improving their applications with reproducible results across devices and platforms. MediaPipe is open-sourced at https://github.com/google/mediapipe. MediaPipe is designed for machine learning practitioners, including researchers, students, and software developers, who implement production-ready ML applications, publish code accompanying research work, and build technology prototypes. The main use case for MediaPipe is rapid prototyping of perception pipelines with inference models and other reusable components. MediaPipe also facilitates the deployment of perception technology into real-world applications. MediaPipe allows developers to prototype a pipeline incrementally. A pipeline is defined as a directed graph of components where each component is a Calculator. The graph is specified using a GraphConfig protocol buffer and then run using a Graph object. Calculators are connected by data Streams. Each stream represents a time-series of data Packets. Together, the calculators and streams define a data-flow graph. The packets which flow across the graph are collated by their timestamps within the time-series. The framework allows operations on arbitrary data types and has native support for streaming time-series data, making it suitable for analyzing audio and sensor data. MediaPipe consists of three main parts: (a) a framework for inference from sensory data, (b) a set of tools for performance evaluation, and (c) a collection of reusable inference and processing components called calculators. MediaPipe supports GPU compute and rendering nodes, and allows combining multiple GPU nodes, as well as mixing them with CPU-based nodes. It also provides tools for evaluating graph performance, such as the Tracer and Visualizer. The Tracer module follows individual packets across a graph and records timing events along the way, while the Visualizer helps users understand the topology and overall behavior of their pipelines. MediaPipe has been immensely successful at Google for over 6 years. One of the main reasons for its success can be attributed to the ecosystem of reusable calculators and graphs. After the open-source release, the focus will be on community support, including third-party development of calculators and curating a set of recommended calculators and graphs. Furthermore, the framework will further improve tooling to make performance and quality evaluation easy for users.
Reach us at info@study.space