14 Jun 2019 | Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, Wan-Teh Chang, Wei Hua, Manfred Georg and Matthias Grundmann
MediaPipe is a framework designed for building perception pipelines, enabling developers to create applications that perceive and process sensory data. The framework abstracts and connects individual perception models into maintainable pipelines, allowing for rapid prototyping and deployment across different hardware platforms. Key features include:
1. **Graph-Based Architecture**: Perception pipelines are built as directed graphs of modular components, including model inference, media processing algorithms, and data transformations.
2. **Modular Components**: MediaPipe consists of calculators, which are the core components that perform specific tasks. These calculators can be combined to form pipelines and are highly customizable.
3. **Scheduling and Synchronization**: The framework manages calculator execution using a comprehensive scheduling system, ensuring efficient and deterministic processing. It supports GPU compute and rendering nodes, allowing for parallel execution and efficient resource utilization.
4. **Performance Evaluation**: Tools like the tracer and visualizer help developers analyze and optimize the performance of their pipelines, providing insights into timing events and visualizing pipeline behavior.
5. **Cross-Platform Support**: MediaPipe enables developers to develop applications on workstations and deploy them on mobile devices, ensuring consistent behavior across different platforms.
The paper also discusses related work, the architecture of MediaPipe, implementation details, and provides examples of perception applications, such as object detection and face landmark detection. MediaPipe is open-sourced to facilitate further development and community contributions.MediaPipe is a framework designed for building perception pipelines, enabling developers to create applications that perceive and process sensory data. The framework abstracts and connects individual perception models into maintainable pipelines, allowing for rapid prototyping and deployment across different hardware platforms. Key features include:
1. **Graph-Based Architecture**: Perception pipelines are built as directed graphs of modular components, including model inference, media processing algorithms, and data transformations.
2. **Modular Components**: MediaPipe consists of calculators, which are the core components that perform specific tasks. These calculators can be combined to form pipelines and are highly customizable.
3. **Scheduling and Synchronization**: The framework manages calculator execution using a comprehensive scheduling system, ensuring efficient and deterministic processing. It supports GPU compute and rendering nodes, allowing for parallel execution and efficient resource utilization.
4. **Performance Evaluation**: Tools like the tracer and visualizer help developers analyze and optimize the performance of their pipelines, providing insights into timing events and visualizing pipeline behavior.
5. **Cross-Platform Support**: MediaPipe enables developers to develop applications on workstations and deploy them on mobile devices, ensuring consistent behavior across different platforms.
The paper also discusses related work, the architecture of MediaPipe, implementation details, and provides examples of perception applications, such as object detection and face landmark detection. MediaPipe is open-sourced to facilitate further development and community contributions.