This paper proposes a taxonomy for workflow management systems in Grid computing, aiming to classify and characterize various approaches for building and executing workflows on Grids. It surveys several representative Grid workflow systems developed by global projects to demonstrate the comprehensiveness of the taxonomy. The taxonomy highlights the design and engineering similarities and differences of state-of-the-art Grid workflow systems and identifies areas requiring further research.
Grid workflow systems are essential for managing and executing complex scientific applications on distributed resources. A workflow is a collection of tasks processed in a defined order to achieve a specific goal. Workflow management techniques have been developed for over 20 years, particularly in business management and office automation. However, Grid-based scientific workflows differ from conventional workflows in aspects such as long execution times, large data flows, heterogeneous resources, multiple administrative domains, and dynamic resource availability.
The taxonomy classifies Grid workflow management systems based on five elements: workflow design, information retrieval, workflow scheduling, fault tolerance, and data movement. Each element is further categorized into sub-taxonomies. For example, workflow design includes workflow structure, workflow model/specification, workflow composition system, and workflow QoS constraints. Workflow structure can be sequence, parallelism, choice, or iteration. Workflow models are divided into abstract and concrete models, with abstract models being more flexible and suitable for users who do not need to specify resource details.
Workflow composition systems allow users to assemble components into workflows, with two main types: user-directed and automatic. Workflow QoS constraints include time, cost, fidelity, reliability, and security. Scheduling architecture is categorized into centralized, hierarchical, and decentralized schemes. Decision making is based on local or global decisions, with global decisions improving overall workflow performance. Planning schemes include static and dynamic schemes, with dynamic schemes using both static and dynamic information. Scheduling strategies include performance-driven, market-driven, and trust-driven approaches. Performance estimation techniques include simulation, analytical modeling, historical data, on-line learning, and hybrid approaches. Fault tolerance techniques include task-level and workflow-level methods. Intermediate data movement is categorized into centralized, mediated, and peer-to-peer approaches.
The paper also surveys several Grid workflow systems, including Condor DAGMan, Pegasus in GriPhyN, and Triana. These systems demonstrate the diversity of approaches in Grid workflow management and highlight the importance of the proposed taxonomy in understanding and improving Grid workflow systems.This paper proposes a taxonomy for workflow management systems in Grid computing, aiming to classify and characterize various approaches for building and executing workflows on Grids. It surveys several representative Grid workflow systems developed by global projects to demonstrate the comprehensiveness of the taxonomy. The taxonomy highlights the design and engineering similarities and differences of state-of-the-art Grid workflow systems and identifies areas requiring further research.
Grid workflow systems are essential for managing and executing complex scientific applications on distributed resources. A workflow is a collection of tasks processed in a defined order to achieve a specific goal. Workflow management techniques have been developed for over 20 years, particularly in business management and office automation. However, Grid-based scientific workflows differ from conventional workflows in aspects such as long execution times, large data flows, heterogeneous resources, multiple administrative domains, and dynamic resource availability.
The taxonomy classifies Grid workflow management systems based on five elements: workflow design, information retrieval, workflow scheduling, fault tolerance, and data movement. Each element is further categorized into sub-taxonomies. For example, workflow design includes workflow structure, workflow model/specification, workflow composition system, and workflow QoS constraints. Workflow structure can be sequence, parallelism, choice, or iteration. Workflow models are divided into abstract and concrete models, with abstract models being more flexible and suitable for users who do not need to specify resource details.
Workflow composition systems allow users to assemble components into workflows, with two main types: user-directed and automatic. Workflow QoS constraints include time, cost, fidelity, reliability, and security. Scheduling architecture is categorized into centralized, hierarchical, and decentralized schemes. Decision making is based on local or global decisions, with global decisions improving overall workflow performance. Planning schemes include static and dynamic schemes, with dynamic schemes using both static and dynamic information. Scheduling strategies include performance-driven, market-driven, and trust-driven approaches. Performance estimation techniques include simulation, analytical modeling, historical data, on-line learning, and hybrid approaches. Fault tolerance techniques include task-level and workflow-level methods. Intermediate data movement is categorized into centralized, mediated, and peer-to-peer approaches.
The paper also surveys several Grid workflow systems, including Condor DAGMan, Pegasus in GriPhyN, and Triana. These systems demonstrate the diversity of approaches in Grid workflow management and highlight the importance of the proposed taxonomy in understanding and improving Grid workflow systems.