This paper proposes ProFL, a novel progressive training framework for Federated Learning (FL) that addresses memory constraints in heterogeneous environments. ProFL partitions the global model into blocks based on its original architecture and trains them progressively. It first trains the front blocks and safely freezes them after convergence, then trains the next block. This process progressively builds the full model, reducing the peak memory footprint for deployment on resource-constrained devices. ProFL features two stages: progressive model shrinking and progressive model growing. During shrinking, output modules are designed to assist each block in learning the expected feature representation, and initialization parameters are obtained. These are then used in the growing stage to train the full model. A novel metric, effective movement, is proposed to assess the learning status of each block, enabling secure freezing upon convergence. Theoretical analysis proves the convergence of ProFL, and experiments on representative models and datasets show that ProFL reduces peak memory usage by up to 57.4% and improves model accuracy by up to 82.4%. ProFL is also compatible with large-scale datasets and existing FL algorithms, demonstrating its scalability and practicality.This paper proposes ProFL, a novel progressive training framework for Federated Learning (FL) that addresses memory constraints in heterogeneous environments. ProFL partitions the global model into blocks based on its original architecture and trains them progressively. It first trains the front blocks and safely freezes them after convergence, then trains the next block. This process progressively builds the full model, reducing the peak memory footprint for deployment on resource-constrained devices. ProFL features two stages: progressive model shrinking and progressive model growing. During shrinking, output modules are designed to assist each block in learning the expected feature representation, and initialization parameters are obtained. These are then used in the growing stage to train the full model. A novel metric, effective movement, is proposed to assess the learning status of each block, enabling secure freezing upon convergence. Theoretical analysis proves the convergence of ProFL, and experiments on representative models and datasets show that ProFL reduces peak memory usage by up to 57.4% and improves model accuracy by up to 82.4%. ProFL is also compatible with large-scale datasets and existing FL algorithms, demonstrating its scalability and practicality.