The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle and Michael Carbin from MIT CSAIL propose that dense, randomly-initialized neural networks contain subnetworks (winning tickets) that, when trained in isolation, can achieve test accuracy comparable to the original network in a similar number of iterations. These winning tickets are initialized in a way that makes them particularly effective for training. The authors present an algorithm to identify these winning tickets and conduct experiments supporting the hypothesis. They find that winning tickets are often less than 10-20% of the size of fully-connected and convolutional architectures for MNIST and CIFAR10. When trained in isolation, these subnetworks learn faster and reach higher test accuracy than the original networks. However, when randomly reinitialized, their performance drops significantly, indicating that their success depends on their initializations. The study also shows that iterative pruning finds smaller winning tickets than one-shot pruning. The hypothesis is extended to convolutional networks, where similar results are observed. The authors suggest that the lottery ticket hypothesis provides a new perspective on neural network composition, emphasizing the importance of initialization in training performance and generalization. The findings have implications for improving training efficiency, designing better networks, and enhancing theoretical understanding of neural networks. The study highlights the importance of initialization in neural network training and suggests that sparse subnetworks can be effectively trained when properly initialized. The results support the idea that neural networks contain subnetworks that can be trained independently and achieve performance comparable to the full network. The study also explores the effects of different pruning strategies and learning rates on the identification of winning tickets. The authors conclude that the lottery ticket hypothesis offers a complementary perspective on the relationship between network size, training, and generalization, suggesting that larger networks may contain simpler representations that are more effective for learning.The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle and Michael Carbin from MIT CSAIL propose that dense, randomly-initialized neural networks contain subnetworks (winning tickets) that, when trained in isolation, can achieve test accuracy comparable to the original network in a similar number of iterations. These winning tickets are initialized in a way that makes them particularly effective for training. The authors present an algorithm to identify these winning tickets and conduct experiments supporting the hypothesis. They find that winning tickets are often less than 10-20% of the size of fully-connected and convolutional architectures for MNIST and CIFAR10. When trained in isolation, these subnetworks learn faster and reach higher test accuracy than the original networks. However, when randomly reinitialized, their performance drops significantly, indicating that their success depends on their initializations. The study also shows that iterative pruning finds smaller winning tickets than one-shot pruning. The hypothesis is extended to convolutional networks, where similar results are observed. The authors suggest that the lottery ticket hypothesis provides a new perspective on neural network composition, emphasizing the importance of initialization in training performance and generalization. The findings have implications for improving training efficiency, designing better networks, and enhancing theoretical understanding of neural networks. The study highlights the importance of initialization in neural network training and suggests that sparse subnetworks can be effectively trained when properly initialized. The results support the idea that neural networks contain subnetworks that can be trained independently and achieve performance comparable to the full network. The study also explores the effects of different pruning strategies and learning rates on the identification of winning tickets. The authors conclude that the lottery ticket hypothesis offers a complementary perspective on the relationship between network size, training, and generalization, suggesting that larger networks may contain simpler representations that are more effective for learning.