This essay explores the limitations of using compute thresholds as a governance strategy for managing risks associated with Generative AI models. The author, Sara Hooker, argues that while the concept of "bigger is better" in computing has been a driving force in technological progress, the relationship between compute and risk is highly uncertain and rapidly changing. She critiques the current implementation of compute thresholds, which are based on hard-coded FLOP (floating-point operations) limits, as shortsighted and likely to fail in mitigating risks effectively.
Hooker highlights several key points:
1. **The Uncertain Relationship Between Compute and Risk**: The relationship between compute and performance is complex and evolving. Smaller models are becoming increasingly performant, and optimization techniques can significantly improve performance without increasing compute.
2. **Data Quality and Optimization**: Better data quality and optimization techniques can compensate for more compute, reducing the need for larger models.
3. **Architecture Impact**: New architectural designs can fundamentally change the relationship between compute and performance, rendering compute thresholds irrelevant.
4. ** challenges with FLOP as a Metric**: FLOP is not a reliable proxy for overall compute due to its lack of consideration for post-training improvements and the difficulty in tracking FLOP across the model lifecycle.
5. **Policy Implications**: The choice of compute thresholds has far-reaching implications, and current policies lack clear guidance on how to measure and apply these thresholds effectively.
Hooker concludes that relying solely on compute thresholds is problematic and suggests moving away from hard-coded thresholds. She recommends dynamic thresholds that adjust based on model properties and a risk index composed of multiple performance measures. Additionally, she emphasizes the need for governments to be transparent about their concerns and resource allocation, and to avoid over-reliance on compute as a sole indicator of risk.This essay explores the limitations of using compute thresholds as a governance strategy for managing risks associated with Generative AI models. The author, Sara Hooker, argues that while the concept of "bigger is better" in computing has been a driving force in technological progress, the relationship between compute and risk is highly uncertain and rapidly changing. She critiques the current implementation of compute thresholds, which are based on hard-coded FLOP (floating-point operations) limits, as shortsighted and likely to fail in mitigating risks effectively.
Hooker highlights several key points:
1. **The Uncertain Relationship Between Compute and Risk**: The relationship between compute and performance is complex and evolving. Smaller models are becoming increasingly performant, and optimization techniques can significantly improve performance without increasing compute.
2. **Data Quality and Optimization**: Better data quality and optimization techniques can compensate for more compute, reducing the need for larger models.
3. **Architecture Impact**: New architectural designs can fundamentally change the relationship between compute and performance, rendering compute thresholds irrelevant.
4. ** challenges with FLOP as a Metric**: FLOP is not a reliable proxy for overall compute due to its lack of consideration for post-training improvements and the difficulty in tracking FLOP across the model lifecycle.
5. **Policy Implications**: The choice of compute thresholds has far-reaching implications, and current policies lack clear guidance on how to measure and apply these thresholds effectively.
Hooker concludes that relying solely on compute thresholds is problematic and suggests moving away from hard-coded thresholds. She recommends dynamic thresholds that adjust based on model properties and a risk index composed of multiple performance measures. Additionally, she emphasizes the need for governments to be transparent about their concerns and resource allocation, and to avoid over-reliance on compute as a sole indicator of risk.