Training Compute Thresholds: Features and Functions in AI Regulation

Training Compute Thresholds: Features and Functions in AI Regulation

6 Aug 2024 | Lennart Heim, Leonie Koessler
Training compute thresholds are increasingly used by regulators in the US and EU to identify general-purpose artificial intelligence (GPAI) models that may pose large-scale societal risks. Training compute refers to the total number of computational operations used to train an AI model. It is a quantifiable metric that is relatively simple and cheap to calculate, and can be measured early in the AI lifecycle. Training compute is also robust to attempts at circumvention, as reducing the amount of compute used to train a model generally decreases its capabilities and risks. It can be measured before development and deployment, and is externally verifiable, making it a useful metric for GPAI regulation. Training compute is a proxy for model capabilities and risks, and is therefore useful for identifying GPAI models that warrant regulatory oversight. However, it is an imperfect proxy for risk, and should not be used in isolation. Instead, it should be used to detect potentially risky GPAI models that warrant regulatory oversight, such as through notification requirements, and further scrutiny, such as via model evaluations and risk assessments. These assessments may inform which mitigation measures are appropriate. The US AI Executive Order 14110 and the EU AI Act both use compute thresholds in line with our suggestions. The US AI EO requires companies to notify the government about models trained with more than 10^26 operations, and to report on measures taken to ensure the physical and cybersecurity of model weights. The EU AI Act requires providers of GPAI models that cross a compute threshold of 10^25 operations to notify the European Commission, conduct model evaluations, assess and mitigate systemic risks, and ensure cybersecurity. Despite its advantages, training compute has limitations. It is only a crude proxy for risk, and may become a worse proxy in the future. Additionally, improvements in algorithmic efficiency may reduce the amount of training compute required to achieve a given level of capability, potentially weakening the relationship between a specific compute threshold and the corresponding level of risk over time. However, this shift may occur gradually rather than abruptly. Therefore, compute thresholds should be used as an initial filter to identify GPAI models that warrant regulatory oversight and further scrutiny. They should be complemented with other metrics, such as capability thresholds, to determine appropriate mitigation measures. In a full regulatory framework for AI, most requirements should not hinge on the amount of training compute. Overall, while not perfect, compute thresholds are currently a key tool in GPAI regulation.Training compute thresholds are increasingly used by regulators in the US and EU to identify general-purpose artificial intelligence (GPAI) models that may pose large-scale societal risks. Training compute refers to the total number of computational operations used to train an AI model. It is a quantifiable metric that is relatively simple and cheap to calculate, and can be measured early in the AI lifecycle. Training compute is also robust to attempts at circumvention, as reducing the amount of compute used to train a model generally decreases its capabilities and risks. It can be measured before development and deployment, and is externally verifiable, making it a useful metric for GPAI regulation. Training compute is a proxy for model capabilities and risks, and is therefore useful for identifying GPAI models that warrant regulatory oversight. However, it is an imperfect proxy for risk, and should not be used in isolation. Instead, it should be used to detect potentially risky GPAI models that warrant regulatory oversight, such as through notification requirements, and further scrutiny, such as via model evaluations and risk assessments. These assessments may inform which mitigation measures are appropriate. The US AI Executive Order 14110 and the EU AI Act both use compute thresholds in line with our suggestions. The US AI EO requires companies to notify the government about models trained with more than 10^26 operations, and to report on measures taken to ensure the physical and cybersecurity of model weights. The EU AI Act requires providers of GPAI models that cross a compute threshold of 10^25 operations to notify the European Commission, conduct model evaluations, assess and mitigate systemic risks, and ensure cybersecurity. Despite its advantages, training compute has limitations. It is only a crude proxy for risk, and may become a worse proxy in the future. Additionally, improvements in algorithmic efficiency may reduce the amount of training compute required to achieve a given level of capability, potentially weakening the relationship between a specific compute threshold and the corresponding level of risk over time. However, this shift may occur gradually rather than abruptly. Therefore, compute thresholds should be used as an initial filter to identify GPAI models that warrant regulatory oversight and further scrutiny. They should be complemented with other metrics, such as capability thresholds, to determine appropriate mitigation measures. In a full regulatory framework for AI, most requirements should not hinge on the amount of training compute. Overall, while not perfect, compute thresholds are currently a key tool in GPAI regulation.
Reach us at info@study.space