The paper "TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs" addresses the issue of data contamination in large language models (LLMs) during pretraining and fine-tuning, which has raised concerns about the reliability of benchmarking studies. The authors propose Private Benchmarking, a solution where test datasets are kept private from the models, ensuring that the models never see the test data. They describe various trust scenarios and present solutions to avoid data contamination using private benchmarking, including confidential computing and cryptographic techniques. The paper introduces TRUCE, an end-to-end system that enables private benchmarking, demonstrating its practical feasibility through experimental evaluation. The authors also discuss solutions for auditing private benchmarks to ensure their high quality. The paper highlights the importance of preventing benchmark contamination and encourages collaboration across disciplines to address this challenge.The paper "TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs" addresses the issue of data contamination in large language models (LLMs) during pretraining and fine-tuning, which has raised concerns about the reliability of benchmarking studies. The authors propose Private Benchmarking, a solution where test datasets are kept private from the models, ensuring that the models never see the test data. They describe various trust scenarios and present solutions to avoid data contamination using private benchmarking, including confidential computing and cryptographic techniques. The paper introduces TRUCE, an end-to-end system that enables private benchmarking, demonstrating its practical feasibility through experimental evaluation. The authors also discuss solutions for auditing private benchmarks to ensure their high quality. The paper highlights the importance of preventing benchmark contamination and encourages collaboration across disciplines to address this challenge.