This paper investigates data contamination and evaluation malpractices in closed-source large language models (LLMs), focusing on OpenAI's GPT-3.5 and GPT-4. Researchers have raised concerns about data contamination due to limited access to model details, particularly training data. The study analyzes 255 papers and finds that these models have been exposed to approximately 4.7 million samples from 263 benchmarks. The research highlights issues such as indirect data leakage, where models are iteratively improved using data from users, and evaluation malpractices like unfair or missing baseline comparisons and reproducibility issues. The study also reveals that many papers lack transparency in data usage, prompting calls for more rigorous evaluation practices. The findings are shared as a collaborative project at https://leak-llm.github.io/, where researchers can contribute to further investigations. The paper emphasizes the need for fair and objective evaluation of closed-source LLMs, suggesting practices such as avoiding data leakage, using open-source alternatives, and ensuring reproducibility. The study underscores the importance of addressing data contamination and evaluation issues to ensure the credibility and fairness of LLM research.This paper investigates data contamination and evaluation malpractices in closed-source large language models (LLMs), focusing on OpenAI's GPT-3.5 and GPT-4. Researchers have raised concerns about data contamination due to limited access to model details, particularly training data. The study analyzes 255 papers and finds that these models have been exposed to approximately 4.7 million samples from 263 benchmarks. The research highlights issues such as indirect data leakage, where models are iteratively improved using data from users, and evaluation malpractices like unfair or missing baseline comparisons and reproducibility issues. The study also reveals that many papers lack transparency in data usage, prompting calls for more rigorous evaluation practices. The findings are shared as a collaborative project at https://leak-llm.github.io/, where researchers can contribute to further investigations. The paper emphasizes the need for fair and objective evaluation of closed-source LLMs, suggesting practices such as avoiding data leakage, using open-source alternatives, and ensuring reproducibility. The study underscores the importance of addressing data contamination and evaluation issues to ensure the credibility and fairness of LLM research.