TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

30 Jun 2024 | Jintai Chen, Yaojun Hu, Yue Wang, Yingzhou Lu, Xu Cao, Miao Lin, Hongxia Xu, Jian Wu, Cao Xiao, Jimeng Sun, Lucas Glass, Kexin Huang, Marinka Zitnik, Tianfan Fu
**TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets** This paper introduces TrialBench, a comprehensive suite of AI-ready clinical trial datasets designed to facilitate the development of advanced AI models for clinical trial design. The datasets cover multi-modal data, including drug molecule structures, disease codes, textual descriptions, and categorical/numerical features, and address eight crucial prediction challenges in clinical trial design. These challenges include predicting trial duration, patient dropout rate, serious adverse events, mortality rates, trial approval outcomes, trial failure reasons, drug dose finding, and eligibility criteria design. The datasets are curated from ClinicalTrials.gov, DrugBank, TrialTrove, and ICD-10 coding systems, ensuring high-quality and reliability. The paper outlines the background and significance of clinical trials, highlighting their importance in developing new medical treatments while acknowledging the risks and challenges associated with their design and execution. It emphasizes the potential of AI to reduce these risks by providing insights and guiding trial designs. The datasets are designed to support cross-disciplinary research, leveraging the expertise of both data scientists and AI experts. The methods section details the process of data acquisition, curation, and feature organization. The datasets are split into training, validation, and testing sets to ensure unbiased evaluation of AI models. The experimental results demonstrate the effectiveness of multi-modal deep neural networks in predicting various clinical trial outcomes, validating the quality and AI-readiness of the curated datasets. TrialBench is publicly available at <https://github.com/M2Health/ML2ClinicalTrials/tree/main/AI4Trial>, and the authors plan to continuously expand and maintain the platform to include new learning tasks, datasets, and leaderboards. The datasets are intended for healthcare, biomedical, and AI researchers and data scientists who aim to innovate and improve clinical trial design and outcomes.**TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets** This paper introduces TrialBench, a comprehensive suite of AI-ready clinical trial datasets designed to facilitate the development of advanced AI models for clinical trial design. The datasets cover multi-modal data, including drug molecule structures, disease codes, textual descriptions, and categorical/numerical features, and address eight crucial prediction challenges in clinical trial design. These challenges include predicting trial duration, patient dropout rate, serious adverse events, mortality rates, trial approval outcomes, trial failure reasons, drug dose finding, and eligibility criteria design. The datasets are curated from ClinicalTrials.gov, DrugBank, TrialTrove, and ICD-10 coding systems, ensuring high-quality and reliability. The paper outlines the background and significance of clinical trials, highlighting their importance in developing new medical treatments while acknowledging the risks and challenges associated with their design and execution. It emphasizes the potential of AI to reduce these risks by providing insights and guiding trial designs. The datasets are designed to support cross-disciplinary research, leveraging the expertise of both data scientists and AI experts. The methods section details the process of data acquisition, curation, and feature organization. The datasets are split into training, validation, and testing sets to ensure unbiased evaluation of AI models. The experimental results demonstrate the effectiveness of multi-modal deep neural networks in predicting various clinical trial outcomes, validating the quality and AI-readiness of the curated datasets. TrialBench is publicly available at <https://github.com/M2Health/ML2ClinicalTrials/tree/main/AI4Trial>, and the authors plan to continuously expand and maintain the platform to include new learning tasks, datasets, and leaderboards. The datasets are intended for healthcare, biomedical, and AI researchers and data scientists who aim to innovate and improve clinical trial design and outcomes.
Reach us at info@study.space