Understanding TrialBench%3A Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

TrialBench is a comprehensive dataset of multi-modal clinical trial data designed to support the development of artificial intelligence (AI) models for clinical trial design. The dataset includes 23 AI-ready datasets covering 8 key challenges in clinical trial design, such as trial duration forecasting, patient dropout rate prediction, serious adverse event prediction, mortality rate prediction, trial approval prediction, trial failure reason identification, eligibility criteria design, and drug dose finding. The dataset is curated from ClinicalTrials.gov and includes multi-modal features such as drug molecule structures, disease codes, text, categorical/numerical features, and MeSH terms. The dataset is also linked with DrugBank and TrialTrove to provide a comprehensive set of information for clinical trial AI. The dataset is publicly available at https://github.com/ML2Health/ML2ClinicalTrials/tree/main/AI4Trial. The dataset includes evaluation metrics and baseline models to ensure usability and reliability. The availability of this dataset is expected to accelerate the development of advanced AI approaches for clinical trial design, ultimately advancing clinical trial research and accelerating medical solution development. The dataset includes a wide range of features, including categorical, numerical, and text data, as well as drug molecule structures and MeSH terms. The dataset is designed to support a variety of AI tasks, including trial duration forecasting, patient dropout rate prediction, serious adverse event prediction, mortality rate prediction, trial approval prediction, trial failure reason identification, eligibility criteria design, and drug dose finding. The dataset is also designed to support the generalization of AI models to new clinical trials. The dataset is curated from multiple public data sources, including ClinicalTrials.gov, DrugBank, TrialTrove, and ICD-10 coding system. The dataset is processed and linked to AI-ready input and output formats. The dataset includes a wide range of features, including categorical, numerical, and text data, as well as drug molecule structures and MeSH terms. The dataset is designed to support a variety of AI tasks, including trial duration forecasting, patient dropout rate prediction, serious adverse event prediction, mortality rate prediction, trial approval prediction, trial failure reason identification, eligibility criteria design, and drug dose finding. The dataset is also designed to support the generalization of AI models to new clinical trials. The dataset is curated from multiple public data sources, including ClinicalTrials.gov, DrugBank, TrialTrove, and ICD-10 coding system. The dataset is processed and linked to AI-ready input and output formats.TrialBench is a comprehensive dataset of multi-modal clinical trial data designed to support the development of artificial intelligence (AI) models for clinical trial design. The dataset includes 23 AI-ready datasets covering 8 key challenges in clinical trial design, such as trial duration forecasting, patient dropout rate prediction, serious adverse event prediction, mortality rate prediction, trial approval prediction, trial failure reason identification, eligibility criteria design, and drug dose finding. The dataset is curated from ClinicalTrials.gov and includes multi-modal features such as drug molecule structures, disease codes, text, categorical/numerical features, and MeSH terms. The dataset is also linked with DrugBank and TrialTrove to provide a comprehensive set of information for clinical trial AI. The dataset is publicly available at https://github.com/ML2Health/ML2ClinicalTrials/tree/main/AI4Trial. The dataset includes evaluation metrics and baseline models to ensure usability and reliability. The availability of this dataset is expected to accelerate the development of advanced AI approaches for clinical trial design, ultimately advancing clinical trial research and accelerating medical solution development. The dataset includes a wide range of features, including categorical, numerical, and text data, as well as drug molecule structures and MeSH terms. The dataset is designed to support a variety of AI tasks, including trial duration forecasting, patient dropout rate prediction, serious adverse event prediction, mortality rate prediction, trial approval prediction, trial failure reason identification, eligibility criteria design, and drug dose finding. The dataset is also designed to support the generalization of AI models to new clinical trials. The dataset is curated from multiple public data sources, including ClinicalTrials.gov, DrugBank, TrialTrove, and ICD-10 coding system. The dataset is processed and linked to AI-ready input and output formats. The dataset includes a wide range of features, including categorical, numerical, and text data, as well as drug molecule structures and MeSH terms. The dataset is designed to support a variety of AI tasks, including trial duration forecasting, patient dropout rate prediction, serious adverse event prediction, mortality rate prediction, trial approval prediction, trial failure reason identification, eligibility criteria design, and drug dose finding. The dataset is also designed to support the generalization of AI models to new clinical trials. The dataset is curated from multiple public data sources, including ClinicalTrials.gov, DrugBank, TrialTrove, and ICD-10 coding system. The dataset is processed and linked to AI-ready input and output formats.

TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

30 Jun 2024 | Jintai Chen, Yaojun Hu, Yue Wang, Yingzhou Lu, Xu Cao, Miao Lin, Hongxia Xu, Jian Wu, Cao Xiao, Jimeng Sun, Lucas Glass, Kexin Huang, Marinka Zitnik, and Tianfan Fu