2024 | Anna Pawłowska, Anna Ćwierz-Pieńkowska, Agnieszka Domalik, Dominika Jagus, Piotr Kasprzak, Rafał Matkowski, Łukasz Fura, Andrzej Nowicki & Norbert Żólek
The paper introduces the BrEaST dataset, a comprehensive collection of breast ultrasound scans for research and development in breast disease detection, tumor segmentation, and classification. The dataset includes 256 breast scans from 256 patients, each manually annotated by experienced radiologists using freehand annotations and labeled according to BIRADS features and lexicon. Histopathological classifications are provided for patients who underwent biopsies. The BrEaST dataset is the first to include patient-level labels, image-level annotations, and tumor-level labels, all confirmed by follow-up care or biopsy results. It is made publicly available under the CC-BY 4.0 license to support research in breast ultrasound imaging and machine learning applications. The dataset addresses the limitations of existing datasets by providing detailed annotations and labels, enhancing the reliability and utility for developing and evaluating algorithms. The paper also outlines the data collection, anonymization, annotation, and validation processes, ensuring the quality and integrity of the dataset.The paper introduces the BrEaST dataset, a comprehensive collection of breast ultrasound scans for research and development in breast disease detection, tumor segmentation, and classification. The dataset includes 256 breast scans from 256 patients, each manually annotated by experienced radiologists using freehand annotations and labeled according to BIRADS features and lexicon. Histopathological classifications are provided for patients who underwent biopsies. The BrEaST dataset is the first to include patient-level labels, image-level annotations, and tumor-level labels, all confirmed by follow-up care or biopsy results. It is made publicly available under the CC-BY 4.0 license to support research in breast ultrasound imaging and machine learning applications. The dataset addresses the limitations of existing datasets by providing detailed annotations and labels, enhancing the reliability and utility for developing and evaluating algorithms. The paper also outlines the data collection, anonymization, annotation, and validation processes, ensuring the quality and integrity of the dataset.