Understanding Robustness of cancer microbiome signals over a broad range of methodological variation

The authors address concerns raised by Gihawi et al. regarding the robustness of cancer-specific microbial signals identified in The Cancer Genome Atlas (TCGA). They conducted extensive re-analyses of their original data and developed new methods to validate their findings. Key points include: 1. **Batch Correction**: They tested batch correction using Voom-SNM and ConQuR, finding no systematic bias or equivalent ML model performances between raw and corrected data. This suggests that batch correction did not introduce artificial biases. 2. **Database Contamination**: They used Conteminator to assess human contamination in their databases, finding low levels (<1%) of human sequences. They also developed Exhaustive, a more sensitive method for cleaning databases, which confirmed minimal human contamination. 3. **Microbial Read Differences**: They demonstrated that the reduction in microbial reads was due to sequential host depletion steps, not database contamination. They showed that the number of microbial reads was significantly correlated with input non-human reads, indicating that the true level of microbial signals was being asymptotically approximated. 4. **Cancer Type-Specificity**: They repeated their analyses using updated methods and found that cancer type-specific microbial signatures remained robust. They concluded that their original findings are valid and that the conclusions about cancer type-specific microbiomes are supported by their re-analyses and updated methods. These results validate the original findings and show that the cancer type-specific microbial signals identified in TCGA are robust to methodological variations.The authors address concerns raised by Gihawi et al. regarding the robustness of cancer-specific microbial signals identified in The Cancer Genome Atlas (TCGA). They conducted extensive re-analyses of their original data and developed new methods to validate their findings. Key points include: 1. **Batch Correction**: They tested batch correction using Voom-SNM and ConQuR, finding no systematic bias or equivalent ML model performances between raw and corrected data. This suggests that batch correction did not introduce artificial biases. 2. **Database Contamination**: They used Conteminator to assess human contamination in their databases, finding low levels (<1%) of human sequences. They also developed Exhaustive, a more sensitive method for cleaning databases, which confirmed minimal human contamination. 3. **Microbial Read Differences**: They demonstrated that the reduction in microbial reads was due to sequential host depletion steps, not database contamination. They showed that the number of microbial reads was significantly correlated with input non-human reads, indicating that the true level of microbial signals was being asymptotically approximated. 4. **Cancer Type-Specificity**: They repeated their analyses using updated methods and found that cancer type-specific microbial signatures remained robust. They concluded that their original findings are valid and that the conclusions about cancer type-specific microbiomes are supported by their re-analyses and updated methods. These results validate the original findings and show that the cancer type-specific microbial signals identified in TCGA are robust to methodological variations.