The paper introduces a method for combining independent evidence using p-values to assess the significance of sequence homology searches. The method involves calculating the product of p-values from multiple motifs to determine the statistical significance of a sequence's match to a family of sequences. This approach is applied to sequence homology searches, where the goal is to identify sequences that are homologous to a given set of motifs. The method is implemented in the MAST algorithm, which calculates the product of p-values for each motif and combines them to produce a final p-value for the sequence. This final p-value is used to assess the significance of the sequence's match to the family of sequences.
The method is based on the statistical distribution of the product of independent p-values, which is derived from the distribution of the product of independent, uniform random variables. The distribution of the product of p-values is calculated using the QFAST algorithm, which is efficient and accurate. The algorithm calculates the distribution of the product of p-values by using the formula $ F_{n}(p) = p \sum_{i=0}^{n-1} \frac{(-\ln p)^i}{i!} $ for $ 0 < p \leq 1 $.
The method is validated by comparing the accuracy of the p-values calculated by the QFAST algorithm with the expected distribution of p-values. The results show that the p-values calculated by the QFAST algorithm are accurate and effective in classifying sequences into families. The method is also shown to be effective in improving the sensitivity and selectivity of sequence homology searches. The method is particularly useful when the motifs are independent and the match scores are continuous. However, when the motifs are correlated, the p-values may be overestimated, and the method may need to be adjusted to account for this.
The method is applied to a variety of sequence families and has been shown to be effective in identifying homologous sequences. The results demonstrate that the product of p-values provides a more accurate and sensitive measure of sequence similarity than other methods. The method is also shown to be effective in reducing the number of false positives in sequence homology searches. The method is implemented in the MAST algorithm, which is available for use and download. The algorithm is efficient and accurate, and it provides a statistically valid measure of the significance of sequence similarity. The method is a valuable tool for sequence homology searches and has the potential to improve the accuracy and sensitivity of sequence comparisons.The paper introduces a method for combining independent evidence using p-values to assess the significance of sequence homology searches. The method involves calculating the product of p-values from multiple motifs to determine the statistical significance of a sequence's match to a family of sequences. This approach is applied to sequence homology searches, where the goal is to identify sequences that are homologous to a given set of motifs. The method is implemented in the MAST algorithm, which calculates the product of p-values for each motif and combines them to produce a final p-value for the sequence. This final p-value is used to assess the significance of the sequence's match to the family of sequences.
The method is based on the statistical distribution of the product of independent p-values, which is derived from the distribution of the product of independent, uniform random variables. The distribution of the product of p-values is calculated using the QFAST algorithm, which is efficient and accurate. The algorithm calculates the distribution of the product of p-values by using the formula $ F_{n}(p) = p \sum_{i=0}^{n-1} \frac{(-\ln p)^i}{i!} $ for $ 0 < p \leq 1 $.
The method is validated by comparing the accuracy of the p-values calculated by the QFAST algorithm with the expected distribution of p-values. The results show that the p-values calculated by the QFAST algorithm are accurate and effective in classifying sequences into families. The method is also shown to be effective in improving the sensitivity and selectivity of sequence homology searches. The method is particularly useful when the motifs are independent and the match scores are continuous. However, when the motifs are correlated, the p-values may be overestimated, and the method may need to be adjusted to account for this.
The method is applied to a variety of sequence families and has been shown to be effective in identifying homologous sequences. The results demonstrate that the product of p-values provides a more accurate and sensitive measure of sequence similarity than other methods. The method is also shown to be effective in reducing the number of false positives in sequence homology searches. The method is implemented in the MAST algorithm, which is available for use and download. The algorithm is efficient and accurate, and it provides a statistically valid measure of the significance of sequence similarity. The method is a valuable tool for sequence homology searches and has the potential to improve the accuracy and sensitivity of sequence comparisons.