The simple Bayesian classifier is optimal under zero-one loss even when attributes are not independent given the class, contrary to the common assumption that it requires attribute independence. This article shows that the Bayesian classifier's region of zero-one loss optimality is much larger than its region of quadratic loss optimality. The classifier can be optimal under zero-one loss even when attribute independence is violated, as demonstrated by its performance on learning conjunctions and disjunctions, which violate the independence assumption. Empirical studies show that the Bayesian classifier often outperforms more complex classifiers in many domains, even when its assumptions are not met. The article also shows that detecting attribute dependence is not necessarily the best way to improve the Bayesian classifier. The Bayesian classifier's performance is influenced by the distribution of examples and the structure of the data. The article derives necessary and sufficient conditions for the local and global optimality of the Bayesian classifier under zero-one loss. It shows that the Bayesian classifier is globally optimal for certain concept classes, such as conjunctions, and that its range of applicability is broader than previously thought. The article concludes that the Bayesian classifier has significant practical value and that its performance can be improved by considering the distribution of examples and the structure of the data.The simple Bayesian classifier is optimal under zero-one loss even when attributes are not independent given the class, contrary to the common assumption that it requires attribute independence. This article shows that the Bayesian classifier's region of zero-one loss optimality is much larger than its region of quadratic loss optimality. The classifier can be optimal under zero-one loss even when attribute independence is violated, as demonstrated by its performance on learning conjunctions and disjunctions, which violate the independence assumption. Empirical studies show that the Bayesian classifier often outperforms more complex classifiers in many domains, even when its assumptions are not met. The article also shows that detecting attribute dependence is not necessarily the best way to improve the Bayesian classifier. The Bayesian classifier's performance is influenced by the distribution of examples and the structure of the data. The article derives necessary and sufficient conditions for the local and global optimality of the Bayesian classifier under zero-one loss. It shows that the Bayesian classifier is globally optimal for certain concept classes, such as conjunctions, and that its range of applicability is broader than previously thought. The article concludes that the Bayesian classifier has significant practical value and that its performance can be improved by considering the distribution of examples and the structure of the data.