[slides] Improved Heterogeneous Distance Functions

Instance-based learning techniques often struggle with nominal input attributes, while continuous attributes are typically handled well. The Value Difference Metric (VDM) is designed to handle nominal attributes but ignores continuous attributes, requiring discretization. This paper introduces three new heterogeneous distance functions: the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These functions are designed to handle applications with both nominal and continuous attributes. Experiments on 48 datasets show that the new distance metrics achieve higher classification accuracy than previous methods on datasets with both nominal and continuous attributes. The paper also discusses the choice of distance functions and their impact on learning algorithms' bias and generalization accuracy.Instance-based learning techniques often struggle with nominal input attributes, while continuous attributes are typically handled well. The Value Difference Metric (VDM) is designed to handle nominal attributes but ignores continuous attributes, requiring discretization. This paper introduces three new heterogeneous distance functions: the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These functions are designed to handle applications with both nominal and continuous attributes. Experiments on 48 datasets show that the new distance metrics achieve higher classification accuracy than previous methods on datasets with both nominal and continuous attributes. The paper also discusses the choice of distance functions and their impact on learning algorithms' bias and generalization accuracy.

Improved Heterogeneous Distance Functions

1997 | D. Randall Wilson, Tony R. Martinez