This paper introduces three new heterogeneous distance functions—Heterogeneous Value Difference Metric (HVDM), Interpolated Value Difference Metric (IVDM), and Windowed Value Difference Metric (WVDM)—to handle both nominal and continuous attributes in instance-based learning. The Value Difference Metric (VDM) was previously used for nominal attributes but required discretization for continuous attributes, which could degrade generalization accuracy. The new functions address this by directly handling continuous attributes without discretization.
HVDM combines normalized Euclidean distance for linear attributes and VDM for nominal attributes. It uses normalization schemes to ensure both attribute types contribute equally to the overall distance. Experiments show that HVDM achieves higher classification accuracy than previous distance functions on datasets with both nominal and continuous attributes.
IVDM uses interpolation to estimate probabilities for continuous values, avoiding the need for discretization. It calculates probabilities at midpoints of discretized ranges and interpolates between them to provide a continuous approximation of the probability function. This approach avoids the loss of information that occurs with discretization.
WVDM further improves upon IVDM by sampling probabilities at more points within each discretized range, providing a closer approximation to the true probability function. This leads to more accurate distance measurements between continuous values.
Experiments on 48 datasets show that the new distance functions outperform previous ones in terms of generalization accuracy. HVDM achieves over 3% higher accuracy on average than Euclidean and HOEM metrics on datasets with nominal attributes. IVDM and WVDM also show improved performance compared to discretized VDM. These results suggest that the new distance functions are more appropriate for applications involving both nominal and continuous attributes.This paper introduces three new heterogeneous distance functions—Heterogeneous Value Difference Metric (HVDM), Interpolated Value Difference Metric (IVDM), and Windowed Value Difference Metric (WVDM)—to handle both nominal and continuous attributes in instance-based learning. The Value Difference Metric (VDM) was previously used for nominal attributes but required discretization for continuous attributes, which could degrade generalization accuracy. The new functions address this by directly handling continuous attributes without discretization.
HVDM combines normalized Euclidean distance for linear attributes and VDM for nominal attributes. It uses normalization schemes to ensure both attribute types contribute equally to the overall distance. Experiments show that HVDM achieves higher classification accuracy than previous distance functions on datasets with both nominal and continuous attributes.
IVDM uses interpolation to estimate probabilities for continuous values, avoiding the need for discretization. It calculates probabilities at midpoints of discretized ranges and interpolates between them to provide a continuous approximation of the probability function. This approach avoids the loss of information that occurs with discretization.
WVDM further improves upon IVDM by sampling probabilities at more points within each discretized range, providing a closer approximation to the true probability function. This leads to more accurate distance measurements between continuous values.
Experiments on 48 datasets show that the new distance functions outperform previous ones in terms of generalization accuracy. HVDM achieves over 3% higher accuracy on average than Euclidean and HOEM metrics on datasets with nominal attributes. IVDM and WVDM also show improved performance compared to discretized VDM. These results suggest that the new distance functions are more appropriate for applications involving both nominal and continuous attributes.