Multicollinearity and misleading statistical results

Multicollinearity and misleading statistical results

December 2019 | Jong Hae Kim
Multicollinearity refers to a high degree of linear correlation among explanatory variables in a multiple regression model, leading to unreliable regression results. Diagnostic tools include the variance inflation factor (VIF), condition index, condition number, and variance decomposition proportion (VDP). VIF is calculated as 1/(1-R²), where R² is the coefficient of determination from a regression model with one explanatory variable as the response and others as predictors. A VIF greater than 5-10 indicates multicollinearity. The condition index is the square root of the ratio of the maximum eigenvalue to each eigenvalue from the correlation matrix of standardized explanatory variables. A condition index above 10-30 suggests multicollinearity. VDPs help identify multicollinear variables by showing the extent of variance inflation according to each condition index. If two or more VDPs corresponding to a common condition index exceed 0.8-0.9, the variables are multicollinear. Removing multicollinear variables improves regression model stability. Multicollinearity can be addressed by excluding variables, combining variables into a single variable, using equations to replace collinear variables, or employing ridge regression. A numerical example using data from a study on liver regeneration after liver transplantation illustrates multicollinearity. The study found high VIFs and condition indices for some variables, indicating multicollinearity. Removing multicollinear variables improved model stability, reducing VIFs and increasing the significance of regression coefficients. Multicollinearity distorts regression results by inflating variance of coefficients, leading to unreliable probability values and confidence intervals. It is detected using VIF, condition indices, and VDPs. While VIF and condition indices indicate multicollinearity, VDPs identify specific variables involved. Removing multicollinear variables improves model reliability. However, excluding variables without proper analysis can lead to biased results. Ridge regression is an alternative to handle multicollinearity by including all variables in the model. The study highlights the importance of detecting and addressing multicollinearity to ensure accurate regression analysis.Multicollinearity refers to a high degree of linear correlation among explanatory variables in a multiple regression model, leading to unreliable regression results. Diagnostic tools include the variance inflation factor (VIF), condition index, condition number, and variance decomposition proportion (VDP). VIF is calculated as 1/(1-R²), where R² is the coefficient of determination from a regression model with one explanatory variable as the response and others as predictors. A VIF greater than 5-10 indicates multicollinearity. The condition index is the square root of the ratio of the maximum eigenvalue to each eigenvalue from the correlation matrix of standardized explanatory variables. A condition index above 10-30 suggests multicollinearity. VDPs help identify multicollinear variables by showing the extent of variance inflation according to each condition index. If two or more VDPs corresponding to a common condition index exceed 0.8-0.9, the variables are multicollinear. Removing multicollinear variables improves regression model stability. Multicollinearity can be addressed by excluding variables, combining variables into a single variable, using equations to replace collinear variables, or employing ridge regression. A numerical example using data from a study on liver regeneration after liver transplantation illustrates multicollinearity. The study found high VIFs and condition indices for some variables, indicating multicollinearity. Removing multicollinear variables improved model stability, reducing VIFs and increasing the significance of regression coefficients. Multicollinearity distorts regression results by inflating variance of coefficients, leading to unreliable probability values and confidence intervals. It is detected using VIF, condition indices, and VDPs. While VIF and condition indices indicate multicollinearity, VDPs identify specific variables involved. Removing multicollinear variables improves model reliability. However, excluding variables without proper analysis can lead to biased results. Ridge regression is an alternative to handle multicollinearity by including all variables in the model. The study highlights the importance of detecting and addressing multicollinearity to ensure accurate regression analysis.
Reach us at info@study.space
[slides and audio] Multicollinearity and misleading statistical results