Variance Inflation Factor VIF

Contents

1 Variance Inflation Factor (VIF)

Variance Inflation Factor (VIF)

What Is a Variance Inflation Factor (VIF)?

A VIF is a measure of multicollinearity in regression analysis. Multicollinearity occurs when there is a correlation between independent variables in a multiple regression model, negatively impacting the regression results. The VIF estimates the inflation of variance in a regression coefficient due to multicollinearity.

Key Takeaways

The VIF measures multicollinearity among independent variables in a multiple regression model.
Identifying multicollinearity is important because even though it does not reduce the explanatory power of the model, it diminishes the statistical significance of independent variables.
A high VIF for an independent variable indicates a collinear relationship with other variables, requiring consideration or adjustment in the model’s structure and variable selection.

Understanding the VIF

A VIF helps identify the degree of multicollinearity. Multiple regression is employed to test the impact of various variables on a specific outcome. The outcome, referred to as the dependent variable, is affected by independent variables, which serve as inputs for the model. Multicollinearity exists when there is a linear relationship or correlation between one or more independent variables.

The Problem of Multicollinearity

Multicollinearity creates issues in the multiple regression model, as the inputs influence one another. Consequently, the inputs are not truly independent, making it challenging to determine how the combination of independent variables affects the dependent variable within the model.

Although multicollinearity does not diminish the overall predictive power of a model, it can lead to regression coefficients that lack statistical significance. It can be seen as a form of double-counting within the model.

In statistical terms, a multiple regression model with high multicollinearity makes it challenging to estimate the relationship between each independent variable and the dependent variable. When two or more independent variables closely relate or measure similar aspects, the underlying effect they measure is accounted for multiple times across the variables. Consequently, it becomes difficult to determine which variable influences the dependent variable.

Even small changes in the data or model equation can lead to significant and volatile changes in the estimated coefficients of independent variables. This presents a problem since many econometric models aim to test this statistical relationship between independent and dependent variables.

Tests to Address Multicollinearity

To ensure proper model specification and functioning, tests for multicollinearity can be conducted. The VIF is one such tool used to measure this issue. It determines how much an independent variable’s behavior (variance) is influenced or inflated by its interaction or correlation with other independent variables.

The VIF provides a rapid means to evaluate how a variable contributes to the standard error in the regression. When significant multicollinearity exists, the VIF value for the involved variables will be high. Once these variables are identified, various approaches can be employed to eliminate or combine collinear variables, thereby resolving the multicollinearity issue.

Formula and Calculation of VIF

The formula for VIF is:

VIFi = 1 / (1 – Ri2) , where: Ri2 = Unadjusted coefficient of determination for regressing the ith independent variable on the remaining ones

What Can VIF Tell You?

When Ri2 is equal to 0 (VIF or tolerance equal to 1), the ith independent variable is not correlated with the remaining variables, indicating the absence of multicollinearity.

In general terms:

VIF equal to 1 = variables are not correlated
VIF between 1 and 5 = variables are moderately correlated
VIF greater than 5 = variables are highly correlated

A higher VIF suggests a greater likelihood of multicollinearity, requiring further investigation. When VIF exceeds 10, significant multicollinearity needs correction.

Example of Using VIF

Suppose an economist aims to test the statistical relationship between unemployment rate (independent variable) and inflation rate (dependent variable). Including additional independent variables related to the unemployment rate, such as new initial jobless claims, could introduce multicollinearity to the model.

The overall model might show strong explanatory power, yet fail to determine whether the effect is largely due to the unemployment rate or new initial jobless claims. This is where the VIF can help identify the issue, suggesting the removal of a variable or finding a way to consolidate them to capture their joint effect based on the specific hypothesis being tested.

What Is a Good VIF Value?

As a rule of thumb, a VIF of three or below is not a cause for concern. Higher VIF values indicate less reliable regression results.

What Does a VIF of 1 Mean?

A VIF of one indicates no correlation between variables and the absence of multicollinearity in the regression model.

What Is VIF Used For?

VIF measures the strength of correlation between independent variables in regression analysis. This correlation, known as multicollinearity, can pose problems for regression models.

The Bottom Line

While a moderate level of multicollinearity is acceptable in a regression model, significant multicollinearity should raise concerns.

Various measures can be taken to address high multicollinearity. First, one or more highly correlated variables can be removed since their information is redundant. The other method involves using principal components analysis or partial least squares regression instead of ordinary least squares regression. These methods can respectively reduce the variables to a smaller set with no correlation or create new uncorrelated variables, improving model predictability.