In an attempt to create an Engel curve for alcohol consumption, I collected state-level, cross-sectional data for the United States in 2007 on income by state (INCTHi) and alcohol consumption (ALCi). I chose 2007 because it was a trough for unemployment and I assumed this would capture an interesting snapshot of the economy at heightened consumption levels before the spike in unemployment and economic downturn. The data for income by state is real median income based to 2014, pulled from the US Census Bureau. The income data was divided by $1,000 in SAS to make the regression output more interpretable. The data for alcohol consumption is measured by gallons of ethanol per capita, per year, and taken from a 2011 study done by the National Institute on Alcohol Abuse and Alcoholism. The Engel curve for this bivariate regression is shown below with the corresponding equation.
ALCi = β0 + β1 INCTHi + ui
The Engel curve is upward sloping, suggesting alcohol is a normal good. This bivariate regression resulted in the following output:
The regression shows that for every $1,000 increase in income, alcohol consumption goes up by 0.02135 gallons of ethanol. To make this regression more digestible, I logged alcohol consumption and produced the following model and output:
lnALCi = β0 + β1 INCTHi + u i
This new logged model suggests that a $1,000 increase in income causes a 0.81% increase in alcohol consumption. Disappointingly, both models yielded low R2, and the log-linear model left just under 87% of the variation in alcohol consumption unexplained. However, in both models the independent variable was significant at the 5% level. While income is a determinant of alcohol consumption, there are other factors at play that help to determine it, indicating that the model suffers from omitted variables bias. In order to combat this, I attempted to find other independent variables which may help account for alcohol consumption. I chose to create a dummy variable for whether or not a state had some form of government-controlled ABC store, with the state taking a “0” value if the state did have an ABC presence and a “1” value if it did not have an ABC presence. Additionally, I included state level unemployment with the thought being that states with higher unemployment may be conducive to higher alcohol consumption. The new model is shown below.
lnALCi = β0 + β1 INCTHi + β2 ABCi + β3 UNEMPi + u i
Unfortunately, the inclusion of these two additional variables failed to significantly increase R2, explaining only 14.47% of the variation in alcohol consumption, and neither new variable was significant at any commonly accepted level. Income remained robust, holding significance across all models. The output of the regression above is shown below:
To better understand the variance in alcohol consumption it’s clear we need better independent variables on top of income. A possible shortfall of my data was just the use of 2007 data, whereas it may have been better to do a panel data study which accounted for a longer time period. A different approach than mine may be to do panel data over a longer period of time, keeping ABC and UNEMP as independent variables, as well as finding ways to control for various state level measures such as high school graduation rate, family structure, or population density.