My Engel Curve attempts to examine the relationship between median household income (by state), and per capita consumption spending on motor vehicles and parts. The median household income by state is taken from a 2010 report by the Census Bureau, and the consumption data was recorded from the FRED database, state by state, in 2010 $.

Model: Initial Regression

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 1 221858259 221858259 11.79 0.0012

Error 49 921998248 18816291

Corrected Total 50 1143856507

Root MSE 4337.77486 R-Square 0.1940

Dependent Mean 40271 Adj R-Sq 0.1775

Coeff Var 10.77149

Parameter Estimates

Parameter Standard

Variable Label DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 27843 3669.93928 7.59 <.0001

Expense Expense 1 10.61806 3.09225 3.43 0.0012

While at first glance this regression appears to be statistically significant, it is important to carefully check to see if any unaccounted-for dynamic could be throwing the model off. To check for this, I ran a White Test for heteroskedasticity, to make sure the relationship between independent and dependent variables are similar across all ranges X-values.

procreg; model income=expense; output out=resids residual=e;run;datawhite; set resids; title 'White Test'; e2= e**2; income2=income**2;procregdata=white; model e2 = income income2;run;

Running this code gives a new regression, with an R^2 value of 0.0267.

This value (.0267) is then multiplied by n, or sample size (51), giving us our test statistic of 1.3617

1.3617 with 2 degrees of freedom means we cannot reject the null hypothesis of a White Test, suggesting that heteroscedasticity may not be a problem.

That said, I ran the regression anyway after taking the log of both variables, because we were supposed to transform the data to try for a better fit. The data is below, but even with logging each side the R^2 value barely increases.

Model: Double Log

Dependent Variable: lnincome

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 1 0.13141 0.13141 12.26 0.0010

Error 49 0.52504 0.01072

Corrected Total 50 0.65645

Root MSE 0.10351 R-Square 0.2002

Dependent Mean 10.59681 Adj R-Sq 0.1839

Coeff Var 0.97684

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 8.38966 0.63042 13.31 <.0001

lnexpense 1 0.31299 0.08937 3.50 0.0010

While T and F value appear to suggest that this Engel curve has a linear relationship, I would suggest there may be omitted variable that may be throwing the model off.