My Engel Curve attempts to examine the relationship between median household income (by state), and per capita consumption spending on motor vehicles and parts. The median household income by state is taken from a 2010 report by the Census Bureau, and the consumption data was recorded from the FRED database, state by state, in 2010 $.

 

Model: Initial Regression

 

Analysis of Variance

Sum of           Mean

Source                   DF        Squares         Square    F Value    Pr > F

Model                     1      221858259      221858259      11.79    0.0012

Error                    49      921998248       18816291

Corrected Total          50     1143856507

Root MSE           4337.77486    R-Square     0.1940

Dependent Mean          40271    Adj R-Sq     0.1775

Coeff Var            10.77149

Parameter Estimates

Parameter      Standard

Variable    Label       DF      Estimate         Error   t Value   Pr > |t|

Intercept   Intercept    1         27843    3669.93928      7.59     <.0001

Expense     Expense      1      10.61806       3.09225      3.43     0.0012

While at first glance this regression appears to be statistically significant, it is important to carefully check to see if any unaccounted-for dynamic could be throwing the model off. To check for this, I ran a White Test for heteroskedasticity, to make sure the relationship between independent and dependent variables are similar across all ranges X-values.

proc reg;

model income=expense;

output out=resids residual=e;

run;

data white;

set resids;

title 'White Test';

e2= e**2;

income2=income**2;

proc reg data=white;

model e2 = income income2;

run;

 

Running this code gives a new regression, with an R^2 value of 0.0267.

This value (.0267) is then multiplied by n, or sample size (51), giving us our test statistic of 1.3617

1.3617 with 2 degrees of freedom means we cannot reject the null hypothesis of a White Test, suggesting that heteroscedasticity may not be a problem.

That said, I ran the regression anyway after taking the log of both variables, because we were supposed to transform the data to try for a better fit. The data is below, but even with logging each side the R^2 value barely increases.

Model: Double Log

Dependent Variable: lnincome

Analysis of Variance

Sum of           Mean

Source                   DF        Squares         Square    F Value    Pr > F

Model                     1        0.13141        0.13141      12.26    0.0010

Error                    49        0.52504        0.01072

Corrected Total          50        0.65645

 

Root MSE              0.10351    R-Square     0.2002

Dependent Mean       10.59681    Adj R-Sq     0.1839

Coeff Var             0.97684

Parameter Estimates

Parameter       Standard

Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        8.38966        0.63042      13.31      <.0001

lnexpense     1        0.31299        0.08937       3.50      0.0010

While T and F value appear to suggest that this Engel curve has a linear relationship, I would suggest there may be omitted variable that may be throwing the model off.

Engel Curve: Median Income and Spending on Motor Vehicles and Parts (by State)