My Engel Curve attempts to examine the relationship between median household income (by state), and per capita consumption spending on motor vehicles and parts. The median household income by state is taken from a 2010 report by the Census Bureau, and the consumption data was recorded from the FRED database, state by state, in 2010 $.
Model: Initial Regression
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 221858259 221858259 11.79 0.0012
Error 49 921998248 18816291
Corrected Total 50 1143856507
Root MSE 4337.77486 R-Square 0.1940
Dependent Mean 40271 Adj R-Sq 0.1775
Coeff Var 10.77149
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 27843 3669.93928 7.59 <.0001
Expense Expense 1 10.61806 3.09225 3.43 0.0012
While at first glance this regression appears to be statistically significant, it is important to carefully check to see if any unaccounted-for dynamic could be throwing the model off. To check for this, I ran a White Test for heteroskedasticity, to make sure the relationship between independent and dependent variables are similar across all ranges X-values.
proc reg; model income=expense; output out=resids residual=e; run; data white; set resids; title 'White Test'; e2= e**2; income2=income**2; proc reg data=white; model e2 = income income2; run;
Running this code gives a new regression, with an R^2 value of 0.0267.
This value (.0267) is then multiplied by n, or sample size (51), giving us our test statistic of 1.3617
1.3617 with 2 degrees of freedom means we cannot reject the null hypothesis of a White Test, suggesting that heteroscedasticity may not be a problem.
That said, I ran the regression anyway after taking the log of both variables, because we were supposed to transform the data to try for a better fit. The data is below, but even with logging each side the R^2 value barely increases.
Model: Double Log
Dependent Variable: lnincome
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 0.13141 0.13141 12.26 0.0010
Error 49 0.52504 0.01072
Corrected Total 50 0.65645
Root MSE 0.10351 R-Square 0.2002
Dependent Mean 10.59681 Adj R-Sq 0.1839
Coeff Var 0.97684
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 8.38966 0.63042 13.31 <.0001
lnexpense 1 0.31299 0.08937 3.50 0.0010
While T and F value appear to suggest that this Engel curve has a linear relationship, I would suggest there may be omitted variable that may be throwing the model off.