__Engel Curve Assignment__

For this assignment I was interested in how aggregate food expenditures fluctuated with aggregate disposable income. I obtained annual aggregate food expenditure information, in 1988 dollars, dating from 1953-2014 from the website of the United States Department of Agriculture (USDA). Consequently, from the USDA I also obtained a dataset containing information concerning the annual per capita available food supply adjusted for loss, in Kilocalories, dating from 1953-2014. From the Federal Reserve of St. Louis (FRED) I obtained a dataset chronicling the U.S. civilian noninstitutional population per year from 1953-2014. I also obtained, from FRED, a dataset containing seasonally adjusted and end of period aggregated real disposable personal income for the period 1953-2014 as well as the annual personal savings rate as a percentage of disposable income, for the same period, also aggregated end of period.

My initial model OLS model was as follows:

*CPEXP*=* INTERCEPT+RDI +POP +SUPPLY +SAVINGR*

Where *CPEXP *is annual aggregate food expenditures in millions of 1988 dollars, *RDI *is annual seasonally adjusted real disposable income in billions of 2009 chained dollars, *POP *is annual civilian noninstitutional population of the U.S. in thousands of persons, *SUPPLY* is the loss-adjusted annual supply of kilocalories available per capita in the U.S., and *SAVINGR* is the annual personal savings as a percentage of personal disposable income. This model yielded the following SAS output:

As can be seen from the output, my initial R-squared and adjusted R-squared indicated that my model had good fit, with over 99% of the variance in *CPEXP* explained by the variance in the statistically significant independent variables. Since this assignment concerns an Engel curve I will only talk briefly about the variables *POP, SUPPLY*, and *SAVINGR* and focus instead on the effect that *RDI* has on *CPEXP. *In my initial model the effect of *RDI* was statistically significant at the 5% level, as well as the 1% level as indicated by the p-value, and the parameter estimate predicates that an increase of $1 billion in aggregate real disposable income would cause an increase of $21.27 million in aggregate food expenditures holding the supply of food, as well as the civilian noninstitutional population, and personal savings rate as a percentage of personal disposable income constant. Consequently*, POP* is also significant at the 5% and 1% level but both *SUPPLY* and *SAVINGR* are insignificant at the 5% level.

However, looking at the residual plots of the independent variables, there is a hint that autocorrelation is present given the cyclical nature of the residuals. To check for this I conducted a Durbin-Watson test and obtained the following output:

Given this result, I am able to conclude that my model is suffering from positive autocorrelation since the Durbin-Watson statistic is 0.693 but the dL is 1.455 and the dU is 1.729 at the 5% significance level. To correct for this, I run a second regression model, using the same variables as in the above model, but using the SAS *autoreg* procedure with the maximum likelihood estimator enabled, and lag of 3. This autoregressive model yields the following parameter estimates:

The Durbin-Watson test statistic of 1.9230 indicates that, at the 5% significance level with dL being 1.455 and dU being 1.789, the autocorrelation has been corrected. The parameter estimate for *RDI* indicates that, at the 5% as well as 1% significance level, an increase of $1 billion in real disposable income leads to an additional $21.5 million in aggregate expenditures on food holding supply of food, civilian noninstitutional population, and personal savings as a percentage of personal disposable income constant. Consequently, *POP* is statistically significant at the 1% and 5% significance level and *SAVINGR* is significant at the 5% level. However, *SUPPLY* still remains insignificant at the 5% level. Additionally, the R-squared of this model indicates that 99% of the variation in CPEXP is explained by the variation in the statistically significant dependent variables.

Finally, in order to create a visual of the Engel Curve for aggregate food expenditures in the U.S., I create a scatter plot of the RDI on the vertical axis and CPEXP on the horizontal. The output is given below and shows a clear positive relationship between the two variables: