Sam Felling
The consumption of water and the effects income has on these consumption levels across the 50 states intrigued me and as such I decided to focus my SAS assignment on estimating this Engel curve. The consumption of water is a good that does not have a ton of data available on it, and as such I had to adjust accordingly. In 2010 the U.S department of the interior and U.S Geological survey came together to form a report on the consumption of water both domestically and industrially across the 50 states. This data released was the data I used to estimate my dependent variable in this OLS regression. Because the data was average consumption of water daily in millions of gallons in 2010, my independent variables also had to be statewide data across in 2010. The variables I focuses on were “income” which was median household income in 2010 across the states, “unemployment” which was the average rate of unemployment across the states in 2010, “population” which was the gross population across the states in 2010, and finally “bachelors” which was the percent of population across the states that were 25 and older who held bachelor’s degrees or higher in 2010. After running the initial regression, I received the following output.
The SAS System |
The REG Procedure
Model: MODEL1
Dependent Variable: consumtion consumtion
Number of Observations Read | 51 |
Number of Observations Used | 51 |
Analysis of Variance | |||||
Source | DF | Sum of Squares |
Mean Square |
F Value | Pr > F |
Model | 4 | 1621962144 | 405490536 | 34.77 | <.0001 |
Error | 46 | 536483889 | 11662693 | ||
Corrected Total | 50 | 2158446032 |
Root MSE | 3415.06855 | R-Square | 0.7514 |
Dependent Mean | 6888.80588 | Adj R-Sq | 0.7298 |
Coeff Var | 49.57417 |
Parameter Estimates | ||||||
Variable | Label | DF | Parameter Estimate |
Standard Error |
t Value | Pr > |t| |
Intercept | Intercept | 1 | 7707.91450 | 4178.30478 | 1.84 | 0.0715 |
income | income | 1 | -0.11330 | 0.09324 | -1.22 | 0.2305 |
unemployment | unemployment | 1 | 103.92834 | 278.62980 | 0.37 | 0.7109 |
population | population | 1 | 0.00081677 | 0.00007821 | 10.44 | <.0001 |
Bachelors | Bachelors | 1 | -36.34565 | 132.01076 | -0.28 | 0.7843 |
Population was the only significant parameter estimated in this model and that seemed to be a given. All the other variables could not reject the null of not being significant. The only P value of significance was population with a p value that was so small that it could be accepted at 1 percent level of significance. This model though did have significance and approximately 75 percent of variation in water consumption was explained by income, unemployment, education, and population. Below is graph of the Engel Curve of Water consumption when it is the only variable.
As shown in the graph above when income is the only variable the model predicts a non-significant downward slope, implying a inverse relationship between income and consumption of water. This may be due to as income goes up your values change, but also that you can afford appliances and various items than are more energy efficient. However, the p value shown below once again indicated a non-significant relationship. Show below is the output from the bivariate regression. The P value was .45 and very weak. In the regression with all of the variables such as education, income ,and unemployment the p value was still high at .23 and still insignificant.
The SAS System |
The REG Procedure
Model: MODEL1
Dependent Variable: consumtion consumtion
Number of Observations Read | 51 |
Number of Observations Used | 51 |
Analysis of Variance | |||||
Source | DF | Sum of Squares |
Mean Square |
F Value | Pr > F |
Model | 1 | 25225530 | 25225530 | 0.58 | 0.4502 |
Error | 49 | 2133220502 | 43535112 | ||
Corrected Total | 50 | 2158446032 |
Root MSE | 6598.11430 | R-Square | 0.0117 |
Dependent Mean | 6888.80588 | Adj R-Sq | -0.0085 |
Coeff Var | 95.78023 |
Parameter Estimates | ||||||
Variable | Label | DF | Parameter Estimate |
Standard Error |
t Value | Pr > |t| |
Intercept | Intercept | 1 | 11256 | 5810.63794 | 1.94 | 0.0585 |
income | income | 1 | -0.08737 | 0.11478 | -0.76 | 0.4502 |
To correct for problems with my model I had to first determine some potential problems. I believed high standard errors may have been in fact due to correlation between the variables. After retrieving these results my very first initial concern was the unusually high standard error for unemployment and Bachelors. As such I decided to create a double log model that should help correct for some of the autocorrelation. In this model “lncon” is the natural log of2010 water consumption across states, “lnunem” is the natural log of 2010 unemployment rate across states, “lnpop” is the natural log of 2010 population across states, and “lnbach” is the natural log of 2010 percent of population 25 and over across the states who held bachelor’s degrees or higher. The model results are shown below.
Water consumption Engel Curve |
The REG Procedure
Model: MODEL1
Dependent Variable: lncon lncon
Number of Observations Read | 51 |
Number of Observations Used | 51 |
Analysis of Variance | |||||
Source | DF | Sum of Squares |
Mean Square |
F Value | Pr > F |
Model | 4 | 100.09811 | 25.02453 | 17.37 | <.0001 |
Error | 46 | 66.25850 | 1.44040 | ||
Corrected Total | 50 | 166.35660 |
Root MSE | 1.20017 | R-Square | 0.6017 |
Dependent Mean | 8.22613 | Adj R-Sq | 0.5671 |
Coeff Var | 14.58969 |
Parameter Estimates | ||||||
Variable | Label | DF | Parameter Estimate |
Standard Error |
t Value | Pr > |t| |
Intercept | Intercept | 1 | -15.41774 | 16.03432 | -0.96 | 0.3413 |
lnincome | lnincome | 1 | 2.49133 | 1.78913 | 1.39 | 0.1705 |
lnunem | lnunem | 1 | -1.56430 | 0.85442 | -1.83 | 0.0736 |
lnpop | lnpop | 1 | 1.25736 | 0.19299 | 6.52 | <.0001 |
lnbach | lnbach | 1 | -5.71094 | 1.46567 | -3.90 | 0.0003 |
When using the double log model we find increasing significant results and much lower standard errors. In this model we can see that income is still downward sloping and insignificant, but the p value was much lower. While the R squared value (.6020 was lower than the first model, this seems to be a more realistic explanation of water consumption by the variables. In this model lnpop and lnbach are significant at 1 percent level of significance, and lnunem is significant at 10 percent level of significance. The following graphs are fit plots and residual regressors for the double log model.
Ultimately though the double log model explains less, it makes more sense. Population is of course going to impact levels of consumption. If population in increased it is excepted water consumption will increase along with that population. If population is increased by 10 percent than water consumption than the model predicts holding all else equal that water consumption will increase by around 12 percent. Education has as downward slope as well. If you are more educated you may be more inclined to conserve more water due to cultural values. If a state has higher unemployment than the model predicts a decline in the usage of water. This make sense because if unemployment is higher than less people will be able to afford more water. An interesting note is that in this model income is not significant but it is estimated as upward sloping.
Ultimately when it comes to determine the Engel Curve of water consumption, the results from my study were inconclusive. Other variables were found to be significant, but income was not. This however is telling. Water consumption may be inelastic when it comes to income, but once again one model found a upward sloping relationship and the other a downward sloping relationship (both not significant o values). One potential flaw with this study could be the fact that there is no clear indicator or variable for price across the states. If a variable for price of water across the states in 2010 was added, the results could become more significant. However, there is no current proxy for price that seemed to be a variable that is usable. Only a few select cities have data available on average price of water, and it seemed illogical to use a few select cities as usable data for all the states.
Data obtained from:
https://pubs.usgs.gov/circ/1405/pdf/circ1405.pdf
https://www.bls.gov/lau/lastrk10.htm