global basedir http://personalpages.manchester.ac.uk/staff/mark.lunt global datadir $basedir/stats/5_LinearModels1/data use $datadir/anscombe scatter Y1 x1, xlab(0 (5) 20) ylab(0 (5) 15) scatter Y2 x1, xlab(0 (5) 20) ylab(0 (5) 15) scatter Y3 x1, xlab(0 (5) 20) ylab(0 (5) 15) scatter Y4 x2, xlab(0 (5) 20) ylab(0 (5) 15) regress Y1 x1 regress Y2 x1 regress Y3 x1 regress Y4 x2 sysuse auto, clear regress mpg weight * 2.1 Yes: the coefficient for weight is very significantly different from 0 * 2.2. 65.15%: this is given by R-squared * 2.3 A reduction of 0.006 mpg lincom _cons + 3000 * weight * 2.4 21.4 mpg, with a 95% CI of (20.6, 22.2) * 2.5 No, because there are no vehicles this light in the dataset use "$datadir/constvar" regress y x * 3.1 Yes, p=0.000 predict rstand, rstand predict yhat scatter rstand yhat graph export graph1.eps, replace * 3.2 The variance (the spread of the data) increases as the fitted value increases hettest * 3.3 hettest confirms that the variance is not constant rvfplot * 3.4 Yes: there is very little difference between these two plots graph export graph2.eps, replace gen ly = ln(y) regress ly x predict rstand2, rstand predict yhat2 scatter rstand2 yhat2 graph export graph3.eps, replace * 3.5 There is no longer evidence of changing variance hettest * 3.6 This is confirmed by hettest use $datadir/wood73, clear scatter Y x1 graph export graph4.eps, replace scatter Y x2 graph export graph5.eps, replace regress Y x1 x2 cprplot x1 graph export graph6.eps, replace * 3.9 Y against x1 looks non-linear cprplot x2 graph export graph7.eps, replace * 3.9 Y against x2 looks reasonably linear gen x3 = x1^2 regress Y x1 x2 x3 * 3.10 Yes, the coefficient for x3 is highly significant, so after adjusting for x1 and x3, it is a significant predictor cprplot x1 graph export graph8.eps, replace cprplot x2 graph export graph9.eps, replace cprplot x3 graph export graph10.eps, replace * 3.11 No, the non-linearity has been removed predict Yhat scatter Y Yhat graph export graph11.eps, replace * 3.12 The correlation between observed and predicted values is extremely high, so the regression model is producing excellent predictions * This is to be expected, since R-squared was well over 99% use $datadir/lifeline, clear regress age lifeline * 3.13 Yes: p = 0.009 scatter age lifeline graph export graph12.eps, replace * 3.14 There is a single outlier in the bottm right cormer of the plot * 3.15 This point has high leverage, and so should have a large effect on the regression predict predage predict cooksd, cooksd scatter cooksd predage graph export graph13.eps, replace * 3.16 Certainly 1, possibly 2 summarize cooksd, det regress age lifeline if cooksd < 1 * 3.17 Effect of lifeline is no longer significant regress age lifeline if cooksd < 0.1 * 3.18 The association between age and lifeline is still not significant * 3.19 There is no association between age and lifeline in general, the apparent association was caused by a single unusual observation regress age lifeline predict rstand, rstand qnorm rstand * 3.20 The plot is reasonabley linear: no points stand out asbeing unusual swilk rstand * 3.21 Yes: there is no evidence against the null hypothesis of a normal distribution use $datadir/hsng, clear regress rent hsngval hsnggrow hsng faminc * 4.1 50 * 4.2 All 4 * 4.3 0.65 (0.45, 0.84) * 4.4 For each 1% increase in housing growth, the mean rent increases by about 65 cents * The true rent increase is probably between 45 and 84 cents * 4.5 R-squared is 0.9, so the model accounts for 90% of the variation in rents predict rstand, rstand predict pred_val scatter rstand pred_val graph export graph14.eps, replace hettest * 4.6 There is a slight suggestion of less variation for smaller fitted values, but it is only slight * Using hettest, it is of borderline significance rvfplot graph export graph15.eps, replace * 4.7 This plot is very similar to the previous one cprplot faminc graph export graph16.eps, replace cprplot hsng graph export graph17.eps, replace cprplot hsnggrow graph export graph18.eps, replace cprplot hsngval graph export graph19.eps, replace * 4.8 There is no sign of non-linearity in any of the plots predict cooksd, cooksd scatter cooksd pred_val graph export graph20.eps, replace * 4.9 There is one point with a large Cook's distance list if cooksd > 0.4 * 4.10 Alaska regress rent hsngval hsnggrow hsng faminc regress rent hsngval hsnggrow hsng faminc if cooksd < 0.5 * 4.11 They all change slightly, but all remain significant, in the same direction, and with nearly the same magnitude predict pred2 scatter pred2 pred_val * 4.12 No: the predicted values including and excluding Alaska are very nearly the same qnorm rstand scatter pred2 pred_val graph export graph21.eps, replace qnorm rstand graph export graph22.eps, replace * 4.13 Yes, the residuals appear to be normally distributed swilk rstand * 4.14 Yes, there is no evidence against the null hypothesis of a normal distribution