global basedir http://personalpages.manchester.ac.uk/staff/mark.lunt global datadir $basedir/stats/6_LinearModels2/data sysuse auto, clear regress weight foreign * 1.1 foreign vehicles are, on average, 1000 lbs lighter than US vehicles * The difference is significant, p = 0.000 xi: regress weight i.foreign * 1.2 This makes no difference at all ttest weight, by(foreign) * 1.3 the mean difference and standard error are exactly the same * (except for the minus sign) graph box weight, by(foreign) graph export graph1.eps, replace * 1.4 There is a wider spread of weights for Domestic cars compared to Foreign cars, i.e. greater variance by foreign: summ weight * 1.5 the SD is much higher for Domestic (~700) compared to Foreign (~430) hettest * 1.6 The difference in variance is significant. Therefore, a linear model is inappropriate use $datadir/soap, clear graph box appearance, by(operator) graph export graph2.eps, replace * 1.7 Operator 3 has the highest scores: 25% of scores are above 9 sort operator by operator: summ appearance xi: regress appearance i.operator * 1.9 Yes: Prob > F = 0.0000 is testing the null hypothesis that all operators are the same. * 1.10 p= 0.0000 * 1.11 Operator 1 is the baseline: _Ioperator_1 omitted lincom _cons + _Ioperator_2 * 1.12 This is the same as we have already seen lincom _Ioperator_2 - _Ioperator_3 * 1.13 Yes: t = -6.04, p= 0.000 use $datadir/cadmium, clear scatter capacity age graph export graph3.eps, replace regress capacity age * 2.2 The regression coefficient for age is negative, showing that capacity decreases as age increases. gen cap1 = capacity if exposure == 1 gen cap2 = capacity if exposure == 2 gen cap3 = capacity if exposure == 3 scatter cap1 cap2 cap3 age graph export graph4.eps, replace xi: regress capacity i.exposure * 2.3 Its borderline, p = 0.09 xi: regress capacity age i.exposure testparm _I* * 2.4 There are now no significant differences between groups predict ppred, xb gen ppred1 = ppred if exposure == 1 gen ppred2 = ppred if exposure == 2 gen ppred3 = ppred if exposure == 3 scatter cap1 cap2 cap3 age || line ppred1 age || line ppred2 age || /* */ line ppred3 age graph export graph5.eps, replace xi: regress capacity i.exposure*age testparm _IexpX* * 2.5 Yes, the slopes in the different exposure groups are different predict ipred, xb gen ipred1 = ipred if exposure == 1 gen ipred2 = ipred if exposure == 2 gen ipred3 = ipred if exposure == 3 scatter cap1 cap2 cap3 age || line ipred1 age || line ipred2 age || /* */ line ipred3 age graph export graph6.eps, replace * 2.6 The least steep is in the baseline (least exposed group) * The steepest is in the most exposed group lincom age + _IexpXage_3 use $datadir/hald, clear sw regress y x1 x2 x3 x4, pe(0.05) * 3.1 x1 & x4 are retained sw regress y x1 x2 x3 x4, pr(0.05) * 3.2 This time x1 & x2 are retained sw regress y x1 x2 x3 x4, pe(0.05) pr(0.0500005) * 3.3 This is the same as the backwards model corr x* * 3.4 Correlation between x2 & x4 is -0.97 * 3.5 x2 & x4 are very strongly correlated: they contain the same information, so they are largely interchangeable regress y x1 x2 x3 x4 * 3.6 The F statistic says that the model is very highly significant: the null hypothesis that all coefficients are 0 could not have given rise to this data * 3.7 98% of the variance is explained * 3.8 None of the coefficients are significant, due to the strong correlations between them use $datadir/growth, clear scatter weight week graph export graph7.eps, replace * 4.1 The line does not look quite straight: there appears to be some curvature regress weight week cprplot week * 4.2 There is definitely curvature around the line gen week2 = week * week regress weight week week2 * 4.3 week2 is very highly significant (p = 0.000) predict pred2, xb twoway scatter weight week || line pred2 week graph export graph8.eps, replace * 4.4 Curved predictor fits the data very well gen week3 = week2*week regress weight week week2 week3 * 4.5 week3 is not significant corr week* * 4.6 Correlation between week and week2 is 0.97