global basedir http://personalpages.manchester.ac.uk/staff/mark.lunt global datadir $basedir/stats/7_binary/data use $datadir/epicourse, clear tab hip_p sex, co * 1.1 Prevalence is 9.84% in men, 15.23% in women tab hip_p sex, co chi2 * 1.2 The difference in prevalence between men and women is very significant cs hip_p sex, or * 1.3 Confidence interval is (1.37, 1.97) * 1.4 The odds ratio and the relative risk are very similar * 1.5 Yes, the confidence interval does not contain 0, which is the null hypothesis risk difference logistic hip_p sex * 1.6 The odds ratio is exactly the same as that produced by cs * 1.7 The confidence intervals are the same to 3 decimal places (the methods used to calculate them differ, but generally give very similar results) egen agegp = cut(age), at(0 30(10)100) label define age 0 "<30" 30 "30-39" 40 "40-49" 50 "50-59" label define age 60 "60-69" 70 "70-79" 80 "80-89" 90 "90+", modify label values agegp age tab agegp hip_p, chi2 * 2.1 Yes: chi2 is very significant logistic hip_p age sex * 2.2 Yes: p = 0.000 * 2.3 Odds of hip pain increase by 1.03 for each year increase in age logistic hip_p i.sex##c.age * 2.4 No: the interaction term i.sex#c.age is not significant (p=0.118) logistic hip_p sex i.agegp * 2.5 Odds for a man aged 50-60 are 7.74 times the odds for a man aged less than 30 logistic hip_p age sex estat gof * 3.1 Yes. However, this is not really appropriate, since there are so many covariate patterns. It would be better to use only 10 groups estat gof, group(10) * 3.1 In this case, there is evidence that the predicted and observed values differ more than can be explained by random variation lroc graph export graph1.eps, replace logistic hip_p i.agegp sex estat gof estat gof, group(10) * 3.3 Yes, this model is adequate lroc graph export graph2.eps, replace gen age2= age*age logistic hip_p age age2 sex estat gof, group(10) table * 3.5 Yes, the coefficient for age2 is highly significant, and there is * no longer evidence of lack of fit. lroc graph export graph3.eps, replace * 3.6 The area under the curve with this model is similar to that use age * as a categorical predictor. predict p predict db, dbeta scatter db p graph export graph4.eps, replace * 4.1 No, there are no points that are obvious outliers * However, there are 4 points that may be worth checking predict d, ddeviance scatter d p graph export graph5.eps, replace * 4.2 Again, there is no evidence of any outliers scatter p age graph export graph6.eps, replace * 4.3 the two lines are the prevalences in men and women graph twoway scatter p age || lowess hip_p age if sex == 1 || lowess hip_p age if sex == 0 graph export graph7.eps, replace * 4.4 the fit is good for men, but fits poorly to women over 80 * The quadratic model is reasonable for men, not women use $datadir/chd, clear sort agegrp by agegrp: egen agemean = mean(age) by agegrp: egen chdprop = mean(chd) label var agemean "Mean age" label var chdprop "Proportion of subjects with CHD" scatter chdprop agemean graph export graph8.eps, replace logistic chd age * 5.1 Odds ratio is about 1.12 per year predict p predict db, dbeta predict d, ddeviance scatter db p graph export graph9.eps, replace * 5.3 Yes, there is one influential point, with db ~ 0.25 summ db, detail logistic chd age if db < 0.2 * 5.4 The effect on the odds ratio is small: a very slight increase to 1.13