Analytical Epidemiology II

# Analytical Epidemiology II
## Statistical Inference
### David Selby
### 30<sup>th</sup> March 2021

---

layout: true
background-image: url(cfe-logo.jpg)
background-position: 97% 97%
background-size: 70px

.purple { color: #644BA5; }
.pink { color: #EB64A0; }
.blue { color: #0073BE; }
.green { color: #37A53C; }
.yellow { color: #FAB900; }
.red { color: #E61E32; }
    
ul, ol {
  padding-left: 0;
}
ol li {
  margin-top: .5em;
}
ul ul, dl, ol ul {
  padding-left: 1em;
}
code {
  color: #644BA5;
}

.remark-slide-number {
  position: inherit;
}

.remark-slide-number .progress-bar-container {
  position: absolute;
  bottom: 0;
  height: 4px;
  display: block;
  left: 0;
  right: 0;
}

.remark-slide-number .progress-bar {
  height: 100%;
  background-color: #0073BE;
}

</style>

---

## Statistical inference

1. Sampling
    - sample mean estimator
    - normal quantiles
2. Confidence intervals
    - non-normal probability distributions
3. Null hypothesis significance testing
4. Examples

???

Questions or comments about last week's worksheet

---

## Principles of sampling

- Often, it is not practical to measure every subject in a population.
- A reduced number of subjects—a .red[**sample**]—is measured instead.
  - cheaper
  - quicker
  - can be more thorough
- Sample needs to be chosen in such a way as to be representative of the __target population__
  - e.g. uniform random sampling

---

## Estimating from random samples

Suppose we have a population in which a variable `$X$` has an (unknown) mean `$\mu$` and standard deviation `$\sigma$`.

We want to estimate `$\mu$` from a random sample of size `$n$`:

- sample mean `$\bar x$` should be close to the population mean `$\mu$`
- in every sample the sample mean `$\bar x$` will vary slightly

The variation in `$\bar x$` depends on

- variance in the population, `$\sigma^2$`
- random variation due to sample size, `$n$`

---

### Example: population

Consider a population comprising `$1000$` copies of each of the digits `$0, 1, \dots, 9$`.

```r
uniform <- data.frame(digit = rep(0:9, each = 1000))
```

---

### Example: samples

1. Sample `$n = 5, 25, 100$` digits from the population
2. Draw 2000 uniform random samples of each size
3. Compute sample mean `$\bar x = \frac1n\sum x_i$` of each sample
4. Visualise distribution of sample means

---

### Properties of `$\bar X$` estimator

- `$\mathbb{E}(\bar X) = \mu$`: the average sample mean is equal to the true population mean (*unbiased*)
- `$\operatorname{Var}(\bar X) = \frac{\sigma^2}n$`: uncertainty increases with `$\sigma$`, decreases with `$n$` (*consistent*)
- `$\bar X$` becomes normally distributed as `$n$` increases (*Central Limit Theorem*)

For large `$n$`,
`$$\bar X \sim \text{Normal}\Bigl(\mu, \frac{\sigma^2}{n}\Bigr).$$`

---

## Standard error

- The standard deviation of the .blue[sampling distribution] of a statistic
- The sample distribution: the distribution of a statistic as sampling is repeated
- All statistics have sampling distributions
- Statistical inference is based on the standard error

e.g. `$$\text{s.e.}(\bar X) = \frac{\sigma}{\sqrt n}$$`

---

### Example: sampling distribution of `$\bar X$`

Population mean `$\mu = 4.5$`  
Population s.d. `$\sigma = 2.87$`

| `$n$`| `$\mu$`| `$\frac{\sigma}{\sqrt{n}}$`| `$\text{mean}(\bar x)$`| `$\text{sd}(\bar x)$`|
|---:|-----:|-------------------------:|---------------------:|-------------------:|
|   5|   4.5|                      1.28|                   4.5|                1.26|
|  25|   4.5|                      0.57|                   4.5|                0.57|
| 100|   4.5|                      0.29|                   4.5|                0.29|

- average sample mean very close to `$\mu$`
- standard deviation very close to `$\frac{\sigma}{\sqrt{n}}$`

---

## Normal quantiles

`$$X \sim \text{Normal}(\mu, \sigma^2)$$`

---

## Normal quantiles

`$$X \sim \text{Normal}(\mu, \sigma^2)$$`

???

One standard deviation away from the mean.

---

## Normal quantiles

`$$X \sim \text{Normal}(\mu, \sigma^2)$$`
???

Two standard deviations away from the mean.

Remember this, as it'll come up again.

---

## Normal quantiles

`$$X \sim \text{Normal}(\mu, \sigma^2)$$`

???

Three standard deviations away from the mean.

The normal distribution is unbounded, so to get (exactly) 100% coverage you'd need to cover an infinite distance in both directions.

---

## Normal quantiles

What intervals would cover

1. 50%
2. 75%
3. 95%

of a normal distribution with mean `$\mu$` and variance `$\sigma^2$`?

???

From the graphs on the previous slides, we could hazard a guess that the answer to (3) is approximately `$2 \sigma$`.

---

## Normal quantiles

Looking up the quantiles in R:

```r
qnorm(c(.75, .875, .975))
```

```
[1] 0.67 1.15 1.96
```

So, by symmetry of the normal distribution:

1. 50%: `$(\mu - 0.67 \sigma, \mu + 0.67 \sigma)$`

2. 75%: `$(\mu - 1.15 \sigma, \mu + 1.15 \sigma)$`

3. 95%: `$(\mu - 1.96 \sigma, \mu + 1.96 \sigma)$`

???

Here we are using the fact that you can subtract any normal distribution's mean and divide by its standard deviation to get a standard normal:

`$$X \sim N(\mu, \sigma^2)$$`

then

`$$Z = \frac{X - \mu}{\sigma} \sim N(0, 1)$$`

and conversely, `$X = \sigma Z + \mu$`.

---

# Confidence intervals

---

layout: true
background-image: url(cfe-logo.jpg)
background-position: 97% 97%
background-size: 70px

---

## Confidence intervals

- In frequentist statistics, the population mean `$\mu$` is assumed to be a fixed, unknown value.

- `$\bar X \sim \text{Normal}(\mu, \frac{\sigma^2}{n})$` &nbsp; .small[for sufficiently large *n*]

So if we could sample repeatedly from the population, 95% of sample means would fall in the range `$\mu \pm 1.96 \frac{\sigma}{\sqrt{n}}$`.

Therefore `$\mu$` lies within 95% of intervals `$$\bigl(\bar x - 1.96 \frac{\sigma}{\sqrt{n}}, \bar x + 1.96 \frac{\sigma}{\sqrt{n}} \bigr).$$`

This is the .blue[**95% confidence interval**] for the mean.

---

### Example: confidence intervals

From our 2000 samples of `$n$` digits, generate confidence intervals for the mean:

`$$\Bigl(\bar x - 1.96 \times \frac{2.87}{\sqrt{n}}, \: \bar x + 1.96 \times \frac{2.87}{\sqrt{n}}\Bigr)$$`

How many of them contain true population mean `$\mu = 4.5$`?

| `$n$`| CIs containing `$\mu$`|  %|
|---:|--------------------:|--:|
|   5|                 1924| 96|
|  25|                 1902| 95|
| 100|                 1899| 95|

---

### Example: confidence intervals

From our 2000 samples of `$n$` digits, generate confidence intervals for the mean:

`$$\Bigl(\bar x - 1.96 \times \frac{2.87}{\sqrt{n}}, \: \bar x + 1.96 \times \frac{2.87}{\sqrt{n}}\Bigr)$$`

---

### Example: primary biliary cirrhosis

In 216 patients with primary biliary cirrhosis, serum albumin had a mean value of `$\bar x = 34.46~g/l$`.
The population standard deviation is known to be `$\sigma = 5.84~g/l$`.

What is the 95% confidence interval for the mean serum albumin level?

1. The standard error is `$\sigma/\sqrt{n} = 5.84 / \sqrt{216} = 0.397$`.

2. The 95% confidence interval for the mean is given by
`$$\bar x \pm 1.96 \sigma/\sqrt{n}
= 34.46 \pm 1.96 \times 0.397$$`

3. Hence, mean serum albumin in the population is *probably* between `$33.68~g/l$` and `$35.24~g/l$`.

---

### Example: primary biliary cirrhosis

---

## Other sampling distributions

So far we've seen the sampling distribution of the sample mean `$\bar X$` where the population variance is known.

But there are other data types to consider.

1. Means with *unknown* variance
2. Comparing two sample means
3. Counts and proportions data
4. Contingency tables

---

## Other sampling distributions

If statistic `$T$` has mean `$\mu$` and standard deviation `$\sigma$`, then `$\frac{T-\mu}{\sigma}$` has distribution `$\text{Normal}(0, 1)$`.

But, in practice we *rarely* know the population variance; only the estimate
`$$s = \sqrt{\sum_{i=1}^n \frac{(x_i - \bar x)^2}{n-1}}.$$`

What is the distribution of `$\frac{T - \mu}{s}$` ?

---

### Student's `$t$`-distribution

`$\dfrac{T - \mu}{s}$` has a ***t*-distribution** with `$n-1$` degrees of freedom.

- A `$t$`-distribution looks like a normal distribution, but with *fatter tails*, allowing for more extreme values.
- At small sample sizes where `$\sigma$` is unknown, a *normal* 95% confidence interval will fail to cover `$\mu$`
- As sample size increases, `$t$`-distribution approximates a normal

.footnote[W. S. Gossett, chief brewer at Guinness, developed the *t*-distribution for analysing small sample sizes of barley, under pseudonym &lsquo;Student&rsquo;]

---

---

### Student's `$t$` distribution

If variance is unknown, simply plugging in `$s$` for `$\sigma$` will result in confidence intervals that are too narrow!

2000 &times; 95% confidence intervals on sample digits data:

| `$n$`| Contains `$\mu$`|  %|
|---:|--------------:|--:|
|   5|           1743| 87|
|  25|           1875| 94|
| 100|           1893| 95|
]

| `$n$`| Contains `$\mu$`|  %|
|---:|--------------:|--:|
|   5|           1871| 94|
|  25|           1899| 95|
| 100|           1894| 95|
]

---

### Proportional data

Binary counts follow a **binomial distribution** with success probability `$p$` and number of trials `$n$`.

When `$n$` is large, we use approximation `$\text{Normal}(np, np(1-p))$`, and so proportions are distributed
`$$\text{Normal}\Bigl(p, \frac{p(1-p)}{n}\Bigr).$$`
--

If `$x$` is the number of successes (outcomes), a 95% confidence interval is

`$$\frac{x}{n} \pm 1.96 \times \sqrt{\frac{\frac{x}n(1-\frac{x}n)}{n}}.$$`

---

### Proportional data: example

100 subjects each receive two analgesics, `$A$` and `$B$`, for 1 week, in random order.
They then state a preference for one drug: 65 prefer `$A$` and the remaining 35 prefer `$B$`.

Calculate a 95% confidence interval for the proportion preferring `$A$`.

`$$\frac{65}{100} \pm 1.96 \times \sqrt{\frac{\frac{65}{100}\cdot\frac{35}{100}}{100}} = (0.56, 0.74)$$`

So we estimate between 56% and 74% of people tend to prefer analgesic `$A$`.

???

Equivalently, the underlying probability of preferring `$A$` is likely to fall between 0.56 and 0.74, with 95% confidence.

---

### Proportional data: example

```r
binom.test(65, 100)
```

```

Exact binomial test

data:  65 and 100
number of successes = 65, number of trials = 100, p-value = 0.004
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.55 0.74
sample estimates:
probability of success 
                  0.65 
```

---

### Proportional data: example

```r
prop.test(65, 100)
```

```

1-sample proportions test with continuity correction

data:  65 out of 100, null probability 0.5
X-squared = 8, df = 1, p-value = 0.004
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.55 0.74
sample estimates:
   p 
0.65 
```

---

### Proportional data: example

---

### Normal approximation to binomial

---

### The `$\chi^2$` distribution

The sum of squares of `$k$` normally-distributed random variables has a `$\chi^2(k)$` distribution.
Useful as the sampling distribution for

- sample variance, `$S^2 = \frac1{n-1}\sum_{i} (X_i - \bar X)^2$`
- comparing contingency tables, `$X^2 = \sum_i \frac{(O_i-E_i)^2}{E_i}$`

---

# Null hypothesis significance testing

---

layout: true
background-image: url(cfe-logo.jpg)
background-position: 97% 97%
background-size: 70px

---

### Null hypothesis significance testing

Don't do it!

---

### Null hypothesis significance testing

We can never say what the population parameter `$\mu$` is without measuring .blue[*entire*] population.

Confidence intervals suggest the population parameter *probably* lies within a particular range

A hypothesis test suggests the population parameter is probably .red[*not*] a particular value.

In practice: is a posited value inside our `$(1-\alpha)\%$` confidence interval or not?

---

### Null hypothesis significance testing

1. Start with an interesting research hypothesis: e.g. &lsquo;Drug A lowers blood pressure more than Drug B&rsquo;.

2. Form the opposite (.blue[null]) hypothesis, i.e. &lsquo;Drugs A and B have the same effect on blood pressure&rsquo;.

3. Assess the weight of evidence .red[*against*] the .blue[*null*] hypothesis

4. If there is sufficient evidence against the null hypothesis, it can be rejected.

???

If we can show that, under various assumptions, the null hypothesis is an implausible explanation of our data, then we *reject* the null hypothesis, and try to use this to support the alternative hypothesis, `$H_1$`, that therefore Drugs A and B have different effects on blood pressure.

---

### Null and alternative hypothesis

#### The null hypothesis

- Simplest acceptable model, denoted `$H_0$`
- If the null hypothesis is true, the world is uninteresting
- Must be expressible numerically (via a .blue[test statistic], `$T$`)
- sampling distribution of test statistic must be known

#### The alternative hypothesis

- Alternative model, denoted `$H_1$`
- `$¬H_0 \implies H_1$`
- Any possibility not covered by null hypothesis

???

The thinking goes: if `$H_0$` is an implausible explanation for `$T$`, then `$H_1$` must, by default, be a better explanation.

---

### Example: rat weights data

.small[Data are uterine weights (in mg) for a sample of **20 rats**. Previous work suggests that the mean uterine weight for the stock from which the sample was drawn was **24mg**. Does this sample confirm that?]

```r
x <- c(9, 14, 15, 15, 16, 18, 18, 19, 19, 20,
       21, 22, 22, 24, 24, 26, 27, 29, 30, 32)
```

```r
mean(x)
```

```
[1] 21
```

```r
sd(x)
```

```
[1] 5.912
```

???

Null hypothesis: `$H_0: \mu = 24$`

Alternative hypothesis: `$H_1: \mu \neq 24$`

---

### Example: rat weights data

Test statistic: `$\frac{\bar X - \mu}{s/\sqrt{n}} \sim t_{n-1}$`

```r
mean(x) + qt(c(.025, .975), 19) * sd(x) / sqrt(20)
```

```
[1] 18.23 23.77
```

95% confidence interval for `$\bar X$` .red[does not contain] `$\mu = 24$`:

Reject `$H_0$` at the 5% level. *Implausible* that sample data were drawn from population with mean 24 mg.

???

The test statistic `$T$` here is really the *difference* between the sample mean and the proposed population mean.
The mean of this difference would be zero.
But you can rearrange the formula to give a confidence interval for the sample mean (rather than the mean difference), which has a nicer interpretation.

---

### Example: rat weights data

Test statistic: `$\frac{\bar X - \mu}{s/\sqrt{n}} \sim t_{n-1}$`

```r
(mean(x) - 24) + qt(c(.025, .975), 19) * sd(x) / sqrt(20)
```

```
[1] -5.7667 -0.2333
```

95% confidence interval for `$(T - \mu)$` .red[does not contain] 0:

Reject `$H_0$` at the 5% level. *Implausible* that sample data were drawn from population with mean 24 mg.

???

If the population (stock) really had a mean uterine weight of 24 mg and we took repeated random samples from it, then less than 5 samples in 100 would have a sample mean that was this much smaller (or greater) than 24 mg.

---

### Example: rat weights data

```r
t.test(x - 24)
```

```

One Sample t-test

data:  x - 24
t = -2.3, df = 19, p-value = 0.04
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -5.7667 -0.2333
sample estimates:
mean of x 
       -3 
```

---

### Example: rat weights data

```r
model <- lm(x ~ 1)
confint(model)
```

```
            2.5 % 97.5 %
(Intercept) 18.23  23.77
```

```r
confint(model) - 24
```

```
             2.5 %  97.5 %
(Intercept) -5.767 -0.2333
```

---

### Example: rat weights data

???

It's outside the confidence interval, yes, but do we care about a difference of 3 mg? This is one of the reasons why showing a confidence interval can be more useful than a `$p$`-value.

---

### Unpaired, two-sample `$t$`-test

For comparing two sample means.

If we have two samples, `$x$` of size `$n_x$` and `$y$` of size `$n_y$`,
`$$\begin{aligned}
H_0: \mu_x &= \mu_y \iff \mu_x - \mu_y = 0 \\
H_1: \mu_x &\neq \mu_y \iff \mu_x - \mu_y \neq 0
\end{aligned}$$`

Test statistic `$$T = \frac{\bar x - \bar y}{\operatorname{se}(\bar x - \bar y)} \sim t_{n_x + n_y - 2},$$`
Assume `$\sigma_x \neq \sigma_y$`, then
`$\operatorname{se}(\bar x - \bar y) = \sqrt{\frac{s_x^2}{n_x} + \frac{s_y^2}{n_y}}.$`

???

The rats data compared one sample to a hypothetical population mean value.

Another common situation is to compare the means of two (independent) samples: are they drawn from a population with a common mean?

Equivalently: are they drawn from two populations with the same mean?

---

### Example: comparing heights

Is the mean height (in cm) the same for male and female patients in the `htwt` dataset?

???

Visualise first!
A good safeguard.

---

### Example: comparing heights

In R, use the `t.test` function:

```r
t.test(nurseht ~ sex, data = htwt)
```

```

Welch Two Sample t-test

data:  nurseht by sex
t = -20, df = 359, p-value <2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -14.51 -11.86
sample estimates:
mean in group female   mean in group male 
               159.8                173.0 
```

---

### Example: comparing heights

Or fit a linear model and look at the fitted parameters:

```r
model <- lm(nurseht ~ sex, data = htwt)
confint(model)
```

```
             2.5 % 97.5 %
(Intercept) 158.91 160.64
sexmale      11.87  14.49
```

---

# `$p$`-values

---

layout: true
background-image: url(cfe-logo.jpg)
background-position: 97% 97%
background-size: 70px

---

## `$p$`-values

A `$p$`-value describes the probability of your test statistic (or a more extreme value) under the null hypothesis.

It's the area under the null curve *equal to or more extreme* than the sample mean.

Conventionally,

- `$p < 0.05 \Rightarrow$` &lsquo;reject the null hypothesis&rsquo;
- &lsquo;effect is statistically significant at 5% level&rsquo;

Small `$p$`-value supposedly means more evidence (but conflates effect size with sample size)

???

Reject the null hypothesis *at the 5% level*.

More generally, if `$p < \alpha$` then we reject at the `$\alpha$` significance level.

---

## Type I and Type II errors

There are two ways of getting it wrong.

1. The null hypothesis is true but we conclude it isn't.
2. The null hypothesis is false but we fail to reject it.

.small[
- `$H_0$` is true but we reject it
- False positive: `$p < \alpha$`
- Significance level of CI or test
- Increases with *multiple testing*
]]
.pull-right[### Type II error, `$\beta$`

.small[
- `$H_0$` is false but we don't reject it
- False negative: `$p > \alpha$`
- Depends on study + effect size
- .blue[Power] = `$1 - \beta$`
]]

???

Testing several hypotheses at once increases the probability that at least one of them will be incorrectly found to be statistically significant.

Several corrections are available for “Multiple Testing”, Bonferroni’s is the most commonly used, easiest and least accurate.

---

### `$p$`-values: digits data example

Observe `$\bar x = 4.64, s = 3.04, n = 25$`

Test the null hypothesis `$H_0: \mu = 3.0$`  
vs. alternative hypothesis `$H_1: \mu \neq 3.0$`

???

The `$p$`-value is a measure of how far into the _critical region_ the sample mean is, under the null distribution.

---

### `$p$`-values: digits data example

Observe `$\bar x = 4.64, s = 3.04, n = 25$`

Test the null hypothesis `$H_0: \mu = 3.0$`  
vs. alternative hypothesis `$H_1: \mu \neq 3.0$`

???

The `$p$`-value is a measure of how far outside the confidence interval the null value is.

---

### `$p$`-values: digits data example

Observe `$\bar x = 4.64, s = 3.04, n = 25$`

Test the null hypothesis `$H_0: \mu = 3.0$`  
vs. alternative hypothesis `$H_1: \mu \neq 3.0$`

```r
qt(c(.025, .975), 24) * (3.04 / 5) + 4.64 # does not contain 3.0
```

```
[1] 3.385 5.895
```

```r
pt((3 - 4.64) / (3.04 / 5), 24) * 2 # p-value < 0.05
```

```
[1] 0.01258
```

???

This is how you'd look up the quantiles or distribution function in R.
But in practice you'd get the confidence interval (or `$p$`-value) from a linear model.

---

### `$p$`-values: digits data example 2

Observe `$\bar x = 2.8, s = 3.04, n = 25$`

Test the null hypothesis `$H_0: \mu = 3.0$`  
vs. alternative hypothesis `$H_1: \mu \neq 3.0$`

---

### `$p$`-values: digits data example 2

Observe `$\bar x = 2.8, s = 3.04, n = 25$`

Test the null hypothesis `$H_0: \mu = 3.0$`  
vs. alternative hypothesis `$H_1: \mu \neq 3.0$`

```r
qt(c(.025, .975), 24) * (3.04 / 5) + 2.8 # contains 3.0
```

```
[1] 1.545 4.055
```

```r
pt((2.8 - 3) / (3.04 / 5), 24) * 2 # p-value > 0.05
```

```
[1] 0.745
```

---

### One and two sided tests

<dl>
<dt>Two-sided test</dt>
<dd>Probability of obtaining a value <em>as extreme as</em> the sample statistic, under the null distribution</dd>
<dt>One-sided test</dt>
<dd>Probability of obtaining a value <em>greater (less) than</em> the sample statistic, under the null distribution</dd>
</dt>

---

### Pearson's `$\chi^2$` test

Alternative to normal approximation for proportional data.

1. Calculated the values .blue[*expected*] in table, if outcomes were independent of exposure.
2. Compare with observed values.

For a `$2 \times 2$` contingency table,

`$$X^2 = \sum_{i=1} \frac{(O_{i} - E_{i}^2)}{E_{i}} \sim \chi^2(n-1)$$`

---

### Pearson's `$\chi^2$` test

Back pain data by sex (`epicourse` dataset)

`$H_0: p_m = p_f$`, probability of back pain is independent of sex  
`$H_1: p_m \neq p_f$`, different probabilities of back pain

|   |   no| yes|
|:--|----:|---:|
|F  | 1694| 637|
|M  | 1739| 445|
]
.pull-right[
#### Expected

|   |   no|   yes|
|:--|----:|-----:|
|F  | 1772| 558.6|
|M  | 1661| 523.4|
]

---

### Pearson's `$\chi^2$` test

```r
Xsq <- sum((observed - expected)^2 / expected)
```

```
[1] 29.91
```

```r
qchisq(.95, df = 1) # does not contain 30
```

```
[1] 3.841
```

???

You can confirm this with `chisq.test` and/or `prop.test`.

Key idea here: what information does the `$p$`-value convey that isn't better shown with either:

a. comparing the `$X^2$` statistic with the critical region,
b. drawing a confidence interval around the estimated (difference in) proportion?

---

*Optional*

# Epistemology and Reproducibility

---

layout: true
background-image: url(cfe-logo.jpg)
background-position: 97% 97%
background-size: 70px

---

---

## Intro to Epidemiology

---

## Intro to Epi.purple[stemology]

Much of published research is .red[*not reproducible*].

1. Poor reporting (not fully describing methods and code)
2. The &lsquo;.blue[*file drawer problem*]&rsquo;
3. Null hypothesis significance testing *itself*

> `$p$` is the probability of &ge; observed value, given `$H_0$` is true, .red[and that there is no source of bias in data collection, measurement or analytical process].sup[*]

...even when using intervals (not just `$p$`-values) for testing

???

> "the migration from significance testing to interval estimation in the way that results are presented has come with a systematic and erroneous judgment in the way that results are interpreted. Because the interval estimation migration was not accompanied by a simultaneous abandonment of null hypothesis significance testing, so that results excluding the null from their confidence intervals continue to receive the vast majority of the attention, the unintended consequence has been the erroneous belief that the point estimate is an unbiased estimate of the truth. It is not."
>
> https://doi.org/10.1093/aje/kwx261

---

## Intro to Epi.purple[stemology]

**Popper**: we can never *justify* theories, only *refute* them  
**Fisher**: you cannot prove `$H_0$` but you can (almost) refute it.

**Gelman** (and others): the null hypothesis is a .blue[*straw man*]

???

The opposite of **reject** is not **accept** but rather **do not reject**.
Failing to reject the null hypothesis is not evidence in favour of the null hypothesis.

[Andrew Gelman argues](https://statmodeling.stat.columbia.edu/2014/09/05/confirmationist-falsificationist-paradigms-science/) that NHST is not really *falsificationist* in the Popperian sense, because the researcher is not trying to falsify their actual hypothesis, but the null.
Rather, by seeking to falsify the null hypothesis, they are seeking to confirm their own hypothesis, which is the alternative.
This is *confirmationist*, not *falsificationist*.

---

## Alternatives to NHST

> Discard the current culture of null hypothesis significance testing and its detrimental influences on study planning, data analysis, presentation of results, and inference.
>
> Focus on estimation, which requires designing studies that yield a precise estimate of association, quantitatively adjusting point estimates and uncertainty intervals for systematic sources of error and quantitatively incorporating prior information..sup[*]

???

This quotation by the editor-in-chief of *Epidemiology*, a journal in which publishing `$p$`-values has been banned for over 20 years.

https://doi.org/10.1097/00001648-199801000-00004

---

# The End