Learning Objectives
In this lab, we will learn:
- how to source data from The Human Mortality Database (HMD) website.
- how to build period life tables.
- how to visualise parts of life tables using ggplot2 package.
To start with, install and load the packages required for this Lab. While the package "HMDHFDplus"
will help to source the data from the Human Mortality Database (HMD) website, the packages "dplyr"
, "tidyr"
and "haven"
will help to manipulate and clean our data. The package "ggplot2"
is for visualisations.
install.packages("HMDHFDplus")
install.packages("dplyr")
install.packages("tidyr")
install.packages("haven")
install.packages("ggplot2")
library("HMDHFDplus")
library("dplyr")
library("tidyr")
library("haven")
library("ggplot2")
We are going to build a period life table for Swedish females between 2010 and 2014. To source data from the HMD, you need to obtain a username
and a password
. Go to the HMD website and click on New user at the top left of the screen. Then, click New User again to read and accept the User Agreement. After this, you will be asked to provide some details, including an email, your name, your institution and your role. When you finish, click on Enter and you will receive a confirmation of your registration in the email that you included previously. Once you get your username
and password
, include them in the following code where correspond:
temp <- readHMDweb(CNTRY = "SWE",
item = "fltper_5x5",
username = "username@email.com",
password = "password",
fixup = FALSE)
The function readHMDweb()
comes from the package HMDHFSplus
. Besides your username
and password
, this function uses other arguments, which you need to define. For our purposes, we specify:
CNTRY = "SWE"
, which refers to Sweden. The list of the available countries and series are on the HMD website.item = "fltper_5x5"
, which corresponds to the period life table for females by 5-year age groups and 5-year intervals.fixup = FALSE
to made the data more user friendly.Before building the period life table for Swedish females between 2010 and 2014, you need to clean the data using the following code:
my_data <- temp %>%
filter(Year=="2010-2014") %>%
select(Year, Age, mx, ax) %>%
mutate(Age = as_factor(Age))
With the code, you filter
the rows of the dataset by the required 5-year interval and select
the columns corresponding to Year
, Age
, and the specific mortality rate of each 5-year age group mx
. We also select the column ax
, which refers to the average number of years lived for those who die in each age group. For most of the age groups, it is sensible to assume that deaths evenly spread over each interval of age and, therefore, to assume the value 2.5 for ax
. However, this value changes for the youngest and the oldest ages and they are taken from the HMD database.
The function mutate
helps to convert Age
as a factor. This will be used to get sensible plots later. The pipe operator %>%
indicates R to take the result of the respective line and to apply the following function on it.
For building the period life table for Swedish females between 2010 and 2014, use the next chunk of code, where:
n
is the number of years within each age interval.qx
shows the probability of dying in each specific-age interval.px
indicates the probability of surviving in each specific-age interval.lx
corresponds to the number of people alive after x years.dx
denotes the number of deaths between x and x+n.Lx
is the number of person years lived between x and x+n.Tx
refers to the total number of person years left to live after age x.ex
indicates the life expectancy at age x.perLT_SE <- my_data %>%
mutate(n = case_when(
Age=="0" ~ 1,
Age=="1-4" ~ 4,
TRUE ~ 5),
qx = n*mx/(1+(n-ax)*mx),
px = 1 - qx,
lx = lag(cumprod(px), default=1),
dx = lx - lead(lx, default = 0),
Lx = n * lead(lx, default = 0) + (ax* dx),
Lx = ifelse(Age=="110+",lx/mx,Lx),
Tx = rev(cumsum(rev(Lx))),
ex = Tx / lx
)
Use the function ggplot
from the package ggplot2
to visualise different parts of the period life table for Swedish females between 2010 and 2014 that you have just created.
qx
and px
To visualise the probability of dying qx
and the probability of surviving px
at the same time, you need to collapse these columns, for which you require the function gather
as the next code presents:
perLT_SE %>%
gather(Probability, Value, qx:px) %>%
ggplot(aes(x = Age, y = Value, group = Probability, color = Probability)) +
geom_line(size = 2) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
After running the previous code, you should get the next figure:
dx
For visualising the number of deaths between x and x+n (dx
), the code is similar to the previous lines, but you need to change the argument y
as follows:
perLT_SE %>%
ggplot(aes(x = Age, y = dx, group = 1)) +
geom_line(size = 2) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
ex
One of the most important outcomes of a life table is the column related to the life expectancy. Use the next code to plot ex
:
perLT_SE %>%
ggplot(aes(x = Age, y = ex, group = 1)) +
geom_line(size = 2) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
qx
, px
, dx
, Lx
, Tx
, and ex
.