Learning Objectives
In this lab, we will learn:
- how to source data from The Human Mortality Database (HMD) website.
- how to build period life tables.
- how to visualise parts of life tables using ggplot2 package.

To start with, install and load the packages required for this Lab. While the package "HMDHFDplus" will help to source the data from the Human Mortality Database (HMD) website, the packages "dplyr", "tidyr" and "haven" will help to manipulate and clean our data. The package "ggplot2" is for visualisations.

install.packages("HMDHFDplus")
install.packages("dplyr")
install.packages("tidyr")
install.packages("haven")
install.packages("ggplot2")
library("HMDHFDplus")
library("dplyr")
library("tidyr")
library("haven")
library("ggplot2")

1 Sourcing data from the HMD website

We are going to build a period life table for Swedish females between 2010 and 2014. To source data from the HMD, you need to obtain a username and a password. Go to the HMD website and click on New user at the top left of the screen. Then, click New User again to read and accept the User Agreement. After this, you will be asked to provide some details, including an email, your name, your institution and your role. When you finish, click on Enter and you will receive a confirmation of your registration in the email that you included previously. Once you get your username and password, include them in the following code where correspond:

temp <- readHMDweb(CNTRY = "SWE", 
                  item = "fltper_5x5",
                  username = "username@email.com",
                  password = "password", 
                  fixup = FALSE)

The function readHMDweb() comes from the package HMDHFSplus. Besides your username and password, this function uses other arguments, which you need to define. For our purposes, we specify:

Before building the period life table for Swedish females between 2010 and 2014, you need to clean the data using the following code:

my_data <- temp %>% 
    filter(Year=="2010-2014") %>%
    select(Year, Age, mx, ax) %>%
    mutate(Age = as_factor(Age))

With the code, you filter the rows of the dataset by the required 5-year interval and select the columns corresponding to Year, Age, and the specific mortality rate of each 5-year age group mx. We also select the column ax, which refers to the average number of years lived for those who die in each age group. For most of the age groups, it is sensible to assume that deaths evenly spread over each interval of age and, therefore, to assume the value 2.5 for ax. However, this value changes for the youngest and the oldest ages and they are taken from the HMD database.

The function mutate helps to convert Age as a factor. This will be used to get sensible plots later. The pipe operator %>% indicates R to take the result of the respective line and to apply the following function on it.

2 Building a period life table

For building the period life table for Swedish females between 2010 and 2014, use the next chunk of code, where:

perLT_SE <- my_data %>% 
    mutate(n = case_when(
              Age=="0" ~ 1,
              Age=="1-4" ~ 4,
              TRUE ~ 5),
           qx = n*mx/(1+(n-ax)*mx),
           px = 1 - qx,
           lx = lag(cumprod(px), default=1),
           dx = lx - lead(lx, default = 0),  
           Lx = n * lead(lx, default = 0) + (ax* dx),
           Lx = ifelse(Age=="110+",lx/mx,Lx),
           Tx = rev(cumsum(rev(Lx))),
           ex = Tx / lx
        )

3 Plots

Use the function ggplot from the package ggplot2 to visualise different parts of the period life table for Swedish females between 2010 and 2014 that you have just created.

3.1 Plotting qx and px

To visualise the probability of dying qx and the probability of surviving px at the same time, you need to collapse these columns, for which you require the function gather as the next code presents:

perLT_SE %>%
  gather(Probability, Value, qx:px) %>%
  ggplot(aes(x = Age, y = Value, group = Probability, color = Probability)) +
  geom_line(size = 2) + 
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

After running the previous code, you should get the next figure:

3.2 Plotting dx

For visualising the number of deaths between x and x+n (dx), the code is similar to the previous lines, but you need to change the argument y as follows:

perLT_SE %>%
  ggplot(aes(x = Age, y = dx, group = 1)) +
  geom_line(size = 2) + 
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

3.3 Plotting ex

One of the most important outcomes of a life table is the column related to the life expectancy. Use the next code to plot ex:

perLT_SE %>%
  ggplot(aes(x = Age, y = ex, group = 1)) +
  geom_line(size = 2) +       
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

4 Homework

  1. Go back to the HMD website and select a country of your interest. Then, source the corresponding data.
  2. Calculate a period life table for the country of your selection.
  3. Visualise qx, px, dx, Lx, Tx, and ex.
  4. Compare the results and plots of Sweden and the country of your choice. Is it what you expected?