Caffeine in Pregnancy, Weak Identifiability, and Working with the Media

A recent Paper in BMJ Open deals with the question of maternal caffeine intake and childhood obesity.

I worked with the Science Media Centre to produce a Press Release in which I made the comment:

"This study makes the headline claim that in a population of over 50,000 mother-child pairs, caffeine intake is positively associated with risk of childhood obesity. Let us make a ‘back of the envelope’ calculation on one of the outcome measures using the cup of instant coffee per day as a measure of the mother’s caffeine intake.

If we take 100 children whose mother group had the lowest caffeine intake (1 cup or less) then at age three, around 11 will be obese; in the next group (1-4 cups) there will be one additional obese child (12 in total); in the next (4-6 cups) there will be an additional two (14 in total) and in the group with highest caffeine intake (6 or more cups) there will be an additional three (17 in total).

The main problem with the study is that maternal caffeine intake is strongly correlated with other relevant factors, and so these should be controlled for. In the authors’ statistical analyses, the ‘best fit’ models they’ve used suggest around half of the additional cases of obesity in children of age 3 are explained by caffeine rather than, say, calorie intake of mothers. But since the caffeine and calorie intake are correlated, the existence of a ‘best fit’ model with equal weighting on both means we actually can’t say whether a model looking at just calorie intake of mothers would be an equally good explanation of child obesity. It also says nothing about whether including a new explanatory measure, such as the child’s calorie intake, would change the model.

As such, the evidence provided in the study for a causal effect is extremely weak and the statement from the authors that 'complete avoidance might actually be advisable' seems unjustified, particularly when we consider the effects that such a restriction might have on wellbeing of mothers."

This story was quite widely covered, but most papers took the same approach as iNews and only quoted my main conclusion, but not the reasoning.

So let's consider a simple exaple that exhibits the main problem, in particular the following contingency table. This is much smaller than the original study, but the same phenomenon is exhibited by larger populations with more explanatory factors:

  Caffeine Intake
Low High
Calorie Intake Low Healthy Overweight Healthy Overweight
63 7 3 2
High Healthy Overweight Healthy Overweight
2 3 5 15

Note that 68% of the high caffeine group is overweight, while only 13% of the low caffeine group is, but we note that calorie intake may be an issue so let us perform logistic regression as the authors did, letting \[\mathrm{Pr}(\mathrm{Overweight}) = 1/(1+ \mathrm{exp}(-\beta_0 - \beta_1 \times \mathrm{HighCalorie} - \beta_2 \times \mathrm{HighCaffeine})) \mathrm{ ,}\] where \(\beta_0\) is the intercept, \(\beta_1\) is the log odds due to calories and \(\beta_2\) is the log odds due to caffeine. Setting the intercept at its maximum likelihood value, the maximum likelihood values for the other parameters are shown as the cross below, which attributes roughly equal importance to the two factors:

Likelihood surface

Behind this cross is a shaded region with intensity equal to the likelihood of different parameter combinations - and we note that there is still a reasonable amount of shading in the area where \(\beta_2 \approx 0\). This is a special case of a general phenomenon called weak identifiability, and it is something I've done a bit of work on - there are various ways to deal with this, but its presence is the reason that evidence for a causal effect is weak in the original study.

Visualising the Wishart Distribution

This semester, I have been teaching Multivariate Statistics, and one isse that comes up is how to think about the Wishart distribution, which generalises the \(\chi^2 \) distribution. This has probability density function

$$ f_{\chi^2_k}(x) = \frac{1}{2\,\Gamma\!\left(\frac{k}{2}\right)} \left(\frac{x}{2}\right)^{\frac{k}{2}-1} {\rm e}^{-\frac{x}{2}} . $$

To visualise this at different degrees of freedom \(k \), we use the following Python code.

%matplotlib inline
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
np.random.seed(42)
Sig1 = np.array([[1,0.75],[0.75,2]])
n=1000
x = np.linspace(0, 10, 1000)
plt.figure(figsize=(8,4))
colors = plt.cm.Accent(np.linspace(0,1,4))
i=0
for k in (1,2,5,10):
    c = st.chi2.pdf(x, k);
    plt.plot(x,c, label='k = ' + str(k), color=colors[i],linewidth=3)
    i += 1
plt.xlim((0,10))
plt.ylim((0,0.5))
plt.xlabel('x')
plt.ylabel('f(x)')
plt.legend()
../output_2_1.png

The Wishart emerges when we make \(k \) observations each with \(p \) variables; the sampling distribution for the covariance matrix of these data if the population is multivariate normal is Wishart and has probability density function

$$ f_{\mathcal{W}_p(k,\boldsymbol{\Sigma})}(\boldsymbol{V}) = \frac{\mathrm{det}(\boldsymbol{V})^{\frac{k - p - 1}{2}}}{2^{ \frac{k p}{2} } \mathrm{det}(\boldsymbol{\Sigma})^\frac{k}{2} \Gamma_p\!\left(\frac{k}{2} \right)} \exp\left( -\frac12 \mathrm{Tr}(\boldsymbol{\Sigma}^{-1} \boldsymbol{V}) \right) . $$

In general, this cannot be visualised, but for the \(p=2 \) case, one option is to pick random numbers from this distribution and then produce a three-dimensional scatter plot of these. The code and results below show that this can be seen as behaving somewhat like the chi-squared distribution that it generalises as the degrees of freedom \(k \) are increased.

x=np.zeros(n)
y=np.zeros(n)
z=np.zeros(n)
for i in range(0,n):
    M=st.wishart.rvs(2,scale=Sig1,size=1)
    x[i]=M[0][0]
    y[i]=M[1][1]
    z[i]=M[1][0]
fig = plt.figure(figsize=(8,5))
ax = fig.add_subplot(111, projection='3d')
ax.scatter3D(x,y,z, marker='o', c=z, cmap='seismic')
ax.set_xlabel('V11')
ax.set_ylabel('V22')
ax.set_zlabel('V12=V21')
ax.set_xlim((0,30))
ax.set_ylim((0,70))
ax.set_zlim((0,30))
plt.tight_layout()
../output_3_0.png
x=np.zeros(n)
y=np.zeros(n)
z=np.zeros(n)
for i in range(0,n):
    M=st.wishart.rvs(5,scale=Sig1,size=1)
    x[i]=M[0][0]
    y[i]=M[1][1]
    z[i]=M[1][0]
fig = plt.figure(figsize=(8,5))
ax = fig.add_subplot(111, projection='3d')
ax.scatter3D(x,y,z, marker='o', c=z, cmap='seismic')
ax.set_xlabel('V11')
ax.set_ylabel('V22')
ax.set_zlabel('V12=V21')
ax.set_xlim((0,30))
ax.set_ylim((0,70))
ax.set_zlim((0,30))
plt.tight_layout()
../output_4_0.png
x=np.zeros(n)
y=np.zeros(n)
z=np.zeros(n)
for i in range(0,n):
    M=st.wishart.rvs(10,scale=Sig1,size=1)
    x[i]=M[0][0]
    y[i]=M[1][1]
    z[i]=M[1][0]
fig = plt.figure(figsize=(8,5))
ax = fig.add_subplot(111, projection='3d')
ax.scatter3D(x,y,z, marker='o', c=z, cmap='seismic')
ax.set_xlabel('V11')
ax.set_ylabel('V22')
ax.set_zlabel('V12=V21')
ax.set_xlim((0,30))
ax.set_ylim((0,70))
ax.set_zlim((0,30))
plt.tight_layout()
../output_5_0.png

Mathematics of Memes

Models of epidemics can be adapted to consider the spread of internet Memes - for more information see the University Press Release. Below is a figure showing some of these Memes 'in the wild'! We hope that the work will be possible to extend to more substantial health-related behaviours, but Memes provide a relatively pure signal of the kind of social influence that must be accounted for in such contexts, and as such are a good way to develop methods.

Memes considered in the paper

I also recently gave a talk at the local Galois Group (undergraduate mathematics) on the work:

Operationalising Modern Mathematical Epidemiology

This month Liz Buckingham-Jeffery is joining me and Tim Kinyanjui to complete the team for the EPSRC-funded project Operationalising Modern Mathematical Epidemiology, which is a Healthcare Technologies Impact Fellowship I hold (picture of us all below). The project is about development of some recent ideas at developed by mathematical scientists to generate benefits in the healthcare system. We have non-academic partners from three major Manchester hospitals (Central Manchester, North Manchester and the Christie), Public Health England's national Real-time Syndromic Surveillance group, and a data analytics firm Spectra Analytics. The first papers from this project are starting be be submitted, and we have an ambitious programme of research and impact-related activity over the next couple of years to ensure that the potential benefits to population health from theoretical developments are realised.

Me, Liz and Tim

Testing Preferential Attachment

I've recently had a New paper out in EPJ Data Science concerning preferential attachment (the 'rich get richer' effect) in social networks.

This is a relatively controversial area, and the aim of the paper was to try to be as systematic / objective as possible. The simplest way to explain our results, suggested by one of two pleasingly constructive reviewers, is that we estimate around 40% of social contacts arise due to the existence of existing social contacts (e.g. your friends introduce you to their friends) rather than from direct social activity. If this figure had been much larger, it would have created an epidemiological paradox - no infectious disease would be controllable.

The main innovation was to use a general finite-state-space Markov chain for the non-preferentially attached links, which generates a mechanistically interpretable phase-type distribution as a way to capture other sources of heterogeneity in contact numbers.

The journal's editor (quite a 'big name' with many followers) was also nice enough to promote the work on social media:

Spreading of Mood on Social Networks

Together with Ed Hill and Frances Griffiths, I had a paper out in Proceedings B yesterday. There has been a bit of media interest in the paper (e.g. The Atlantic and the Daily Telegraph) but I assume that anyone reading this page is more interested in how the mathematics works. This is given in relatively full detail in the paper and Supplementary Material, so below I give a more informal account.

A ubiquitous problem in epidemiology is confounding, which is (very loosely speaking) where something else correlates with the actual causal mechanism for disease. In the current context, the issue is that there may be clusters of ill people on a social network due to spreading of the illness on the network, or it could be that people who share a risk factor tend to be linked on the network. So for depression (which we studied) it could be that people who drink heavily tend to be friends, but it is the drinking that causes the clusters of depression rather than the friendships.

There is a lot of interesting discussion of such questions in the context of social networks in a Special issue of Statistics in Medicine. Personally, I tend to think that while it is never possible to eliminate all forms of confounding in observational studies, these offer information that is not possible to obtain in experiments. Therefore the task for methodologists is to try to come up with methods that allow information to be reliably extracted from observational data. In the paper in question, we tried to reduce the possible role for confounding in the following way.

Figure 3 from the paper

Figure 3 from the paper. D and N in the paper are equivalent to 1 and 0 in the discussion here respectively.

Suppose we have a Markov chain \(\mathbf{X}_t = (X^i_t)\), where \(X^i_t=1\) if individual \(i\in\{1,2,\ldots,N\}\) is ill at time \(t\) and \(X^i_t=0\) otherwise. If there is no social influence, then a standard modelling approach is to let \[\mathrm{Pr}(X^i_t=1) = f(\boldsymbol{\theta}_i) \mathrm{ ,}\] where \(\boldsymbol{\theta}_i\) is a vector of individual-level properties / behaviours for individual \(i\), and the function \(f\) is usually a logit transform on \(\boldsymbol{\beta}\cdot\boldsymbol{\theta}_i\). Here there is a lot of room for confounding if different elements of the vectors in \(\{\boldsymbol{\theta}_i\}_{i=1}^{N}\) are correlated with each other.

Now add the social network - a matrix with elements \[G_{i,j} = \begin{cases} 1 & \text{ if } j \text{ is } i \text{'s friend,} \\ 0 & \text{ otherwise.}\end{cases}\] Clustering in such a network is a tendency for friends to be in the same disease state - \(\mathrm{Pr}(X^i_t = X^j_t \vert G_{i,j}=1) \gt \mathrm{Pr}(X^i_t = X^j_t \vert G_{i,j}=0)\). The issue of confounding then becomes whether the links \(\{G_{i,j}\}_{i,j=1}^{N}\) (usually thought of as Bernoulli random variables like the disease states) are correlated with the \(\{\boldsymbol{\theta}_i\}_{i=1}^{N}\), or whether there is evidence for "spreading" of the disease as in Figure 3 of the paper.

Our way of answering this rests on the bit of stochastic process theory that says that distinct Markov chains can share the same stationary distribution, meaning that observations from this distribution alone cannot distinguish them, but may have different transition probabilities. A direct observation of transitions between disease states therefore allows us to consider determine model probabilities like \(\mathrm{Pr}(X^i_{t+1} = 1 \vert \mathbf{X}_t,\mathbf{G})\), and thereby distinguish between spreading over a network, and correlation of other properties / behaviours. The full details are given in the paper and supplement, but the basic idea that a lot of confounding is related to equivalent stationarity of stochastic processes behind the hypotheses to be considered, and could be mitigated by fitting to temporal observations (or other features that do distinguish those processes) is, I believe, a relatively general one.

Science Review Paper Out

I was a long-term participant in the Infectious Disease Dynamics programme at the Newton Institute.

This experience had a large effect on my thinking about the field, and allowed me to start new collaborations. One of the things that we did together is write the review paper Modeling infectious disease dynamics in the complex landscape of global health, which has just appeared in its final form. While the lead author, Hans Heesterbeek, did the really hard work of pulling everything together, the paper comes remarkably close to a consensus view of the most exciting directions in infectious disease modelling, and it's a good starting point for someone thinking about making contributions to this field.

Hello

Welcome to my new personal website. It's powered by Nikola, and contains information about my research and teaching.