Kikwit Ebola Analysis

Model Fitting

Model fitting is done using the mcmcSEIR function. The first argument to this function is a list that contains the important features of the data -

Istar - the number of new cases over time
S0 - the number of susceptible individuals at the beginning of the epidemic
E0 - the number of exposed individuals at the beginning of the epidemic
I0 - the number of infectious individuals at the beginning of the epidemic
N - the population size

As three individuals were consecutively reported on the first three days of the epidemic, the initial conditions used are \(N\) = 5,363,500, \(S_0\) = 5,363,497, \(E_0 = 3\), \(I_0 = 0\), and \(R_0 = 0\).

N <- 5363500
E0 <- 3
S0 <- N - E0
I0 <- 0
ebolaDat <- list(Istar = ebola$Count[-c(1:3)],
                 S0 = S0,
                 E0 = E0,
                 I0 = I0,
                 N = N)

The matrix X is used to describe the intensity process. As in Lekone and Finkenstadt (2006), we formulate X to describe constant epidemic intensity until the time of an intervention, after which transmission decays exponentially. We scale the intervention vector by 100 to improve estimation.

tau <- length(ebolaDat$Istar)
interventionTime <- which(ebola$Date[-c(1:3)] == as.Date('1995-05-10'))

X <- cbind(1, cumsum(1:tau > interventionTime) / 100)

Finally, we must determine our MCMC specifications:

niter - the number of iterations to run the MCMC algorithm for
nburn - the number of burn-in iterations to be discarded

A full analysis should implement multiple chains using different initial values, but we will run only one chain for illustration.

niter <- 100000
nburn <- 20000

The rest of the parameters are specific to the type of model being fit. For each type of model, we need to specify the initial values, prior distributions, and any model specific parameters.

Exponential infectious period

The model with exponential infectious periods has four parameters: \(\beta_1\), \(\beta_2\), \(\rho_E\), and \(\rho_I\).

Normal priors were used for the \(\boldsymbol{\beta}\) parameters. To allow the potential for the intervention to have a strong effect the prior on \(\beta_2\) has a large variance. An informative prior was used for the \(\rho_E\) parameter, corresponding to a mean length of this period to be between 9 - 10 days. The prior on the \(\rho_I\) parameter represents a mean infectious period of 6 days.

Finally, we specify WAIC = TRUE to tell the function to compute the WAIC, so we can compare the model fits later.

initsList <- list(beta = c(1, -1), rateE = 0.1, rateI = 0.3)

betaPrior <- function(x) {
  dnorm(x[1], mean = 0, sd = 4, log = T) + 
    dnorm(x[2], mean = 0, sd = 10, log = T)
}

rateEPrior <- function(x) {
  dgamma(x, 11, 100, log = T)
}

rateIPrior <- function(x) {
  dgamma(x, 16, 100, log = T)
}


priorList <- list(betaPrior = betaPrior,
                  rateEPrior = rateEPrior,
                  rateIPrior = rateIPrior)

postResExp <- mcmcSEIR(dat = ebolaDat, X = X, 
                       inits = initsList, 
                       niter = niter, nburn = nburn,
                       infPeriodSpec = 'exp',
                       priors = priorList,
                       WAIC = T)

We can look at the trace plots to assess convergence.

postParamsExp <- postResExp$fullPost

par(mfrow = c(2,2))
plot(postParamsExp$beta1, type = 'l')
plot(postParamsExp$beta2, type = 'l')
plot(postParamsExp$rateE, type = 'l')
plot(postParamsExp$rateI, type = 'l')

We can also calculate the posterior distribution of \(R_0(t)\), and plot the mean and credible intervals. We thin the posterior by taking every 10th iteration to speed computation.

postBurninIter <- seq(nburn + 1, niter, 10)

R0Post <- vapply(1:length(postBurninIter), function(i) 
  getR0(infPeriodSpec = 'exp',
        beta = c(postParamsExp$beta1[i],
                 postParamsExp$beta2[i]),
        X = X, N = N,
        infExpParams = list(rateI = postParamsExp$rateI[i])),
  numeric(nrow(X)))


R0PostSummary <- data.frame(mean = rowMeans(R0Post),
                            lower = apply(R0Post, 1, quantile, probs = 0.025),
                            upper = apply(R0Post, 1, quantile, probs = 0.975))

plot(R0PostSummary$mean, type = 'l', lwd = 2, 
     main = 'Reproductive number over time', 
     xlab = 'Epidemic time', ylab = 'R0', ylim = c(0, 2.5))
lines(R0PostSummary$lower, lty = 2, lwd = 2)
lines(R0PostSummary$upper, lty = 2, lwd = 2)

The exponential model estimates a reproductive number starting around 1.8, which then declines due to the intervention, ultimately going to 0 when the epidemic ends.

IDD transmissibility using the gamma pdf for the IDD curve

The next model we will implement is the infectious duration-dependent (IDD) model of Ward et al. (2022), using the gamma probability density function (PDF) to describe the IDD transmissibility curve. The \(\beta\) and \(\rho_E\) priors are specified identically to the previous model. It is difficult to choose informative priors for the shape and rate parameters of the gamma PDF, so Gamma(1, 1) priors are used for both. Generally, individuals infected with EVD are infectious for 4 to 10 days, but we set the maximum length of the infectious period to 21 days to be conservative.

initsList <- list(beta = c(1, -1), rateE = 0.1, 
                  iddParams = list(shape = 0.2, rate = 0.8))

betaPrior <- function(x) {
  dnorm(x[1], mean = 0, sd = 4, log = T) + 
    dnorm(x[2], mean = 0, sd = 10, log = T)
}

rateEPrior <- function(x) {
  dgamma(x, 11, 100, log = T)
}

iddParamsPrior <- function(x) {
  dgamma(x['shape'], 1, 1, log = T) +
    dgamma(x['rate'], 1, 1, log = T)
}


priorList <- list(betaPrior = betaPrior,
                  rateEPrior = rateEPrior,
                  iddParamsPrior = iddParamsPrior)

maxInf <- 21
iddFun <- dgammaIDD

postResIDDGamma <- mcmcSEIR(dat = ebolaDat, X = X, 
                            inits = initsList, 
                            niter = niter, nburn = nburn,
                            infPeriodSpec = 'IDD',
                            priors = priorList,
                            iddFun = iddFun, maxInf = maxInf,
                            WAIC = T)

The trace plots for each parameter:

postParamsIDDGamma <- postResIDDGamma$fullPost

par(mfrow = c(2,3))
plot(postParamsIDDGamma$beta1, type = 'l')
plot(postParamsIDDGamma$beta2, type = 'l')
plot(postParamsIDDGamma$rateE, type = 'l')
plot(postParamsIDDGamma$shape, type = 'l')
plot(postParamsIDDGamma$rate, type = 'l')

Calculating the posterior distribution of \(R_0(t)\) we get the following:

postBurninIter <- seq(nburn + 1, niter, 10)

R0Post <- vapply(1:length(postBurninIter), function(i) 
  getR0(infPeriodSpec = 'IDD',
        beta = c(postParamsIDDGamma$beta1[i],
                 postParamsIDDGamma$beta2[i]),
        X = X, N = N,
        infIDDParams = list(iddFun = iddFun,
                            iddParams = list(shape = postParamsIDDGamma$shape[i],
                                             rate = postParamsIDDGamma$rate[i]),
                            maxInf = maxInf)),
  numeric(nrow(X)))


R0PostSummary <- data.frame(mean = rowMeans(R0Post),
                            lower = apply(R0Post, 1, quantile, probs = 0.025),
                            upper = apply(R0Post, 1, quantile, probs = 0.975))

plot(R0PostSummary$mean, type = 'l', lwd = 2, 
     main = 'Reproductive number over time', 
     xlab = 'Epidemic time', ylab = 'R0', ylim = c(0, 2.5))
lines(R0PostSummary$lower, lty = 2, lwd = 2)
lines(R0PostSummary$upper, lty = 2, lwd = 2)

The posterior mean indicates the reproductive number is around 1.5 before going to 0 by the end of the epidemic.

Finally, we may also be interested in the posterior distribution of the IDD transmissibility curve. We can also compute this using the posteriod samples.

iddCurvePost <- vapply(1:length(postBurninIter), function(i) 
  do.call(iddFun, args = list(x = 1:maxInf,
                              params = list(shape = postParamsIDDGamma$shape[i],
                                            rate = postParamsIDDGamma$rate[i]))),
  numeric(maxInf))

iddCurvePostSummary <- data.frame(mean = rowMeans(iddCurvePost),
                                  lower = apply(iddCurvePost, 1, quantile, probs = 0.025),
                                  upper = apply(iddCurvePost, 1, quantile, probs = 0.975))

plot(iddCurvePostSummary$mean, type = 'l', lwd = 2, 
     main = 'IDD Transmissibility Curve', 
     xlab = 'Days Infectious', ylab = 'IDD transmissibility')
lines(iddCurvePostSummary$lower, lty = 2, lwd = 2)
lines(iddCurvePostSummary$upper, lty = 2, lwd = 2)

The posterior distribution of the IDD curve indicates transmission is high on the first day of the infectious period, then rapidly declines.

IDD transmissibility using logistic decay for the IDD curve

The next model we will implement is the IDD model using the logistic decay function to describe the IDD transmissibility curve. The \(\beta\) and \(\rho_E\) priors are specified identically to the previous models. As individuals infected with EVD are generally infectious for 4 to 10 days, but we set the the mean on the midpoint of the logistic decay curve at 6. Finally, as before, we specify the maximum length of the infectious period to 21 days to be conservative.

initsList <- list(beta = c(1, -1), rateE = 0.1, 
                  iddParams = list(mid = 5, rate = 0.8))

betaPrior <- function(x) {
  dnorm(x[1], mean = 0, sd = 4, log = T) + 
    dnorm(x[2], mean = 0, sd = 10, log = T)
}

rateEPrior <- function(x) {
  dgamma(x, 11, 100, log = T)
}

iddParamsPrior <- function(x) {
  dnorm(x['mid'], 6, 1, log = T) +
    dgamma(x['rate'], 1, 1, log = T)
}


priorList <- list(betaPrior = betaPrior,
                  rateEPrior = rateEPrior,
                  iddParamsPrior = iddParamsPrior)


maxInf <- 21
iddFun <- logitIDD

postResIDDLogit <- mcmcSEIR(dat = ebolaDat, X = X, 
                            inits = initsList, 
                            niter = niter, nburn = nburn,
                            infPeriodSpec = 'IDD',
                            priors = priorList,
                            iddFun = iddFun, maxInf = maxInf,
                            WAIC = T)

Trace plots:

postParamsIDDLogit <- postResIDDLogit$fullPost

par(mfrow = c(2,3))
plot(postParamsIDDLogit$beta1, type = 'l')
plot(postParamsIDDLogit$beta2, type = 'l')
plot(postParamsIDDLogit$rateE, type = 'l')
plot(postParamsIDDLogit$mid, type = 'l')
plot(postParamsIDDLogit$rate, type = 'l')

Posterior distribution of \(R_0(t)\).

postBurninIter <- seq(nburn + 1, niter, 10)

R0Post <- vapply(1:length(postBurninIter), function(i) 
  getR0(infPeriodSpec = 'IDD',
        beta = c(postParamsIDDLogit$beta1[i],
                 postParamsIDDLogit$beta2[i]),
        X = X, N = N,
        infIDDParams = list(iddFun = iddFun,
                            iddParams = list(mid = postParamsIDDLogit$mid[i],
                                             rate = postParamsIDDLogit$rate[i]),
                            maxInf = maxInf)),
  numeric(nrow(X)))


R0PostSummary <- data.frame(mean = rowMeans(R0Post),
                            lower = apply(R0Post, 1, quantile, probs = 0.025),
                            upper = apply(R0Post, 1, quantile, probs = 0.975))

plot(R0PostSummary$mean, type = 'l', lwd = 2, 
     main = 'Reproductive number over time', 
     xlab = 'Epidemic time', ylab = 'R0', ylim = c(0, 2.5))
lines(R0PostSummary$lower, lty = 2, lwd = 2)
lines(R0PostSummary$upper, lty = 2, lwd = 2)

The posterior mean indicates the reproductive number is around 1.6 before going to 0 by the end of the epidemic.

Finally, we may also be interested in the posterior distribution of the IDD transmissibility curve. We can also compute this.

postBurninIter <- seq(nburn + 1, niter, 10)

iddCurvePost <- vapply(1:length(postBurninIter), function(i) 
  do.call(iddFun, args = list(x = 1:maxInf,
                                 params = list(mid = postParamsIDDLogit$mid[i],
                                               rate = postParamsIDDLogit$rate[i]))),
  numeric(maxInf))

iddCurvePostSummary <- data.frame(mean = rowMeans(iddCurvePost),
                                  lower = apply(iddCurvePost, 1, quantile, probs = 0.025),
                                  upper = apply(iddCurvePost, 1, quantile, probs = 0.975))

plot(iddCurvePostSummary$mean, type = 'l', lwd = 2, 
     main = 'IDD Transmissibility Curve', 
     xlab = 'Days Infectious', ylab = 'IDD transmissibility', ylim = c(0, 1))
lines(iddCurvePostSummary$lower, lty = 2, lwd = 2)
lines(iddCurvePostSummary$upper, lty = 2, lwd = 2)

The posterior distribution of the logistic decay IDD curve indicates transmission is high on the first day of the infectious period, then declines until become 0 after about 10 days.

Path-specific infectious periods according to the gamma distribution

Finally, we will fit the path-specific model of Porter and Oleson (2013), using the gamma distribution to describe the duration of time an individual spends in the infectious period. The \(\beta\) and \(\rho_E\) priors are specified identically to the previous models. As before, we set the maximum duration to 21 days. To reflect a mean infectious period of 6 days, the priors on the shape and rate parameters of the infectious distribution are specified to center on 18 and 3, respectively.

initsList <- list(beta = c(1, -1), rateE = 0.1, 
                  psParams = list(shape = 18, rate = 3))

betaPrior <- function(x) {
  dnorm(x[1], mean = 0, sd = 4, log = T) + 
    dnorm(x[2], mean = 0, sd = 10, log = T)
}

rateEPrior <- function(x) {
  dgamma(x, 11, 100, log = T)
}

psParamsPrior <- function(x) {
  dgamma(x['shape'], 180, 10, log = T) +
    dgamma(x['rate'], 30, 10, log = T)
}


priorList <- list(betaPrior = betaPrior,
                  rateEPrior = rateEPrior,
                  psParamsPrior = psParamsPrior)

dist <- 'gamma'
maxInf <- 21

postResPS <- mcmcSEIR(dat = ebolaDat, X = X, 
                      inits = initsList, 
                      niter = niter, nburn = nburn,
                      infPeriodSpec = 'PS',
                      priors = priorList,
                      dist = dist, maxInf = maxInf,
                      WAIC = T)

Trace plots:

postParamsPS <- postResPS$fullPost

par(mfrow = c(2,3))
plot(postParamsPS$beta1, type = 'l')
plot(postParamsPS$beta2, type = 'l')
plot(postParamsPS$rateE, type = 'l')
plot(postParamsPS$shape, type = 'l')
plot(postParamsPS$rate, type = 'l')

Posterior distribution of \(R_0(t)\):

postBurninIter <- seq(nburn + 1, niter, 10)

R0Post <- vapply(1:length(postBurninIter), function(i) 
  getR0(infPeriodSpec = 'PS',
        beta = c(postParamsPS$beta1[i],
                 postParamsPS$beta2[i]),
        X = X, N = N,
        infPSParams = list(dist = dist,
                           psParams = list(shape = postParamsPS$shape[i],
                                           rate = postParamsPS$rate[i]),
                           maxInf = maxInf)),
  numeric(nrow(X)))


R0PostSummary <- data.frame(mean = rowMeans(R0Post),
                            lower = apply(R0Post, 1, quantile, probs = 0.025),
                            upper = apply(R0Post, 1, quantile, probs = 0.975))

plot(R0PostSummary$mean, type = 'l', lwd = 2, 
     main = 'Reproductive number over time', 
     xlab = 'Epidemic time', ylab = 'R0', ylim = c(0, 2.5))
lines(R0PostSummary$lower, lty = 2, lwd = 2)
lines(R0PostSummary$upper, lty = 2, lwd = 2)

The path-specific model estimates the highest reproductive number, with the posterior mean just over 2 at the start of the epidemic.

Compare WAIC

To determine which model fits this data best, we compare the WAIC between the four models.

kable(data.frame(Model = c('Exponential', 'IDD - Gamma PDF', 
                           'IDD - Logistic Decay', 'Path-specific'),
                 WAIC = c(postResExp$WAIC,
                          postResIDDGamma$WAIC,
                          postResIDDLogit$WAIC,
                          postResPS$WAIC)))

Model	WAIC
Exponential	1176.2788
IDD - Gamma PDF	742.1807
IDD - Logistic Decay	778.1733
Path-specific	2183.9635

Lower values of WAIC indicate better fit. WAIC indicates that the IDD model using the gamma PDF fits the data best, followed by the IDD model using the logistic decay function. The path-specific model has the worst fit.