library(knitr)
library(BayesSEIR)
This vignette illustrates the use of the mcmcSEIR
function in the BayesSEIR package to analyze real data from the Ebola
Virus Disease (EVD) outbreak in Kikwit, Democratic Republic of the Congo
(DRC). Four models are fit to the observed data and compared using
WAIC.
Exponential infectious period
Infectious duration-dependent (IDD) transmissibility using the gamma pdf for the IDD curve
IDD transmissibility using logistic decay for the IDD curve
Path-specific infectious periods according to the gamma distribution
The DRC epidemic of 1995 took place in the Bandundu region, with the primarily impacted location being the city of Kikwit. The population size in this region was 5,363,500 people during the outbreak. 316 cases were documented between March and July 1995, with a case fatality rate of 81%. Control measures were introduced immediately after EVD was identified as the cause of the outbreak on May 9. Due to under-reporting, the data only contains records for 291 of the 316 cases identified.
The data are publicly available in the ABSEIR R package. The data provide the daily numbers of newly detected EVD cases during the outbreak.
# devtools::install_git("https://github.com/grantbrown/ABSEIR.git")
library(ABSEIR)
## Loading required package: Rcpp
## Warning: package 'Rcpp' was built under R version 4.1.3
## Loading required package: parallel
## Loading required package: compiler
<- ABSEIR::Kikwit1995
ebola head(ebola)
## Date Count
## 1 1995-03-06 1
## 2 1995-03-07 1
## 3 1995-03-08 1
## 4 1995-03-09 0
## 5 1995-03-10 0
## 6 1995-03-11 0
plot(ebola$Date, ebola$Count, type = 'l', lwd = 2,
main = 'Daily Counts of Ebola in Kikwit',
xlab = 'Date', ylab = 'Count')
abline(v = as.Date('1995-05-10'), col = 'blue', lty = 2, lwd = 2)
legend('topright', 'Date of Intervention', col = 'blue', lty = 2, lwd = 2, bty ='n')
Model fitting is done using the mcmcSEIR
function. The
first argument to this function is a list that contains the important
features of the data -
Istar
- the number of new cases over timeS0
- the number of susceptible individuals at the
beginning of the epidemicE0
- the number of exposed individuals at the beginning
of the epidemicI0
- the number of infectious individuals at the
beginning of the epidemicN
- the population sizeAs three individuals were consecutively reported on the first three days of the epidemic, the initial conditions used are \(N\) = 5,363,500, \(S_0\) = 5,363,497, \(E_0 = 3\), \(I_0 = 0\), and \(R_0 = 0\).
<- 5363500
N <- 3
E0 <- N - E0
S0 <- 0
I0 <- list(Istar = ebola$Count[-c(1:3)],
ebolaDat S0 = S0,
E0 = E0,
I0 = I0,
N = N)
The matrix X
is used to describe the intensity process.
As in Lekone and Finkenstadt (2006), we formulate X
to
describe constant epidemic intensity until the time of an intervention,
after which transmission decays exponentially. We scale the intervention
vector by 100 to improve estimation.
<- length(ebolaDat$Istar)
tau <- which(ebola$Date[-c(1:3)] == as.Date('1995-05-10'))
interventionTime
<- cbind(1, cumsum(1:tau > interventionTime) / 100) X
Finally, we must determine our MCMC specifications:
niter
- the number of iterations to run the MCMC
algorithm fornburn
- the number of burn-in iterations to be
discardedA full analysis should implement multiple chains using different initial values, but we will run only one chain for illustration.
<- 100000
niter <- 20000 nburn
The rest of the parameters are specific to the type of model being fit. For each type of model, we need to specify the initial values, prior distributions, and any model specific parameters.
The model with exponential infectious periods has four parameters: \(\beta_1\), \(\beta_2\), \(\rho_E\), and \(\rho_I\).
Normal priors were used for the \(\boldsymbol{\beta}\) parameters. To allow the potential for the intervention to have a strong effect the prior on \(\beta_2\) has a large variance. An informative prior was used for the \(\rho_E\) parameter, corresponding to a mean length of this period to be between 9 - 10 days. The prior on the \(\rho_I\) parameter represents a mean infectious period of 6 days.
Finally, we specify WAIC = TRUE
to tell the function to
compute the WAIC, so we can compare the model fits later.
<- list(beta = c(1, -1), rateE = 0.1, rateI = 0.3)
initsList
<- function(x) {
betaPrior dnorm(x[1], mean = 0, sd = 4, log = T) +
dnorm(x[2], mean = 0, sd = 10, log = T)
}
<- function(x) {
rateEPrior dgamma(x, 11, 100, log = T)
}
<- function(x) {
rateIPrior dgamma(x, 16, 100, log = T)
}
<- list(betaPrior = betaPrior,
priorList rateEPrior = rateEPrior,
rateIPrior = rateIPrior)
<- mcmcSEIR(dat = ebolaDat, X = X,
postResExp inits = initsList,
niter = niter, nburn = nburn,
infPeriodSpec = 'exp',
priors = priorList,
WAIC = T)
We can look at the trace plots to assess convergence.
<- postResExp$fullPost
postParamsExp
par(mfrow = c(2,2))
plot(postParamsExp$beta1, type = 'l')
plot(postParamsExp$beta2, type = 'l')
plot(postParamsExp$rateE, type = 'l')
plot(postParamsExp$rateI, type = 'l')
We can also calculate the posterior distribution of \(R_0(t)\), and plot the mean and credible intervals. We thin the posterior by taking every 10th iteration to speed computation.
<- seq(nburn + 1, niter, 10)
postBurninIter
<- vapply(1:length(postBurninIter), function(i)
R0Post getR0(infPeriodSpec = 'exp',
beta = c(postParamsExp$beta1[i],
$beta2[i]),
postParamsExpX = X, N = N,
infExpParams = list(rateI = postParamsExp$rateI[i])),
numeric(nrow(X)))
<- data.frame(mean = rowMeans(R0Post),
R0PostSummary lower = apply(R0Post, 1, quantile, probs = 0.025),
upper = apply(R0Post, 1, quantile, probs = 0.975))
plot(R0PostSummary$mean, type = 'l', lwd = 2,
main = 'Reproductive number over time',
xlab = 'Epidemic time', ylab = 'R0', ylim = c(0, 2.5))
lines(R0PostSummary$lower, lty = 2, lwd = 2)
lines(R0PostSummary$upper, lty = 2, lwd = 2)
The exponential model estimates a reproductive number starting around 1.8, which then declines due to the intervention, ultimately going to 0 when the epidemic ends.
The next model we will implement is the infectious duration-dependent (IDD) model of Ward et al. (2022), using the gamma probability density function (PDF) to describe the IDD transmissibility curve. The \(\beta\) and \(\rho_E\) priors are specified identically to the previous model. It is difficult to choose informative priors for the shape and rate parameters of the gamma PDF, so Gamma(1, 1) priors are used for both. Generally, individuals infected with EVD are infectious for 4 to 10 days, but we set the maximum length of the infectious period to 21 days to be conservative.
<- list(beta = c(1, -1), rateE = 0.1,
initsList iddParams = list(shape = 0.2, rate = 0.8))
<- function(x) {
betaPrior dnorm(x[1], mean = 0, sd = 4, log = T) +
dnorm(x[2], mean = 0, sd = 10, log = T)
}
<- function(x) {
rateEPrior dgamma(x, 11, 100, log = T)
}
<- function(x) {
iddParamsPrior dgamma(x['shape'], 1, 1, log = T) +
dgamma(x['rate'], 1, 1, log = T)
}
<- list(betaPrior = betaPrior,
priorList rateEPrior = rateEPrior,
iddParamsPrior = iddParamsPrior)
<- 21
maxInf <- dgammaIDD
iddFun
<- mcmcSEIR(dat = ebolaDat, X = X,
postResIDDGamma inits = initsList,
niter = niter, nburn = nburn,
infPeriodSpec = 'IDD',
priors = priorList,
iddFun = iddFun, maxInf = maxInf,
WAIC = T)
The trace plots for each parameter:
<- postResIDDGamma$fullPost
postParamsIDDGamma
par(mfrow = c(2,3))
plot(postParamsIDDGamma$beta1, type = 'l')
plot(postParamsIDDGamma$beta2, type = 'l')
plot(postParamsIDDGamma$rateE, type = 'l')
plot(postParamsIDDGamma$shape, type = 'l')
plot(postParamsIDDGamma$rate, type = 'l')
Calculating the posterior distribution of \(R_0(t)\) we get the following:
<- seq(nburn + 1, niter, 10)
postBurninIter
<- vapply(1:length(postBurninIter), function(i)
R0Post getR0(infPeriodSpec = 'IDD',
beta = c(postParamsIDDGamma$beta1[i],
$beta2[i]),
postParamsIDDGammaX = X, N = N,
infIDDParams = list(iddFun = iddFun,
iddParams = list(shape = postParamsIDDGamma$shape[i],
rate = postParamsIDDGamma$rate[i]),
maxInf = maxInf)),
numeric(nrow(X)))
<- data.frame(mean = rowMeans(R0Post),
R0PostSummary lower = apply(R0Post, 1, quantile, probs = 0.025),
upper = apply(R0Post, 1, quantile, probs = 0.975))
plot(R0PostSummary$mean, type = 'l', lwd = 2,
main = 'Reproductive number over time',
xlab = 'Epidemic time', ylab = 'R0', ylim = c(0, 2.5))
lines(R0PostSummary$lower, lty = 2, lwd = 2)
lines(R0PostSummary$upper, lty = 2, lwd = 2)
The posterior mean indicates the reproductive number is around 1.5 before going to 0 by the end of the epidemic.
Finally, we may also be interested in the posterior distribution of the IDD transmissibility curve. We can also compute this using the posteriod samples.
<- vapply(1:length(postBurninIter), function(i)
iddCurvePost do.call(iddFun, args = list(x = 1:maxInf,
params = list(shape = postParamsIDDGamma$shape[i],
rate = postParamsIDDGamma$rate[i]))),
numeric(maxInf))
<- data.frame(mean = rowMeans(iddCurvePost),
iddCurvePostSummary lower = apply(iddCurvePost, 1, quantile, probs = 0.025),
upper = apply(iddCurvePost, 1, quantile, probs = 0.975))
plot(iddCurvePostSummary$mean, type = 'l', lwd = 2,
main = 'IDD Transmissibility Curve',
xlab = 'Days Infectious', ylab = 'IDD transmissibility')
lines(iddCurvePostSummary$lower, lty = 2, lwd = 2)
lines(iddCurvePostSummary$upper, lty = 2, lwd = 2)
The posterior distribution of the IDD curve indicates transmission is high on the first day of the infectious period, then rapidly declines.
The next model we will implement is the IDD model using the logistic decay function to describe the IDD transmissibility curve. The \(\beta\) and \(\rho_E\) priors are specified identically to the previous models. As individuals infected with EVD are generally infectious for 4 to 10 days, but we set the the mean on the midpoint of the logistic decay curve at 6. Finally, as before, we specify the maximum length of the infectious period to 21 days to be conservative.
<- list(beta = c(1, -1), rateE = 0.1,
initsList iddParams = list(mid = 5, rate = 0.8))
<- function(x) {
betaPrior dnorm(x[1], mean = 0, sd = 4, log = T) +
dnorm(x[2], mean = 0, sd = 10, log = T)
}
<- function(x) {
rateEPrior dgamma(x, 11, 100, log = T)
}
<- function(x) {
iddParamsPrior dnorm(x['mid'], 6, 1, log = T) +
dgamma(x['rate'], 1, 1, log = T)
}
<- list(betaPrior = betaPrior,
priorList rateEPrior = rateEPrior,
iddParamsPrior = iddParamsPrior)
<- 21
maxInf <- logitIDD
iddFun
<- mcmcSEIR(dat = ebolaDat, X = X,
postResIDDLogit inits = initsList,
niter = niter, nburn = nburn,
infPeriodSpec = 'IDD',
priors = priorList,
iddFun = iddFun, maxInf = maxInf,
WAIC = T)
Trace plots:
<- postResIDDLogit$fullPost
postParamsIDDLogit
par(mfrow = c(2,3))
plot(postParamsIDDLogit$beta1, type = 'l')
plot(postParamsIDDLogit$beta2, type = 'l')
plot(postParamsIDDLogit$rateE, type = 'l')
plot(postParamsIDDLogit$mid, type = 'l')
plot(postParamsIDDLogit$rate, type = 'l')
Posterior distribution of \(R_0(t)\).
<- seq(nburn + 1, niter, 10)
postBurninIter
<- vapply(1:length(postBurninIter), function(i)
R0Post getR0(infPeriodSpec = 'IDD',
beta = c(postParamsIDDLogit$beta1[i],
$beta2[i]),
postParamsIDDLogitX = X, N = N,
infIDDParams = list(iddFun = iddFun,
iddParams = list(mid = postParamsIDDLogit$mid[i],
rate = postParamsIDDLogit$rate[i]),
maxInf = maxInf)),
numeric(nrow(X)))
<- data.frame(mean = rowMeans(R0Post),
R0PostSummary lower = apply(R0Post, 1, quantile, probs = 0.025),
upper = apply(R0Post, 1, quantile, probs = 0.975))
plot(R0PostSummary$mean, type = 'l', lwd = 2,
main = 'Reproductive number over time',
xlab = 'Epidemic time', ylab = 'R0', ylim = c(0, 2.5))
lines(R0PostSummary$lower, lty = 2, lwd = 2)
lines(R0PostSummary$upper, lty = 2, lwd = 2)
The posterior mean indicates the reproductive number is around 1.6 before going to 0 by the end of the epidemic.
Finally, we may also be interested in the posterior distribution of the IDD transmissibility curve. We can also compute this.
<- seq(nburn + 1, niter, 10)
postBurninIter
<- vapply(1:length(postBurninIter), function(i)
iddCurvePost do.call(iddFun, args = list(x = 1:maxInf,
params = list(mid = postParamsIDDLogit$mid[i],
rate = postParamsIDDLogit$rate[i]))),
numeric(maxInf))
<- data.frame(mean = rowMeans(iddCurvePost),
iddCurvePostSummary lower = apply(iddCurvePost, 1, quantile, probs = 0.025),
upper = apply(iddCurvePost, 1, quantile, probs = 0.975))
plot(iddCurvePostSummary$mean, type = 'l', lwd = 2,
main = 'IDD Transmissibility Curve',
xlab = 'Days Infectious', ylab = 'IDD transmissibility', ylim = c(0, 1))
lines(iddCurvePostSummary$lower, lty = 2, lwd = 2)
lines(iddCurvePostSummary$upper, lty = 2, lwd = 2)
The posterior distribution of the logistic decay IDD curve indicates transmission is high on the first day of the infectious period, then declines until become 0 after about 10 days.
Finally, we will fit the path-specific model of Porter and Oleson (2013), using the gamma distribution to describe the duration of time an individual spends in the infectious period. The \(\beta\) and \(\rho_E\) priors are specified identically to the previous models. As before, we set the maximum duration to 21 days. To reflect a mean infectious period of 6 days, the priors on the shape and rate parameters of the infectious distribution are specified to center on 18 and 3, respectively.
<- list(beta = c(1, -1), rateE = 0.1,
initsList psParams = list(shape = 18, rate = 3))
<- function(x) {
betaPrior dnorm(x[1], mean = 0, sd = 4, log = T) +
dnorm(x[2], mean = 0, sd = 10, log = T)
}
<- function(x) {
rateEPrior dgamma(x, 11, 100, log = T)
}
<- function(x) {
psParamsPrior dgamma(x['shape'], 180, 10, log = T) +
dgamma(x['rate'], 30, 10, log = T)
}
<- list(betaPrior = betaPrior,
priorList rateEPrior = rateEPrior,
psParamsPrior = psParamsPrior)
<- 'gamma'
dist <- 21
maxInf
<- mcmcSEIR(dat = ebolaDat, X = X,
postResPS inits = initsList,
niter = niter, nburn = nburn,
infPeriodSpec = 'PS',
priors = priorList,
dist = dist, maxInf = maxInf,
WAIC = T)
Trace plots:
<- postResPS$fullPost
postParamsPS
par(mfrow = c(2,3))
plot(postParamsPS$beta1, type = 'l')
plot(postParamsPS$beta2, type = 'l')
plot(postParamsPS$rateE, type = 'l')
plot(postParamsPS$shape, type = 'l')
plot(postParamsPS$rate, type = 'l')
Posterior distribution of \(R_0(t)\):
<- seq(nburn + 1, niter, 10)
postBurninIter
<- vapply(1:length(postBurninIter), function(i)
R0Post getR0(infPeriodSpec = 'PS',
beta = c(postParamsPS$beta1[i],
$beta2[i]),
postParamsPSX = X, N = N,
infPSParams = list(dist = dist,
psParams = list(shape = postParamsPS$shape[i],
rate = postParamsPS$rate[i]),
maxInf = maxInf)),
numeric(nrow(X)))
<- data.frame(mean = rowMeans(R0Post),
R0PostSummary lower = apply(R0Post, 1, quantile, probs = 0.025),
upper = apply(R0Post, 1, quantile, probs = 0.975))
plot(R0PostSummary$mean, type = 'l', lwd = 2,
main = 'Reproductive number over time',
xlab = 'Epidemic time', ylab = 'R0', ylim = c(0, 2.5))
lines(R0PostSummary$lower, lty = 2, lwd = 2)
lines(R0PostSummary$upper, lty = 2, lwd = 2)
The path-specific model estimates the highest reproductive number, with the posterior mean just over 2 at the start of the epidemic.
To determine which model fits this data best, we compare the WAIC between the four models.
kable(data.frame(Model = c('Exponential', 'IDD - Gamma PDF',
'IDD - Logistic Decay', 'Path-specific'),
WAIC = c(postResExp$WAIC,
$WAIC,
postResIDDGamma$WAIC,
postResIDDLogit$WAIC))) postResPS
Model | WAIC |
---|---|
Exponential | 1176.2788 |
IDD - Gamma PDF | 742.1807 |
IDD - Logistic Decay | 778.1733 |
Path-specific | 2183.9635 |
Lower values of WAIC indicate better fit. WAIC indicates that the IDD model using the gamma PDF fits the data best, followed by the IDD model using the logistic decay function. The path-specific model has the worst fit.
Lekone, P. E., Finkenstadt, B. F., 2006. Statistical inference in a stochastic epidemic SEIR model with control intervention: Ebola as a case study. Biometrics. 62, 1170–1177. DOI: 10.1111/j.1541-0420.2006.00609.x
Porter A. T., Oleson J. J. 2013. A path‐specific SEIR model for use with general latent and infectious time distributions. Biometrics. 69(1), 101-108. DOI: 10.1111/j.1541-0420.2012.01809.x
Ward, C., Brown, G. D., & Oleson, J. J. (2022) Incorporating infectious duration-dependent transmission into Bayesian epidemic models. Biometrical Journal, 1-19. DOI: 10.1002/bimj.202100401