# install.packages("pak")
# Install our packages from GitHub:
::pkg_install("cmu-delphi/epidatr")
pak::pkg_install("cmu-delphi/epiprocess")
pak::pkg_install("cmu-delphi/epipredict")
pak::pkg_install("cmu-delphi/epidatasets")
pak# Other model-fitting packages we use in this book (via epipredict):
::pkg_install("poissonreg")
pak::pkg_install("ranger")
pak::pkg_install("xgboost")
pak# Other data processing, model evaluation, example data, and other packages we
# use in this book:
::pkg_install("RcppRoll")
pak::pkg_install("tidyverse")
pak::pkg_install("tidymodels")
pak::pkg_install("broom")
pak::pkg_install("performance")
pak::pkg_install("modeldata")
pak::pkg_install("see")
pak::pkg_install("sessioninfo") pak
Introduction to Epidemiological Forecasting
Delphi Tools, Data, and Lessons
Preface
This book is still under construction and may not yet be fully self-contained or reproducible. But it hopefully will be!
This book describes some of the functionality of the {epiprocess}
and {epipredict}
R packages, with an eye toward creating various types of signal processing and forecast creation for epidemiological data. The goal is to be able to load, inspect, process, and forecast — using simple baselines to more elaborate customizations.
Installation
The following commands install the latest versions of the packages we use in this book:
Much of the data used for illustration can be loaded directly from Delphi’s Epidata API which is built and maintained by the Carnegie Mellon University Delphi research group. We have tried to provide most of the data used in these examples in a separate package, {epidatasets}
, but it can also be accessed using {epidatr}
, an R interface to the API and the successor to {covidcast}
. These are also available from GitHub:
::pkg_install("cmu-delphi/epidatasets")
pak::pkg_install("cmu-delphi/epidatr") pak
Encountering installation issues? Click here to show some potential solutions.
Linux installation issues: compilation errors or slowness
If you are using Linux and encounter any compilation errors above, or if compilation is taking very long, you might try using the RStudio (now called Posit) Package Manager to install binaries. You can try running this command
options(
repos = c(
# contains binaries for Linux:
RSPM = "https://packagemanager.rstudio.com/all/latest",
# backup CRAN mirror of your choice:
CRAN = "https://cran.rstudio.com/"
) )
Reproducibility
The above commands will give you the current versions of the packages used in this book. If you’re having trouble reproducing some of the results, it may be due to package updates that took place after the book was last updated. To match the versions we used to generate this book, you can use the steps below.
First: set up and store a GitHub PAT
If you don’t already have a GitHub PAT, you can use the following helper functions to create one:
# Run this once:
install.packages("usethis")
#> The following package(s) will be installed:
#> - usethis [2.2.3]
#> These packages will be installed into "~/.cache/R/renv/library/delphi-tooling-book-a509243d/R-4.3/x86_64-pc-linux-gnu".
#>
#> # Installing packages -------------------------------------------------------
#> - Installing usethis ... OK [linked from cache]
#> Successfully installed 1 package in 8.9 milliseconds.
::create_github_token(
usethisscopes = "public_repo",
description = "For public repo access"
)
This will open a web browser window allowing you to describe and customize settings of the PAT. Scroll to the bottom and click “Generate token”. You’ll see a screen that has ghp_<lots of letters and numbers>
with a green background; you can click the two-squares (“copy”) icon to copy this ghp_......
string to the clipboard.
Either A: Download and use the renv.lock
# Run this once:
install.packages(c("renv", "gitcreds"))
download.file("https://raw.githubusercontent.com/cmu-delphi/delphi-tooling-book/main/renv.lock", "delphi-tooling-book.renv.lock")
# Run this in a fresh session each time you'd like to use this set of versions.
# Warning: don't save your GitHub PAT in a file you might share with others;
# look into `gitcreds::gitcreds_set()` or `usethis::edit_r_environ()` instead.
Sys.setenv("GITHUB_PAT" = "ghp_............")
::use(lockfile = "delphi-tooling-book.renv.lock")
renv# If you get 401 errors, you may need to regenerate your GitHub PAT or check if
# `gitcreds::gitcreds_get()` is detecting an old PAT you have saved somewhere.
Or B: Download the book and use its .Rprofile
- Download the book here and unzip it.
- One-time setup: launch R inside the delphi-tooling-book directory (to use its
.Rprofile
file) and run
# Warning: don't save your GitHub PAT in a file you might share with others;
# look into `gitcreds::gitcreds_set()` or `usethis::edit_r_environ()` instead.
Sys.setenv("GITHUB_PAT" = "ghp_............")
::restore() # downloads the appropriate package versions renv
- To use this set of versions: launch R inside the delphi-tooling-book directory.
Other issues
Please let us know! You can file an issue with the book here, or with one of the individual packages at their own issue pages: epidatr, epiprocess, epipredict.
Documentation
You can view the complete documentation for these packages at
Attribution
This document contains a number of datasets that are a modified part of the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University as republished in the COVIDcast Epidata API. These data are licensed under the terms of the Creative Commons Attribution 4.0 International license by the Johns Hopkins University on behalf of its Center for Systems Science in Engineering. Copyright Johns Hopkins University 2020.
From the COVIDcast Epidata API: These signals are taken directly from the JHU CSSE COVID-19 GitHub repository without changes.
Quick-start example
These packages come with some built-in historical data for illustration, but up-to-date versions could be downloaded with the {epidatr}
or {covidcast}
packages and processed using {epiprocess}
.1
library(epipredict)
<- case_death_rate_subset
jhu jhu
#> An `epi_df` object, 20,496 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2022-05-31 12:08:25.791826
#>
#> # A tibble: 20,496 × 4
#> geo_value time_value case_rate death_rate
#> * <chr> <date> <dbl> <dbl>
#> 1 ak 2020-12-31 35.9 0.158
#> 2 al 2020-12-31 65.1 0.438
#> 3 ar 2020-12-31 66.0 1.27
#> 4 as 2020-12-31 0 0
#> 5 az 2020-12-31 76.8 1.10
#> 6 ca 2020-12-31 96.0 0.751
#> # ℹ 20,490 more rows
To create and train a simple auto-regressive forecaster to predict the death rate two weeks into the future using past (lagged) deaths and cases, we could use the following function.
<- arx_forecaster(
two_week_ahead
jhu,outcome = "death_rate",
predictors = c("case_rate", "death_rate"),
args_list = arx_args_list(
lags = list(case_rate = c(0, 1, 2, 3, 7, 14), death_rate = c(0, 7, 14)),
ahead = 14
) )
In this case, we have used a number of different lags for the case rate, while only using 3 weekly lags for the death rate (as predictors). The result is both a fitted model object which could be used any time in the future to create different forecasts, as well as a set of predicted values (and prediction intervals) for each location 14 days after the last available time value in the data.
$epi_workflow two_week_ahead
#>
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#>
#> Coefficients:
#> (Intercept) lag_0_case_rate lag_1_case_rate lag_2_case_rate
#> -0.0073358 0.0030365 0.0012467 0.0009536
#> lag_3_case_rate lag_7_case_rate lag_14_case_rate lag_0_death_rate
#> 0.0011425 0.0012481 0.0003041 0.1351769
#> lag_7_death_rate lag_14_death_rate
#> 0.1471127 0.1062473
The fitted model here involved preprocessing the data to appropriately generate lagged predictors, estimating a linear model with stats::lm()
and then postprocessing the results to be meaningful for epidemiological tasks. We can also examine the predictions.
$predictions two_week_ahead
#> # A tibble: 56 × 5
#> geo_value .pred .pred_distn forecast_date target_date
#> <chr> <dbl> <dist> <date> <date>
#> 1 ak 0.449 quantiles(0.45)[2] 2021-12-31 2022-01-14
#> 2 al 0.574 quantiles(0.57)[2] 2021-12-31 2022-01-14
#> 3 ar 0.673 quantiles(0.67)[2] 2021-12-31 2022-01-14
#> 4 as 0 quantiles(0.12)[2] 2021-12-31 2022-01-14
#> 5 az 0.679 quantiles(0.68)[2] 2021-12-31 2022-01-14
#> 6 ca 0.575 quantiles(0.57)[2] 2021-12-31 2022-01-14
#> # ℹ 50 more rows
The results above show a distributional forecast produced using data through the end of 2021 for the 14th of January 2022. A prediction for the death rate per 100K inhabitants is available for every state (geo_value
) along with a 90% predictive interval. The figure below displays the forecast for a small handful of states. The vertical black line is the forecast date. The forecast doesn’t appear to be particularly good, but our choices above were intended to be illustrative of the functionality rather than optimized for accuracy.
Code
<- c("ca", "co", "ny", "pa")
samp_geos
<- jhu %>%
hist filter(
%in% samp_geos,
geo_value >= max(time_value) - 90L
time_value
)<- two_week_ahead$predictions %>%
preds filter(geo_value %in% samp_geos) %>%
pivot_quantiles_wider(.pred_distn)
ggplot(hist, aes(color = geo_value)) +
geom_line(aes(time_value, death_rate)) +
theme_bw() +
geom_errorbar(data = preds, aes(x = target_date, ymin = `0.05`, ymax = `0.95`)) +
geom_point(data = preds, aes(target_date, .pred)) +
geom_vline(data = preds, aes(xintercept = forecast_date)) +
scale_colour_viridis_d(name = "") +
scale_x_date(date_labels = "%b %Y") +
theme(legend.position = "bottom") +
labs(x = "", y = "Incident deaths per 100K\n inhabitants")
Contents
The remainder of this book examines this software in more detail, illustrating some of the flexibility that is available.
Session Information.
See also Installation.
::session_info() sessioninfo
#> ─ Session info ────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.3 (2024-02-29)
#> os Ubuntu 20.04.6 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/Los_Angeles
#> date 2024-05-01
#> pandoc 2.5 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ────────────────────────────────────────────────────────────────
#> ! package * version date (UTC) lib source
#> P anytime 0.3.9 2020-08-27 [?] RSPM (R 4.3.0)
#> askpass 1.2.0 2023-09-03 [1] RSPM
#> backports 1.4.1 2021-12-13 [1] RSPM
#> cachem 1.0.8 2023-05-01 [1] RSPM
#> P checkmate 2.3.1 2023-12-04 [?] RSPM (R 4.3.0)
#> P class 7.3-22 2023-05-03 [?] CRAN (R 4.3.1)
#> cli 3.6.2 2023-12-11 [1] RSPM
#> P codetools 0.2-20 2024-03-31 [?] RSPM (R 4.3.0)
#> colorspace 2.1-0 2023-01-23 [1] RSPM
#> crayon 1.5.2 2022-09-29 [1] RSPM
#> data.table 1.15.4 2024-03-30 [1] RSPM
#> digest 0.6.35 2024-03-11 [1] RSPM
#> P distributional 0.4.0 2024-02-07 [?] RSPM
#> dplyr * 1.1.4 2023-11-17 [1] RSPM
#> ellipsis 0.3.2 2021-04-29 [1] RSPM
#> P epidatasets * 0.0.1 2024-05-01 [?] Github (cmu-delphi/epidatasets@ca86f03)
#> P epidatr * 1.1.5 2024-04-03 [?] Github (cmu-delphi/epidatr@626c30b)
#> P epipredict * 0.0.14 2024-05-01 [?] Github (cmu-delphi/epipredict@5e50a5a)
#> P epiprocess * 0.7.7 2024-05-01 [?] Github (cmu-delphi/epiprocess@e61e11a)
#> evaluate 0.23 2023-11-01 [1] RSPM
#> fansi 1.0.6 2023-12-08 [1] RSPM
#> farver 2.1.1 2022-07-06 [1] RSPM
#> fastmap 1.1.1 2023-02-24 [1] RSPM
#> forcats * 1.0.0 2023-01-29 [1] RSPM
#> fs 1.6.4 2024-04-25 [1] RSPM
#> P future 1.33.2 2024-03-26 [?] RSPM (R 4.3.0)
#> P future.apply 1.11.2 2024-03-28 [?] RSPM (R 4.3.0)
#> generics 0.1.3 2022-07-05 [1] RSPM
#> ggplot2 * 3.5.1 2024-04-23 [1] RSPM
#> P globals 0.16.3 2024-03-08 [?] RSPM
#> glue 1.7.0 2024-01-09 [1] RSPM
#> P gower 1.0.1 2022-12-22 [?] RSPM (R 4.3.0)
#> gtable 0.3.5 2024-04-22 [1] RSPM
#> P hardhat 1.3.1 2024-02-02 [?] RSPM
#> hms 1.1.3 2023-03-21 [1] RSPM
#> htmltools 0.5.8.1 2024-04-04 [1] RSPM
#> htmlwidgets 1.6.4 2023-12-06 [1] RSPM
#> httr 1.4.7 2023-08-15 [1] RSPM
#> P ipred 0.9-14 2023-03-09 [?] RSPM (R 4.3.0)
#> jsonlite 1.8.8 2023-12-04 [1] RSPM
#> knitr 1.46 2024-04-06 [1] RSPM
#> labeling 0.4.3 2023-08-29 [1] RSPM
#> P lattice 0.22-6 2024-03-20 [?] RSPM
#> P lava 1.8.0 2024-03-05 [?] RSPM
#> lifecycle 1.0.4 2023-11-07 [1] RSPM
#> P listenv 0.9.1 2024-01-29 [?] RSPM
#> lubridate * 1.9.3 2023-09-27 [1] RSPM
#> magrittr 2.0.3 2022-03-30 [1] RSPM
#> P MASS 7.3-60 2023-05-04 [?] CRAN (R 4.3.1)
#> P Matrix 1.6-5 2024-01-11 [?] CRAN (R 4.3.3)
#> P MatrixModels 0.5-3 2023-11-06 [?] RSPM
#> P MMWRweek 0.1.3 2020-04-22 [?] RSPM (R 4.3.0)
#> munsell 0.5.1 2024-04-01 [1] RSPM
#> P nnet 7.3-19 2023-05-03 [?] CRAN (R 4.3.1)
#> openssl 2.1.2 2024-04-21 [1] RSPM
#> P parallelly 1.37.1 2024-02-29 [?] RSPM
#> P parsnip * 1.2.1 2024-03-22 [?] RSPM (R 4.3.0)
#> pillar 1.9.0 2023-03-22 [1] RSPM
#> pkgconfig 2.0.3 2019-09-22 [1] RSPM
#> P prodlim 2023.08.28 2023-08-28 [?] RSPM (R 4.3.0)
#> purrr * 1.0.2 2023-08-10 [1] RSPM
#> P quantreg 5.97 2023-08-19 [?] RSPM (R 4.3.0)
#> R.cache 0.16.0 2022-07-21 [1] RSPM
#> R.methodsS3 1.8.2 2022-06-13 [1] RSPM
#> R.oo 1.26.0 2024-01-24 [1] RSPM
#> R.utils 2.12.3 2023-11-18 [1] RSPM
#> R6 2.5.1 2021-08-19 [1] RSPM
#> Rcpp 1.0.12 2024-01-09 [1] RSPM
#> readr * 2.1.5 2024-01-10 [1] RSPM
#> P recipes 1.0.10 2024-02-18 [?] RSPM
#> P renv 1.0.7 2024-04-11 [?] RSPM
#> rlang 1.1.3 2024-01-10 [1] RSPM
#> rmarkdown 2.26 2024-03-05 [1] RSPM
#> P rpart 4.1.23 2023-12-05 [?] CRAN (R 4.3.2)
#> scales 1.3.0 2023-11-28 [1] RSPM
#> sessioninfo 1.2.2 2021-12-06 [1] RSPM
#> P slider 0.3.1 2023-10-12 [?] RSPM (R 4.3.0)
#> P smoothqr 0.1.1 2023-09-08 [?] Github (dajmcdon/smoothqr@3def5f0)
#> P SparseM 1.81 2021-02-18 [?] RSPM (R 4.3.0)
#> stringi 1.8.3 2023-12-11 [1] RSPM
#> stringr * 1.5.1 2023-11-14 [1] RSPM
#> styler 1.10.3 2024-04-07 [1] RSPM
#> P survival 3.6-4 2024-04-24 [?] RSPM
#> tibble * 3.2.1 2023-03-20 [1] RSPM
#> tidyr * 1.3.1 2024-01-24 [1] RSPM
#> tidyselect 1.2.1 2024-03-11 [1] RSPM
#> tidyverse * 2.0.0 2023-02-22 [1] RSPM
#> timechange 0.3.0 2024-01-18 [1] RSPM
#> P timeDate 4032.109 2023-12-14 [?] RSPM
#> P tsibble 1.1.4 2024-01-29 [?] RSPM
#> tzdb 0.4.0 2023-05-12 [1] RSPM
#> P usethis 2.2.3 2024-02-19 [?] RSPM (R 4.3.0)
#> utf8 1.2.4 2023-10-22 [1] RSPM
#> vctrs 0.6.5 2023-12-01 [1] RSPM
#> viridisLite 0.4.2 2023-05-02 [1] RSPM
#> P warp 0.2.1 2023-11-02 [?] RSPM
#> withr 3.0.0 2024-01-16 [1] RSPM
#> P workflows 1.1.4 2024-02-19 [?] RSPM
#> xfun 0.43 2024-03-25 [1] RSPM
#> xml2 1.3.6 2023-12-04 [1] RSPM
#> yaml 2.3.8 2023-12-11 [1] RSPM
#>
#> [1] /home/dshemeto/.cache/R/renv/library/delphi-tooling-book-a509243d/R-4.3/x86_64-pc-linux-gnu
#> [2] /home/dshemeto/.cache/R/renv/sandbox/R-4.3/x86_64-pc-linux-gnu/9a444a72
#>
#> P ── Loaded and on-disk path mismatch.
#>
#> ───────────────────────────────────────────────────────────────────────────
COVIDcast data and other epidemiological signals for non-Covid related illnesses are available with
{epidatr}
, which interfaces directly to Delphi’s Epidata API.↩︎