{epidatr} and {epiprocess}CDC and MIDAS Forecasting Meeting — 21 November 2023
Slides are online at https://cmu-delphi.github.io/midas-cdc-2023-demo
{epidatr} package is a new R front-end for the Delphi Epidata API{epidatr} package is a complete rewrite of the {covidcast} package and delphi_epidata.R script, with a focus on speed, reliability, and ease of usecovidcast} package and delphi_epidata.R script are deprecated and will no longer be updated{epidatr} requires a (free) API key for full functionality
save_api_key() for help storing the key.pvt_) that require a separate key to be passed as an argument. These endpoints require data use agreements to access.):::
# A tibble: 152 × 15
signal source geo_type time_type geo_value time_value issue lag value
<chr> <chr> <fct> <fct> <chr> <date> <date> <int> <dbl>
1 confirm… hhs nation day us 2023-01-01 2023-10-03 275 6078
2 confirm… hhs nation day us 2023-01-02 2023-11-17 319 6727
3 confirm… hhs nation day us 2023-01-03 2023-11-17 318 6932
4 confirm… hhs nation day us 2023-01-04 2023-11-11 311 6693
5 confirm… hhs nation day us 2023-01-05 2023-11-17 316 6609
# ℹ 147 more rows
# ℹ 6 more variables: direction <dbl>, missing_value <int>,
# missing_stderr <int>, missing_sample_size <int>, stderr <dbl>,
# sample_size <dbl>
# A tibble: 150 × 15
signal source geo_type time_type geo_value time_value issue lag value
<chr> <chr> <fct> <fct> <chr> <date> <date> <int> <dbl>
1 confirm… hhs nation day us 2023-01-01 2023-05-19 138 6058
2 confirm… hhs nation day us 2023-01-02 2023-06-01 150 6713
3 confirm… hhs nation day us 2023-01-03 2023-06-01 149 6893
4 confirm… hhs nation day us 2023-01-04 2023-06-01 148 6657
5 confirm… hhs nation day us 2023-01-05 2023-06-01 147 6587
# ℹ 145 more rows
# ℹ 6 more variables: direction <dbl>, missing_value <int>,
# missing_stderr <int>, missing_sample_size <int>, stderr <dbl>,
# sample_size <dbl>
chng from Change Healthcare
doctor-visits from health system partners
fb-survey from (parts of) the COVID-19 Trends and Impact Survey in collaboration with Facebook
See also covidcast_epidata() or the COVIDcast web site for a listing of other COVIDcast data available.
Using avail_endpoints() you can find a listing of our other endpoints that serve a wide variety of public health data. Here we’ve filtered to non-COVID-specific data.
Endpoint Description
1 pub_delphi() Delphi's ILINet forecasts
2 pub_dengue_nowcast() Delphi's PAHO Dengue nowcast
3 pub_ecdc_ili() ECDC ILI data
4 pub_flusurv() FluSurv hospitalization data
5 pub_fluview() FluView ILINet data
6 pub_fluview_clinical() FluView virological data from clinical labs
7 pub_fluview_meta() FluView metadata
8 pub_gft() Google Flu Trends data
9 pub_kcdc_ili() KCDC ILI data
10 pub_meta() API metadata
11 pub_nidss_dengue() NIDSS dengue data
12 pub_nidss_flu() NIDSS flu data
13 pub_nowcast() Delphi's ILI nowcast
14 pub_paho_dengue() PAHO Dengue data
15 pub_wiki() Wikipedia access data
{epiprocess} package helps work with epidemic datasetsepi_df: a snapshot of epidata in timegeo_value and time_valuegeo_type, time_type, other_keys, as_ofepi_df: a snapshot of epidata in timeProduce an epi_df from epidatr output like so:
tbl <- pub_covidcast(
source = "hhs",
signals = "confirmed_admissions_covid_1d",
geo_type = "state",
time_type = "day",
geo_values = "ca,fl,ny,tx",
time_values = "*"
)
epi_df <- tbl %>%
dplyr::select(geo_value, time_value, admissions = value) %>%
# Add NAs to fill gaps, cover same time range for each geo:
tidyr::complete(geo_value, time_value = tidyr::full_seq(time_value, period = 1L)) %>%
as_epi_df(
geo_type = "state",
time_type = "day",
as_of = max(tbl$issue)
)epi_df: a snapshot of epidata in timeAn `epi_df` object, 5,644 x 3 with metadata:
* geo_type = state
* time_type = day
* as_of = 2023-11-19
# A tibble: 5,644 × 3
geo_value time_value admissions
* <chr> <date> <dbl>
1 ca 2019-12-31 NA
2 ca 2020-01-01 NA
3 ca 2020-01-02 NA
4 ca 2020-01-03 NA
5 ca 2020-01-04 NA
6 ca 2020-01-05 NA
7 ca 2020-01-06 NA
8 ca 2020-01-07 NA
9 ca 2020-01-08 NA
10 ca 2020-01-09 NA
# ℹ 5,634 more rows
epi_archive: a collection of historical epidatageo_value, time_value, versionepi_archive: a collection of historical epidatatbl <- pub_covidcast(
source = "hhs",
signals = "confirmed_admissions_covid_1d",
geo_type = "state",
time_type = "day",
geo_values = "ca,fl,ny,tx",
time_values = "*", # "*" = all time values
issues = epirange("1234-01-01", "2023-06-01") # start of range must be before data set start
)
epi_archive <- tbl %>%
select(
geo_value, time_value,
version = issue, admissions = value
) %>%
# don't try to `complete` here; `complete` after `epix_as_of` or inside `epix_slide` computations
as_epi_archive(compactify = TRUE)epi_archive: a collection of historical epidataAn `epi_archive` object, with metadata:
* geo_type = state
* time_type = day
----------
* min time value = 2019-12-31
* max time value = 2023-05-30
* first version with update = 2020-11-16
* last version with update = 2023-06-01
* No clobberable versions
* versions end = 2023-06-01
----------
Data archive (stored in DT field): 24625 x 4
Columns in DT: geo_value, time_value, version, admissions
----------
Public R6 methods: initialize, print, as_of, fill_through_version,
truncate_versions_after, merge, group_by, slide, clone
epi_archive: a collection of historical epidata geo_value time_value version admissions
1: ca 2020-02-03 2020-11-16 NA
2: ca 2020-02-04 2020-11-16 NA
3: ca 2020-02-05 2020-11-16 NA
4: ca 2020-02-06 2020-11-16 NA
5: ca 2020-02-07 2020-11-16 NA
---
24621: tx 2023-05-28 2023-05-31 98
24622: tx 2023-05-28 2023-06-01 85
24623: tx 2023-05-29 2023-05-31 19
24624: tx 2023-05-29 2023-06-01 96
24625: tx 2023-05-30 2023-06-01 96
epi_slide use casesepiprocess::growth_rate, epipredict::step_lag_difference)epix_as_of, epix_slide use casesepi_df and epi_archive utilitiesepi_df
group_by() - standard grouped operationsepi_slide() - perform (grouped) time-window computations on an epi_dfepi_cor() - compute correlations between variables in an epi_dfepi_archive
epix_merge() - merge/join two epi_archive objectsepix_as_of() - generate a snapshot epi_df from an epi_archive objectgroup_by() - standard grouped operationsepix_slide() - perform (grouped) time-windowed computations on several versions
Packages for forecasting — cmu-delphi.github.io/midas-cdc-2023-demo