index

install.packages("epidatr")
pak::pkg_install("epidatr")
renv::install("epidatr")
pak::pkg_install("cmu-delphi/epidatr@dev")
remotes::install_github("cmu-delphi/epidatr", ref = "dev")
renv::install("cmu-delphi/epidatr@dev")
epidata <- pub_covidcast(
  source = "hhs",
  signals = "confirmed_admissions_covid_1d",
  geo_type = "nation",
  time_type = "day",
  geo_values = "us",
  time_values = epirange("2023-01-01", "2023-06-01")
  # (by default, fetches the current version)
)
# `epidata` looks like:
# A tibble: 152 × 15
  signal   source geo_type time_type geo_value time_value issue        lag value
  <chr>    <chr>  <fct>    <fct>     <chr>     <date>     <date>     <int> <dbl>
1 confirm… hhs    nation   day       us        2023-01-01 2023-10-03   275  6078
2 confirm… hhs    nation   day       us        2023-01-02 2023-11-17   319  6727
3 confirm… hhs    nation   day       us        2023-01-03 2023-11-17   318  6932
4 confirm… hhs    nation   day       us        2023-01-04 2023-11-11   311  6693
5 confirm… hhs    nation   day       us        2023-01-05 2023-11-17   316  6609
# ℹ 147 more rows
# ℹ 6 more variables: direction <dbl>, missing_value <int>,
#   missing_stderr <int>, missing_sample_size <int>, stderr <dbl>,
#   sample_size <dbl>
epidata <- pub_covidcast(
  source = "hhs",
  signals = "confirmed_admissions_covid_1d",
  geo_type = "nation",
  time_type = "day",
  geo_values = "us",
  time_values = epirange("2023-01-01", "2023-06-01"),
  as_of = "2023-06-01"
)
# `epidata` looks like:
# A tibble: 150 × 15
  signal   source geo_type time_type geo_value time_value issue        lag value
  <chr>    <chr>  <fct>    <fct>     <chr>     <date>     <date>     <int> <dbl>
1 confirm… hhs    nation   day       us        2023-01-01 2023-05-19   138  6058
2 confirm… hhs    nation   day       us        2023-01-02 2023-06-01   150  6713
3 confirm… hhs    nation   day       us        2023-01-03 2023-06-01   149  6893
4 confirm… hhs    nation   day       us        2023-01-04 2023-06-01   148  6657
5 confirm… hhs    nation   day       us        2023-01-05 2023-06-01   147  6587
# ℹ 145 more rows
# ℹ 6 more variables: direction <dbl>, missing_value <int>,
#   missing_stderr <int>, missing_sample_size <int>, stderr <dbl>,
#   sample_size <dbl>
   Endpoint               Description                                
 1 pub_delphi()           Delphi's ILINet forecasts                  
 2 pub_dengue_nowcast()   Delphi's PAHO Dengue nowcast               
 3 pub_ecdc_ili()         ECDC ILI data                              
 4 pub_flusurv()          FluSurv hospitalization data               
 5 pub_fluview()          FluView ILINet data                        
 6 pub_fluview_clinical() FluView virological data from clinical labs
 7 pub_fluview_meta()     FluView metadata                           
 8 pub_gft()              Google Flu Trends data                     
 9 pub_kcdc_ili()         KCDC ILI data                              
10 pub_meta()             API metadata                               
11 pub_nidss_dengue()     NIDSS dengue data                          
12 pub_nidss_flu()        NIDSS flu data                             
13 pub_nowcast()          Delphi's ILI nowcast                       
14 pub_paho_dengue()      PAHO Dengue data                           
15 pub_wiki()             Wikipedia access data                      
pak::pkg_install("cmu-delphi/epiprocess@main")
tbl <- pub_covidcast(
  source = "hhs",
  signals = "confirmed_admissions_covid_1d",
  geo_type = "state",
  time_type = "day",
  geo_values = "ca,fl,ny,tx",
  time_values = "*"
)
epi_df <- tbl %>%
  dplyr::select(geo_value, time_value, admissions = value) %>%
  # Add NAs to fill gaps, cover same time range for each geo:
  tidyr::complete(geo_value, time_value = tidyr::full_seq(time_value, period = 1L)) %>%
  as_epi_df(
    geo_type = "state",
    time_type = "day",
    as_of = max(tbl$issue)
  )
epi_df
An `epi_df` object, 5,644 x 3 with metadata:
* geo_type  = state
* time_type = day
* as_of     = 2023-11-19

# A tibble: 5,644 × 3
   geo_value time_value admissions
 * <chr>     <date>          <dbl>
 1 ca        2019-12-31         NA
 2 ca        2020-01-01         NA
 3 ca        2020-01-02         NA
 4 ca        2020-01-03         NA
 5 ca        2020-01-04         NA
 6 ca        2020-01-05         NA
 7 ca        2020-01-06         NA
 8 ca        2020-01-07         NA
 9 ca        2020-01-08         NA
10 ca        2020-01-09         NA
# ℹ 5,634 more rows
tbl <- pub_covidcast(
  source = "hhs",
  signals = "confirmed_admissions_covid_1d",
  geo_type = "state",
  time_type = "day",
  geo_values = "ca,fl,ny,tx",
  time_values = "*", # "*" = all time values
  issues = epirange("1234-01-01", "2023-06-01") # start of range must be before data set start
)
epi_archive <- tbl %>%
  select(
    geo_value, time_value,
    version = issue, admissions = value
  ) %>%
  # don't try to `complete` here; `complete` after `epix_as_of` or inside `epix_slide` computations
  as_epi_archive(compactify = TRUE)
epi_archive
An `epi_archive` object, with metadata:
* geo_type  = state
* time_type = day
----------
* min time value = 2019-12-31
* max time value = 2023-05-30
* first version with update = 2020-11-16
* last version with update = 2023-06-01
* No clobberable versions
* versions end   = 2023-06-01
----------
Data archive (stored in DT field): 24625 x 4
Columns in DT: geo_value, time_value, version, admissions
----------
Public R6 methods: initialize, print, as_of, fill_through_version, 
                   truncate_versions_after, merge, group_by, slide, clone
epi_archive$DT
       geo_value time_value    version admissions
    1:        ca 2020-02-03 2020-11-16         NA
    2:        ca 2020-02-04 2020-11-16         NA
    3:        ca 2020-02-05 2020-11-16         NA
    4:        ca 2020-02-06 2020-11-16         NA
    5:        ca 2020-02-07 2020-11-16         NA
   ---                                           
24621:        tx 2023-05-28 2023-05-31         98
24622:        tx 2023-05-28 2023-06-01         85
24623:        tx 2023-05-29 2023-05-31         19
24624:        tx 2023-05-29 2023-06-01         96
24625:        tx 2023-05-30 2023-06-01         96

Finding, fetching, and processing epidemiological data with `{epidatr}` and `{epiprocess}`

Delphi Research Group at CMU

Slides: Nat DeFries, Dmitry Shemetov, Logan Brooks, others on Delphi tooling team

The Delphi `{epidatr}` package is a new R front-end for the Delphi Epidata API

Conveniently install in the normal ways

Example: HHS/NHSN hospitalization data

Example: versioned HHS/NHSN hospitalization data

Access other useful data, including Delphi-exclusive sources

Access more than just COVID data!

Consider subscribing to the Delphi API mailing list to be notified of package updates, new data sources, corrections, and other updates

The `{epiprocess}` package helps work with epidemic datasets

`epi_df`: a snapshot of epidata in time

`epi_df`: a snapshot of epidata in time

`epi_df`: a snapshot of epidata in time

`epi_archive`: a collection of historical epidata

`epi_archive`: a collection of historical epidata

`epi_archive`: a collection of historical epidata

`epi_archive`: a collection of historical epidata

Some `epi_slide` use cases

Some `epix_as_of`, `epix_slide` use cases

`epi_df` and `epi_archive` utilities

Resources

Finding, fetching, and processing epidemiological data with {epidatr} and {epiprocess}

Delphi Research Group at CMU

Slides: Nat DeFries, Dmitry Shemetov, Logan Brooks, others on Delphi tooling team

The Delphi {epidatr} package is a new R front-end for the Delphi Epidata API

Conveniently install in the normal ways

Example: HHS/NHSN hospitalization data

Example: versioned HHS/NHSN hospitalization data

Access other useful data, including Delphi-exclusive sources

Access more than just COVID data!

Consider subscribing to the Delphi API mailing list to be notified of package updates, new data sources, corrections, and other updates

The {epiprocess} package helps work with epidemic datasets

epi_df: a snapshot of epidata in time

epi_df: a snapshot of epidata in time

epi_df: a snapshot of epidata in time

epi_archive: a collection of historical epidata

epi_archive: a collection of historical epidata

epi_archive: a collection of historical epidata

epi_archive: a collection of historical epidata

Some epi_slide use cases

Some epix_as_of, epix_slide use cases

epi_df and epi_archive utilities

Resources

Finding, fetching, and processing epidemiological data with `{epidatr}` and `{epiprocess}`

The Delphi `{epidatr}` package is a new R front-end for the Delphi Epidata API

The `{epiprocess}` package helps work with epidemic datasets

`epi_df`: a snapshot of epidata in time

`epi_df`: a snapshot of epidata in time

`epi_df`: a snapshot of epidata in time

`epi_archive`: a collection of historical epidata

`epi_archive`: a collection of historical epidata

`epi_archive`: a collection of historical epidata

`epi_archive`: a collection of historical epidata

Some `epi_slide` use cases

Some `epix_as_of`, `epix_slide` use cases

`epi_df` and `epi_archive` utilities