A recipe is a description of the steps to be applied to a data set in
order to prepare it for data analysis. This is a loose wrapper
around recipes::recipe()
to properly handle the additional
columns present in an epi_df
Usage
epi_recipe(x, ...)
# S3 method for class 'epi_df'
epi_recipe(
x,
reference_date = NULL,
formula = NULL,
...,
vars = NULL,
roles = NULL
)
# S3 method for class 'formula'
epi_recipe(formula, data, reference_date = NULL, ...)
Arguments
- x, data
An epi_df of the template data set (see below).
- ...
Further arguments passed to or from other methods (not currently used).
- reference_date
Either a date of the same class as the
time_value
column in theepi_df
orNULL
. If a date, it gives the date to which all operations are relative. Typically, in real-time tasks this is the date that the model is created (and presumably trained). In forecasting, this is often the same as the most recent date of data availability, but when data is "latent" (reported after the date to which it corresponds), or if performing a nowcast, thereference_date
may be later than this. Settingreference_date
to a value BEFORE the most recent data is not a true "forecast", because future data is being used to create the model, but this may be reasonable in model building, nowcasting (predicting finalized values from preliminary data), or if producing a backcast. IfNULL
, it will be set to theas_of
date of theepi_df
.- formula
A model formula. No in-line functions should be used here (e.g.
log(x)
,x:y
, etc.) and minus signs are not allowed. These types of transformations should be enacted usingstep
functions in this package. Dots are allowed as are simple multivariate outcome terms (i.e. no need forcbind
; see Examples).- vars
A character string of column names corresponding to variables that will be used in any context (see below)
- roles
A character string (the same length of
vars
) that describes a single role that the variable will take. This value could be anything but common roles are"outcome"
,"predictor"
,"time_value"
, and"geo_value"
Value
An object of class recipe
with sub-objects:
- var_info
A tibble containing information about the original data set columns.
- term_info
A tibble that contains the current set of terms in the data set. This initially defaults to the same data contained in
var_info
.- steps
A list of
step
orcheck
objects that define the sequence of preprocessing operations that will be applied to data. The default value isNULL
.- template
A tibble of the data. This is initialized to be the same as the data given in the
data
argument but can be different after the recipe is trained.
Examples
library(dplyr)
library(recipes)
jhu <- covid_case_death_rates %>%
filter(time_value > "2021-08-01") %>%
arrange(geo_value, time_value)
r <- epi_recipe(jhu) %>%
step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
step_epi_ahead(death_rate, ahead = 7) %>%
step_epi_lag(case_rate, lag = c(0, 7, 14)) %>%
step_naomit(all_predictors()) %>%
# below, `skip` means we don't do this at predict time
step_naomit(all_outcomes(), skip = TRUE)
r
#>
#> ── Epi Recipe ──────────────────────────────────────────────────────────────────
#>
#> ── Inputs
#> Number of variables by role
#> raw: 2
#> geo_value: 1
#> time_value: 1
#>
#> ── Operations
#> 1. Lagging: death_rate by 0, 7, 14
#> 2. Leading: death_rate by 7
#> 3. Lagging: case_rate by 0, 7, 14
#> 4. • Removing rows with NA values in: all_predictors()
#> 5. • Removing rows with NA values in: all_outcomes()