A recipe is a description of the steps to be applied to a data set in
order to prepare it for data analysis. This is a loose wrapper
around recipes::recipe()
to properly handle the additional
columns present in an epi_df
Usage
epi_recipe(x, ...)
# Default S3 method
epi_recipe(x, ...)
# S3 method for class 'epi_df'
epi_recipe(x, formula = NULL, ..., vars = NULL, roles = NULL)
# S3 method for class 'formula'
epi_recipe(formula, data, ...)
Arguments
- x, data
A data frame, tibble, or epi_df of the template data set (see below). This is always coerced to the first row to avoid memory issues
- ...
Further arguments passed to or from other methods (not currently used).
- formula
A model formula. No in-line functions should be used here (e.g.
log(x)
,x:y
, etc.) and minus signs are not allowed. These types of transformations should be enacted usingstep
functions in this package. Dots are allowed as are simple multivariate outcome terms (i.e. no need forcbind
; see Examples).- vars
A character string of column names corresponding to variables that will be used in any context (see below)
- roles
A character string (the same length of
vars
) that describes a single role that the variable will take. This value could be anything but common roles are"outcome"
,"predictor"
,"time_value"
, and"geo_value"
Value
An object of class recipe
with sub-objects:
- var_info
A tibble containing information about the original data set columns
- term_info
A tibble that contains the current set of terms in the data set. This initially defaults to the same data contained in
var_info
.- steps
A list of
step
orcheck
objects that define the sequence of preprocessing operations that will be applied to data. The default value isNULL
- template
A tibble of the data. This is initialized to be the same as the data given in the
data
argument but can be different after the recipe is trained.
Examples
library(dplyr)
library(recipes)
jhu <- covid_case_death_rates %>%
filter(time_value > "2021-08-01") %>%
arrange(geo_value, time_value)
r <- epi_recipe(jhu) %>%
step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
step_epi_ahead(death_rate, ahead = 7) %>%
step_epi_lag(case_rate, lag = c(0, 7, 14)) %>%
step_naomit(all_predictors()) %>%
# below, `skip` means we don't do this at predict time
step_naomit(all_outcomes(), skip = TRUE)
r
#>
#> ── Epi Recipe ──────────────────────────────────────────────────────────────────
#>
#> ── Inputs
#> Number of variables by role
#> raw: 2
#> geo_value: 1
#> time_value: 1
#>
#> ── Operations
#> 1. Lagging: death_rate by 0, 7, 14
#> 2. Leading: death_rate by 7
#> 3. Lagging: case_rate by 0, 7, 14
#> 4. • Removing rows with NA values in: all_predictors()
#> 5. • Removing rows with NA values in: all_outcomes()