Skip to contents

A recipe is a description of the steps to be applied to a data set in order to prepare it for data analysis. This is a loose wrapper around recipes::recipe() to properly handle the additional columns present in an epi_df

Usage

epi_recipe(x, ...)

# Default S3 method
epi_recipe(x, ...)

# S3 method for class 'epi_df'
epi_recipe(x, formula = NULL, ..., vars = NULL, roles = NULL)

# S3 method for class 'formula'
epi_recipe(formula, data, ...)

Arguments

x, data

A data frame, tibble, or epi_df of the template data set (see below). This is always coerced to the first row to avoid memory issues

...

Further arguments passed to or from other methods (not currently used).

formula

A model formula. No in-line functions should be used here (e.g. log(x), x:y, etc.) and minus signs are not allowed. These types of transformations should be enacted using step functions in this package. Dots are allowed as are simple multivariate outcome terms (i.e. no need for cbind; see Examples).

vars

A character string of column names corresponding to variables that will be used in any context (see below)

roles

A character string (the same length of vars) that describes a single role that the variable will take. This value could be anything but common roles are "outcome", "predictor", "time_value", and "geo_value"

Value

An object of class recipe with sub-objects:

var_info

A tibble containing information about the original data set columns

term_info

A tibble that contains the current set of terms in the data set. This initially defaults to the same data contained in var_info.

steps

A list of step or check objects that define the sequence of preprocessing operations that will be applied to data. The default value is NULL

template

A tibble of the data. This is initialized to be the same as the data given in the data argument but can be different after the recipe is trained.

Examples

library(dplyr)
library(recipes)
jhu <- covid_case_death_rates %>%
  filter(time_value > "2021-08-01") %>%
  arrange(geo_value, time_value)

r <- epi_recipe(jhu) %>%
  step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
  step_epi_ahead(death_rate, ahead = 7) %>%
  step_epi_lag(case_rate, lag = c(0, 7, 14)) %>%
  step_naomit(all_predictors()) %>%
  # below, `skip` means we don't do this at predict time
  step_naomit(all_outcomes(), skip = TRUE)

r
#> 
#> ── Epi Recipe ──────────────────────────────────────────────────────────────────
#> 
#> ── Inputs 
#> Number of variables by role
#> raw:        2
#> geo_value:  1
#> time_value: 1
#> 
#> ── Operations 
#> 1. Lagging: death_rate by 0, 7, 14
#> 2. Leading: death_rate by 7
#> 3. Lagging: case_rate by 0, 7, 14
#> 4.  Removing rows with NA values in: all_predictors()
#> 5.  Removing rows with NA values in: all_outcomes()