Skip to contents

step_training_window creates a specification of a recipe step that limits the size of the training window to the n_recent most recent observations in time_value per group, where the groups are formed based on the remaining epi_keys.

Usage

step_training_window(
  recipe,
  role = NA,
  n_recent = 50,
  epi_keys = NULL,
  id = rand_id("training_window")
)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

role

For model terms created by this step, what analysis role should they be assigned? lag is default a predictor while ahead is an outcome.

n_recent

An integer value that represents the number of most recent observations that are to be kept in the training window per key The default value is 50.

epi_keys

An optional character vector for specifying "key" variables to group on. The default, NULL, ensures that every key combination is limited.

id

A unique identifier for the step

Value

An updated version of recipe with the new step added to the sequence of any existing operations.

Details

Note that step_epi_lead() and step_epi_lag() should come after any filtering step.

Examples

tib <- tibble(
  x = 1:10,
  y = 1:10,
  time_value = rep(seq(as.Date("2020-01-01"), by = 1, length.out = 5), 2),
  geo_value = rep(c("ca", "hi"), each = 5)
) %>%
  as_epi_df()

epi_recipe(y ~ x, data = tib) %>%
  step_training_window(n_recent = 3) %>%
  prep(tib) %>%
  bake(new_data = NULL)
#> An `epi_df` object, 6 x 4 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-12-02 16:11:19.603656
#> 
#> # A tibble: 6 × 4
#>   geo_value time_value     x     y
#> * <chr>     <date>     <int> <int>
#> 1 ca        2020-01-03     3     3
#> 2 ca        2020-01-04     4     4
#> 3 ca        2020-01-05     5     5
#> 4 hi        2020-01-03     8     8
#> 5 hi        2020-01-04     9     9
#> 6 hi        2020-01-05    10    10

epi_recipe(y ~ x, data = tib) %>%
  step_epi_naomit() %>%
  step_training_window(n_recent = 3) %>%
  prep(tib) %>%
  bake(new_data = NULL)
#> An `epi_df` object, 6 x 4 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-12-02 16:11:19.603656
#> 
#> # A tibble: 6 × 4
#>   geo_value time_value     x     y
#> * <chr>     <date>     <int> <int>
#> 1 ca        2020-01-03     3     3
#> 2 ca        2020-01-04     4     4
#> 3 ca        2020-01-05     5     5
#> 4 hi        2020-01-03     8     8
#> 5 hi        2020-01-04     9     9
#> 6 hi        2020-01-05    10    10