Limits the size of the training window to the most recent observations
Source:R/step_training_window.R
step_training_window.Rd
step_training_window
creates a specification of a recipe step that
limits the size of the training window to the n_recent
most recent
observations in time_value
per group, where the groups are formed
based on the remaining epi_keys
.
Usage
step_training_window(
recipe,
role = NA,
n_recent = 50,
epi_keys = NULL,
id = rand_id("training_window")
)
Arguments
- recipe
A recipe object. The step will be added to the sequence of operations for this recipe.
- role
For model terms created by this step, what analysis role should they be assigned?
lag
is default a predictor whileahead
is an outcome.- n_recent
An integer value that represents the number of most recent observations that are to be kept in the training window per key The default value is 50.
- epi_keys
An optional character vector for specifying "key" variables to group on. The default,
NULL
, ensures that every key combination is limited.- id
A unique identifier for the step
Value
An updated version of recipe
with the new step added to the
sequence of any existing operations.
Details
Note that step_epi_lead()
and step_epi_lag()
should come
after any filtering step.
Examples
tib <- tibble(
x = 1:10,
y = 1:10,
time_value = rep(seq(as.Date("2020-01-01"), by = 1, length.out = 5), 2),
geo_value = rep(c("ca", "hi"), each = 5)
) %>%
as_epi_df()
epi_recipe(y ~ x, data = tib) %>%
step_training_window(n_recent = 3) %>%
prep(tib) %>%
bake(new_data = NULL)
#> An `epi_df` object, 6 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-11-12 19:15:10.085706
#>
#> # A tibble: 6 × 4
#> geo_value time_value x y
#> * <chr> <date> <int> <int>
#> 1 ca 2020-01-03 3 3
#> 2 ca 2020-01-04 4 4
#> 3 ca 2020-01-05 5 5
#> 4 hi 2020-01-03 8 8
#> 5 hi 2020-01-04 9 9
#> 6 hi 2020-01-05 10 10
epi_recipe(y ~ x, data = tib) %>%
step_epi_naomit() %>%
step_training_window(n_recent = 3) %>%
prep(tib) %>%
bake(new_data = NULL)
#> An `epi_df` object, 6 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-11-12 19:15:10.085706
#>
#> # A tibble: 6 × 4
#> geo_value time_value x y
#> * <chr> <date> <int> <int>
#> 1 ca 2020-01-03 3 3
#> 2 ca 2020-01-04 4 4
#> 3 ca 2020-01-05 5 5
#> 4 hi 2020-01-03 8 8
#> 5 hi 2020-01-04 9 9
#> 6 hi 2020-01-05 10 10