Skip to contents

Slides an n-timestep data.table::froll or slider::summary-slide function over variables in an epi_df object. See the slide vignette for examples.

Usage

epi_slide_opt(
  .x,
  .col_names,
  .f,
  ...,
  .window_size = 1,
  .align = c("right", "center", "left"),
  .ref_time_values = NULL,
  .all_rows = FALSE
)

Arguments

.x

The epi_df object under consideration, grouped or ungrouped. If ungrouped, all data in .x will be treated as part of a single data group.

.col_names

<tidy-select> An unquoted column name(e.g., cases), multiple column names (e.g., c(cases, deaths)), other tidy-select expression, or a vector of characters (e.g. c("cases", "deaths")). Variable names can be used as if they were positions in the data frame, so expressions like x:y can be used to select a range of variables.

The tidy-selection renaming interface is not supported, and cannot be used to provide output column names; if you want to customize the output column names, use dplyr::rename after the slide.

.f

Function; together with ... specifies the computation to slide. .f must be one of data.table's rolling functions (frollmean, frollsum, frollapply. See data.table::roll) or one of slider's specialized sliding functions (slide_mean, slide_sum, etc. See slider::summary-slide).

The optimized data.table and slider functions can't be directly passed as the computation function in epi_slide without careful handling to make sure each computation group is made up of the .window_size dates rather than .window_size points. epi_slide_opt (and wrapper functions epi_slide_mean and epi_slide_sum) take care of window completion automatically to prevent associated errors.

...

Additional arguments to pass to the slide computation .f, for example, algo or na.rm in data.table functions. You don't need to specify .x, .window_size, or .align (or before/after for slider functions).

.window_size

The size of the sliding window. By default, this is 1, meaning that only the current ref_time_value is included. The accepted values here depend on the time_value column:

  • if time_type is Date and the cadence is daily, then .window_size can be an integer (which will be interpreted in units of days) or a difftime with units "days"

  • if time_type is Date and the cadence is weekly, then .window_size must be a difftime with units "weeks"

  • if time_type is an integer, then .window_size must be an integer

.align

The alignment of the sliding window. If right (default), then the window has its end at the reference time; if center, then the window is centered at the reference time; if left, then the window has its start at the reference time. If the alignment is center and the window size is odd, then the window will have floor(window_size/2) points before and after the reference time. If the window size is even, then the window will be asymmetric and have one less value on the right side of the reference time (assuming time increases from left to right).

.ref_time_values

Time values for sliding computations, meaning, each element of this vector serves as the reference time point for one sliding window. If missing, then this will be set to all unique time values in the underlying data table, by default.

.all_rows

If .all_rows = TRUE, then all rows of .x will be kept in the output even with .ref_time_values provided, with some type of missing value marker for the slide computation output column(s) for time_values outside .ref_time_values; otherwise, there will be one row for each row in .x that had a time_value in .ref_time_values. Default is FALSE. The missing value marker is the result of vctrs::vec_casting NA to the type of the slide computation output.

Value

An epi_df object given by appending one or more new columns to .x, named according to the .new_col_name argument.

Details

To "slide" means to apply a function or formula over a rolling window. The .window_size arg determines the width of the window (including the reference time) and the .align arg governs how the window is aligned (see below for examples). The .ref_time_values arg controls which time values to consider for the slide and .all_rows allows you to keep NAs around.

epi_slide_*() does not require a complete window (such as on the left boundary of the dataset) and will attempt to perform the computation anyway. The issue of what to do with partial computations (those run on incomplete windows) is therefore left up to the user, either through the specified function or formula f, or through post-processing.

Let's look at some window examples, assuming that the reference time value is "tv". With .align = "right" and .window_size = 3, the window will be:

time_values: tv - 3, tv - 2, tv - 1, tv, tv + 1, tv + 2, tv + 3 window: tv - 2, tv - 1, tv

With .align = "center" and .window_size = 3, the window will be:

time_values: tv - 3, tv - 2, tv - 1, tv, tv + 1, tv + 2, tv + 3 window: tv - 1, tv, tv + 1

With .align = "center" and .window_size = 4, the window will be:

time_values: tv - 3, tv - 2, tv - 1, tv, tv + 1, tv + 2, tv + 3 window: tv - 2, tv - 1, tv, tv + 1

With .align = "left" and .window_size = 3, the window will be:

time_values: ttv - 3, tv - 2, tv - 1, tv, tv + 1, tv + 2, tv + 3 window: tv, tv + 1, tv + 2

Examples

# slide a 7-day trailing average formula on cases. This can also be done with `epi_slide_mean`
jhu_csse_daily_subset %>%
  group_by(geo_value) %>%
  epi_slide_opt(
    cases,
    .f = data.table::frollmean, .window_size = 7
  ) %>%
  # Remove a nonessential var. to ensure new col is printed, and rename new col
  dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>%
  ungroup()
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-08-23 02:40:48.296938
#> 
#> # A tibble: 4,026 × 4
#>    geo_value time_value cases cases_7dav
#>  * <chr>     <date>     <dbl>      <dbl>
#>  1 ca        2020-03-01     6       NA  
#>  2 ca        2020-03-02     4       NA  
#>  3 ca        2020-03-03     6       NA  
#>  4 ca        2020-03-04    11       NA  
#>  5 ca        2020-03-05    10       NA  
#>  6 ca        2020-03-06    18       NA  
#>  7 ca        2020-03-07    26       11.6
#>  8 ca        2020-03-08    19       13.4
#>  9 ca        2020-03-09    23       16.1
#> 10 ca        2020-03-10    22       18.4
#> # ℹ 4,016 more rows

# slide a 7-day trailing average formula on cases. Adjust `frollmean` settings for speed
# and accuracy, and to allow partially-missing windows.
jhu_csse_daily_subset %>%
  group_by(geo_value) %>%
  epi_slide_opt(
    cases,
    .f = data.table::frollmean, .window_size = 7,
    # `frollmean` options
    algo = "exact", hasNA = TRUE, na.rm = TRUE
  ) %>%
  dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>%
  ungroup()
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-08-23 02:40:48.296938
#> 
#> # A tibble: 4,026 × 4
#>    geo_value time_value cases cases_7dav
#>  * <chr>     <date>     <dbl>      <dbl>
#>  1 ca        2020-03-01     6       6   
#>  2 ca        2020-03-02     4       5   
#>  3 ca        2020-03-03     6       5.33
#>  4 ca        2020-03-04    11       6.75
#>  5 ca        2020-03-05    10       7.4 
#>  6 ca        2020-03-06    18       9.17
#>  7 ca        2020-03-07    26      11.6 
#>  8 ca        2020-03-08    19      13.4 
#>  9 ca        2020-03-09    23      16.1 
#> 10 ca        2020-03-10    22      18.4 
#> # ℹ 4,016 more rows

# slide a 7-day leading average
jhu_csse_daily_subset %>%
  group_by(geo_value) %>%
  epi_slide_opt(
    cases,
    .f = slider::slide_mean, .window_size = 7, .align = "left"
  ) %>%
  # Remove a nonessential var. to ensure new col is printed
  dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>%
  ungroup()
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-08-23 02:40:48.296938
#> 
#> # A tibble: 4,026 × 4
#>    geo_value time_value cases cases_7dav
#>  * <chr>     <date>     <dbl>      <dbl>
#>  1 ca        2020-03-01     6       11.6
#>  2 ca        2020-03-02     4       13.4
#>  3 ca        2020-03-03     6       16.1
#>  4 ca        2020-03-04    11       18.4
#>  5 ca        2020-03-05    10       20.4
#>  6 ca        2020-03-06    18       25.1
#>  7 ca        2020-03-07    26       30.1
#>  8 ca        2020-03-08    19       34.4
#>  9 ca        2020-03-09    23       37.3
#> 10 ca        2020-03-10    22       56.7
#> # ℹ 4,016 more rows

# slide a 7-day center-aligned sum. This can also be done with `epi_slide_sum`
jhu_csse_daily_subset %>%
  group_by(geo_value) %>%
  epi_slide_opt(
    cases,
    .f = data.table::frollsum, .window_size = 6, .align = "center"
  ) %>%
  # Remove a nonessential var. to ensure new col is printed
  dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>%
  ungroup()
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-08-23 02:40:48.296938
#> 
#> # A tibble: 4,026 × 4
#>    geo_value time_value cases cases_7dav
#>  * <chr>     <date>     <dbl>      <dbl>
#>  1 ca        2020-03-01     6         NA
#>  2 ca        2020-03-02     4         NA
#>  3 ca        2020-03-03     6         NA
#>  4 ca        2020-03-04    11         55
#>  5 ca        2020-03-05    10         75
#>  6 ca        2020-03-06    18         90
#>  7 ca        2020-03-07    26        107
#>  8 ca        2020-03-08    19        118
#>  9 ca        2020-03-09    23        133
#> 10 ca        2020-03-10    22        158
#> # ℹ 4,016 more rows