epi_slide_opt
allows sliding an n-timestep data.table::froll
or slider::summary-slide function over variables in an epi_df
object.
These functions tend to be much faster than epi_slide()
. See
vignette("epi_df")
for more examples.
epi_slide_mean
is a wrapper around epi_slide_opt
with .f = datatable::frollmean
.
epi_slide_sum
is a wrapper around epi_slide_opt
with .f = datatable::frollsum
.
Usage
epi_slide_opt(
.x,
.col_names,
.f,
...,
.window_size = NULL,
.align = c("right", "center", "left"),
.ref_time_values = NULL,
.all_rows = FALSE
)
epi_slide_mean(
.x,
.col_names,
...,
.window_size = NULL,
.align = c("right", "center", "left"),
.ref_time_values = NULL,
.all_rows = FALSE
)
epi_slide_sum(
.x,
.col_names,
...,
.window_size = NULL,
.align = c("right", "center", "left"),
.ref_time_values = NULL,
.all_rows = FALSE
)
Arguments
- .x
An
epi_df
object. If ungrouped, we group bygeo_value
and any columns inother_keys
. If grouped, we make sure the grouping is bygeo_value
andother_keys
.- .col_names
<
tidy-select
> An unquoted column name(e.g.,cases
), multiple column names (e.g.,c(cases, deaths)
), other tidy-select expression, or a vector of characters (e.g.c("cases", "deaths")
). Variable names can be used as if they were positions in the data frame, so expressions likex:y
can be used to select a range of variables.The tidy-selection renaming interface is not supported, and cannot be used to provide output column names; if you want to customize the output column names, use
dplyr::rename
after the slide.- .f
Function; together with
...
specifies the computation to slide..f
must be one ofdata.table
's rolling functions (frollmean
,frollsum
,frollapply
. See data.table::roll) or one ofslider
's specialized sliding functions (slide_mean
,slide_sum
, etc. See slider::summary-slide).The optimized
data.table
andslider
functions can't be directly passed as the computation function inepi_slide
without careful handling to make sure each computation group is made up of the.window_size
dates rather than.window_size
points.epi_slide_opt
(and wrapper functionsepi_slide_mean
andepi_slide_sum
) take care of window completion automatically to prevent associated errors.- ...
Additional arguments to pass to the slide computation
.f
, for example,algo
orna.rm
in data.table functions. You don't need to specify.x
,.window_size
, or.align
(orbefore
/after
for slider functions).- .window_size
The size of the sliding window. The accepted values depend on the type of the
time_value
column in.x
:if time type is
Date
and the cadence is daily, then.window_size
can be an integer (which will be interpreted in units of days) or a difftime with units "days"if time type is
Date
and the cadence is weekly, then.window_size
must be adifftime
with units "weeks"if time type is a
yearmonth
or an integer, then.window_size
must be an integer
- .align
The alignment of the sliding window.
If "right" (default), then the window has its end at the reference time. This is likely the most common use case, e.g.
.window_size=7
and.align="right"
slides over the past week of data.If "left", then the window has its start at the reference time.
If "center", then the window is centered at the reference time. If the window size is odd, then the window will have floor(window_size/2) points before and after the reference time; if the window size is even, then the window will be asymmetric and have one more value before the reference time than after.
- .ref_time_values
The time values at which to compute the slides values. By default, this is all the unique time values in
.x
.- .all_rows
If
.all_rows = FALSE
, the default, then the outputepi_df
will have only the rows that had atime_value
in.ref_time_values
. Otherwise, all the rows from.x
are included by with a missing value marker (typically NA, but more technically the result ofvctrs::vec_cast
-ingNA
to the type of the slide computation output).
See also
epi_slide
for the more general slide function
Examples
# Compute a 7-day trailing average on cases.
cases_deaths_subset %>%
group_by(geo_value) %>%
epi_slide_opt(cases, .f = data.table::frollmean, .window_size = 7) %>%
dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases)
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-03-20
#>
#> # A tibble: 4,026 × 4
#> # Groups: geo_value [6]
#> geo_value time_value cases cases_7dav
#> * <chr> <date> <dbl> <dbl>
#> 1 ca 2020-03-01 6 NA
#> 2 ca 2020-03-02 4 NA
#> 3 ca 2020-03-03 6 NA
#> 4 ca 2020-03-04 11 NA
#> 5 ca 2020-03-05 10 NA
#> 6 ca 2020-03-06 18 NA
#> 7 ca 2020-03-07 26 11.6
#> 8 ca 2020-03-08 19 13.4
#> 9 ca 2020-03-09 23 16.1
#> 10 ca 2020-03-10 22 18.4
#> # ℹ 4,016 more rows
# Same as above, but adjust `frollmean` settings for speed, accuracy, and
# to allow partially-missing windows.
cases_deaths_subset %>%
group_by(geo_value) %>%
epi_slide_opt(
cases,
.f = data.table::frollmean, .window_size = 7,
algo = "exact", hasNA = TRUE, na.rm = TRUE
) %>%
dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases)
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-03-20
#>
#> # A tibble: 4,026 × 4
#> # Groups: geo_value [6]
#> geo_value time_value cases cases_7dav
#> * <chr> <date> <dbl> <dbl>
#> 1 ca 2020-03-01 6 6
#> 2 ca 2020-03-02 4 5
#> 3 ca 2020-03-03 6 5.33
#> 4 ca 2020-03-04 11 6.75
#> 5 ca 2020-03-05 10 7.4
#> 6 ca 2020-03-06 18 9.17
#> 7 ca 2020-03-07 26 11.6
#> 8 ca 2020-03-08 19 13.4
#> 9 ca 2020-03-09 23 16.1
#> 10 ca 2020-03-10 22 18.4
#> # ℹ 4,016 more rows
# Compute a 7-day trailing average on cases.
cases_deaths_subset %>%
group_by(geo_value) %>%
epi_slide_mean(cases, .window_size = 7) %>%
dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases)
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-03-20
#>
#> # A tibble: 4,026 × 4
#> # Groups: geo_value [6]
#> geo_value time_value cases cases_7dav
#> * <chr> <date> <dbl> <dbl>
#> 1 ca 2020-03-01 6 NA
#> 2 ca 2020-03-02 4 NA
#> 3 ca 2020-03-03 6 NA
#> 4 ca 2020-03-04 11 NA
#> 5 ca 2020-03-05 10 NA
#> 6 ca 2020-03-06 18 NA
#> 7 ca 2020-03-07 26 11.6
#> 8 ca 2020-03-08 19 13.4
#> 9 ca 2020-03-09 23 16.1
#> 10 ca 2020-03-10 22 18.4
#> # ℹ 4,016 more rows
# Same as above, but adjust `frollmean` settings for speed, accuracy, and
# to allow partially-missing windows.
cases_deaths_subset %>%
group_by(geo_value) %>%
epi_slide_mean(
cases,
.window_size = 7,
na.rm = TRUE, algo = "exact", hasNA = TRUE
) %>%
dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases)
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-03-20
#>
#> # A tibble: 4,026 × 4
#> # Groups: geo_value [6]
#> geo_value time_value cases cases_7dav
#> * <chr> <date> <dbl> <dbl>
#> 1 ca 2020-03-01 6 6
#> 2 ca 2020-03-02 4 5
#> 3 ca 2020-03-03 6 5.33
#> 4 ca 2020-03-04 11 6.75
#> 5 ca 2020-03-05 10 7.4
#> 6 ca 2020-03-06 18 9.17
#> 7 ca 2020-03-07 26 11.6
#> 8 ca 2020-03-08 19 13.4
#> 9 ca 2020-03-09 23 16.1
#> 10 ca 2020-03-10 22 18.4
#> # ℹ 4,016 more rows
# Compute a 7-day trailing sum on cases.
cases_deaths_subset %>%
group_by(geo_value) %>%
epi_slide_sum(cases, .window_size = 7) %>%
dplyr::select(geo_value, time_value, cases, cases_7dsum = slide_value_cases)
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-03-20
#>
#> # A tibble: 4,026 × 4
#> # Groups: geo_value [6]
#> geo_value time_value cases cases_7dsum
#> * <chr> <date> <dbl> <dbl>
#> 1 ca 2020-03-01 6 NA
#> 2 ca 2020-03-02 4 NA
#> 3 ca 2020-03-03 6 NA
#> 4 ca 2020-03-04 11 NA
#> 5 ca 2020-03-05 10 NA
#> 6 ca 2020-03-06 18 NA
#> 7 ca 2020-03-07 26 81
#> 8 ca 2020-03-08 19 94
#> 9 ca 2020-03-09 23 113
#> 10 ca 2020-03-10 22 129
#> # ℹ 4,016 more rows