Optimized slide function for performing rolling averages on an epi_df
object
Source: R/slide.R
epi_slide_mean.Rd
Slides an n-timestep mean over variables in an epi_df
object. See the slide vignette for
examples.
Usage
epi_slide_mean(
.x,
.col_names,
...,
.window_size = 1,
.align = c("right", "center", "left"),
.ref_time_values = NULL,
.all_rows = FALSE
)
Arguments
- .x
The
epi_df
object under consideration, grouped or ungrouped. If ungrouped, all data in.x
will be treated as part of a single data group.- .col_names
<
tidy-select
> An unquoted column name(e.g.,cases
), multiple column names (e.g.,c(cases, deaths)
), other tidy-select expression, or a vector of characters (e.g.c("cases", "deaths")
). Variable names can be used as if they were positions in the data frame, so expressions likex:y
can be used to select a range of variables.The tidy-selection renaming interface is not supported, and cannot be used to provide output column names; if you want to customize the output column names, use
dplyr::rename
after the slide.- ...
Additional arguments to pass to the slide computation
.f
, for example,algo
orna.rm
in data.table functions. You don't need to specify.x
,.window_size
, or.align
(orbefore
/after
for slider functions).- .window_size
The size of the sliding window. By default, this is 1, meaning that only the current ref_time_value is included. The accepted values here depend on the
time_value
column:if time_type is Date and the cadence is daily, then
.window_size
can be an integer (which will be interpreted in units of days) or a difftime with units "days"if time_type is Date and the cadence is weekly, then
.window_size
must be a difftime with units "weeks"if time_type is an integer, then
.window_size
must be an integer
- .align
The alignment of the sliding window. If
right
(default), then the window has its end at the reference time; ifcenter
, then the window is centered at the reference time; ifleft
, then the window has its start at the reference time. If the alignment iscenter
and the window size is odd, then the window will have floor(window_size/2) points before and after the reference time. If the window size is even, then the window will be asymmetric and have one less value on the right side of the reference time (assuming time increases from left to right).- .ref_time_values
Time values for sliding computations, meaning, each element of this vector serves as the reference time point for one sliding window. If missing, then this will be set to all unique time values in the underlying data table, by default.
- .all_rows
If
.all_rows = TRUE
, then all rows of.x
will be kept in the output even with.ref_time_values
provided, with some type of missing value marker for the slide computation output column(s) fortime_value
s outside.ref_time_values
; otherwise, there will be one row for each row in.x
that had atime_value
in.ref_time_values
. Default isFALSE
. The missing value marker is the result ofvctrs::vec_cast
ingNA
to the type of the slide computation output.
Value
An epi_df
object given by appending one or more new columns to .x
,
named according to the .new_col_name
argument.
Details
Wrapper around epi_slide_opt
with .f = datatable::frollmean
.
To "slide" means to apply a function or formula over a rolling
window. The .window_size
arg determines the width of the window
(including the reference time) and the .align
arg governs how the window
is aligned (see below for examples). The .ref_time_values
arg controls
which time values to consider for the slide and .all_rows
allows you to
keep NAs around.
epi_slide_*()
does not require a complete window (such as on the left
boundary of the dataset) and will attempt to perform the computation
anyway. The issue of what to do with partial computations (those run on
incomplete windows) is therefore left up to the user, either through the
specified function or formula f
, or through post-processing.
Let's look at some window examples, assuming that the reference time value is "tv". With .align = "right" and .window_size = 3, the window will be:
time_values: tv - 3, tv - 2, tv - 1, tv, tv + 1, tv + 2, tv + 3 window: tv - 2, tv - 1, tv
With .align = "center" and .window_size = 3, the window will be:
time_values: tv - 3, tv - 2, tv - 1, tv, tv + 1, tv + 2, tv + 3 window: tv - 1, tv, tv + 1
With .align = "center" and .window_size = 4, the window will be:
time_values: tv - 3, tv - 2, tv - 1, tv, tv + 1, tv + 2, tv + 3 window: tv - 2, tv - 1, tv, tv + 1
With .align = "left" and .window_size = 3, the window will be:
time_values: ttv - 3, tv - 2, tv - 1, tv, tv + 1, tv + 2, tv + 3 window: tv, tv + 1, tv + 2
Examples
# slide a 7-day trailing average formula on cases
jhu_csse_daily_subset %>%
group_by(geo_value) %>%
epi_slide_mean(cases, .window_size = 7) %>%
# Remove a nonessential var. to ensure new col is printed
dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>%
ungroup()
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-08-23 02:40:48.296938
#>
#> # A tibble: 4,026 × 4
#> geo_value time_value cases cases_7dav
#> * <chr> <date> <dbl> <dbl>
#> 1 ca 2020-03-01 6 NA
#> 2 ca 2020-03-02 4 NA
#> 3 ca 2020-03-03 6 NA
#> 4 ca 2020-03-04 11 NA
#> 5 ca 2020-03-05 10 NA
#> 6 ca 2020-03-06 18 NA
#> 7 ca 2020-03-07 26 11.6
#> 8 ca 2020-03-08 19 13.4
#> 9 ca 2020-03-09 23 16.1
#> 10 ca 2020-03-10 22 18.4
#> # ℹ 4,016 more rows
# slide a 7-day trailing average formula on cases. Adjust `frollmean` settings for speed
# and accuracy, and to allow partially-missing windows.
jhu_csse_daily_subset %>%
group_by(geo_value) %>%
epi_slide_mean(
cases,
.window_size = 7,
# `frollmean` options
na.rm = TRUE, algo = "exact", hasNA = TRUE
) %>%
dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>%
ungroup()
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-08-23 02:40:48.296938
#>
#> # A tibble: 4,026 × 4
#> geo_value time_value cases cases_7dav
#> * <chr> <date> <dbl> <dbl>
#> 1 ca 2020-03-01 6 6
#> 2 ca 2020-03-02 4 5
#> 3 ca 2020-03-03 6 5.33
#> 4 ca 2020-03-04 11 6.75
#> 5 ca 2020-03-05 10 7.4
#> 6 ca 2020-03-06 18 9.17
#> 7 ca 2020-03-07 26 11.6
#> 8 ca 2020-03-08 19 13.4
#> 9 ca 2020-03-09 23 16.1
#> 10 ca 2020-03-10 22 18.4
#> # ℹ 4,016 more rows
# slide a 7-day leading average
jhu_csse_daily_subset %>%
group_by(geo_value) %>%
epi_slide_mean(cases, .window_size = 7, .align = "right") %>%
# Remove a nonessential var. to ensure new col is printed
dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>%
ungroup()
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-08-23 02:40:48.296938
#>
#> # A tibble: 4,026 × 4
#> geo_value time_value cases cases_7dav
#> * <chr> <date> <dbl> <dbl>
#> 1 ca 2020-03-01 6 NA
#> 2 ca 2020-03-02 4 NA
#> 3 ca 2020-03-03 6 NA
#> 4 ca 2020-03-04 11 NA
#> 5 ca 2020-03-05 10 NA
#> 6 ca 2020-03-06 18 NA
#> 7 ca 2020-03-07 26 11.6
#> 8 ca 2020-03-08 19 13.4
#> 9 ca 2020-03-09 23 16.1
#> 10 ca 2020-03-10 22 18.4
#> # ℹ 4,016 more rows
# slide a 7-day center-aligned average
jhu_csse_daily_subset %>%
group_by(geo_value) %>%
epi_slide_mean(cases, .window_size = 7, .align = "center") %>%
# Remove a nonessential var. to ensure new col is printed
dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>%
ungroup()
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-08-23 02:40:48.296938
#>
#> # A tibble: 4,026 × 4
#> geo_value time_value cases cases_7dav
#> * <chr> <date> <dbl> <dbl>
#> 1 ca 2020-03-01 6 NA
#> 2 ca 2020-03-02 4 NA
#> 3 ca 2020-03-03 6 NA
#> 4 ca 2020-03-04 11 11.6
#> 5 ca 2020-03-05 10 13.4
#> 6 ca 2020-03-06 18 16.1
#> 7 ca 2020-03-07 26 18.4
#> 8 ca 2020-03-08 19 20.4
#> 9 ca 2020-03-09 23 25.1
#> 10 ca 2020-03-10 22 30.1
#> # ℹ 4,016 more rows
# slide a 14-day center-aligned average
jhu_csse_daily_subset %>%
group_by(geo_value) %>%
epi_slide_mean(cases, .window_size = 14, .align = "center") %>%
# Remove a nonessential var. to ensure new col is printed
dplyr::select(geo_value, time_value, cases, cases_14dav = slide_value_cases) %>%
ungroup()
#> An `epi_df` object, 4,026 x 4 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-08-23 02:40:48.296938
#>
#> # A tibble: 4,026 × 4
#> geo_value time_value cases cases_14dav
#> * <chr> <date> <dbl> <dbl>
#> 1 ca 2020-03-01 6 NA
#> 2 ca 2020-03-02 4 NA
#> 3 ca 2020-03-03 6 NA
#> 4 ca 2020-03-04 11 NA
#> 5 ca 2020-03-05 10 NA
#> 6 ca 2020-03-06 18 NA
#> 7 ca 2020-03-07 26 NA
#> 8 ca 2020-03-08 19 23
#> 9 ca 2020-03-09 23 25.4
#> 10 ca 2020-03-10 22 36.4
#> # ℹ 4,016 more rows