dplyr::filter for epi_archives
Usage
# S3 method for class 'epi_archive'
filter(.data, ..., .by = NULL, .format_aware = FALSE)Arguments
- .data
 an
epi_archive- ...
 as in
dplyr::filter; using theversioncolumn is not allowed unless you use.format_aware = TRUE; see details.- .by
 as in
dplyr::filter- .format_aware
 optional,
TRUEorFALSE; defaultFALSE. See details.
Details
By default, using the version column or measurement columns is disabled as
it's easy to get unexpected results. See if either epix_as_of or
epix_slide works for any version selection you have in mind: for version
selection, see the version or .versions args, respectively; for
measurement column-based filtering, try filtering after epix_as_of or
inside the .f in epix_slide(). If they don't cover your use case, then
you can set .format_aware = TRUE to enable usage of these columns, but be
careful to:
Factor in that
.data$DTmay have been converted into a compact format based on diffing consecutive versions, and the last version of each observation in.data$DTwill always be carried forward to futureversions; see details of [as_epi_archive`].Set
clobberable_versions_startandversions_endof the result appropriately after thefiltercall. They will be initialized with the same values as in.data.
dplyr::filter also has an optional argument .preserve, which should not
have an impact on (ungrouped) epi_archives, and grouped_epi_archives do
not currently support dplyr::filter.
Examples
# Filter to one location and a particular time range:
archive_cases_dv_subset %>%
  filter(geo_value == "fl", time_value >= as.Date("2020-10-01"))
#> → An `epi_archive` object, with metadata:
#> ℹ Min/max time values: 2020-10-01 / 2021-11-30
#> ℹ First/last version with update: 2020-10-02 / 2021-12-01
#> ℹ Versions end: 2021-12-01
#> ℹ A preview of the table (23619 rows x 5 columns):
#> Key: <geo_value, time_value, version>
#>        geo_value time_value    version percent_cli case_rate_7d_av
#>           <char>     <Date>     <Date>       <num>           <num>
#>     1:        fl 2020-10-01 2020-10-02          NA       10.711424
#>     2:        fl 2020-10-01 2020-10-04    5.751017       10.711424
#>     3:        fl 2020-10-01 2020-10-05    5.721662       10.711424
#>     4:        fl 2020-10-01 2020-10-06    5.465139       10.711424
#>     5:        fl 2020-10-01 2020-10-07    5.278582       10.711424
#>    ---                                                            
#> 23615:        fl 2021-11-26 2021-11-29    1.077506        0.000000
#> 23616:        fl 2021-11-27 2021-11-28          NA        0.000000
#> 23617:        fl 2021-11-28 2021-11-29          NA        0.000000
#> 23618:        fl 2021-11-29 2021-11-30          NA        0.000000
#> 23619:        fl 2021-11-30 2021-12-01          NA        5.844879
# Convert to weekly by taking the Saturday data for each week, so that
# `case_rate_7d_av` represents a Sun--Sat average:
archive_cases_dv_subset %>%
  filter(as.POSIXlt(time_value)$wday == 6L)
#> → An `epi_archive` object, with metadata:
#> ℹ Min/max time values: 2020-06-06 / 2021-11-27
#> ℹ First/last version with update: 2020-06-07 / 2021-11-29
#> ℹ Versions end: 2021-12-01
#> ℹ A preview of the table (18416 rows x 5 columns):
#> Key: <geo_value, time_value, version>
#>        geo_value time_value    version percent_cli case_rate_7d_av
#>           <char>     <Date>     <Date>       <num>           <num>
#>     1:        ca 2020-06-06 2020-06-07          NA        6.760295
#>     2:        ca 2020-06-06 2020-06-10    1.342609        6.760295
#>     3:        ca 2020-06-06 2020-06-11    1.581691        6.760295
#>     4:        ca 2020-06-06 2020-06-12    1.856477        6.760295
#>     5:        ca 2020-06-06 2020-06-13    1.901306        6.760295
#>    ---                                                            
#> 18412:        tx 2021-11-20 2021-11-25    2.114713       11.179158
#> 18413:        tx 2021-11-20 2021-11-26    2.089479       11.179158
#> 18414:        tx 2021-11-20 2021-11-27    2.081636       11.179158
#> 18415:        tx 2021-11-20 2021-11-29    2.031905       11.179158
#> 18416:        tx 2021-11-27 2021-11-28          NA        7.174299
# Filtering involving the `version` column or measurement columns requires
# extra care. See epix_as_of and epix_slide instead for some common
# operations. One semi-common operation that ends up being fairly simple is
# treating observations as finalized after some amount of time, and ignoring
# any revisions that were made after that point:
archive_cases_dv_subset %>%
  filter(
    version <= time_value + as.difftime(60, units = "days"),
    .format_aware = TRUE
  )
#> → An `epi_archive` object, with metadata:
#> ℹ Min/max time values: 2020-06-01 / 2021-11-30
#> ℹ First/last version with update: 2020-06-02 / 2021-12-01
#> ℹ Versions end: 2021-12-01
#> ℹ A preview of the table (104394 rows x 5 columns):
#> Key: <geo_value, time_value, version>
#>         geo_value time_value    version percent_cli case_rate_7d_av
#>            <char>     <Date>     <Date>       <num>           <num>
#>      1:        ca 2020-06-01 2020-06-02          NA        6.628329
#>      2:        ca 2020-06-01 2020-06-06    2.140116        6.628329
#>      3:        ca 2020-06-01 2020-06-07    2.140116        6.628329
#>      4:        ca 2020-06-01 2020-06-08    2.140379        6.628329
#>      5:        ca 2020-06-01 2020-06-09    2.114430        6.628329
#>     ---                                                            
#> 104390:        tx 2021-11-26 2021-11-29    1.858596        7.957657
#> 104391:        tx 2021-11-27 2021-11-28          NA        7.174299
#> 104392:        tx 2021-11-28 2021-11-29          NA        6.834681
#> 104393:        tx 2021-11-29 2021-11-30          NA        8.841247
#> 104394:        tx 2021-11-30 2021-12-01          NA        9.566218