Applies one or more outlier detection methods to a given signal variable, and optionally aggregates the outputs to create a consensus result. See the outliers vignette for examples.
Arguments
- x
Design points corresponding to the signal values
y
. Default isseq_along(y)
(that is, equally-spaced points from 1 to the length ofy
).- y
Signal values.
- methods
A tibble specifying the method(s) to use for outlier detection, with one row per method, and the following columns:
method
: Either "rm" or "stl", or a custom function for outlier detection; see details for further explanation.args
: Named list of arguments that will be passed to the detection method.abbr
: Abbreviation to use in naming output columns with results from this method.
- combiner
String, one of "median", "mean", or "none", specifying how to combine results from different outlier detection methods for the thresholds determining whether a particular observation is classified as an outlier, as well as a replacement value for any outliers. If "none", then no summarized results are calculated. Note that if the number of
methods
(number of rows) is odd, then "median" is equivalent to a majority vote for purposes of determining whether a given observation is an outlier.
Value
An tibble with number of rows equal to length(y)
and columns
giving the outlier detection thresholds (lower
and upper
) and
replacement values from each detection method (replacement
).
Details
Each outlier detection method, one per row of the passed methods
tibble, is a function that must take as its first two arguments x
and
y
, and then any number of additional arguments. The function must return
a tibble with the number of rows equal to length(y)
, and with columns
lower
, upper
, and replacement
, representing lower and upper bounds
for what would be considered an outlier, and a posited replacement value,
respectively.
For convenience, the outlier detection method can be specified (in the
method
column of methods
) by a string "rm", shorthand for
detect_outlr_rm()
, which detects outliers via a rolling median; or by
"stl", shorthand for detect_outlr_stl()
, which detects outliers via an
STL decomposition.
Examples
detection_methods <- dplyr::bind_rows(
dplyr::tibble(
method = "rm",
args = list(list(
detect_negatives = TRUE,
detection_multiplier = 2.5
)),
abbr = "rm"
),
dplyr::tibble(
method = "stl",
args = list(list(
detect_negatives = TRUE,
detection_multiplier = 2.5,
seasonal_period = 7
)),
abbr = "stl_seasonal"
),
dplyr::tibble(
method = "stl",
args = list(list(
detect_negatives = TRUE,
detection_multiplier = 2.5,
seasonal_period = 7,
seasonal_as_residual = TRUE
)),
abbr = "stl_reseasonal"
)
)
x <- incidence_num_outlier_example %>%
dplyr::select(geo_value, time_value, cases) %>%
as_epi_df() %>%
group_by(geo_value) %>%
mutate(outlier_info = detect_outlr(
x = time_value, y = cases,
methods = detection_methods,
combiner = "median"
)) %>%
unnest(outlier_info)
#> Adding missing grouping variables: `geo_value`
#> Adding missing grouping variables: `geo_value`
#> Adding missing grouping variables: `geo_value`
#> Adding missing grouping variables: `rm_geo_value`
#> Adding missing grouping variables: `rm_geo_value`
#> Adding missing grouping variables: `rm_geo_value`
#> Adding missing grouping variables: `geo_value`
#> Adding missing grouping variables: `geo_value`
#> Adding missing grouping variables: `geo_value`
#> Adding missing grouping variables: `rm_geo_value`
#> Adding missing grouping variables: `rm_geo_value`
#> Adding missing grouping variables: `rm_geo_value`