Skip to contents

Applies one or more outlier detection methods to a given signal variable, and optionally aggregates the outputs to create a consensus result. See the outliers vignette for examples.

Usage

detect_outlr(
  x = seq_along(y),
  y,
  methods = tibble::tibble(method = "rm", args = list(list()), abbr = "rm"),
  combiner = c("median", "mean", "none")
)

Arguments

x

Design points corresponding to the signal values y. Default is seq_along(y) (that is, equally-spaced points from 1 to the length of y).

y

Signal values.

methods

A tibble specifying the method(s) to use for outlier detection, with one row per method, and the following columns:

  • method: Either "rm" or "stl", or a custom function for outlier detection; see details for further explanation.

  • args: Named list of arguments that will be passed to the detection method.

  • abbr: Abbreviation to use in naming output columns with results from this method.

combiner

String, one of "median", "mean", or "none", specifying how to combine results from different outlier detection methods for the thresholds determining whether a particular observation is classified as an outlier, as well as a replacement value for any outliers. If "none", then no summarized results are calculated. Note that if the number of methods (number of rows) is odd, then "median" is equivalent to a majority vote for purposes of determining whether a given observation is an outlier.

Value

An tibble with number of rows equal to length(y) and columns giving the outlier detection thresholds (lower and upper) and replacement values from each detection method (replacement).

Details

Each outlier detection method, one per row of the passed methods tibble, is a function that must take as its first two arguments x and y, and then any number of additional arguments. The function must return a tibble with the number of rows equal to length(y), and with columns lower, upper, and replacement, representing lower and upper bounds for what would be considered an outlier, and a posited replacement value, respectively.

For convenience, the outlier detection method can be specified (in the method column of methods) by a string "rm", shorthand for detect_outlr_rm(), which detects outliers via a rolling median; or by "stl", shorthand for detect_outlr_stl(), which detects outliers via an STL decomposition.

Examples

detection_methods <- dplyr::bind_rows(
  dplyr::tibble(
    method = "rm",
    args = list(list(
      detect_negatives = TRUE,
      detection_multiplier = 2.5
    )),
    abbr = "rm"
  ),
  dplyr::tibble(
    method = "stl",
    args = list(list(
      detect_negatives = TRUE,
      detection_multiplier = 2.5,
      seasonal_period = 7
    )),
    abbr = "stl_seasonal"
  ),
  dplyr::tibble(
    method = "stl",
    args = list(list(
      detect_negatives = TRUE,
      detection_multiplier = 2.5,
      seasonal_period = 7,
      seasonal_as_residual = TRUE
    )),
    abbr = "stl_reseasonal"
  )
)

x <- incidence_num_outlier_example %>%
  dplyr::select(geo_value, time_value, cases) %>%
  as_epi_df() %>%
  group_by(geo_value) %>%
  mutate(outlier_info = detect_outlr(
    x = time_value, y = cases,
    methods = detection_methods,
    combiner = "median"
  )) %>%
  unnest(outlier_info)
#> Adding missing grouping variables: `geo_value`
#> Adding missing grouping variables: `geo_value`
#> Adding missing grouping variables: `geo_value`
#> Adding missing grouping variables: `rm_geo_value`
#> Adding missing grouping variables: `rm_geo_value`
#> Adding missing grouping variables: `rm_geo_value`
#> Adding missing grouping variables: `geo_value`
#> Adding missing grouping variables: `geo_value`
#> Adding missing grouping variables: `geo_value`
#> Adding missing grouping variables: `rm_geo_value`
#> Adding missing grouping variables: `rm_geo_value`
#> Adding missing grouping variables: `rm_geo_value`