Skip to contents

Detects outliers based on a seasonal-trend decomposition using LOESS (STL).

Usage

detect_outlr_stl(
  x = seq_along(y),
  y,
  n_trend = 21,
  n_seasonal = 21,
  n_threshold = 21,
  seasonal_period,
  seasonal_as_residual = FALSE,
  log_transform = FALSE,
  detect_negatives = FALSE,
  detection_multiplier = 2,
  min_radius = 0,
  replacement_multiplier = 0
)

Arguments

x

Design points corresponding to the signal values y. Default is seq_along(y) (that is, equally-spaced points from 1 to the length of y).

y

Signal values.

n_trend

Number of time steps to use in the rolling window for trend. Default is 21.

n_seasonal

Number of time steps to use in the rolling window for seasonality. Default is 21. Can also be the string "periodic". See s.window in stats::stl.

n_threshold

Number of time steps to use in rolling window for the IQR outlier thresholds.

seasonal_period

Integer specifying period of "seasonality". For example, for daily data, a period 7 means weekly seasonality. It must be strictly larger than 1. Also impacts the size of the low-pass filter window; see l.window in stats::stl.

seasonal_as_residual

Boolean specifying whether the seasonal(/weekly) component should be treated as part of the residual component instead of as part of the predictions. The default, FALSE, treats them as part of the predictions, so large seasonal(/weekly) components will not lead to flagging points as outliers. TRUE may instead consider the extrema of large seasonal variations to be outliers; n_seasonal and seasonal_period will still have an impact on the result, though, by impacting the estimation of the trend component.

log_transform

Should a log transform be applied before running outlier detection? Default is FALSE. If TRUE, and zeros are present, then the log transform will be padded by 1.

detect_negatives

Should negative values automatically count as outliers? Default is FALSE.

detection_multiplier

Value determining how far the outlier detection thresholds are from the rolling median, which are calculated as (rolling median) +/- (detection multiplier) * (rolling IQR). Default is 2.

min_radius

Minimum distance between rolling median and threshold, on transformed scale. Default is 0.

replacement_multiplier

Value determining how far the replacement values are from the rolling median. The replacement is the original value if it is within the detection thresholds, or otherwise it is rounded to the nearest (rolling median) +/- (replacement multiplier) * (rolling IQR). Default is 0.

Value

An tibble with number of rows equal to length(y) and columns giving the outlier detection thresholds (lower and upper) and replacement values from each detection method (replacement).

Details

The STL decomposition is computed using stats::stl(). Once computed, the outlier detection method is analogous to the rolling median method in detect_outlr_rm(), except with the fitted values and residuals from the STL decomposition taking the place of the rolling median and residuals to the rolling median, respectively.

The last set of arguments, log_transform through replacement_multiplier, are exactly as in detect_outlr_rm().

Examples

# Detects outliers based on a seasonal-trend decomposition using LOESS
incidence_num_outlier_example %>%
  dplyr::select(geo_value, time_value, cases) %>%
  as_epi_df() %>%
  group_by(geo_value) %>%
  mutate(outlier_info = detect_outlr_stl(
    x = time_value, y = cases,
    seasonal_period = 7
  )) %>% # weekly seasonality for daily data
  unnest(outlier_info)
#> An `epi_df` object, 730 x 6 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2022-05-21 22:17:14.962335
#> 
#> # A tibble: 730 × 6
#> # Groups:   geo_value [2]
#>    geo_value time_value cases  lower upper replacement
#>  * <chr>     <date>     <dbl>  <dbl> <dbl>       <dbl>
#>  1 fl        2020-06-01   667 -1193. 1233.         667
#>  2 nj        2020-06-01   486   281.  762.         486
#>  3 fl        2020-06-02   617  -691. 1890.         617
#>  4 nj        2020-06-02   658   317.  891.         658
#>  5 fl        2020-06-03  1317  -144. 2396.        1317
#>  6 nj        2020-06-03   541   292.  809.         541
#>  7 fl        2020-06-04  1419   260. 2696.        1419
#>  8 nj        2020-06-04   478   315.  792.         478
#>  9 fl        2020-06-05  1305   548. 2950.        1305
#> 10 nj        2020-06-05   825   382.  835.         825
#> # ℹ 720 more rows