Detects outliers based on a seasonal-trend decomposition using LOESS (STL).
Usage
detect_outlr_stl(
x = seq_along(y),
y,
n_trend = 21,
n_seasonal = 21,
n_threshold = 21,
seasonal_period,
seasonal_as_residual = FALSE,
log_transform = FALSE,
detect_negatives = FALSE,
detection_multiplier = 2,
min_radius = 0,
replacement_multiplier = 0
)
Arguments
- x
Design points corresponding to the signal values
y
. Default isseq_along(y)
(that is, equally-spaced points from 1 to the length ofy
).- y
Signal values.
- n_trend
Number of time steps to use in the rolling window for trend. Default is 21.
- n_seasonal
Number of time steps to use in the rolling window for seasonality. Default is 21. Can also be the string "periodic". See
s.window
instats::stl
.- n_threshold
Number of time steps to use in rolling window for the IQR outlier thresholds.
- seasonal_period
Integer specifying period of "seasonality". For example, for daily data, a period 7 means weekly seasonality. It must be strictly larger than 1. Also impacts the size of the low-pass filter window; see
l.window
instats::stl
.- seasonal_as_residual
Boolean specifying whether the seasonal(/weekly) component should be treated as part of the residual component instead of as part of the predictions. The default, FALSE, treats them as part of the predictions, so large seasonal(/weekly) components will not lead to flagging points as outliers.
TRUE
may instead consider the extrema of large seasonal variations to be outliers;n_seasonal
andseasonal_period
will still have an impact on the result, though, by impacting the estimation of the trend component.- log_transform
Should a log transform be applied before running outlier detection? Default is
FALSE
. IfTRUE
, and zeros are present, then the log transform will be padded by 1.- detect_negatives
Should negative values automatically count as outliers? Default is
FALSE
.- detection_multiplier
Value determining how far the outlier detection thresholds are from the rolling median, which are calculated as (rolling median) +/- (detection multiplier) * (rolling IQR). Default is 2.
- min_radius
Minimum distance between rolling median and threshold, on transformed scale. Default is 0.
- replacement_multiplier
Value determining how far the replacement values are from the rolling median. The replacement is the original value if it is within the detection thresholds, or otherwise it is rounded to the nearest (rolling median) +/- (replacement multiplier) * (rolling IQR). Default is 0.
Value
An tibble with number of rows equal to length(y)
and columns
giving the outlier detection thresholds (lower
and upper
) and
replacement values from each detection method (replacement
).
Details
The STL decomposition is computed using stats::stl()
. Once
computed, the outlier detection method is analogous to the rolling median
method in detect_outlr_rm()
, except with the fitted values and residuals
from the STL decomposition taking the place of the rolling median and
residuals to the rolling median, respectively.
The last set of arguments, log_transform
through replacement_multiplier
,
are exactly as in detect_outlr_rm()
.
Examples
# Detects outliers based on a seasonal-trend decomposition using LOESS
incidence_num_outlier_example %>%
dplyr::select(geo_value, time_value, cases) %>%
as_epi_df() %>%
group_by(geo_value) %>%
mutate(outlier_info = detect_outlr_stl(
x = time_value, y = cases,
seasonal_period = 7
)) %>% # weekly seasonality for daily data
unnest(outlier_info)
#> An `epi_df` object, 730 x 6 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2022-05-21 22:17:14.962335
#>
#> # A tibble: 730 × 6
#> # Groups: geo_value [2]
#> geo_value time_value cases lower upper replacement
#> * <chr> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 fl 2020-06-01 667 -1193. 1233. 667
#> 2 nj 2020-06-01 486 281. 762. 486
#> 3 fl 2020-06-02 617 -691. 1890. 617
#> 4 nj 2020-06-02 658 317. 891. 658
#> 5 fl 2020-06-03 1317 -144. 2396. 1317
#> 6 nj 2020-06-03 541 292. 809. 541
#> 7 fl 2020-06-04 1419 260. 2696. 1419
#> 8 nj 2020-06-04 478 315. 792. 478
#> 9 fl 2020-06-05 1305 548. 2950. 1305
#> 10 nj 2020-06-05 825 382. 835. 825
#> # ℹ 720 more rows