Detects outliers based on a distance from the rolling median specified in terms of multiples of the rolling interquartile range (IQR).
Usage
detect_outlr_rm(
x = seq_along(y),
y,
n = 21,
log_transform = FALSE,
detect_negatives = FALSE,
detection_multiplier = 2,
min_radius = 0,
replacement_multiplier = 0
)
Arguments
- x
Design points corresponding to the signal values
y
. Default isseq_along(y)
(that is, equally-spaced points from 1 to the length ofy
).- y
Signal values.
- n
Number of time steps to use in the rolling window. Default is 21. This value is centrally aligned. When
n
is an odd number, the rolling window extends from(n-1)/2
time steps before each design point to(n-1)/2
time steps after. Whenn
is even, then the rolling range extends fromn/2-1
time steps before ton/2
time steps after.- log_transform
Should a log transform be applied before running outlier detection? Default is
FALSE
. IfTRUE
, and zeros are present, then the log transform will be padded by 1.- detect_negatives
Should negative values automatically count as outliers? Default is
FALSE
.- detection_multiplier
Value determining how far the outlier detection thresholds are from the rolling median, which are calculated as (rolling median) +/- (detection multiplier) * (rolling IQR). Default is 2.
- min_radius
Minimum distance between rolling median and threshold, on transformed scale. Default is 0.
- replacement_multiplier
Value determining how far the replacement values are from the rolling median. The replacement is the original value if it is within the detection thresholds, or otherwise it is rounded to the nearest (rolling median) +/- (replacement multiplier) * (rolling IQR). Default is 0.
Value
An tibble with number of rows equal to length(y)
and columns
giving the outlier detection thresholds (lower
and upper
) and
replacement values from each detection method (replacement
).
Examples
# Detect outliers based on a rolling median
incidence_num_outlier_example %>%
dplyr::select(geo_value, time_value, cases) %>%
as_epi_df() %>%
group_by(geo_value) %>%
mutate(outlier_info = detect_outlr_rm(
x = time_value, y = cases
)) %>%
unnest(outlier_info)
#> An `epi_df` object, 730 x 6 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2022-05-21 22:17:14.962335
#>
#> # A tibble: 730 × 6
#> # Groups: geo_value [2]
#> geo_value time_value cases lower upper replacement
#> * <chr> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 fl 2020-06-01 667 530 2010 667
#> 2 nj 2020-06-01 486 150. 840. 486
#> 3 fl 2020-06-02 617 582. 1992. 617
#> 4 nj 2020-06-02 658 210. 771. 658
#> 5 fl 2020-06-03 1317 635 1975 1317
#> 6 nj 2020-06-03 541 270 702 541
#> 7 fl 2020-06-04 1419 713 1909 1419
#> 8 nj 2020-06-04 478 174. 790. 478
#> 9 fl 2020-06-05 1305 553 2081 1305
#> 10 nj 2020-06-05 825 118. 838. 825
#> # ℹ 720 more rows