Estimates the growth rate of a signal at given points along the underlying sequence. Several methodologies are available; see the growth rate vignette for examples.
Arguments
- x
Design points corresponding to the signal values
y
. Default isseq_along(y)
(that is, equally-spaced points from 1 to the length ofy
).- y
Signal values.
- x0
Points at which we should estimate the growth rate. Must be a subset of
x
(no extrapolation allowed). Default isx
.- method
Either "rel_change", "linear_reg", "smooth_spline", or "trend_filter", indicating the method to use for the growth rate calculation. The first two are local methods: they are run in a sliding fashion over the sequence (in order to estimate derivatives and hence growth rates); the latter two are global methods: they are run once over the entire sequence. See details for more explanation.
- h
Bandwidth for the sliding window, when
method
is "rel_change" or "linear_reg". See details for more explanation.- log_scale
Should growth rates be estimated using the parametrization on the log scale? See details for an explanation. Default is
FALSE
.- dup_rm
Should we check and remove duplicates in
x
(and corresponding elements ofy
) before the computation? Some methods might handle duplicatex
values gracefully, whereas others might fail (either quietly or loudly). Default isFALSE
.- na_rm
Should missing values be removed before the computation? Default is
FALSE
.- ...
Additional arguments to pass to the method used to estimate the derivative.
Details
The growth rate of a function f defined over a continuously-valued parameter t is defined as f'(t) / f(t), where f'(t) is the derivative of f at t. To estimate the growth rate of a signal in discrete-time (which can be thought of as evaluations or discretizations of an underlying function in continuous-time), we can therefore estimate the derivative and divide by the signal value itself (or possibly a smoothed version of the signal value).
The following methods are available for estimating the growth rate:
"rel_change": uses (B/A - 1) / h, where B is the average of
y
over the second half of a sliding window of bandwidth h centered at the reference pointx0
, and A the average over the first half. This can be seen as using a first-difference approximation to the derivative."linear_reg": uses the slope from a linear regression of
y
onx
over a sliding window centered at the reference pointx0
, divided by the fitted value from this linear regression atx0
."smooth_spline": uses the estimated derivative at
x0
from a smoothing spline fit tox
andy
, viastats::smooth.spline()
, divided by the fitted value of the spline atx0
."trend_filter": uses the estimated derivative at
x0
from polynomial trend filtering (a discrete spline) fit tox
andy
, viagenlasso::trendfilter()
, divided by the fitted value of the discrete spline atx0
.
Log Scale
An alternative view for the growth rate of a function f in general is given
by defining g(t) = log(f(t)), and then observing that g'(t) = f'(t) /
f(t). Therefore, any method that estimates the derivative can be simply
applied to the log of the signal of interest, and in this light, each
method above ("rel_change", "linear_reg", "smooth_spline", and
"trend_filter") has a log scale analog, which can be used by setting
log_scale = TRUE
.
Sliding Windows
For the local methods, "rel_change" and "linear_reg", we use a sliding window
centered at the reference point of bandiwidth h
. In other words, the
sliding window consists of all points in x
whose distance to the
reference point is at most h
. Note that the unit for this distance is
implicitly defined by the x
variable; for example, if x
is a vector of
Date
objects, h = 7
, and the reference point is January 7, then the
sliding window contains all data in between January 1 and 14 (matching the
behavior of epi_slide()
with before = h - 1
and after = h
).
Additional Arguments
For the global methods, "smooth_spline" and "trend_filter", additional
arguments can be specified via ...
for the underlying estimation
function. For the smoothing spline case, these additional arguments are
passed directly to stats::smooth.spline()
(and the defaults are exactly
as in this function). The trend filtering case works a bit differently:
here, a custom set of arguments is allowed (which are distributed
internally to genlasso::trendfilter()
and genlasso::cv.trendfilter()
):
ord
: order of piecewise polynomial for the trend filtering fit. Default is 3.maxsteps
: maximum number of steps to take in the solution path before terminating. Default is 1000.cv
: should cross-validation be used to choose an effective degrees of freedom for the fit? Default isTRUE
.k
: number of folds if cross-validation is to be used. Default is 3.df
: desired effective degrees of freedom for the trend filtering fit. Ifcv = FALSE
, thendf
must be a positive integer; ifcv = TRUE
, thendf
must be one of "min" or "1se" indicating the selection rule to use based on the cross-validation error curve: minimum or 1-standard-error rule, respectively. Default is "min" (going along with the defaultcv = TRUE
). Note that ifcv = FALSE
, then we requiredf
to be set by the user.
Examples
# COVID cases growth rate by state using default method relative change
jhu_csse_daily_subset %>%
group_by(geo_value) %>%
mutate(cases_gr = growth_rate(x = time_value, y = cases))
#> An `epi_df` object, 4,026 x 7 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-08-23 02:40:48.296938
#>
#> # A tibble: 4,026 × 7
#> # Groups: geo_value [6]
#> geo_value time_value cases cases_7d_av case_rate_7d_av death_rate_7d_av
#> * <chr> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 ca 2020-03-01 6 1.29 0.00327 0
#> 2 ca 2020-03-02 4 1.71 0.00435 0
#> 3 ca 2020-03-03 6 2.43 0.00617 0
#> 4 ca 2020-03-04 11 3.86 0.00980 0.000363
#> 5 ca 2020-03-05 10 5.29 0.0134 0.000363
#> 6 ca 2020-03-06 18 7.86 0.0200 0.000363
#> 7 ca 2020-03-07 26 11.6 0.0294 0.000363
#> 8 ca 2020-03-08 19 13.4 0.0341 0.000363
#> 9 ca 2020-03-09 23 16.1 0.0410 0.000726
#> 10 ca 2020-03-10 22 18.4 0.0468 0.000726
#> # ℹ 4,016 more rows
#> # ℹ 1 more variable: cases_gr <dbl>
# Log scale, degree 4 polynomial and 6-fold cross validation
jhu_csse_daily_subset %>%
group_by(geo_value) %>%
mutate(gr_poly = growth_rate(x = time_value, y = cases, log_scale = TRUE, ord = 4, k = 6))
#> Warning: There were 3 warnings in `mutate()`.
#> The first warning was:
#> ℹ In argument: `gr_poly = growth_rate(...)`.
#> ℹ In group 1: `geo_value = "ca"`.
#> Caused by warning in `log()`:
#> ! NaNs produced
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings.
#> An `epi_df` object, 4,026 x 7 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-08-23 02:40:48.296938
#>
#> # A tibble: 4,026 × 7
#> # Groups: geo_value [6]
#> geo_value time_value cases cases_7d_av case_rate_7d_av death_rate_7d_av
#> * <chr> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 ca 2020-03-01 6 1.29 0.00327 0
#> 2 ca 2020-03-02 4 1.71 0.00435 0
#> 3 ca 2020-03-03 6 2.43 0.00617 0
#> 4 ca 2020-03-04 11 3.86 0.00980 0.000363
#> 5 ca 2020-03-05 10 5.29 0.0134 0.000363
#> 6 ca 2020-03-06 18 7.86 0.0200 0.000363
#> 7 ca 2020-03-07 26 11.6 0.0294 0.000363
#> 8 ca 2020-03-08 19 13.4 0.0341 0.000363
#> 9 ca 2020-03-09 23 16.1 0.0410 0.000726
#> 10 ca 2020-03-10 22 18.4 0.0468 0.000726
#> # ℹ 4,016 more rows
#> # ℹ 1 more variable: gr_poly <dbl>