Estimates the growth rate of a signal at given points along the underlying sequence. Several methodologies are available; see the growth rate vignette for examples.

## Arguments

- x
Design points corresponding to the signal values

`y`

. Default is`seq_along(y)`

(that is, equally-spaced points from 1 to the length of`y`

).- y
Signal values.

- x0
Points at which we should estimate the growth rate. Must be a subset of

`x`

(no extrapolation allowed). Default is`x`

.- method
Either "rel_change", "linear_reg", "smooth_spline", or "trend_filter", indicating the method to use for the growth rate calculation. The first two are local methods: they are run in a sliding fashion over the sequence (in order to estimate derivatives and hence growth rates); the latter two are global methods: they are run once over the entire sequence. See details for more explanation.

- h
Bandwidth for the sliding window, when

`method`

is "rel_change" or "linear_reg". See details for more explanation.- log_scale
Should growth rates be estimated using the parametrization on the log scale? See details for an explanation. Default is

`FALSE`

.- dup_rm
Should we check and remove duplicates in

`x`

(and corresponding elements of`y`

) before the computation? Some methods might handle duplicate`x`

values gracefully, whereas others might fail (either quietly or loudly). Default is`FALSE`

.- na_rm
Should missing values be removed before the computation? Default is

`FALSE`

.- ...
Additional arguments to pass to the method used to estimate the derivative.

## Details

The growth rate of a function f defined over a continuously-valued parameter t is defined as f'(t) / f(t), where f'(t) is the derivative of f at t. To estimate the growth rate of a signal in discrete-time (which can be thought of as evaluations or discretizations of an underlying function in continuous-time), we can therefore estimate the derivative and divide by the signal value itself (or possibly a smoothed version of the signal value).

The following methods are available for estimating the growth rate:

"rel_change": uses (B/A - 1) / h, where B is the average of

`y`

over the second half of a sliding window of bandwidth h centered at the reference point`x0`

, and A the average over the first half. This can be seen as using a first-difference approximation to the derivative."linear_reg": uses the slope from a linear regression of

`y`

on`x`

over a sliding window centered at the reference point`x0`

, divided by the fitted value from this linear regression at`x0`

."smooth_spline": uses the estimated derivative at

`x0`

from a smoothing spline fit to`x`

and`y`

, via`stats::smooth.spline()`

, divided by the fitted value of the spline at`x0`

."trend_filter": uses the estimated derivative at

`x0`

from polynomial trend filtering (a discrete spline) fit to`x`

and`y`

, via`genlasso::trendfilter()`

, divided by the fitted value of the discrete spline at`x0`

.

## Log Scale

An alternative view for the growth rate of a function f in general is given
by defining g(t) = log(f(t)), and then observing that g'(t) = f'(t) /
f(t). Therefore, any method that estimates the derivative can be simply
applied to the log of the signal of interest, and in this light, each
method above ("rel_change", "linear_reg", "smooth_spline", and
"trend_filter") has a log scale analog, which can be used by setting
`log_scale = TRUE`

.

## Sliding Windows

For the local methods, "rel_change" and "linear_reg", we use a sliding window
centered at the reference point of bandiwidth `h`

. In other words, the
sliding window consists of all points in `x`

whose distance to the
reference point is at most `h`

. Note that the unit for this distance is
implicitly defined by the `x`

variable; for example, if `x`

is a vector of
`Date`

objects, `h = 7`

, and the reference point is January 7, then the
sliding window contains all data in between January 1 and 14 (matching the
behavior of `epi_slide()`

with `before = h - 1`

and `after = h`

).

## Additional Arguments

For the global methods, "smooth_spline" and "trend_filter", additional
arguments can be specified via `...`

for the underlying estimation
function. For the smoothing spline case, these additional arguments are
passed directly to `stats::smooth.spline()`

(and the defaults are exactly
as in this function). The trend filtering case works a bit differently:
here, a custom set of arguments is allowed (which are distributed
internally to `genlasso::trendfilter()`

and `genlasso::cv.trendfilter()`

):

`ord`

: order of piecewise polynomial for the trend filtering fit. Default is 3.`maxsteps`

: maximum number of steps to take in the solution path before terminating. Default is 1000.`cv`

: should cross-validation be used to choose an effective degrees of freedom for the fit? Default is`TRUE`

.`k`

: number of folds if cross-validation is to be used. Default is 3.`df`

: desired effective degrees of freedom for the trend filtering fit. If`cv = FALSE`

, then`df`

must be a positive integer; if`cv = TRUE`

, then`df`

must be one of "min" or "1se" indicating the selection rule to use based on the cross-validation error curve: minimum or 1-standard-error rule, respectively. Default is "min" (going along with the default`cv = TRUE`

). Note that if`cv = FALSE`

, then we require`df`

to be set by the user.

## Examples

```
# COVID cases growth rate by state using default method relative change
jhu_csse_daily_subset %>%
group_by(geo_value) %>%
mutate(cases_gr = growth_rate(x = time_value, y = cases))
#> An `epi_df` object, 4,026 x 7 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-08-23 02:40:48.296938
#>
#> # A tibble: 4,026 × 7
#> # Groups: geo_value [6]
#> geo_value time_value cases cases_7d_av case_rate_7d_av death_rate_7d_av
#> * <chr> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 ca 2020-03-01 6 1.29 0.00327 0
#> 2 ca 2020-03-02 4 1.71 0.00435 0
#> 3 ca 2020-03-03 6 2.43 0.00617 0
#> 4 ca 2020-03-04 11 3.86 0.00980 0.000363
#> 5 ca 2020-03-05 10 5.29 0.0134 0.000363
#> 6 ca 2020-03-06 18 7.86 0.0200 0.000363
#> 7 ca 2020-03-07 26 11.6 0.0294 0.000363
#> 8 ca 2020-03-08 19 13.4 0.0341 0.000363
#> 9 ca 2020-03-09 23 16.1 0.0410 0.000726
#> 10 ca 2020-03-10 22 18.4 0.0468 0.000726
#> # ℹ 4,016 more rows
#> # ℹ 1 more variable: cases_gr <dbl>
# Log scale, degree 4 polynomial and 6-fold cross validation
jhu_csse_daily_subset %>%
group_by(geo_value) %>%
mutate(gr_poly = growth_rate(x = time_value, y = cases, log_scale = TRUE, ord = 4, k = 6))
#> Warning: There were 3 warnings in `mutate()`.
#> The first warning was:
#> ℹ In argument: `gr_poly = growth_rate(...)`.
#> ℹ In group 1: `geo_value = "ca"`.
#> Caused by warning in `log()`:
#> ! NaNs produced
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings.
#> An `epi_df` object, 4,026 x 7 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2024-08-23 02:40:48.296938
#>
#> # A tibble: 4,026 × 7
#> # Groups: geo_value [6]
#> geo_value time_value cases cases_7d_av case_rate_7d_av death_rate_7d_av
#> * <chr> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 ca 2020-03-01 6 1.29 0.00327 0
#> 2 ca 2020-03-02 4 1.71 0.00435 0
#> 3 ca 2020-03-03 6 2.43 0.00617 0
#> 4 ca 2020-03-04 11 3.86 0.00980 0.000363
#> 5 ca 2020-03-05 10 5.29 0.0134 0.000363
#> 6 ca 2020-03-06 18 7.86 0.0200 0.000363
#> 7 ca 2020-03-07 26 11.6 0.0294 0.000363
#> 8 ca 2020-03-08 19 13.4 0.0341 0.000363
#> 9 ca 2020-03-09 23 16.1 0.0410 0.000726
#> 10 ca 2020-03-10 22 18.4 0.0468 0.000726
#> # ℹ 4,016 more rows
#> # ℹ 1 more variable: gr_poly <dbl>
```