Skip to contents

Estimates the growth rate of a signal at given points along the underlying sequence. Several methodologies are available; see the growth rate vignette for examples.

Usage

growth_rate(
  x = seq_along(y),
  y,
  x0 = x,
  method = c("rel_change", "linear_reg", "smooth_spline", "trend_filter"),
  h = 7,
  log_scale = FALSE,
  dup_rm = FALSE,
  na_rm = FALSE,
  ...
)

Arguments

x

Design points corresponding to the signal values y. Default is seq_along(y) (that is, equally-spaced points from 1 to the length of y).

y

Signal values.

x0

Points at which we should estimate the growth rate. Must be a subset of x (no extrapolation allowed). Default is x.

method

Either "rel_change", "linear_reg", "smooth_spline", or "trend_filter", indicating the method to use for the growth rate calculation. The first two are local methods: they are run in a sliding fashion over the sequence (in order to estimate derivatives and hence growth rates); the latter two are global methods: they are run once over the entire sequence. See details for more explanation.

h

Bandwidth for the sliding window, when method is "rel_change" or "linear_reg". See details for more explanation.

log_scale

Should growth rates be estimated using the parametrization on the log scale? See details for an explanation. Default is FALSE.

dup_rm

Should we check and remove duplicates in x (and corresponding elements of y) before the computation? Some methods might handle duplicate x values gracefully, whereas others might fail (either quietly or loudly). Default is FALSE.

na_rm

Should missing values be removed before the computation? Default is FALSE.

...

Additional arguments to pass to the method used to estimate the derivative.

Value

Vector of growth rate estimates at the specified points x0.

Details

The growth rate of a function f defined over a continuously-valued parameter t is defined as f'(t) / f(t), where f'(t) is the derivative of f at t. To estimate the growth rate of a signal in discrete-time (which can be thought of as evaluations or discretizations of an underlying function in continuous-time), we can therefore estimate the derivative and divide by the signal value itself (or possibly a smoothed version of the signal value).

The following methods are available for estimating the growth rate:

  • "rel_change": uses (B/A - 1) / h, where B is the average of y over the second half of a sliding window of bandwidth h centered at the reference point x0, and A the average over the first half. This can be seen as using a first-difference approximation to the derivative.

  • "linear_reg": uses the slope from a linear regression of y on x over a sliding window centered at the reference point x0, divided by the fitted value from this linear regression at x0.

  • "smooth_spline": uses the estimated derivative at x0 from a smoothing spline fit to x and y, via stats::smooth.spline(), divided by the fitted value of the spline at x0.

  • "trend_filter": uses the estimated derivative at x0 from polynomial trend filtering (a discrete spline) fit to x and y, via genlasso::trendfilter(), divided by the fitted value of the discrete spline at x0.

Log Scale

An alternative view for the growth rate of a function f in general is given by defining g(t) = log(f(t)), and then observing that g'(t) = f'(t) / f(t). Therefore, any method that estimates the derivative can be simply applied to the log of the signal of interest, and in this light, each method above ("rel_change", "linear_reg", "smooth_spline", and "trend_filter") has a log scale analog, which can be used by setting log_scale = TRUE.

Sliding Windows

For the local methods, "rel_change" and "linear_reg", we use a sliding window centered at the reference point of bandiwidth h. In other words, the sliding window consists of all points in x whose distance to the reference point is at most h. Note that the unit for this distance is implicitly defined by the x variable; for example, if x is a vector of Date objects, h = 7, and the reference point is January 7, then the sliding window contains all data in between January 1 and 14 (matching the behavior of epi_slide() with before = h - 1 and after = h).

Additional Arguments

For the global methods, "smooth_spline" and "trend_filter", additional arguments can be specified via ... for the underlying estimation function. For the smoothing spline case, these additional arguments are passed directly to stats::smooth.spline() (and the defaults are exactly as in this function). The trend filtering case works a bit differently: here, a custom set of arguments is allowed (which are distributed internally to genlasso::trendfilter() and genlasso::cv.trendfilter()):

  • ord: order of piecewise polynomial for the trend filtering fit. Default is 3.

  • maxsteps: maximum number of steps to take in the solution path before terminating. Default is 1000.

  • cv: should cross-validation be used to choose an effective degrees of freedom for the fit? Default is TRUE.

  • k: number of folds if cross-validation is to be used. Default is 3.

  • df: desired effective degrees of freedom for the trend filtering fit. If cv = FALSE, then df must be a positive integer; if cv = TRUE, then df must be one of "min" or "1se" indicating the selection rule to use based on the cross-validation error curve: minimum or 1-standard-error rule, respectively. Default is "min" (going along with the default cv = TRUE). Note that if cv = FALSE, then we require df to be set by the user.

Examples

# COVID cases growth rate by state using default method relative change
jhu_csse_daily_subset %>%
  group_by(geo_value) %>%
  mutate(cases_gr = growth_rate(x = time_value, y = cases))
#> An `epi_df` object, 4,026 x 7 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-08-23 02:40:48.296938
#> 
#> # A tibble: 4,026 × 7
#> # Groups:   geo_value [6]
#>    geo_value time_value cases cases_7d_av case_rate_7d_av death_rate_7d_av
#>  * <chr>     <date>     <dbl>       <dbl>           <dbl>            <dbl>
#>  1 ca        2020-03-01     6        1.29         0.00327         0       
#>  2 ca        2020-03-02     4        1.71         0.00435         0       
#>  3 ca        2020-03-03     6        2.43         0.00617         0       
#>  4 ca        2020-03-04    11        3.86         0.00980         0.000363
#>  5 ca        2020-03-05    10        5.29         0.0134          0.000363
#>  6 ca        2020-03-06    18        7.86         0.0200          0.000363
#>  7 ca        2020-03-07    26       11.6          0.0294          0.000363
#>  8 ca        2020-03-08    19       13.4          0.0341          0.000363
#>  9 ca        2020-03-09    23       16.1          0.0410          0.000726
#> 10 ca        2020-03-10    22       18.4          0.0468          0.000726
#> # ℹ 4,016 more rows
#> # ℹ 1 more variable: cases_gr <dbl>

# Log scale, degree 4 polynomial and 6-fold cross validation
jhu_csse_daily_subset %>%
  group_by(geo_value) %>%
  mutate(gr_poly = growth_rate(x = time_value, y = cases, log_scale = TRUE, ord = 4, k = 6))
#> Warning: There were 3 warnings in `mutate()`.
#> The first warning was:
#>  In argument: `gr_poly = growth_rate(...)`.
#>  In group 1: `geo_value = "ca"`.
#> Caused by warning in `log()`:
#> ! NaNs produced
#>  Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings.
#> An `epi_df` object, 4,026 x 7 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-08-23 02:40:48.296938
#> 
#> # A tibble: 4,026 × 7
#> # Groups:   geo_value [6]
#>    geo_value time_value cases cases_7d_av case_rate_7d_av death_rate_7d_av
#>  * <chr>     <date>     <dbl>       <dbl>           <dbl>            <dbl>
#>  1 ca        2020-03-01     6        1.29         0.00327         0       
#>  2 ca        2020-03-02     4        1.71         0.00435         0       
#>  3 ca        2020-03-03     6        2.43         0.00617         0       
#>  4 ca        2020-03-04    11        3.86         0.00980         0.000363
#>  5 ca        2020-03-05    10        5.29         0.0134          0.000363
#>  6 ca        2020-03-06    18        7.86         0.0200          0.000363
#>  7 ca        2020-03-07    26       11.6          0.0294          0.000363
#>  8 ca        2020-03-08    19       13.4          0.0341          0.000363
#>  9 ca        2020-03-09    23       16.1          0.0410          0.000726
#> 10 ca        2020-03-10    22       18.4          0.0468          0.000726
#> # ℹ 4,016 more rows
#> # ℹ 1 more variable: gr_poly <dbl>