Estimate derivatives of values in covidcast_signal data frame

Estimates derivatives of the values in a covidcast_signal data frame, using a local (in time) linear regression or smoothing spline. (When multiple issue dates are present, only the latest issue is considered.) See the estimating derivatives vignette for examples.

estimate_deriv(
  x,
  method = c("lin", "ss", "tf"),
  n = 14,
  col_name = "deriv",
  keep_obj = FALSE,
  deriv = 1,
  ...
)

Arguments

x	The `covidcast_signal` data frame under consideration.
method	One of "lin", "ss", or "tf" indicating the method to use for the derivative calculation. To estimate the derivative at any time point, we run the given method on the last `n` days of data, and use the corresponding predicted derivative (that is, the derivative of the underlying estimated function, linear or spline) at the current time point. Default is "lin". See details below.
n	Size of the local window (in days) to use. For example, if `n = 5`, then to estimate the derivative on November 5, we train the given method on data in between November 1 and November 5. Default is 14.
col_name	String indicating the name of the new column that will contain the derivative values. Default is "deriv"; note that setting `col_name = "value"` will overwrite the existing "value" column.
keep_obj	Should the fitted object (from linear regression, smoothing spline, or trend filtering) be kept as a separate column? If `TRUE`, then this column name is given by appending "_obj" to `col_name`. Default is `FALSE`.
deriv	Order of derivative to estimate. Only orders 1 or 2 are allowed, with the default being 1. (In some cases, a second-order derivative will return a trivial result: for example: when `method = "lin"`, this will always be zero.)
...	Additional arguments to pass to the function that estimates derivatives. See details below.

Value

A data frame given by appending a new column to x named according to the col_name argument, containing the estimated derivative values.

Details

Derivatives are estimated using:

Linear regression, when method = "lin", via stats::lsfit().
Cubic smoothing spline, when method = "ss", via stats::smooth.spline().
Polynomial trend filtering, when method = "tf", via genlasso::trendfilter().

The second and third cases base the derivative calculation on a nonparametric fit and should typically be used with a larger window n. The third case (trend filtering) is more locally adaptive than the second (smoothing spline) and can work better when there are sharp changes in the smoothness of the underlying values.

In the first and second cases (linear regression and smoothing spline), the additional arguments in ... are directly passed to the underlying estimation function (stats::lsfit() and stats::smooth.spline()).

The third case (trend filtering) works a little differently. Here, a custom set of arguments is allowed (and are internally distributed as appropriate to genlasso::trendfilter(), genlasso::cv.trendfilter(), and genlasso::coef.genlasso()):

ord: Order of piecewise polynomial for the trend filtering fit, default is 2.
maxsteps: Maximum number of steps to take in the solution path before terminating, default is 100.
cv: Boolean indicating whether cross-validation should be used to choose an effective degrees of freedom for the fit, default is FALSE.
k: Number of folds if cross-validation is to be used. Default is 5.
df: Desired effective degrees of freedom for the trend filtering fit. If cv = FALSE, then df must be an integer; if cv = TRUE, then df should be one of "min" or "1se" indicating the selection rule to use based on the cross-validation error curve (minimum or 1-standard-error rule, respectively). Default is 8 when cv = FALSE, and "1se" when cv = TRUE.