Simple quantile autoregressive forecaster based on quantgen

A simple quantile autoregressive forecaster based on quantgen, to be used with evalcast, via evalcast::get_predictions(). See the quantgen forecast vignette for examples.

quantgen_forecaster(
  df,
  forecast_date,
  signals,
  incidence_period,
  ahead,
  geo_type,
  n = 4 * ifelse(incidence_period == "day", 7, 1),
  lags = 0,
  tau = modeltools::covidhub_probs,
  transform = NULL,
  inv_trans = NULL,
  featurize = NULL,
  noncross = FALSE,
  noncross_points = c("all", "test", "train"),
  cv_type = c("forward", "random"),
  verbose = FALSE,
  ...
)

Arguments

df	Data frame of signal values to use for forecasting, of the format that is returned by `covidcast::covidcast_signals()`.
forecast_date	Date object or string of the form "YYYY-MM-DD", indicating the date on which forecasts will be made. For example, if `forecast_date = "2020-05-11"`, `incidence_period = "day"`, and `ahead = 3`, then, forecasts would be made for "2020-05-14".
signals	Tibble with columns `data_source` and `signal` that specifies which variables are being fetched from the COVIDcast API, and populated in `df`. Each row of `signals` represents a separate signal, and first row is taken to be the response. An optional column `start_day` can also be included. This can be a Date object or string in the form "YYYY-MM-DD", indicating the earliest date of data needed from that data source. Importantly, `start_day` can also be a function (represented as a list column) that takes a forecast date and returns a start date for model training (again, Date object or string in the form "YYYY-MM-DD"). The latter is useful when the start date should be computed dynamically from the forecast date (e.g., when the forecaster only trains on the most recent 4 weeks of data).
incidence_period	One of "day or "epiweek", indicating the period over which forecasts are being made. Default is "day".
ahead	Vector of ahead values, indicating how many days/epiweeks ahead to forecast. If `incidence_period = "day"`, then `ahead = 1` means the day after forecast date. If `incidence_period = "epiweek"` and the forecast date falls on a Sunday or Monday, then `ahead = 1` means the epiweek that includes the forecast date; if `forecast_date` falls on a Tuesday through Saturday, then it means the following epiweek.
n	Size of the local training window (in days/weeks, depending on `incidence_period`) to use. For example, if `n = 14`, and `incidence_period = "day"`, then to make a 1-day-ahead forecast on December 15, we train on data from November 1 to November 14.
lags	Vector of lag values to use as features in the autoregressive model. For example, when `incidence_period = "day"`, setting `lags = c(0, 7, 14)`means we use the current value of each signal (defined by a row of the `signals` tibble), as well as the values 7 and 14 days ago, as the features. Recall that the response is defined by the first row of the `signals` tibble. Note that `lags` can also be a list of vectors of lag values, this list having the same length as the number of rows of `signals`, in order to apply a different set of shifts to each signal. Default is 0, which means no additional lags (only current values) for each signal.
tau	Vector of quantile levels for the probabilistic forecast. If not specified, defaults to the levels required by the COVID Forecast Hub.
transform, inv_trans	Transformation and inverse transformations to use for the response/features. The former `transform` can be a function or a list of functions, this list having the same length as the number of rows in the `signals` tibble, in order to apply the same transformation or a different transformation to each signal. These transformations will be applied before fitting the quantile model. The latter argument `inv_trans` specifies the inverse transformation to use on the response variable (inverse of `transform` if this is a function, or of `transform[[1]]` if `transform` is a list), which will be applied post prediction from the quantile model. Several convenience functions for transformations exist as part of the `quantgen` package. Default is `NULL` for both `transform` and `inv_trans`, which means no transformations are applied.
featurize	Function to construct custom features before the quantile model is fit. As input, this function must take a data frame with columns `geo_value`, `time_value`, then the transformed, lagged signal values. This function must return a data frame with columns `geo_value`, `time_value`, then any custom features. The rows of the returned data frame must not be reordered.
noncross	Should noncrossing constraints be applied? These force the predicted quantiles to be properly ordered across all quantile levels being considered. The default is `FALSE`. If `TRUE`, then noncrossing constraints are applied to the estimated quantiles at all points specified by the next argument.
noncross_points	One of "all", "test", "train" indicating which points to use for the noncrossing constraints: the default "all" means to use both training and testing sets combined, while "test" or "train" means to use just one set, training or testing, respectively.
cv_type	One of "forward" or "random", indicating the type of cross-validation to perform. If "random", then `nfolds` folds are chosen by dividing training data points randomly (the default being `nfolds = 5`). If "forward", the default, then we instead use a "forward-validation" approach that better reflects the way predictions are made in the current time series forecasting context. Roughly, this works as follows: the data points from the first `n - nfolds` time values are used for model training, and then predictions are made at the earliest possible forecast date after this training period. We march forward one time point at a time and repeat. In either case ("random" or "forward"), the loss function used for computing validation error is quantile regression loss (read the documentation for `quantgen::cv_quantile_lasso()` for more details); and the final quantile model is refit on the full training set using the validation-optimal tuning parameter.
verbose	Should progress be printed out to the console? Default is `FALSE`.
...	Additional arguments. Any parameter accepted by `quantgen::cv_quantile_lasso()` (for model training) or by `quantgen:::predict.cv_quantile_genlasso()` (for model prediction) can be passed here. For example, `nfolds`, for specifying the number of folds used in cross-validation, or `lambda`, for specifying the tuning parameter values over which to perform cross-validation (the default allows `quantgen::cv_quantile_lasso()` to set the lambda sequence itself). Note that fixing a single tuning parameter value (such as `lambda = 0`) effectively disables cross-validation and fits a quantile model at the given tuning parameter value (here unregularized quantile autoregression).

Value

Data frame with columns ahead, geo_value, quantile, and value. The quantile column gives the probabilities associated with quantile forecasts for that location and ahead.