Constructs a list of arguments for arx_classifier()
.
Usage
arx_class_args_list(
lags = c(0L, 7L, 14L),
ahead = 7L,
n_training = Inf,
forecast_date = NULL,
target_date = NULL,
outcome_transform = c("growth_rate", "lag_difference"),
breaks = 0.25,
horizon = 7L,
method = c("rel_change", "linear_reg"),
log_scale = FALSE,
additional_gr_args = list(),
nafill_buffer = Inf,
check_enough_data_n = NULL,
check_enough_data_epi_keys = NULL,
...
)
Arguments
- lags
Vector or List. Positive integers enumerating lags to use in autoregressive-type models (in days). By default, an unnamed list of lags will be set to correspond to the order of the predictors.
- ahead
Integer. Number of time steps ahead (in days) of the forecast date for which forecasts should be produced.
- n_training
Integer. An upper limit for the number of rows per key that are used for training (in the time unit of the
epi_df
).- forecast_date
Date. The date on which the forecast is created. The default
NULL
will attempt to determine this automatically.- target_date
Date. The date for which the forecast is intended. The default
NULL
will attempt to determine this automatically.- outcome_transform
Scalar character. Whether the outcome should be created using growth rates (as the predictors are) or lagged differences. The second case is closer to the requirements for the 2022-23 CDC Flusight Hospitalization Experimental Target. See the Classification Vignette for details of how to create a reasonable baseline for this case. Selecting
"growth_rate"
(the default) usesepiprocess::growth_rate()
to create the outcome using some of the additional arguments below. Choosing"lag_difference"
instead simply uses the change from the value at the selectedhorizon
.- breaks
Vector. A vector of breaks to turn real-valued growth rates into discrete classes. The default gives binary upswing classification as in McDonald, Bien, Green, Hu, et al.. This coincides with the default
trainer = parsnip::logistic_reg()
argument inarx_classifier()
. However, multiclass classification is also supported (e.g. withbreaks = c(-.2, .25)
) provided thattrainer = parsnip::multinom_reg()
(or another multiclass trainer) is used as well. These will be sliently expanded to cover the entire real line (so the default will becomebreaks = c(-Inf, .25, Inf)
) before being used to discretize the response. This is different than the behaviour inrecipes::step_cut()
which creates classes that only cover the range of the training data.- horizon
Scalar integer. This is passed to the
h
argument ofepiprocess::growth_rate()
. It determines the amount of data used to calculate the growth rate.- method
Character. Options available for growth rate calculation.
- log_scale
Scalar logical. Whether to compute growth rates on the log scale.
- additional_gr_args
List. Optional arguments controlling growth rate calculation. See
epiprocess::growth_rate()
and the related Vignette for more details.- nafill_buffer
At predict time, recent values of the training data are used to create a forecast. However, these can be
NA
due to, e.g., data latency issues. By default, any missing values will get filled with less recent data. Setting this value toNULL
will result in 1 extra recent row (beyond those required for lag creation) to be used. Note that we require at leastmin(lags)
rows of recent data pergeo_value
to create a prediction. For this reason, settingnafill_buffer < min(lags)
will be treated as additional allowed recent data rather than the total amount of recent data to examine.- check_enough_data_n
Integer. A lower limit for the number of rows per epi_key that are required for training. If
NULL
, this check is ignored.- check_enough_data_epi_keys
Character vector. A character vector of column names on which to group the data and check threshold within each group. Useful if training per group (for example, per geo_value).
- ...
Space to handle future expansions (unused).
Examples
arx_class_args_list()
#> • lags : 0, 7, and 14
#> • ahead : 7
#> • n_training : Inf
#> • breaks : -Inf, 0.25, and Inf
#> • forecast_date : "NULL"
#> • target_date : "NULL"
#> • outcome_transform : "growth_rate"
#> • max_lags : 14
#> • horizon : 7
#> • method : "rel_change"
#> • log_scale : FALSE
#> • additional_gr_args : "_empty_"
#> • nafill_buffer : Inf
#> • check_enough_data_n : "NULL"
#> • check_enough_data_epi_keys : "NULL"
# 3-class classsification,
# also needs arx_classifier(trainer = parsnip::multinom_reg())
arx_class_args_list(breaks = c(-.2, .25))
#> • lags : 0, 7, and 14
#> • ahead : 7
#> • n_training : Inf
#> • breaks : -Inf, -0.2, 0.25, and Inf
#> • forecast_date : "NULL"
#> • target_date : "NULL"
#> • outcome_transform : "growth_rate"
#> • max_lags : 14
#> • horizon : 7
#> • method : "rel_change"
#> • log_scale : FALSE
#> • additional_gr_args : "_empty_"
#> • nafill_buffer : Inf
#> • check_enough_data_n : "NULL"
#> • check_enough_data_epi_keys : "NULL"