Evaluates the performance of a forecaster, through the following steps:

  1. Takes a prediction card (as created by get_predictions()).

  2. Computes various user-specified error measures.

The result is a "score card" data frame, where each row corresponds to a prediction-result pair, where the columns define the prediction task, give the observed value, and give the calculated values of the provided error measures.

evaluate_predictions(
  predictions_cards,
  truth_data,
  err_measures = list(wis = weighted_interval_score, ae = absolute_error, coverage_80 =
    interval_coverage(coverage = 0.8)),
  grp_vars = c("forecaster", intersect(colnames(predictions_cards),
    colnames(truth_data)))
)

Arguments

predictions_cards

tibble of quantile forecasts, which contains at least quantile and value columns, as well as any other prediction task identifiers. For covid data, a predictions card may be created by the function get_predictions(), downloaded with get_covidhub_predictions() or created manually.

truth_data

truth data (observed). This should be a data frame that will be joined to predictions_cards by all available columns. The observed data column should be named actual.

err_measures

Named list of one or more functions, where each function takes a data frame with three columns quantile, value and actual (i.e., observed) and returns a scalar measure of error. Null or an empty list may be provided if scoring is not desired.

grp_vars

character vector of named columns in predictions_cards such that the combination gives a unique (quantile) prediction.

Value

tibble of "score cards". Contains the same information as the predictions_cards() with additional columns for each err_measure and for the truth (named actual).