Create a score card data frame based on covid forecasts — evaluate_covid

Evaluates the performance of a covid forecaster, through the following steps:

Takes a prediction card (as created by get_predictions()).
Downloads from the COVIDcast API the latest available data to compute what actually occurred (summing the response over the incidence period).
Computes various user-specified error measures.

Backfill refers to the process by which some data sources go back in time updating previously reported values. Suppose it is September 14 and we are evaluating our predictions for what happened in the previous epiweek (September 6 through 12). Although we may be able to calculate a value for "actual", we might not trust this value since on September 16, backfill may occur changing what is known about the period September 6 through 12. There are two consequences of this phenomenon. First, running this function on different dates may result in different estimates of the error. Second, we may not trust the evaluations we get that are too recent. The parameter backfill_buffer specifies how long of a buffer period we should enforce. This will be dependent on the data source and signal and is left to the user to determine. If backfill is not relevant for the particular signal you are predicting, then you can set backfill_buffer to 0.

evaluate_covid_predictions(
  predictions_cards,
  err_measures = list(wis = weighted_interval_score, ae = absolute_error, coverage_80 =
    interval_coverage(coverage = 0.8)),
  backfill_buffer = 0,
  geo_type = c("county", "hrr", "msa", "dma", "state", "hhs", "nation")
)

Arguments

predictions_cards	tibble of quantile forecasts, which contains at least `quantile` and `value` columns, as well as any other prediction task identifiers. Must be of class "predictions_cards". Covid predictions card may be created by the function `get_predictions()`, downloaded with `get_covidhub_predictions()` or potentially created manually.
err_measures	Named list of one or more functions, where each function takes a data frame with three columns `quantile`, `value` and `actual` (i.e., observed) returns a scalar measure of error. Null or an empty list may be provided if scoring is not desired.
backfill_buffer	How many days until response is deemed trustworthy enough to be taken as correct? See details for more.
geo_type	String indicating geographical type, such as "county", or "state". See the COVIDcast Geographic Coding documentation for available options.

Value

tibble of "score cards". Contains the same information as the predictions_cards() with additional columns for each err_measure and for the truth (named actual).