Format predictions for submission to FluSight forecast Hub
Source:R/flusight_hub_formatter.R
flusight_hub_formatter.RdThis function converts predictions from any of the included forecasters into
a format (nearly) ready for submission to the 2023-24
FluSight-forecast-hub.
See there for documentation of the required columns. Currently, only
"quantile" forcasts are supported, but the intention is to support both
"quantile" and "pmf". For this reason, adding the output_type column should
be done via the ... argument. See the examples below. The specific required
format for this forecast task is here.
Usage
flusight_hub_formatter(object, ..., .fcast_period = c("daily", "weekly"))Arguments
- object
a data.frame of predictions or an object of class
canned_epipredas created by, e.g.,arx_forecaster()- ...
<
dynamic-dots> Name = value pairs of constant columns (or mutations) to perform to the results. See examples.- .fcast_period
Control whether the
horizonshould represent days or weeks. Depending on whether the forecaster output has target dates fromlayer_add_target_date()or not, we may need to compute the horizon and/or thetarget_end_datefrom the other available columns in the predictions. When bothaheadandtarget_dateare available, this is ignored. If onlyaheadoraheadsexists, then the target date may need to be multiplied if theaheadrepresents weekly forecasts. Alternatively, if only, thetarget_dateis available, then thehorizonwill be in days, unless this argument is"weekly". Note that these can be adjusted later by the...argument.
Value
A tibble::tibble. If ... is empty, the result will contain the
columns reference_date, horizon, target_end_date, location,
output_type_id, and value. The ... can perform mutations on any of
these.
Examples
library(dplyr)
library(epiprocess)
weekly_deaths <- covid_case_death_rates %>%
filter(
time_value >= as.Date("2021-09-01"),
geo_value %in% c("ca", "ny", "dc", "ga", "vt")
) %>%
select(geo_value, time_value, death_rate) %>%
left_join(state_census %>% select(pop, abbr), by = c("geo_value" = "abbr")) %>%
mutate(deaths = pmax(death_rate / 1e5 * pop * 7, 0)) %>%
select(-pop, -death_rate) %>%
group_by(geo_value) %>%
epi_slide(~ sum(.$deaths), .window_size = 7, .new_col_name = "deaths_7dsum") %>%
ungroup() %>%
filter(weekdays(time_value) == "Saturday")
cdc <- cdc_baseline_forecaster(weekly_deaths, "deaths_7dsum")
flusight_hub_formatter(cdc)
#> # A tibble: 575 × 7
#> reference_date horizon target_end_date location output_type_id value .pred
#> <date> <int> <date> <chr> <dbl> <dbl> <dbl>
#> 1 2021-12-25 1 2022-01-01 06 0.01 2147. 3166.
#> 2 2021-12-25 1 2022-01-01 06 0.025 2165. 3166.
#> 3 2021-12-25 1 2022-01-01 06 0.05 2215. 3166.
#> 4 2021-12-25 1 2022-01-01 06 0.1 2277. 3166.
#> 5 2021-12-25 1 2022-01-01 06 0.15 2594. 3166.
#> 6 2021-12-25 1 2022-01-01 06 0.2 2847. 3166.
#> 7 2021-12-25 1 2022-01-01 06 0.25 2886. 3166.
#> 8 2021-12-25 1 2022-01-01 06 0.3 2931. 3166.
#> 9 2021-12-25 1 2022-01-01 06 0.35 2977. 3166.
#> 10 2021-12-25 1 2022-01-01 06 0.4 3001. 3166.
#> # ℹ 565 more rows
flusight_hub_formatter(cdc, target = "wk inc covid deaths")
#> # A tibble: 575 × 8
#> reference_date horizon target_end_date location output_type_id value .pred
#> <date> <int> <date> <chr> <dbl> <dbl> <dbl>
#> 1 2021-12-25 1 2022-01-01 06 0.01 2147. 3166.
#> 2 2021-12-25 1 2022-01-01 06 0.025 2165. 3166.
#> 3 2021-12-25 1 2022-01-01 06 0.05 2215. 3166.
#> 4 2021-12-25 1 2022-01-01 06 0.1 2277. 3166.
#> 5 2021-12-25 1 2022-01-01 06 0.15 2594. 3166.
#> 6 2021-12-25 1 2022-01-01 06 0.2 2847. 3166.
#> 7 2021-12-25 1 2022-01-01 06 0.25 2886. 3166.
#> 8 2021-12-25 1 2022-01-01 06 0.3 2931. 3166.
#> 9 2021-12-25 1 2022-01-01 06 0.35 2977. 3166.
#> 10 2021-12-25 1 2022-01-01 06 0.4 3001. 3166.
#> # ℹ 565 more rows
#> # ℹ 1 more variable: target <chr>
flusight_hub_formatter(cdc, target = paste(horizon, "wk inc covid deaths"))
#> # A tibble: 575 × 8
#> reference_date horizon target_end_date location output_type_id value .pred
#> <date> <int> <date> <chr> <dbl> <dbl> <dbl>
#> 1 2021-12-25 1 2022-01-01 06 0.01 2147. 3166.
#> 2 2021-12-25 1 2022-01-01 06 0.025 2165. 3166.
#> 3 2021-12-25 1 2022-01-01 06 0.05 2215. 3166.
#> 4 2021-12-25 1 2022-01-01 06 0.1 2277. 3166.
#> 5 2021-12-25 1 2022-01-01 06 0.15 2594. 3166.
#> 6 2021-12-25 1 2022-01-01 06 0.2 2847. 3166.
#> 7 2021-12-25 1 2022-01-01 06 0.25 2886. 3166.
#> 8 2021-12-25 1 2022-01-01 06 0.3 2931. 3166.
#> 9 2021-12-25 1 2022-01-01 06 0.35 2977. 3166.
#> 10 2021-12-25 1 2022-01-01 06 0.4 3001. 3166.
#> # ℹ 565 more rows
#> # ℹ 1 more variable: target <chr>
flusight_hub_formatter(cdc, target = "wk inc covid deaths", output_type = "quantile")
#> # A tibble: 575 × 9
#> reference_date horizon target_end_date location output_type_id value .pred
#> <date> <int> <date> <chr> <dbl> <dbl> <dbl>
#> 1 2021-12-25 1 2022-01-01 06 0.01 2147. 3166.
#> 2 2021-12-25 1 2022-01-01 06 0.025 2165. 3166.
#> 3 2021-12-25 1 2022-01-01 06 0.05 2215. 3166.
#> 4 2021-12-25 1 2022-01-01 06 0.1 2277. 3166.
#> 5 2021-12-25 1 2022-01-01 06 0.15 2594. 3166.
#> 6 2021-12-25 1 2022-01-01 06 0.2 2847. 3166.
#> 7 2021-12-25 1 2022-01-01 06 0.25 2886. 3166.
#> 8 2021-12-25 1 2022-01-01 06 0.3 2931. 3166.
#> 9 2021-12-25 1 2022-01-01 06 0.35 2977. 3166.
#> 10 2021-12-25 1 2022-01-01 06 0.4 3001. 3166.
#> # ℹ 565 more rows
#> # ℹ 2 more variables: target <chr>, output_type <chr>