Format predictions for submission to FluSight forecast Hub
Source:R/flusight_hub_formatter.R
flusight_hub_formatter.Rd
This function converts predictions from any of the included forecasters into
a format (nearly) ready for submission to the 2023-24
FluSight-forecast-hub.
See there for documentation of the required columns. Currently, only
"quantile" forcasts are supported, but the intention is to support both
"quantile" and "pmf". For this reason, adding the output_type
column should
be done via the ...
argument. See the examples below. The specific required
format for this forecast task is here.
Usage
flusight_hub_formatter(object, ..., .fcast_period = c("daily", "weekly"))
Arguments
- object
a data.frame of predictions or an object of class
canned_epipred
as created by, e.g.,arx_forecaster()
- ...
<
dynamic-dots
> Name = value pairs of constant columns (or mutations) to perform to the results. See examples.- .fcast_period
Control whether the
horizon
should represent days or weeks. Depending on whether the forecaster output has target dates fromlayer_add_target_date()
or not, we may need to compute the horizon and/or thetarget_end_date
from the other available columns in the predictions. When bothahead
andtarget_date
are available, this is ignored. If onlyahead
oraheads
exists, then the target date may need to be multiplied if theahead
represents weekly forecasts. Alternatively, if only, thetarget_date
is available, then thehorizon
will be in days, unless this argument is"weekly"
. Note that these can be adjusted later by the...
argument.
Value
A tibble::tibble. If ...
is empty, the result will contain the
columns reference_date
, horizon
, target_end_date
, location
,
output_type_id
, and value
. The ...
can perform mutations on any of
these.
Examples
library(dplyr)
weekly_deaths <- covid_case_death_rates %>%
filter(
time_value >= as.Date("2021-09-01"),
geo_value %in% c("ca", "ny", "dc", "ga", "vt")
) %>%
select(geo_value, time_value, death_rate) %>%
left_join(state_census %>% select(pop, abbr), by = c("geo_value" = "abbr")) %>%
mutate(deaths = pmax(death_rate / 1e5 * pop * 7, 0)) %>%
select(-pop, -death_rate) %>%
group_by(geo_value) %>%
epi_slide(~ sum(.$deaths), .window_size = 7, .new_col_name = "deaths_7dsum") %>%
ungroup() %>%
filter(weekdays(time_value) == "Saturday")
cdc <- cdc_baseline_forecaster(weekly_deaths, "deaths_7dsum")
flusight_hub_formatter(cdc)
#> # A tibble: 575 × 7
#> reference_date horizon target_end_date location output_type_id value .pred
#> <date> <int> <date> <chr> <dbl> <dbl> <dbl>
#> 1 2021-12-25 1 2022-01-01 06 0.01 2088. 3164.
#> 2 2021-12-25 1 2022-01-01 06 0.025 2113. 3164.
#> 3 2021-12-25 1 2022-01-01 06 0.05 2169. 3164.
#> 4 2021-12-25 1 2022-01-01 06 0.1 2240. 3164.
#> 5 2021-12-25 1 2022-01-01 06 0.15 2514. 3164.
#> 6 2021-12-25 1 2022-01-01 06 0.2 2821. 3164.
#> 7 2021-12-25 1 2022-01-01 06 0.25 2888. 3164.
#> 8 2021-12-25 1 2022-01-01 06 0.3 2922. 3164.
#> 9 2021-12-25 1 2022-01-01 06 0.35 2954. 3164.
#> 10 2021-12-25 1 2022-01-01 06 0.4 3010. 3164.
#> # ℹ 565 more rows
flusight_hub_formatter(cdc, target = "wk inc covid deaths")
#> # A tibble: 575 × 8
#> reference_date horizon target_end_date location output_type_id value .pred
#> <date> <int> <date> <chr> <dbl> <dbl> <dbl>
#> 1 2021-12-25 1 2022-01-01 06 0.01 2088. 3164.
#> 2 2021-12-25 1 2022-01-01 06 0.025 2113. 3164.
#> 3 2021-12-25 1 2022-01-01 06 0.05 2169. 3164.
#> 4 2021-12-25 1 2022-01-01 06 0.1 2240. 3164.
#> 5 2021-12-25 1 2022-01-01 06 0.15 2514. 3164.
#> 6 2021-12-25 1 2022-01-01 06 0.2 2821. 3164.
#> 7 2021-12-25 1 2022-01-01 06 0.25 2888. 3164.
#> 8 2021-12-25 1 2022-01-01 06 0.3 2922. 3164.
#> 9 2021-12-25 1 2022-01-01 06 0.35 2954. 3164.
#> 10 2021-12-25 1 2022-01-01 06 0.4 3010. 3164.
#> # ℹ 565 more rows
#> # ℹ 1 more variable: target <chr>
flusight_hub_formatter(cdc, target = paste(horizon, "wk inc covid deaths"))
#> # A tibble: 575 × 8
#> reference_date horizon target_end_date location output_type_id value .pred
#> <date> <int> <date> <chr> <dbl> <dbl> <dbl>
#> 1 2021-12-25 1 2022-01-01 06 0.01 2088. 3164.
#> 2 2021-12-25 1 2022-01-01 06 0.025 2113. 3164.
#> 3 2021-12-25 1 2022-01-01 06 0.05 2169. 3164.
#> 4 2021-12-25 1 2022-01-01 06 0.1 2240. 3164.
#> 5 2021-12-25 1 2022-01-01 06 0.15 2514. 3164.
#> 6 2021-12-25 1 2022-01-01 06 0.2 2821. 3164.
#> 7 2021-12-25 1 2022-01-01 06 0.25 2888. 3164.
#> 8 2021-12-25 1 2022-01-01 06 0.3 2922. 3164.
#> 9 2021-12-25 1 2022-01-01 06 0.35 2954. 3164.
#> 10 2021-12-25 1 2022-01-01 06 0.4 3010. 3164.
#> # ℹ 565 more rows
#> # ℹ 1 more variable: target <chr>
flusight_hub_formatter(cdc, target = "wk inc covid deaths", output_type = "quantile")
#> # A tibble: 575 × 9
#> reference_date horizon target_end_date location output_type_id value .pred
#> <date> <int> <date> <chr> <dbl> <dbl> <dbl>
#> 1 2021-12-25 1 2022-01-01 06 0.01 2088. 3164.
#> 2 2021-12-25 1 2022-01-01 06 0.025 2113. 3164.
#> 3 2021-12-25 1 2022-01-01 06 0.05 2169. 3164.
#> 4 2021-12-25 1 2022-01-01 06 0.1 2240. 3164.
#> 5 2021-12-25 1 2022-01-01 06 0.15 2514. 3164.
#> 6 2021-12-25 1 2022-01-01 06 0.2 2821. 3164.
#> 7 2021-12-25 1 2022-01-01 06 0.25 2888. 3164.
#> 8 2021-12-25 1 2022-01-01 06 0.3 2922. 3164.
#> 9 2021-12-25 1 2022-01-01 06 0.35 2954. 3164.
#> 10 2021-12-25 1 2022-01-01 06 0.4 3010. 3164.
#> # ℹ 565 more rows
#> # ℹ 2 more variables: target <chr>, output_type <chr>