Converts a line list (a data frame where each row represents a case or event)
into an epi_archive object. This function requires "recorded" and
"deleted" timestamps, and generates a time series of counts (e.g. daily
hospitalizations) as they would have appeared at different points in time.
Usage
linelist_to_archive(
x,
...,
geo_value = NULL,
other_keys = NULL,
time_value = NULL,
version_recorded = NULL,
version_deleted = NULL,
is_deletion = NULL,
value = NULL,
id = NULL,
clobberable_versions_start = NA,
versions_end = NULL
)Arguments
- x
A data frame (line list).
- ...
Should be empty.
- geo_value, other_keys, time_value, version_recorded, version_deleted
<
tidy-select> Columns inxrepresenting:geo_value: the geographic location of the event.other_keys: (optional) additional key columns (e.g. age group).time_value: the time of the event.version_recorded: the time at which the event became known/recorded.version_deleted: (optional) the time at which the event was removed/deleted. IfNULL(default), it is assumed no events are deleted. Mutually exclusive withis_deletion.
- is_deletion
(optional) <
tidy-select> Column inxindicating if the row is a deletion (TRUE/1) or an entry (FALSE/0). Used for "chart-style" linelists where each row is an update event andversion_recordedrepresents when the update occurred. Mutually exclusive withversion_deleted.- value
Either
NULL(default) or a string specifying the name of the output count column. IfNULLandother_keysis empty, defaults to "count".- id
<
tidy-select> Optional column identifying unique events/cases.- clobberable_versions_start
Optional;
length-1; either a value of the sameclassasx$version, or anNAof anyclass: specifically, either (a) the earliest version that could be subject to "clobbering" (being overwritten with different update data, but using the same version tag as the old update data), or (b)NA, to indicate that no versions are clobberable. There are a variety of reasons why versions could be clobberable under routine circumstances, such as (a) today's version of one/all of the columns being published after initially being filled withNAor LOCF, (b) a buggy version of today's data being published but then fixed and republished later in the day, or (c) data pipeline delays (e.g., publisher uploading, periodic scraping, database syncing, periodic fetching, etc.) that make events (a) or (b) reflected later in the day (or even on a different day) than expected; potential causes vary between different data pipelines. The default value isNA, which doesn't consider any versions to be clobberable. Another setting that may be appropriate for some pipelines ismax_version_with_row_in(x).- versions_end
optional; as in
as_epi_archive, orNULL. If the latest version(s) had no new events recorded or deleted, you can note this usingversions_end.NULL, the default, will assume there are no such versions.
Value
An epi_archive object.
Examples
library(dplyr)
linelist <- tibble(
event_id = 1:3,
geo_value = c("ca", "ca", "ca"),
time_value = as.Date(c("2022-01-01", "2022-01-01", "2022-01-02")),
report_date = as.Date(c("2022-01-02", "2022-01-02", "2022-01-03")),
delete_date = as.Date(c("2022-01-04", NA, NA))
)
archive <- linelist_to_archive(
linelist,
geo_value = geo_value,
time_value = time_value,
version_recorded = report_date,
version_deleted = delete_date
)
#> ℹ Defaulting to col `event_id` as `id`.