1 Overview
This package introduces a common data structure for epidemiological data sets measured over space and time, and offers associated utilities to perform basic signal processing tasks.
1.1 epi_df
: snapshot of a data set
The first main data structure in the epiprocess
package is called [epi_df
]. This is simply a tibble with a couple of required columns, geo_value
and time_value
. It can have any other number of columns, which can be seen as measured variables, which we also call signal variables. In brief, an epi_df
object represents a snapshot of a data set that contains the most up-to-date values of the signals variables, as of a given time.
By convention, functions in the epiprocess
package that operate on epi_df
objects begin with epi
. For example:
epi_slide()
, for iteratively applying a custom computation to a variable in anepi_df
object over sliding windows in time;epi_cor()
, for computing lagged correlations between variables in anepi_df
object, (allowing for grouping by geo value, time value, or any other variables).
Functions in the package that operate directly on given variables do not begin with epi
. For example:
growth_rate()
, for estimating the growth rate of a given signal at given time values, using various methodologies;detect_outlr()
, for detecting outliers in a given signal over time, using either built-in or custom methodologies.
1.2 epi_archive
: full version history of a data set
The second main data structure in the package is called [epi_archive
]. This is an S3 class containing a data table that stores the archive (version history) of some signal variables of interest.
By convention, functions in the {epiprocess}
package that operate on epi_archive
objects begin with epix
(the “x” is meant to remind you of “archive”). For example:
epix_as_of()
, for generating a snapshot inepi_df
format from the data archive, which represents the most up-to-date values of the signal variables, as of the specified version;epix_fill_through_version()
, for filling in some fake version data following simple rules, for use when downstream methods expect an archive that is more up-to-date (e.g., if it is a forecasting deadline date and one of our data sources cannot be accessed to provide the latest versions of its data)epix_merge()
, for merging two data archives with each other, with support for various approaches to handling when one of the archives is more up-to-date version-wise than the other;epix_slide()
, for sliding a custom computation to a data archive over local windows in time, much likeepi_slide
for anepi_df
object, but with one key difference: the sliding computation at any given reference time \(t\) is performed only on the data that would have been available as of \(t\).