Skip to contents

This section describes the internals of how compactification works in an epi_archive(). Compactification can potentially improve code speed or memory usage, depending on your data.

Details

In general, the last version of each observation is carried forward (LOCF) to fill in data between recorded versions, and between the last recorded update and the versions_end. One consequence is that the DT doesn't have to contain a full snapshot of every version (although this generally works), but can instead contain only the rows that are new or changed from the previous version (see compactify, which does this automatically). Currently, deletions must be represented as revising a row to a special state (e.g., making the entries NA or including a special column that flags the data as removed and performing some kind of post-processing), and the archive is unaware of what this state is. Note that NAs can be introduced by epi_archive methods for other reasons, e.g., in epix_fill_through_version and epix_merge, if requested, to represent potential update data that we do not yet have access to; or in epix_merge to represent the "value" of an observation before the version in which it was first released, or if no version of that observation appears in the archive data at all.