This section describes the internals of how compactification works in an
epi_archive()
. Compactification can potentially improve code speed or
memory usage, depending on your data.
Details
In general, the last version of each observation is carried forward (LOCF) to
fill in data between recorded versions, and between the last recorded
update and the versions_end
. One consequence is that the DT
doesn't
have to contain a full snapshot of every version (although this generally
works), but can instead contain only the rows that are new or changed from
the previous version (see compactify
, which does this automatically).
Currently, deletions must be represented as revising a row to a special
state (e.g., making the entries NA
or including a special column that
flags the data as removed and performing some kind of post-processing), and
the archive is unaware of what this state is. Note that NA
s can be
introduced by epi_archive
methods for other reasons, e.g., in
epix_fill_through_version
and epix_merge
, if requested, to
represent potential update data that we do not yet have access to; or in
epix_merge
to represent the "value" of an observation before the
version in which it was first released, or if no version of that
observation appears in the archive data at all.