Merges two epi_archive
s that share a common geo_value
, time_value
, and
set of key columns. When they also share a common versions_end
, using
epix_as_of
on the result should be the same as using epix_as_of
on x
and y
individually, then performing a full join of the DT
s on the
non-version key columns (potentially consolidating multiple warnings about
clobberable versions). If the versions_end
values differ, the sync
parameter controls what is done.
Usage
epix_merge(
x,
y,
sync = c("forbid", "na", "locf", "truncate"),
compactify = TRUE
)
Arguments
- x, y
Two
epi_archive
objects to join together.- sync
Optional;
"forbid"
,"na"
,"locf"
, or"truncate"
; in the case thatx$versions_end
doesn't matchy$versions_end
, what do we do?:"forbid"
: emit an error; "na": usemax(x$versions_end, y$versions_end)
as the result'sversions_end
, but ensure that, if we request a snapshot as of a version aftermin(x$versions_end, y$versions_end)
, the observation columns from the less up-to-date archive will be all NAs (i.e., imagine there was an update immediately after itsversions_end
which revised all observations to beNA
);"locf"
: usemax(x$versions_end, y$versions_end)
as the result'sversions_end
, allowing the last version of each observation to be carried forward to extrapolate unavailable versions for the less up-to-date input archive (i.e., imagining that in the less up-to-date archive's data set remained unchanged between its actualversions_end
and the other archive'sversions_end
); or"truncate"
: usemin(x$versions_end, y$versions_end)
as the result'sversions_end
, and discard any rows containing update rows for later versions.- compactify
Optional;
TRUE
,FALSE
, orNULL
; should the result be compactified? Seeas_epi_archive()
for an explanation of what this means. Default here isTRUE
.
Details
In all cases, clobberable_versions_start
will be set to the
earliest version that could be clobbered in either input archive.
Examples
# Example 1
# The s1 signal at August 1st gets revised from 10 to 11 on August 2nd
s1 <- tibble::tibble(
geo_value = c("ca", "ca", "ca"),
time_value = as.Date(c("2024-08-01", "2024-08-01", "2024-08-02")),
version = as.Date(c("2024-08-01", "2024-08-02", "2024-08-02")),
signal1 = c(10, 11, 7)
)
s2 <- tibble::tibble(
geo_value = c("ca", "ca"),
time_value = as.Date(c("2024-08-01", "2024-08-02")),
version = as.Date(c("2024-08-03", "2024-08-03")),
signal2 = c(2, 3)
)
s1 <- s1 %>% as_epi_archive()
s2 <- s2 %>% as_epi_archive()
merged <- epix_merge(s1, s2, sync = "locf")
merged[["DT"]]
#> Key: <geo_value, time_value, version>
#> geo_value time_value version signal1 signal2
#> <char> <Date> <Date> <num> <num>
#> 1: ca 2024-08-01 2024-08-01 10 NA
#> 2: ca 2024-08-01 2024-08-02 11 NA
#> 3: ca 2024-08-01 2024-08-03 11 2
#> 4: ca 2024-08-02 2024-08-02 7 NA
#> 5: ca 2024-08-02 2024-08-03 7 3
# Example 2
# The s1 signal at August 1st gets revised from 12 to 13 on August 3rd
s1 <- tibble::tibble(
geo_value = c("ca", "ca", "ca", "ca"),
time_value = as.Date(c("2024-08-01", "2024-08-01", "2024-08-02", "2024-08-03")),
version = as.Date(c("2024-08-01", "2024-08-03", "2024-08-03", "2024-08-03")),
signal1 = c(12, 13, 22, 19)
)
s2 <- tibble::tibble(
geo_value = c("ca", "ca"),
time_value = as.Date(c("2024-08-01", "2024-08-02")),
version = as.Date(c("2024-08-02", "2024-08-02")),
signal2 = c(4, 5),
)
s1 <- s1 %>% as_epi_archive()
s2 <- s2 %>% as_epi_archive()
merged <- epix_merge(s1, s2, sync = "locf")
merged[["DT"]]
#> Key: <geo_value, time_value, version>
#> geo_value time_value version signal1 signal2
#> <char> <Date> <Date> <num> <num>
#> 1: ca 2024-08-01 2024-08-01 12 NA
#> 2: ca 2024-08-01 2024-08-02 12 4
#> 3: ca 2024-08-01 2024-08-03 13 4
#> 4: ca 2024-08-02 2024-08-02 NA 5
#> 5: ca 2024-08-02 2024-08-03 22 5
#> 6: ca 2024-08-03 2024-08-03 19 NA
# Example 3:
s1 <- tibble::tibble(
geo_value = c("ca", "ca", "ca"),
time_value = as.Date(c("2024-08-01", "2024-08-02", "2024-08-03")),
version = as.Date(c("2024-08-01", "2024-08-02", "2024-08-03")),
signal1 = c(14, 11, 9)
)
# The s2 signal at August 1st gets revised from 3 to 5 on August 3rd
s2 <- tibble::tibble(
geo_value = c("ca", "ca", "ca"),
time_value = as.Date(c("2024-08-01", "2024-08-01", "2024-08-02")),
version = as.Date(c("2024-08-02", "2024-08-03", "2024-08-03")),
signal2 = c(3, 5, 2),
)
s1 <- s1 %>% as_epi_archive()
s2 <- s2 %>% as_epi_archive()
# Some LOCF for signal 1 as signal 2 gets updated
merged <- epix_merge(s1, s2, sync = "locf")
merged[["DT"]]
#> Key: <geo_value, time_value, version>
#> geo_value time_value version signal1 signal2
#> <char> <Date> <Date> <num> <num>
#> 1: ca 2024-08-01 2024-08-01 14 NA
#> 2: ca 2024-08-01 2024-08-02 14 3
#> 3: ca 2024-08-01 2024-08-03 14 5
#> 4: ca 2024-08-02 2024-08-02 11 NA
#> 5: ca 2024-08-02 2024-08-03 11 2
#> 6: ca 2024-08-03 2024-08-03 9 NA