Skip to contents

Merges two epi_archives that share a common geo_value, time_value, and set of key columns. When they also share a common versions_end, using epix_as_of on the result should be the same as using epix_as_of on x and y individually, then performing a full join of the DTs on the non-version key columns (potentially consolidating multiple warnings about clobberable versions). If the versions_end values differ, the sync parameter controls what is done.

Usage

epix_merge(
  x,
  y,
  sync = c("forbid", "na", "locf", "truncate"),
  compactify = TRUE
)

Arguments

x, y

Two epi_archive objects to join together.

sync

Optional; "forbid", "na", "locf", or "truncate"; in the case that x$versions_end doesn't match y$versions_end, what do we do?: "forbid": emit an error; "na": use max(x$versions_end, y$versions_end) as the result's versions_end, but ensure that, if we request a snapshot as of a version after min(x$versions_end, y$versions_end), the observation columns from the less up-to-date archive will be all NAs (i.e., imagine there was an update immediately after its versions_end which revised all observations to be NA); "locf": use max(x$versions_end, y$versions_end) as the result's versions_end, allowing the last version of each observation to be carried forward to extrapolate unavailable versions for the less up-to-date input archive (i.e., imagining that in the less up-to-date archive's data set remained unchanged between its actual versions_end and the other archive's versions_end); or "truncate": use min(x$versions_end, y$versions_end) as the result's versions_end, and discard any rows containing update rows for later versions.

compactify

Optional; TRUE, FALSE, or NULL; should the result be compactified? See as_epi_archive() for an explanation of what this means. Default here is TRUE.

Value

the resulting epi_archive

Details

In all cases, clobberable_versions_start will be set to the earliest version that could be clobbered in either input archive.

Examples

# Example 1
# The s1 signal at August 1st gets revised from 10 to 11 on August 2nd
s1 <- tibble::tibble(
  geo_value = c("ca", "ca", "ca"),
  time_value = as.Date(c("2024-08-01", "2024-08-01", "2024-08-02")),
  version = as.Date(c("2024-08-01", "2024-08-02", "2024-08-02")),
  signal1 = c(10, 11, 7)
)

s2 <- tibble::tibble(
  geo_value = c("ca", "ca"),
  time_value = as.Date(c("2024-08-01", "2024-08-02")),
  version = as.Date(c("2024-08-03", "2024-08-03")),
  signal2 = c(2, 3)
)


s1 <- s1 %>% as_epi_archive()
s2 <- s2 %>% as_epi_archive()

merged <- epix_merge(s1, s2, sync = "locf")
merged[["DT"]]
#> Key: <geo_value, time_value, version>
#>    geo_value time_value    version signal1 signal2
#>       <char>     <Date>     <Date>   <num>   <num>
#> 1:        ca 2024-08-01 2024-08-01      10      NA
#> 2:        ca 2024-08-01 2024-08-02      11      NA
#> 3:        ca 2024-08-01 2024-08-03      11       2
#> 4:        ca 2024-08-02 2024-08-02       7      NA
#> 5:        ca 2024-08-02 2024-08-03       7       3

# Example 2
# The s1 signal at August 1st gets revised from 12 to 13 on August 3rd
s1 <- tibble::tibble(
  geo_value = c("ca", "ca", "ca", "ca"),
  time_value = as.Date(c("2024-08-01", "2024-08-01", "2024-08-02", "2024-08-03")),
  version = as.Date(c("2024-08-01", "2024-08-03", "2024-08-03", "2024-08-03")),
  signal1 = c(12, 13, 22, 19)
)

s2 <- tibble::tibble(
  geo_value = c("ca", "ca"),
  time_value = as.Date(c("2024-08-01", "2024-08-02")),
  version = as.Date(c("2024-08-02", "2024-08-02")),
  signal2 = c(4, 5),
)


s1 <- s1 %>% as_epi_archive()
s2 <- s2 %>% as_epi_archive()

merged <- epix_merge(s1, s2, sync = "locf")
merged[["DT"]]
#> Key: <geo_value, time_value, version>
#>    geo_value time_value    version signal1 signal2
#>       <char>     <Date>     <Date>   <num>   <num>
#> 1:        ca 2024-08-01 2024-08-01      12      NA
#> 2:        ca 2024-08-01 2024-08-02      12       4
#> 3:        ca 2024-08-01 2024-08-03      13       4
#> 4:        ca 2024-08-02 2024-08-02      NA       5
#> 5:        ca 2024-08-02 2024-08-03      22       5
#> 6:        ca 2024-08-03 2024-08-03      19      NA


# Example 3:
s1 <- tibble::tibble(
  geo_value = c("ca", "ca", "ca"),
  time_value = as.Date(c("2024-08-01", "2024-08-02", "2024-08-03")),
  version = as.Date(c("2024-08-01", "2024-08-02", "2024-08-03")),
  signal1 = c(14, 11, 9)
)

# The s2 signal at August 1st gets revised from 3 to 5 on August 3rd
s2 <- tibble::tibble(
  geo_value = c("ca", "ca", "ca"),
  time_value = as.Date(c("2024-08-01", "2024-08-01", "2024-08-02")),
  version = as.Date(c("2024-08-02", "2024-08-03", "2024-08-03")),
  signal2 = c(3, 5, 2),
)

s1 <- s1 %>% as_epi_archive()
s2 <- s2 %>% as_epi_archive()

# Some LOCF for signal 1 as signal 2 gets updated
merged <- epix_merge(s1, s2, sync = "locf")
merged[["DT"]]
#> Key: <geo_value, time_value, version>
#>    geo_value time_value    version signal1 signal2
#>       <char>     <Date>     <Date>   <num>   <num>
#> 1:        ca 2024-08-01 2024-08-01      14      NA
#> 2:        ca 2024-08-01 2024-08-02      14       3
#> 3:        ca 2024-08-01 2024-08-03      14       5
#> 4:        ca 2024-08-02 2024-08-02      11      NA
#> 5:        ca 2024-08-02 2024-08-03      11       2
#> 6:        ca 2024-08-03 2024-08-03       9      NA