Computes correlations between two covidcast_signal data frames, allowing for slicing by geo location, or by time. (Only the latest issue from each data frame is used for correlations.) See the correlations vignette for examples: vignette("correlation-utils", package = "covidcast").

covidcast_cor(
  x,
  y,
  dt_x = 0,
  dt_y = 0,
  by = c("geo_value", "time_value"),
  use = "na.or.complete",
  method = c("pearson", "kendall", "spearman")
)

Arguments

x, y

The covidcast_signal data frames to correlate.

dt_x, dt_y

Time shifts (in days) to consider for x and y, respectively, before computing correlations. Default is 0. Negative shifts translate into in a lag value and positive shifts into a lead value; for example, setting dt_y = 2 results in values of y being shifted earlier (leading) by 2 days before correlation, so values of x are correlated with values of y from two days later.

by

If "geo_value", then correlations are computed for each geo location, over all time. Each correlation is measured between two time series at the same location. If "time_value", then correlations are computed for each time, over all geo locations. Each correlation is measured between all locations at one time. Default is "geo_value".

use, method

Arguments to pass to cor(), with "na.or.complete" the default for use (different than cor()) and "pearson" the default for method (same as cor()).

Value

A data frame with first column geo_value or time_value (matching by), and second column value, which gives the correlation.

Examples

if (FALSE) {
# For all these examples, let x and y be two signals measured at the county
# level over several months.

## `by = "geo_value"`
# Correlate each county's time series together, returning one correlation per
# county:
covidcast_cor(x, y, by = "geo_value")

# Correlate x in each county with values of y 14 days later
covidcast_cor(x, y, dt_y = 14, by = "geo_value")

# Equivalently, x can be shifted -14 days:
covidcast_cor(x, y, dt_x = -14, by = "geo_value")

## `by = "time_value"`
# For each date, correlate x's values in every county against y's values in
# the same counties. Returns one correlation per date:
covidcast_cor(x, y, by = "time_value")

# Correlate x values across counties against y values 7 days later
covidcast_cor(x, y, dt_y = 7, by = "time_value")
}