Computes correlations between two covidcast_signal
data frames, allowing
for slicing by geo location, or by time. (Only the latest issue from each
data frame is used for correlations.) See the correlations vignette
for examples: vignette("correlation-utils", package = "covidcast")
.
The covidcast_signal
data frames to correlate.
Time shifts (in days) to consider for x
and y
,
respectively, before computing correlations. Default is 0. Negative shifts
translate into in a lag value and positive shifts into a lead value; for
example, setting dt_y = 2
results in values of y
being shifted earlier
(leading) by 2 days before correlation, so values of x
are correlated
with values of y
from two days later.
If "geo_value", then correlations are computed for each geo location, over all time. Each correlation is measured between two time series at the same location. If "time_value", then correlations are computed for each time, over all geo locations. Each correlation is measured between all locations at one time. Default is "geo_value".
Arguments to pass to cor()
, with "na.or.complete" the
default for use
(different than cor()
) and "pearson" the default for
method
(same as cor()
).
A data frame with first column geo_value
or time_value
(matching
by
), and second column value
, which gives the correlation.
if (FALSE) {
# For all these examples, let x and y be two signals measured at the county
# level over several months.
## `by = "geo_value"`
# Correlate each county's time series together, returning one correlation per
# county:
covidcast_cor(x, y, by = "geo_value")
# Correlate x in each county with values of y 14 days later
covidcast_cor(x, y, dt_y = 14, by = "geo_value")
# Equivalently, x can be shifted -14 days:
covidcast_cor(x, y, dt_x = -14, by = "geo_value")
## `by = "time_value"`
# For each date, correlate x's values in every county against y's values in
# the same counties. Returns one correlation per date:
covidcast_cor(x, y, by = "time_value")
# Correlate x values across counties against y values 7 days later
covidcast_cor(x, y, dt_y = 7, by = "time_value")
}