Computes correlations between variables in an `epi_df`

object, allowing for
grouping by geo value, time value, or any other variables. See the
correlation vignette
for examples.

## Usage

```
epi_cor(
x,
var1,
var2,
dt1 = 0,
dt2 = 0,
shift_by = geo_value,
cor_by = geo_value,
use = "na.or.complete",
method = c("pearson", "kendall", "spearman")
)
```

## Arguments

- x
The

`epi_df`

object under consideration.- var1, var2
The variables in

`x`

to correlate.- dt1, dt2
Time shifts to consider for the two variables, respectively, before computing correlations. Negative shifts translate into in a lag value and positive shifts into a lead value; for example, if

`dt = -1`

, then the new value on June 2 is the original value on June 1; if`dt = 1`

, then the new value on June 2 is the original value on June 3; if`dt = 0`

, then the values are left as is. Default is 0 for both`dt1`

and`dt2`

.- shift_by
The variables(s) to group by, for the time shifts. The default is

`geo_value`

. However, we could also use, for example,`shift_by = c(geo_value, age_group)`

, assuming`x`

has a column`age_group`

, to perform time shifts per geo value and age group. To omit a grouping entirely, use`cor_by = NULL`

. Note that the grouping here is always undone*before*the correlation computations.- cor_by
The variable(s) to group by, for the correlation computations. If

`geo_value`

, the default, then correlations are computed for each geo value, over all time; if`time_value`

, then correlations are computed for each time, over all geo values. A grouping can also be any specified using number of columns of`x`

; for example, we can use`cor_by = c(geo_value, age_group)`

, assuming`x`

has a column`age_group`

, in order to compute correlations for each pair of geo value and age group. To omit a grouping entirely, use`cor_by = NULL`

. Note that the grouping here is always done*after*the time shifts.- use, method
Arguments to pass to

`cor()`

, with "na.or.complete" the default for`use`

(different than`cor()`

) and "pearson" the default for`method`

(same as`cor()`

).

## Value

An tibble with the grouping columns first (`geo_value`

, `time_value`

,
or possibly others), and then a column `cor`

, which gives the correlation.

## Examples

```
# linear association of case and death rates on any given day
epi_cor(
x = jhu_csse_daily_subset,
var1 = case_rate_7d_av,
var2 = death_rate_7d_av,
cor_by = "time_value"
)
#> Warning: There were 3 warnings in `dplyr::summarize()`.
#> The first warning was:
#> ℹ In argument: `cor = cor(x = .data$var1, y = .data$var2, use = use, method =
#> method)`.
#> ℹ In group 1: `time_value = 2020-03-01`.
#> Caused by warning in `cor()`:
#> ! the standard deviation is zero
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings.
#> # A tibble: 671 × 2
#> time_value cor
#> <date> <dbl>
#> 1 2020-03-01 NA
#> 2 2020-03-02 NA
#> 3 2020-03-03 NA
#> 4 2020-03-04 0.746
#> 5 2020-03-05 0.549
#> 6 2020-03-06 0.692
#> 7 2020-03-07 0.277
#> 8 2020-03-08 -0.226
#> 9 2020-03-09 -0.195
#> 10 2020-03-10 -0.227
#> # ℹ 661 more rows
# correlation of death rates and lagged case rates
epi_cor(
x = jhu_csse_daily_subset,
var1 = case_rate_7d_av,
var2 = death_rate_7d_av,
cor_by = time_value,
dt1 = -2
)
#> Warning: There was 1 warning in `dplyr::summarize()`.
#> ℹ In argument: `cor = cor(x = .data$var1, y = .data$var2, use = use, method =
#> method)`.
#> ℹ In group 3: `time_value = 2020-03-03`.
#> Caused by warning in `cor()`:
#> ! the standard deviation is zero
#> # A tibble: 671 × 2
#> time_value cor
#> <date> <dbl>
#> 1 2020-03-01 NA
#> 2 2020-03-02 NA
#> 3 2020-03-03 NA
#> 4 2020-03-04 0.989
#> 5 2020-03-05 0.907
#> 6 2020-03-06 0.746
#> 7 2020-03-07 0.549
#> 8 2020-03-08 -0.158
#> 9 2020-03-09 -0.126
#> 10 2020-03-10 -0.163
#> # ℹ 661 more rows
# correlation grouped by location
epi_cor(
x = jhu_csse_daily_subset,
var1 = case_rate_7d_av,
var2 = death_rate_7d_av,
cor_by = geo_value
)
#> # A tibble: 6 × 2
#> geo_value cor
#> <chr> <dbl>
#> 1 ca 0.573
#> 2 fl 0.488
#> 3 ga 0.465
#> 4 ny 0.285
#> 5 pa 0.708
#> 6 tx 0.750
# correlation grouped by location and incorporates lagged cases rates
epi_cor(
x = jhu_csse_daily_subset,
var1 = case_rate_7d_av,
var2 = death_rate_7d_av,
cor_by = geo_value,
dt1 = -2
)
#> # A tibble: 6 × 2
#> geo_value cor
#> <chr> <dbl>
#> 1 ca 0.618
#> 2 fl 0.576
#> 3 ga 0.525
#> 4 ny 0.337
#> 5 pa 0.734
#> 6 tx 0.784
```