Estimating latent infections

A retrospective


Ryan J. Tibshirani

Daniel J. McDonald, Rachel Lobay, and CMU’s Delphi Group

CDC Flu – 18 December 2023

Goal

Use reported cases to estimate actual infections

  • For every US state
  • Between March 1, 2020 to January 1, 2022
  • Provide an “authoritative” estimate with uncertainty
  • No compartmental models, no sampling or Bayesian methods

Retrospective deconvolution

  • Based on prior work (Jahja et al. 2021)
  • Take reported cases and deconvolve them to find when symptoms began
  • Private CDC linelist to estimate the delay from symptom onset to case report
    • Different delay distribution for every report date and state
  • Combine with Literature estimate of the delay from infection to symptom onset
    • Variant specific
    • Prevailing variant mix taken from GISAID
  • Convolve both to get delay distribution from infection to case report

Empirical delay distributions

  • Day / state specific, using CDC Private Linelist
  • Method of moments to fit a gamma density

Cases are selectively reported to CDC

  • CDC linelist with both onset and report date
  • Shrink the parameters proportionally toward national (each day)

Variant mix over time - from GISAID

Incubation period

  • Literature estimates for each variant
  • Distribution is by day / state specific variant mix

Total delay distribution – Infection to case report

From incident cases to incident infection onset

  • Move the date back using the convolved distribution
\[\begin{aligned} \mathop{\mathrm{minimize}}_{x}\ \sum_{t = 1}^n \left ( y_t - \sum_{k = 1}^{d} \hat{p}_t(k)x_{t-k} \right )^2 + \lambda \|D^{(4)}x\|_1. \end{aligned}\]
  • \(D^{(4)}\) is a 4th-order difference matrix.
  • Result is a smooth estimate of the “deconvolved cases”: \(C_t\)

Deconvolve cases by their delay distribution

From deconvolved cases to circulating infections

  • “Leaky immunity” model

\[I_w = (1-\gamma)I_{w-1} + a_w z_w \sum_{t = w-1}^w C_{t} + \epsilon_w, \quad \epsilon_w \sim \mathrm{N}(0, \sigma^2_\epsilon) \]

  • \(I_w\) is population immunity in week \(w\)
  • \(\gamma\) is percentage that loses immunity between \((w-1)\) and \(w\)
  • \(a_w\) is the inverse reporting ratio
  • \(\sum_{t=w-1}^w C_{t}\) is total deconvolved cases in past week
  • \(z_w\) is estimated infections of cases that are first infections (from Literature)

Serology data

  • Two sources, noisy realizations of \(I_w\)
  • Lots of missingness

State space model

\[\begin{aligned} s_{w,j} &= I_w + \eta_{w,j}, \quad \eta_{w,j} \sim \mathrm{N}(0, u_{w,j}\sigma_{j}), \quad j=1,2\\ I_w &= (1-\gamma)I_{w-1} + a_w z_w \sum_{t = w-1}^w C_{t} + \epsilon_w \end{aligned}\]
  • Estimate \(\gamma\), \(a_w\), \(I_w\) and noise variances using a state space model
  • Use Kalman filter / smoother, maximize the likelihood
  • Handles missingness automatically
  • Imposes smoothness on \(a_w\) (like a spline)
  • Also gives variance estimates for \(a_w\)

Estimated population immunity

  • \(\gamma\) estimated to be 0.8% per week

Estimated inverse reporting ratios

Estimated latent infections

Pre-omicron

Omicron (more uncertain, due to serology)

Validation and usefulness

  • Compare to other public estimates
  • Small exercise estimating Infection-Hospitalization Ratio

Thanks:

  • The whole CMU Delphi Team (across many institutions)
  • Optum/UnitedHealthcare, Change Healthcare.
  • Google, Facebook, Amazon Web Services.
  • Quidel, SafeGraph, Qualtrics.
  • Centers for Disease Control and Prevention.
  • Council of State and Territorial Epidemiologists