- Source name:
- Number of data revisions since 19 May 2020: 0
- Date of last change: Never
- Available for: dma, hrr, msa, state (see geography coding docs)
This data source (
ght) is based on Google searches, provided to us by Google
Health Trends. Using this search data, we estimate the volume of COVID-related
searches in a given location, on a given day. This signal is measured in
arbitrary units (its scale is meaningless); larger numbers represent higher
numbers of COVID-related searches.
||Google search volume for COVID-related searches, in arbitrary units that are normalized for population|
||Google search volume for COVID-related searches, in arbitrary units that are normalized for population, smoothed in time as described below|
We query the Google Health Trends API for overall searcher interest in a set of COVID-19 related terms which encompass the following topics: coronavirus symptoms; coronavirus help; coronavirus test-seeking; anosmia (lack of smell or taste). The API provides data at the Nielsen Designated Marketing Area (DMA) level and at the State level. This information reported by the API is unitless and pre-normalized for population size; i.e., the time series obtained for New York and Wyoming states are directly comparable. The public has access to a limited view of such information through Google Trends.
DMA-level data are aggregated to the MSA and HRR level through population-weighted averaging.
The smoothed signal is produced using the following strategy. For each date, we fit a local linear regression, using a Gaussian kernel, with only data on or before that date. (This is equivalent to using a negative half normal distribution as the kernel.) The bandwidth is chosen such that most of the kernel weight is placed on the preceding seven days. The estimate for the data is the local linear regression’s prediction for that date.
When query volume in a region is below a certain threshold, set by Google, it is reported as 0. Areas with low query volume hence exhibit jumps and zero-inflation, as small variations in the signal can cause it to be sometimes truncated to 0 and sometimes reported at its actual level.
Google does not describe the units of its reported numbers, so the scale is arbitrary.