Finding data of interest

The Epidata API includes numerous data streams – medical claims data, cases and deaths, mobility, and many others – covering different geographic regions. This can make it a challenge to find the data stream that you are most interested in. This page will provide some advice on how to locate donate that may be useful to you.

Using the Delphi Epidata API documentation

The Delphi Epidata API documentation lists all the available data sources and signals for COVID-19 and for other diseases. The site also includes a search tool if you have a keyword (e.g. “Taiwan”) in mind. Generally, any endpoint listed in the Delphi Epidata API has an associated function in this client where its API endpoint name is prefixed with either pub_ or pvt_, e.g. pub_covidcast or pvt_twitter.

Epidata data sources

The parameters available for each source data are documented in each linked source-specific API page. The epidatpy client will also expect certain fields, depending on the endpoint, though the Delphi Epidata API documentation will contain more information about the accepted ranges of values for each field.

A dynamically generated list of all available data sources can be obtained by using the built-in available_endpoints():


from IPython.display import HTML

from epidatpy import available_endpoints

table = available_endpoints()
HTML(table.to_html(index=False))

Endpoint Description
pub_covid_hosp_facility Fetch COVID hospitalization data for specific facilities.
pub_covid_hosp_facility_lookup Lookup COVID hospitalization facility identifiers.
pub_covid_hosp_state_timeseries Fetch COVID hospitalization data.
pub_covidcast Fetch Delphi's COVID-19 Surveillance Streams
pub_covidcast_meta Fetch COVIDcast surveillance stream metadata.
pub_delphi Fetch Delphi's forecast.
pub_dengue_nowcast Fetch Delphi's dengue nowcast.
pub_ecdc_ili Fetch ECDC ILI data.
pub_flusurv Fetch FluSurv data.
pub_fluview None
pub_fluview_clinical Fetch FluView clinical data.
pub_fluview_meta None
pub_gft Fetch Google Flu Trends data.
pub_kcdc_ili Fetch KCDC ILI data.
pub_meta Fetch API metadata.
pub_nidss_dengue Fetch NIDSS dengue data.
pub_nidss_flu Fetch NIDSS flu data.
pub_nowcast Fetch Delphi's wILI nowcast.
pub_paho_dengue Fetch PAHO Dengue data.
pub_wiki Fetch Wikipedia access data.
pvt_cdc Fetch CDC page hits.
pvt_dengue_sensors Fetch Delphi's digital surveillance sensors.
pvt_ght Fetch Google Health Trends data.
pvt_meta_norostat Fetch NoroSTAT metadata.
pvt_norostat Fetch NoroSTAT data (point data, no min/max).
pvt_quidel Fetch Quidel data.
pvt_sensors Fetch Delphi's digital surveillance sensors.
pvt_twitter Fetch HealthTweets data.

Covidcast source and signal metadata

The CovidcastEpidata class provides a way to access information about the data in the pub_covidcast endpoint directly from within the client. The cell below demonstrates how to access this metadata by using source_df property, which returns a Pandas DataFrame of metadata describing all data streams publically accessible from the COVIDcast endpoint of the Delphi Epidata API. This mirrors the information found in the COVIDcast signals endpoint.


from epidatpy import CovidcastEpidata

epidata = CovidcastEpidata()
epidata.source_df

source name description reference_signal license dua signals
0 chng Change Healthcare Change Healthcare is a healthcare technology c... smoothed_outpatient_cli CC BY-NC https://cmu.box.com/s/cto4to822zecr3oyq1kkk9xm... smoothed_outpatient_cli,smoothed_adj_outpatien...
1 covid-act-now Covid Act Now (CAN) COVID Act Now (CAN) tracks COVID-19 testing st... pcr_specimen_total_tests CC BY-NC None pcr_specimen_positivity_rate,pcr_specimen_tota...
2 doctor-visits Doctor Visits From Claims Information about outpatient visits, provided ... smoothed_cli CC BY https://cmu.box.com/s/l2tz6kmiws6jyty2azwb43po... smoothed_cli,smoothed_adj_cli
3 fb-survey Delphi US COVID-19 Trends and Impact Survey We conduct the Delphi US COVID-19 Trends and I... smoothed_cli CC BY https://cmu.box.com/s/qfxplcdrcn9retfzx4zniyug... raw_wcli,raw_cli,smoothed_cli,smoothed_wcli,ra...
4 google-symptoms Google Symptoms Search Trends Google's [COVID-19 Search Trends symptoms data... s05_smoothed_search To download or use the data, you must agree to... None ageusia_raw_search,ageusia_smoothed_search,ano...
... ... ... ... ... ... ... ...
13 indicator-combination-nmf Statistical Combination (NMF) This source provides signals which are statist... nmf_day_doc_fbs_ght CC BY None nmf_day_doc_fbc_fbs_ght,nmf_day_doc_fbs_ght
14 safegraph-daily SafeGraph (Daily) [SafeGraph](https://docs.safegraph.com/docs/so... completely_home_prop CC BY https://cmu.box.com/s/m0p1wpet4vuvey7od83n70h0... completely_home_prop,completely_home_prop_7dav...
15 nchs-mortality NCHS Mortality Data This data source of national provisional death... deaths_allcause_incidence_num [NCHS Data Use Agreement](https://www.cdc.gov/... None deaths_allcause_incidence_num,deaths_allcause_...
16 dsew-cpr COVID-19 Community Profile Report This data source is based on the daily report ... confirmed_admissions_covid_1d_7dav [Public Domain US Government](https://www.usa.... None booster_doses_admin_7dav,confirmed_admissions_...
17 nssp National Syndromic Surveillance Program The National Syndromic Surveillance Program (N... pct_ed_visits_covid [Public Domain US Government](https://www.usa.... None pct_ed_visits_covid,pct_ed_visits_influenza,pc...

18 rows × 7 columns

This DataFrame contains the following columns:

  • source - API-internal source name.

  • name - Human-readable source name.

  • description - Description of the signal.

  • reference_signal - Geographic level for which this signal is available, such as county, state, msa, hss, hrr, or nation. Most signals are available at multiple geographic levels and will hence be listed in multiple rows with their own metadata.

  • license - The license.

  • dua - Link to the Data Use Agreement.

  • signals - List of signals available from this data source.

The signal_df DataFrame can also be used to obtain information about the signals that are available - for example, what time range they are available for, and when they have been updated.


epidata.signal_df

source signal name active short_description description time_type time_label value_label format category high_values_are is_smoothed is_weighted is_cumulative has_stderr has_sample_size geo_types
0 chng smoothed_outpatient_cli COVID-Related Doctor Visits False Estimated percentage of outpatient doctor visi... Estimated percentage of outpatient doctor visi... day Date Value raw early bad True False False False False county,hhs,hrr,msa,nation,state
1 chng smoothed_adj_outpatient_cli COVID-Related Doctor Visits (Day-adjusted) False Estimated percentage of outpatient doctor visi... Estimated percentage of outpatient doctor visi... day Date Value raw early bad True False False False False county,hhs,hrr,msa,nation,state
2 chng smoothed_outpatient_covid COVID-Confirmed Doctor Visits False COVID-Confirmed Doctor Visits Estimated percentage of outpatient doctor visi... day Date Value raw early bad True False False False False county,hhs,hrr,msa,nation,state
3 chng smoothed_adj_outpatient_covid COVID-Confirmed Doctor Visits (Day-adjusted) False COVID-Confirmed Doctor Visits Estimated percentage of outpatient doctor visi... day Date Value raw early bad True False False False False county,hhs,hrr,msa,nation,state
4 chng smoothed_outpatient_flu Influenza-Confirmed Doctor Visits False Estimated percentage of outpatient doctor visi... Estimated percentage of outpatient doctor visi... day Day Value raw early bad True False False None None county,hhs,hrr,msa,nation,state
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
446 nssp pct_ed_visits_combined Emergency Department Visits for COVID, Influen... True Percent of ED visits that had a discharge diag... Percent of ED visits that had a discharge diag... week Week Percentage percent other bad False False False False False county,hhs,hrr,msa,nation,state
447 nssp smoothed_pct_ed_visits_covid COVID Emergency Department Visits (Percent of ... True 3-week moving average of percent of ED visits ... 3-week moving average of percent of ED visits ... week Week Percentage percent other bad True False False False False county,hhs,hrr,msa,nation,state
448 nssp smoothed_pct_ed_visits_influenza Influenza Emergency Department Visits (Percent... True 3-week moving average of percent of ED visits ... 3-week moving average of percent of ED visits ... week Week Percentage percent other bad True False False False False county,hhs,hrr,msa,nation,state
449 nssp smoothed_pct_ed_visits_rsv RSV Emergency Department Visits (Percent of to... True 3-week moving average of percent of ED visits ... 3-week moving average of percent of ED visits ... week Week Percentage percent other bad True False False False False county,hhs,hrr,msa,nation,state
450 nssp smoothed_pct_ed_visits_combined Emergency Department Visits for COVID, Influen... True 3-week moving average of percent of ED visits ... 3-week moving average of percent of ED visits ... week Week Percentage percent other bad True False False False False county,hhs,hrr,msa,nation,state

451 rows × 18 columns

This DataFrame contains one row each available signal, with the following columns:

  • source - Data source name.

  • signal - API-internal signal name.

  • name - Human-readable signal name.

  • active - Whether the signal is currently not updated or not. Signals may be inactive because the sources have become unavailable, other sources have replaced them, or additional work is required for us to continue updating them.

  • short_description - Brief description of the signal.

  • description - Full description of the signal.

  • geo_types - Spatial resolution of the signal (e.g., county, hrr, msa, dma, state). More detail about all geo_types is given in the geographic coding documentation.

  • time_type - Temporal resolution of the signal (e.g., day, week; see date coding details).

  • time_label - The time label (“Date”, “Week”).

  • value_label - The value label (“Value”, “Percentage”, “Visits”, “Visits per 100,000 people”).

  • format - The value format (“per100k”, “percent”, “fraction”, “count”, “raw”).

  • category - The signal category (“early”, “public”, “late”, “other”).

  • high_values_are- What the higher value of signal indicates (“good”, “bad”, “neutral”).

  • is_smoothed - Whether the signal is smoothed.

  • is_weighted - Whether the signal is weighted.

  • is_cumulative - Whether the signal is cumulative.

  • has_stderr - Whether the signal has stderr statistic.

  • has_sample_size - Whether the signal has sample_size statistic.

  • geo_types - Geographical levels for which this signal is available.