Finding data of interest¶
The Epidata API includes numerous data streams – medical claims data, cases and deaths, mobility, and many others – covering different geographic regions. This can make it a challenge to find the data stream that you are most interested in. This page will provide some advice on how to locate donate that may be useful to you.
Using the Delphi Epidata API documentation¶
The Delphi Epidata API documentation lists all the available data sources and signals for COVID-19 and for other diseases. The site also includes a search tool if you have a keyword (e.g. “Taiwan”) in mind. Generally, any endpoint listed in the Delphi Epidata API has an associated function in this client where its API endpoint
name is prefixed with either pub_
or pvt_
, e.g. pub_covidcast
or pvt_twitter
.
Epidata data sources¶
The parameters available for each source data are documented in each linked source-specific API page. The epidatpy client will also expect certain fields, depending on the endpoint, though the Delphi Epidata API documentation will contain more information about the accepted ranges of values for each field.
A dynamically generated list of all available data sources can be obtained by using the built-in available_endpoints()
:
from IPython.display import HTML
from epidatpy import available_endpoints
table = available_endpoints()
HTML(table.to_html(index=False))
Endpoint | Description |
---|---|
pub_covid_hosp_facility | Fetch COVID hospitalization data for specific facilities. |
pub_covid_hosp_facility_lookup | Lookup COVID hospitalization facility identifiers. |
pub_covid_hosp_state_timeseries | Fetch COVID hospitalization data. |
pub_covidcast | Fetch Delphi's COVID-19 Surveillance Streams |
pub_covidcast_meta | Fetch COVIDcast surveillance stream metadata. |
pub_delphi | Fetch Delphi's forecast. |
pub_dengue_nowcast | Fetch Delphi's dengue nowcast. |
pub_ecdc_ili | Fetch ECDC ILI data. |
pub_flusurv | Fetch FluSurv data. |
pub_fluview | None |
pub_fluview_clinical | Fetch FluView clinical data. |
pub_fluview_meta | None |
pub_gft | Fetch Google Flu Trends data. |
pub_kcdc_ili | Fetch KCDC ILI data. |
pub_meta | Fetch API metadata. |
pub_nidss_dengue | Fetch NIDSS dengue data. |
pub_nidss_flu | Fetch NIDSS flu data. |
pub_nowcast | Fetch Delphi's wILI nowcast. |
pub_paho_dengue | Fetch PAHO Dengue data. |
pub_wiki | Fetch Wikipedia access data. |
pvt_cdc | Fetch CDC page hits. |
pvt_dengue_sensors | Fetch Delphi's digital surveillance sensors. |
pvt_ght | Fetch Google Health Trends data. |
pvt_meta_norostat | Fetch NoroSTAT metadata. |
pvt_norostat | Fetch NoroSTAT data (point data, no min/max). |
pvt_quidel | Fetch Quidel data. |
pvt_sensors | Fetch Delphi's digital surveillance sensors. |
pvt_twitter | Fetch HealthTweets data. |
Covidcast source and signal metadata¶
The CovidcastEpidata
class provides a way to access information about the data in the pub_covidcast
endpoint directly from within the client. The cell below demonstrates how to access this metadata by using source_df
property, which returns a Pandas DataFrame of metadata describing all data streams publically accessible from the COVIDcast endpoint of the Delphi Epidata API. This mirrors the information found in the COVIDcast signals
endpoint.
from epidatpy import CovidcastEpidata
epidata = CovidcastEpidata()
epidata.source_df
source | name | description | reference_signal | license | dua | signals | |
---|---|---|---|---|---|---|---|
0 | chng | Change Healthcare | Change Healthcare is a healthcare technology c... | smoothed_outpatient_cli | CC BY-NC | https://cmu.box.com/s/cto4to822zecr3oyq1kkk9xm... | smoothed_outpatient_cli,smoothed_adj_outpatien... |
1 | covid-act-now | Covid Act Now (CAN) | COVID Act Now (CAN) tracks COVID-19 testing st... | pcr_specimen_total_tests | CC BY-NC | None | pcr_specimen_positivity_rate,pcr_specimen_tota... |
2 | doctor-visits | Doctor Visits From Claims | Information about outpatient visits, provided ... | smoothed_cli | CC BY | https://cmu.box.com/s/l2tz6kmiws6jyty2azwb43po... | smoothed_cli,smoothed_adj_cli |
3 | fb-survey | Delphi US COVID-19 Trends and Impact Survey | We conduct the Delphi US COVID-19 Trends and I... | smoothed_cli | CC BY | https://cmu.box.com/s/qfxplcdrcn9retfzx4zniyug... | raw_wcli,raw_cli,smoothed_cli,smoothed_wcli,ra... |
4 | google-symptoms | Google Symptoms Search Trends | Google's [COVID-19 Search Trends symptoms data... | s05_smoothed_search | To download or use the data, you must agree to... | None | ageusia_raw_search,ageusia_smoothed_search,ano... |
... | ... | ... | ... | ... | ... | ... | ... |
13 | indicator-combination-nmf | Statistical Combination (NMF) | This source provides signals which are statist... | nmf_day_doc_fbs_ght | CC BY | None | nmf_day_doc_fbc_fbs_ght,nmf_day_doc_fbs_ght |
14 | safegraph-daily | SafeGraph (Daily) | [SafeGraph](https://docs.safegraph.com/docs/so... | completely_home_prop | CC BY | https://cmu.box.com/s/m0p1wpet4vuvey7od83n70h0... | completely_home_prop,completely_home_prop_7dav... |
15 | nchs-mortality | NCHS Mortality Data | This data source of national provisional death... | deaths_allcause_incidence_num | [NCHS Data Use Agreement](https://www.cdc.gov/... | None | deaths_allcause_incidence_num,deaths_allcause_... |
16 | dsew-cpr | COVID-19 Community Profile Report | This data source is based on the daily report ... | confirmed_admissions_covid_1d_7dav | [Public Domain US Government](https://www.usa.... | None | booster_doses_admin_7dav,confirmed_admissions_... |
17 | nssp | National Syndromic Surveillance Program | The National Syndromic Surveillance Program (N... | pct_ed_visits_covid | [Public Domain US Government](https://www.usa.... | None | pct_ed_visits_covid,pct_ed_visits_influenza,pc... |
18 rows × 7 columns
This DataFrame contains the following columns:
source
- API-internal source name.name
- Human-readable source name.description
- Description of the signal.reference_signal
- Geographic level for which this signal is available, such as county, state, msa, hss, hrr, or nation. Most signals are available at multiple geographic levels and will hence be listed in multiple rows with their own metadata.license
- The license.dua
- Link to the Data Use Agreement.signals
- List of signals available from this data source.
The signal_df
DataFrame can also be used to obtain information about the signals that are available - for example, what time range they are available for, and when they have been updated.
epidata.signal_df
source | signal | name | active | short_description | description | time_type | time_label | value_label | format | category | high_values_are | is_smoothed | is_weighted | is_cumulative | has_stderr | has_sample_size | geo_types | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | chng | smoothed_outpatient_cli | COVID-Related Doctor Visits | False | Estimated percentage of outpatient doctor visi... | Estimated percentage of outpatient doctor visi... | day | Date | Value | raw | early | bad | True | False | False | False | False | county,hhs,hrr,msa,nation,state |
1 | chng | smoothed_adj_outpatient_cli | COVID-Related Doctor Visits (Day-adjusted) | False | Estimated percentage of outpatient doctor visi... | Estimated percentage of outpatient doctor visi... | day | Date | Value | raw | early | bad | True | False | False | False | False | county,hhs,hrr,msa,nation,state |
2 | chng | smoothed_outpatient_covid | COVID-Confirmed Doctor Visits | False | COVID-Confirmed Doctor Visits | Estimated percentage of outpatient doctor visi... | day | Date | Value | raw | early | bad | True | False | False | False | False | county,hhs,hrr,msa,nation,state |
3 | chng | smoothed_adj_outpatient_covid | COVID-Confirmed Doctor Visits (Day-adjusted) | False | COVID-Confirmed Doctor Visits | Estimated percentage of outpatient doctor visi... | day | Date | Value | raw | early | bad | True | False | False | False | False | county,hhs,hrr,msa,nation,state |
4 | chng | smoothed_outpatient_flu | Influenza-Confirmed Doctor Visits | False | Estimated percentage of outpatient doctor visi... | Estimated percentage of outpatient doctor visi... | day | Day | Value | raw | early | bad | True | False | False | None | None | county,hhs,hrr,msa,nation,state |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
446 | nssp | pct_ed_visits_combined | Emergency Department Visits for COVID, Influen... | True | Percent of ED visits that had a discharge diag... | Percent of ED visits that had a discharge diag... | week | Week | Percentage | percent | other | bad | False | False | False | False | False | county,hhs,hrr,msa,nation,state |
447 | nssp | smoothed_pct_ed_visits_covid | COVID Emergency Department Visits (Percent of ... | True | 3-week moving average of percent of ED visits ... | 3-week moving average of percent of ED visits ... | week | Week | Percentage | percent | other | bad | True | False | False | False | False | county,hhs,hrr,msa,nation,state |
448 | nssp | smoothed_pct_ed_visits_influenza | Influenza Emergency Department Visits (Percent... | True | 3-week moving average of percent of ED visits ... | 3-week moving average of percent of ED visits ... | week | Week | Percentage | percent | other | bad | True | False | False | False | False | county,hhs,hrr,msa,nation,state |
449 | nssp | smoothed_pct_ed_visits_rsv | RSV Emergency Department Visits (Percent of to... | True | 3-week moving average of percent of ED visits ... | 3-week moving average of percent of ED visits ... | week | Week | Percentage | percent | other | bad | True | False | False | False | False | county,hhs,hrr,msa,nation,state |
450 | nssp | smoothed_pct_ed_visits_combined | Emergency Department Visits for COVID, Influen... | True | 3-week moving average of percent of ED visits ... | 3-week moving average of percent of ED visits ... | week | Week | Percentage | percent | other | bad | True | False | False | False | False | county,hhs,hrr,msa,nation,state |
451 rows × 18 columns
This DataFrame contains one row each available signal, with the following columns:
source
- Data source name.signal
- API-internal signal name.name
- Human-readable signal name.active
- Whether the signal is currently not updated or not. Signals may be inactive because the sources have become unavailable, other sources have replaced them, or additional work is required for us to continue updating them.short_description
- Brief description of the signal.description
- Full description of the signal.geo_types
- Spatial resolution of the signal (e.g.,county
,hrr
,msa
,dma
,state
). More detail about allgeo_types
is given in the geographic coding documentation.time_type
- Temporal resolution of the signal (e.g., day, week; see date coding details).time_label
- The time label (“Date”, “Week”).value_label
- The value label (“Value”, “Percentage”, “Visits”, “Visits per 100,000 people”).format
- The value format (“per100k”, “percent”, “fraction”, “count”, “raw”).category
- The signal category (“early”, “public”, “late”, “other”).high_values_are
- What the higher value of signal indicates (“good”, “bad”, “neutral”).is_smoothed
- Whether the signal is smoothed.is_weighted
- Whether the signal is weighted.is_cumulative
- Whether the signal is cumulative.has_stderr
- Whether the signal hasstderr
statistic.has_sample_size
- Whether the signal hassample_size
statistic.geo_types
- Geographical levels for which this signal is available.