Fetching Data¶
API keys¶
- covidcast.use_api_key(key)¶
Set the API key to use for all subsequent queries.
- Parameters:
key – String containing the API key for you and/or your group.
Anyone may access the Epidata API anonymously without providing an API key. Anonymous API access is currently rate-limited and with a maximum of two of the requested parameters having multiple selections (signals, dates, issues, regions, etc). To be exempt from these limits, use this function to apply an API key to all subsequent queries. You can register for an API key at <https://api.delphi.cmu.edu/epidata/admin/registration_form>.
Consult the API documentation for details on our API key policies.
Signals¶
This package provides a key function to obtain any signal of interest as a Pandas data frame. Detailed examples are provided in the usage examples.
- covidcast.signal(data_source, signal, start_day=None, end_day=None, geo_type='county', geo_values='*', as_of=None, issues=None, lag=None, time_type='day')¶
Download a Pandas data frame for one signal.
Obtains data for selected date ranges for all geographic regions of the United States. Available data sources and signals are documented in the COVIDcast signal documentation. Most (but not all) data sources are available at the county level, but the API can also return data aggregated to metropolitan statistical areas, hospital referral regions, or states, as desired, by using the
geo_type
argument.The COVIDcast API tracks updates and changes to its underlying data, and records the first date each observation became available. For example, a data source may report its estimate for a specific state on June 3rd on June 5th, once records become available. This data is considered “issued” on June 5th. Later, the data source may update its estimate for June 3rd based on revised data, creating a new issue on June 8th. By default,
signal()
returns the most recent issue available for every observation. Theas_of
,issues
, andlag
parameters allow the user to select specific issues instead, or to see all updates to observations. These options are mutually exclusive; if you specify more than one,as_of
will take priority overissues
, which will take priority overlag
.Note that the API only tracks the initial value of an estimate and changes to that value. If a value was first issued on June 5th and never updated, asking for data issued on June 6th (using
issues
orlag
) would not return that value, though asking for dataas_of
June 6th would.Note also that the API enforces a maximum result row limit; results beyond the maximum limit are truncated. This limit is sufficient to fetch observations in all counties in the United States on one day. This client automatically splits queries for multiple days across multiple API calls. However, if data for one day has been issued many times, using the
issues
argument may return more results than the query limit. A warning will be issued in this case. To see all results, split your query across multiple calls with differentissues
arguments.See the COVIDcast API documentation for more information on available geography types, signals, and data formats, and further discussion of issue dates and data versioning.
- Parameters:
data_source (
str
) – String identifying the data source to query, such as"fb-survey"
.signal (
str
) – String identifying the signal from that source to query, such as"smoothed_cli"
.start_day (
date
) – Query data beginning on this date. Provided as adatetime.date
object. Ifstart_day
isNone
, defaults to the first day data is available for this signal. Iftime_type == "week"
, then this is rounded to the epiweek containing the day (i.e. the previous Sunday).end_day (
date
) – Query data up to this date, inclusive. Provided as adatetime.date
object. Ifend_day
isNone
, defaults to the most recent day data is available for this signal. Iftime_type == "week"
, then this is rounded to the epiweek containing the day (i.e. the previous Sunday).geo_type (
str
) – The geography type for which to request this data, such as"county"
or"state"
. Available types are described in the COVIDcast signal documentation. Defaults to"county"
.geo_values (
Union
[str
,Iterable
[str
]]) – The geographies to fetch data for. The default,"*"
, fetches all geographies. To fetch one geography, specify its ID as a string; multiple geographies can be provided as an iterable (list, tuple, …) of strings.as_of (
date
) – Fetch only data that was available on or before this date, provided as adatetime.date
object. IfNone
, the default, return the most recent available data. Iftime_type == "week"
, then this is rounded to the epiweek containing the day (i.e. the previous Sunday).issues (
Union
[date
,Tuple
[date
],List
[date
]]) – Fetch only data that was published or updated (“issued”) on these dates. Provided as either a singledatetime.date
object, indicating a single date to fetch data issued on, or a tuple or list specifying (start, end) dates. In this case, return all data issued in this range. There may be multiple rows for each observation, indicating several updates to its value. IfNone
, the default, return the most recently issued data. Iftime_type == "week"
, then these are rounded to the epiweek containing the day (i.e. the previous Sunday).lag (
int
) – Integer. If, for example,lag=3
, fetch only data that was published or updated exactly 3 days after the date. For example, a row withtime_value
of June 3 will only be included in the results if its data was issued or updated on June 6. IfNone
, the default, return the most recently issued data regardless of its lag.time_type (
str
) – The temporal resolution to request this data. Most signals are available at the “day” resolution (the default); some are only available at the “week” resolution, representing an MMWR week (“epiweek”).
- Return type:
Optional
[DataFrame
]- Returns:
A Pandas data frame with matching data, or
None
if no data is returned. Each row is one observation on one day in one geographic location. Contains the following columns:geo_value
Identifies the location, such as a state name or county FIPS code. The geographic coding used by COVIDcast is described in the API documentation here.
signal
Name of the signal, same as the value of the
signal
input argument. Used for downstream functions to recognize where this signal is from.time_value
Contains a pandas Timestamp object identifying the date this estimate is for. For data with
time_type = "week"
, this is the first day of the corresponding epiweek.issue
Contains a pandas Timestamp object identifying the date this estimate was issued. For example, an estimate with a
time_value
of June 3 might have been issued on June 5, after the data for June 3rd was collected and ingested into the API.lag
Integer giving the difference between
issue
andtime_value
, in days.value
The signal quantity requested. For example, in a query for the
confirmed_cumulative_num
signal from theusa-facts
source, this would be the cumulative number of confirmed cases in the area, as of thetime_value
.stderr
The value’s standard error, if available.
sample_size
Indicates the sample size available in that geography on that day; sample size may not be available for all signals, due to privacy or other constraints.
geo_type
Geography type for the signal, same as the value of the
geo_type
input argument. Used for downstream functions to parsegeo_value
correctlydata_source
Name of the signal source, same as the value of the
data_source
input argument. Used for downstream functions to recognize where this signal is from.
Consult the signal documentation for more details on how values and standard errors are calculated for specific signals.
Sometimes you would like to work with multiple signals – for example, to obtain several signals at every location, as part of building models of features at each location. For convenience, the package provides a function to produce a single data frame containing multiple signals at each location.
- covidcast.aggregate_signals(signals, dt=None, join_type='outer')¶
Given a list of DataFrames, [optionally] lag each one and join them into one DataFrame.
This method takes a list of DataFrames containing signal information for geographic regions across time, and outputs a single DataFrame with a column for each signal value for each region/time. The
data_source
,signal
, and index of each DataFrame insignals
are appended to the front of each output column name separated by underscores (e.g.source_signal_0_inputcolumn
), and the original data_source and signal columns will be dropped. The input DataFrames must all be of the same geography type, and a singlegeo_type
column will be returned in the final DataFrame.Each signal’s time value can be shifted for analysis on lagged signals using the
dt
argument, which takes a list of integer days to lag each signal’s date. Lagging a signal by +1 day means that all the dates get shifted forward by 1 day (e.g. Jan 1 becomes Jan 2).- Parameters:
signals (
list
) – List of DataFrames to join.dt (
list
) – List of lags in days for each of the input DataFrames insignals
. Defaults toNone
. When provided, must be the same length assignals
.join_type (
str
) – Type of join to be done between the DataFrames insignals
. Defaults to"outer"
, so the output DataFrame contains all region/time combinations at which at least one signal was observed.
- Return type:
DataFrame
- Returns:
DataFrame of aggregated signals.
Metadata¶
Many data sources and signals are available, so one can also obtain a data frame of all signals and their associated metadata:
- covidcast.metadata()¶
Fetch COVIDcast surveillance stream metadata.
Obtains a data frame of metadata describing all publicly available data streams from the COVIDcast API. See the data source and signals documentation for descriptions of the available sources.
- Return type:
DataFrame
- Returns:
A data frame containing one row per available signal, with the following columns:
data_source
Data source name.
signal
Signal name.
time_type
Temporal resolution at which this signal is reported. “day”, for example, means the signal is reported daily.
geo_type
Geographic level for which this signal is available, such as county, state, msa, hss, hrr, or nation. Most signals are available at multiple geographic levels and will hence be listed in multiple rows with their own metadata.
min_time
First day for which this signal is available. For weekly signals, will be the first day of the epiweek.
max_time
Most recent day for which this signal is available. For weekly signals, will be the first day of the epiweek.
num_locations
Number of distinct geographic locations available for this signal. For example, if geo_type is county, the number of counties for which this signal has ever been reported.
min_value
The smallest value that has ever been reported.
max_value
The largest value that has ever been reported.
mean_value
The arithmetic mean of all reported values.
stdev_value
The sample standard deviation of all reported values.
last_update
The UTC datetime for when the signal value was last updated.
max_issue
Most recent date data was issued.
min_lag
Smallest lag from observation to issue, in days.
max_lag
Largest lag from observation to issue, in days.
Working with geographic identifiers¶
The COVIDcast API identifies each geographic region – such as a county or state – using unique codes. For example, counties are identified by their FIPS codes, while states are identified by a two-letter abbreviation; more detail is given in the geographic coding documentation.
When fetching data from the API, you may need to quickly convert between human-readable names and unique identifiers. The following functions are provided for your convenience.
Counties¶
- covidcast.fips_to_name(code, ignore_case=False, fixed=False, ties_method='first')¶
Look up county names by FIPS codes with regular expression support.
Given an individual or list of FIPS codes or regular expressions, look up the corresponding county names.
- Parameters:
code (
Union
[str
,Iterable
[str
]]) – Individual or list of FIPS codes or regular expressions.ignore_case (
bool
) – Boolean for whether or not to be case insensitive in the regular expression. Iffixed=True
, this argument is ignored. Defaults toFalse
.fixed (
bool
) – Conduct an exact case sensitive match with the input string. Defaults toFalse
.ties_method (
str
) – Method for determining how to deal with multiple outputs for a given input. Must be one of"all"
or"first"
. If"first"
, then only the first match for each code is returned. If"all"
, then all matches for each code are returned. Defaults tofirst
.
- Return type:
list
- Returns:
If
ties_method="first"
, returns a list of the first value found for each input key. Ifties_method="all"
, returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of county names. The returned list will be the same length as the input, withNone
or{}
if no values are found forties_method="first"
andties_method="all"
, respectively.
- covidcast.name_to_fips(name, ignore_case=False, fixed=False, ties_method='first', state=None)¶
Look up FIPS codes by county names with regular expression support.
Given an individual or list of county names or regular expressions, look up the corresponding FIPS codes.
- Parameters:
name (
Union
[str
,Iterable
[str
]]) – Individual or list of county names or regular expressions.ignore_case (
bool
) – Boolean for whether or not to be case insensitive in the regular expression. Iffixed=True
, this argument is ignored. Defaults toFalse
.fixed (
bool
) – Conduct an exact case sensitive match with the input string. Defaults toFalse
.ties_method (
str
) – Method for determining how to deal with multiple outputs for a given input. Must be one of"all"
or"first"
. If"first"
, then only the first match for each code is returned. If"all"
, then all matches for each code are returned. Defaults tofirst
.state (
str
) – 2 letter state code, case insensitive, to restrict results to.
- Return type:
list
- Returns:
If
ties_method="first"
, returns a list of the first value found for each input key. Ifties_method="all"
, returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of FIPS. The returned list will be the same length as the input, withNone
or{}
if no values are found forties_method="first"
andties_method="all"
, respectively.
Metropolitan statistical areas¶
- covidcast.cbsa_to_name(code, ignore_case=False, fixed=False, ties_method='first')¶
Look up MSA names by codes with regular expression support.
Given an individual or list of FIPS codes or regular expressions, look up the corresponding MSA names.
- Parameters:
code (
Union
[str
,Iterable
[str
]]) – Individual or list of FIPS codes or regular expressions.ignore_case (
bool
) – Boolean for whether or not to be case insensitive in the regular expression. Iffixed=True
, this argument is ignored. Defaults toFalse
.fixed (
bool
) – Conduct an exact case sensitive match with the input string. Defaults toFalse
.ties_method (
str
) – Method for determining how to deal with multiple outputs for a given input. Must be one of"all"
or"first"
. If"first"
, then only the first match for each code is returned. If"all"
, then all matches for each code are returned. Defaults tofirst
.
- Return type:
list
- Returns:
If
ties_method="first"
, returns a list of the first value found for each input key. Ifties_method="all"
, returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of MSA names. The returned list will be the same length as the input, withNone
or{}
if no values are found forties_method="first"
andties_method="all"
, respectively.
- covidcast.name_to_cbsa(name, ignore_case=False, fixed=False, ties_method='first', state=None)¶
Look up MSA codes by names with regular expression support.
Given an individual or list of names or regular expressions, look up the corresponding MSA codes.
- Parameters:
name (
Union
[str
,Iterable
[str
]]) – Individual or list of MSA names or regular expressions.ignore_case (
bool
) – Boolean for whether or not to be case insensitive in the regular expression. Iffixed=True
, this argument is ignored. Defaults toFalse
.fixed (
bool
) – Conduct an exact case sensitive match with the input string. Defaults toFalse
.ties_method (
str
) – Method for determining how to deal with multiple outputs for a given input. Must be one of"all"
or"first"
. If"first"
, then only the first match for each code is returned. If"all"
, then all matches for each code are returned. Defaults tofirst
.state (
str
) – 2 letter state code, case insensitive, to restrict results to.
- Return type:
list
- Returns:
If
ties_method="first"
, returns a list of the first value found for each input key. Ifties_method="all"
, returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of MSA codes. The returned list will be the same length as the input, withNone
or{}
if no values are found forties_method="first"
andties_method="all"
, respectively.
States¶
- covidcast.abbr_to_name(abbr, ignore_case=False, fixed=False, ties_method='first')¶
Look up state name by abbreviation with regular expression support.
Given an individual or list of state abbreviations or regular expressions, look up the corresponding state names.
- Parameters:
abbr (
Union
[str
,Iterable
[str
]]) – Individual or list of state abbreviations or regular expressions.ignore_case (
bool
) – Boolean for whether or not to be case insensitive in the regular expression. Iffixed=True
, this argument is ignored. Defaults toFalse
.fixed (
bool
) – Conduct an exact case sensitive match with the input string. Defaults toFalse
.ties_method (
str
) – Method for determining how to deal with multiple outputs for a given input. Must be one of"all"
or"first"
. If"first"
, then only the first match for each code is returned. If"all"
, then all matches for each code are returned. Defaults tofirst
.
- Return type:
list
- Returns:
If
ties_method="first"
, returns a list of the first value found for each input key. Ifties_method="all"
, returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of state names. The returned list will be the same length as the input, withNone
or{}
if no values are found forties_method="first"
andties_method="all"
, respectively.
- covidcast.name_to_abbr(name, ignore_case=False, fixed=False, ties_method='first')¶
Look up state abbreviation by name with regular expression support.
Given an individual or list of state names or regular expressions, look up the corresponding state abbreviations.
- Parameters:
name (
Union
[str
,Iterable
[str
]]) – Individual or list of state names or regular expressions.ignore_case (
bool
) – Boolean for whether or not to be case insensitive in the regular expression. Iffixed=True
, this argument is ignored. Defaults toFalse
.fixed (
bool
) – Conduct an exact case sensitive match with the input string. Defaults toFalse
.ties_method (
str
) – Method for determining how to deal with multiple outputs for a given input. Must be one of"all"
or"first"
. If"first"
, then only the first match for each code is returned. If"all"
, then all matches for each code are returned. Defaults tofirst
.
- Return type:
list
- Returns:
If
ties_method="first"
, returns a list of the first value found for each input key. Ifties_method="all"
, returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of state abbreviations. The returned list will be the same length as the input, withNone
or{}
if no values are found forties_method="first"
andties_method="all"
, respectively.
- covidcast.abbr_to_fips(code, ignore_case=False, fixed=False, ties_method='first')¶
Look up state FIPS codes by abbreviation with regular expression support.
Given an individual or list of state abbreviations or regular expressions, look up the corresponding state FIPS codes. The returned codes are 5 digits: the 2 digit state FIPS with 000 appended to the end.
- Parameters:
code (
Union
[str
,Iterable
[str
]]) – Individual or list of abbreviations or regular expressions.ignore_case (
bool
) – Boolean for whether or not to be case insensitive in the regular expression. Iffixed=True
, this argument is ignored. Defaults toFalse
.fixed (
bool
) – Conduct an exact case sensitive match with the input string. Defaults toFalse
.ties_method (
str
) – Method for determining how to deal with multiple outputs for a given input. Must be one of"all"
or"first"
. If"first"
, then only the first match for each code is returned. If"all"
, then all matches for each code are returned. Defaults tofirst
.
- Return type:
list
- Returns:
If
ties_method="first"
, returns a list of the first value found for each input key. Ifties_method="all"
, returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of county names. The returned list will be the same length as the input, withNone
or{}
if no values are found forties_method="first"
andties_method="all"
, respectively.
- covidcast.fips_to_abbr(code, ignore_case=False, fixed=False, ties_method='first')¶
Look up state abbreviation by FIPS codes with regular expression support.
Given an individual or list of FIPS codes or regular expressions, look up the corresponding state abbreviation. FIPS codes can be the 2 digit code (
covidcast.fips_to_abbr("12")
) or the 2 digit code with 000 appended to the end (covidcast.fips_to_abbr("12000")
.- Parameters:
code (
Union
[str
,Iterable
[str
]]) – Individual or list of FIPS codes or regular expressions.ignore_case (
bool
) – Boolean for whether or not to be case insensitive in the regular expression. Iffixed=True
, this argument is ignored. Defaults toFalse
.fixed (
bool
) – Conduct an exact case sensitive match with the input string. Defaults toFalse
.ties_method (
str
) – Method for determining how to deal with multiple outputs for a given input. Must be one of"all"
or"first"
. If"first"
, then only the first match for each code is returned. If"all"
, then all matches for each code are returned. Defaults tofirst
.
- Return type:
list
- Returns:
If
ties_method="first"
, returns a list of the first value found for each input key. Ifties_method="all"
, returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of county names. The returned list will be the same length as the input, withNone
or{}
if no values are found forties_method="first"
andties_method="all"
, respectively.