Fetching Data

Signals

This package provides a key function to obtain any signal of interest as a Pandas data frame. Detailed examples are provided in the usage examples.

covidcast.signal(data_source, signal, start_day=None, end_day=None, geo_type='county', geo_values='*', as_of=None, issues=None, lag=None, time_type='day')

Download a Pandas data frame for one signal.

Obtains data for selected date ranges for all geographic regions of the United States. Available data sources and signals are documented in the COVIDcast signal documentation. Most (but not all) data sources are available at the county level, but the API can also return data aggregated to metropolitan statistical areas, hospital referral regions, or states, as desired, by using the geo_type argument.

The COVIDcast API tracks updates and changes to its underlying data, and records the first date each observation became available. For example, a data source may report its estimate for a specific state on June 3rd on June 5th, once records become available. This data is considered “issued” on June 5th. Later, the data source may update its estimate for June 3rd based on revised data, creating a new issue on June 8th. By default, signal() returns the most recent issue available for every observation. The as_of, issues, and lag parameters allow the user to select specific issues instead, or to see all updates to observations. These options are mutually exclusive; if you specify more than one, as_of will take priority over issues, which will take priority over lag.

Note that the API only tracks the initial value of an estimate and changes to that value. If a value was first issued on June 5th and never updated, asking for data issued on June 6th (using issues or lag) would not return that value, though asking for data as_of June 6th would.

Note also that the API enforces a maximum result row limit; results beyond the maximum limit are truncated. This limit is sufficient to fetch observations in all counties in the United States on one day. This client automatically splits queries for multiple days across multiple API calls. However, if data for one day has been issued many times, using the issues argument may return more results than the query limit. A warning will be issued in this case. To see all results, split your query across multiple calls with different issues arguments.

See the COVIDcast API documentation for more information on available geography types, signals, and data formats, and further discussion of issue dates and data versioning.

Parameters
  • data_source (str) – String identifying the data source to query, such as "fb-survey".

  • signal (str) – String identifying the signal from that source to query, such as "smoothed_cli".

  • start_day (Optional[date]) – Query data beginning on this date. Provided as a datetime.date object. If start_day is None, defaults to the first day data is available for this signal. If time_type == "week", then this is rounded to the epiweek containing the day (i.e. the previous Sunday).

  • end_day (Optional[date]) – Query data up to this date, inclusive. Provided as a datetime.date object. If end_day is None, defaults to the most recent day data is available for this signal. If time_type == "week", then this is rounded to the epiweek containing the day (i.e. the previous Sunday).

  • geo_type (str) – The geography type for which to request this data, such as "county" or "state". Available types are described in the COVIDcast signal documentation. Defaults to "county".

  • geo_values (Union[str, Iterable[str]]) – The geographies to fetch data for. The default, "*", fetches all geographies. To fetch one geography, specify its ID as a string; multiple geographies can be provided as an iterable (list, tuple, …) of strings.

  • as_of (Optional[date]) – Fetch only data that was available on or before this date, provided as a datetime.date object. If None, the default, return the most recent available data. If time_type == "week", then this is rounded to the epiweek containing the day (i.e. the previous Sunday).

  • issues (Union[date, Tuple[date], List[date], None]) – Fetch only data that was published or updated (“issued”) on these dates. Provided as either a single datetime.date object, indicating a single date to fetch data issued on, or a tuple or list specifying (start, end) dates. In this case, return all data issued in this range. There may be multiple rows for each observation, indicating several updates to its value. If None, the default, return the most recently issued data. If time_type == "week", then these are rounded to the epiweek containing the day (i.e. the previous Sunday).

  • lag (Optional[int]) – Integer. If, for example, lag=3, fetch only data that was published or updated exactly 3 days after the date. For example, a row with time_value of June 3 will only be included in the results if its data was issued or updated on June 6. If None, the default, return the most recently issued data regardless of its lag.

  • time_type (str) – The temporal resolution to request this data. Most signals are available at the “day” resolution (the default); some are only available at the “week” resolution, representing an MMWR week (“epiweek”).

Return type

Optional[DataFrame]

Returns

A Pandas data frame with matching data, or None if no data is returned. Each row is one observation on one day in one geographic location. Contains the following columns:

geo_value

Identifies the location, such as a state name or county FIPS code. The geographic coding used by COVIDcast is described in the API documentation here.

signal

Name of the signal, same as the value of the signal input argument. Used for downstream functions to recognize where this signal is from.

time_value

Contains a pandas Timestamp object identifying the date this estimate is for. For data with time_type = "week", this is the first day of the corresponding epiweek.

issue

Contains a pandas Timestamp object identifying the date this estimate was issued. For example, an estimate with a time_value of June 3 might have been issued on June 5, after the data for June 3rd was collected and ingested into the API.

lag

Integer giving the difference between issue and time_value, in days.

value

The signal quantity requested. For example, in a query for the confirmed_cumulative_num signal from the usa-facts source, this would be the cumulative number of confirmed cases in the area, as of the time_value.

stderr

The value’s standard error, if available.

sample_size

Indicates the sample size available in that geography on that day; sample size may not be available for all signals, due to privacy or other constraints.

geo_type

Geography type for the signal, same as the value of the geo_type input argument. Used for downstream functions to parse geo_value correctly

data_source

Name of the signal source, same as the value of the data_source input argument. Used for downstream functions to recognize where this signal is from.

Consult the signal documentation for more details on how values and standard errors are calculated for specific signals.

Sometimes you would like to work with multiple signals – for example, to obtain several signals at every location, as part of building models of features at each location. For convenience, the package provides a function to produce a single data frame containing multiple signals at each location.

covidcast.aggregate_signals(signals, dt=None, join_type='outer')

Given a list of DataFrames, [optionally] lag each one and join them into one DataFrame.

This method takes a list of DataFrames containing signal information for geographic regions across time, and outputs a single DataFrame with a column for each signal value for each region/time. The data_source, signal, and index of each DataFrame in signals are appended to the front of each output column name separated by underscores (e.g. source_signal_0_inputcolumn), and the original data_source and signal columns will be dropped. The input DataFrames must all be of the same geography type, and a single geo_type column will be returned in the final DataFrame.

Each signal’s time value can be shifted for analysis on lagged signals using the dt argument, which takes a list of integer days to lag each signal’s date. Lagging a signal by +1 day means that all the dates get shifted forward by 1 day (e.g. Jan 1 becomes Jan 2).

Parameters
  • signals (list) – List of DataFrames to join.

  • dt (Optional[list]) – List of lags in days for each of the input DataFrames in signals. Defaults to None. When provided, must be the same length as signals.

  • join_type (str) – Type of join to be done between the DataFrames in signals. Defaults to "outer", so the output DataFrame contains all region/time combinations at which at least one signal was observed.

Return type

DataFrame

Returns

DataFrame of aggregated signals.

Metadata

Many data sources and signals are available, so one can also obtain a data frame of all signals and their associated metadata:

covidcast.metadata()

Fetch COVIDcast surveillance stream metadata.

Obtains a data frame of metadata describing all publicly available data streams from the COVIDcast API. See the data source and signals documentation for descriptions of the available sources.

Return type

DataFrame

Returns

A data frame containing one row per available signal, with the following columns:

data_source

Data source name.

signal

Signal name.

time_type

Temporal resolution at which this signal is reported. “day”, for example, means the signal is reported daily.

geo_type

Geographic level for which this signal is available, such as county, state, msa, hss, hrr, or nation. Most signals are available at multiple geographic levels and will hence be listed in multiple rows with their own metadata.

min_time

First day for which this signal is available. For weekly signals, will be the first day of the epiweek.

max_time

Most recent day for which this signal is available. For weekly signals, will be the first day of the epiweek.

num_locations

Number of distinct geographic locations available for this signal. For example, if geo_type is county, the number of counties for which this signal has ever been reported.

min_value

The smallest value that has ever been reported.

max_value

The largest value that has ever been reported.

mean_value

The arithmetic mean of all reported values.

stdev_value

The sample standard deviation of all reported values.

last_update

The UTC datetime for when the signal value was last updated.

max_issue

Most recent date data was issued.

min_lag

Smallest lag from observation to issue, in days.

max_lag

Largest lag from observation to issue, in days.

Working with geographic identifiers

The COVIDcast API identifies each geographic region – such as a county or state – using unique codes. For example, counties are identified by their FIPS codes, while states are identified by a two-letter abbreviation; more detail is given in the geographic coding documentation.

When fetching data from the API, you may need to quickly convert between human-readable names and unique identifiers. The following functions are provided for your convenience.

Counties

covidcast.fips_to_name(code, ignore_case=False, fixed=False, ties_method='first')

Look up county names by FIPS codes with regular expression support.

Given an individual or list of FIPS codes or regular expressions, look up the corresponding county names.

Parameters
  • code (Union[str, Iterable[str]]) – Individual or list of FIPS codes or regular expressions.

  • ignore_case (bool) – Boolean for whether or not to be case insensitive in the regular expression. If fixed=True, this argument is ignored. Defaults to False.

  • fixed (bool) – Conduct an exact case sensitive match with the input string. Defaults to False.

  • ties_method (str) – Method for determining how to deal with multiple outputs for a given input. Must be one of "all" or "first". If "first", then only the first match for each code is returned. If "all", then all matches for each code are returned. Defaults to first.

Return type

list

Returns

If ties_method="first", returns a list of the first value found for each input key. If ties_method="all", returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of county names. The returned list will be the same length as the input, with None or {} if no values are found for ties_method="first" and ties_method="all", respectively.

covidcast.name_to_fips(name, ignore_case=False, fixed=False, ties_method='first', state=None)

Look up FIPS codes by county names with regular expression support.

Given an individual or list of county names or regular expressions, look up the corresponding FIPS codes.

Parameters
  • name (Union[str, Iterable[str]]) – Individual or list of county names or regular expressions.

  • ignore_case (bool) – Boolean for whether or not to be case insensitive in the regular expression. If fixed=True, this argument is ignored. Defaults to False.

  • fixed (bool) – Conduct an exact case sensitive match with the input string. Defaults to False.

  • ties_method (str) – Method for determining how to deal with multiple outputs for a given input. Must be one of "all" or "first". If "first", then only the first match for each code is returned. If "all", then all matches for each code are returned. Defaults to first.

  • state (Optional[str]) – 2 letter state code, case insensitive, to restrict results to.

Return type

list

Returns

If ties_method="first", returns a list of the first value found for each input key. If ties_method="all", returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of FIPS. The returned list will be the same length as the input, with None or {} if no values are found for ties_method="first" and ties_method="all", respectively.

Metropolitan statistical areas

covidcast.cbsa_to_name(code, ignore_case=False, fixed=False, ties_method='first')

Look up MSA names by codes with regular expression support.

Given an individual or list of FIPS codes or regular expressions, look up the corresponding MSA names.

Parameters
  • code (Union[str, Iterable[str]]) – Individual or list of FIPS codes or regular expressions.

  • ignore_case (bool) – Boolean for whether or not to be case insensitive in the regular expression. If fixed=True, this argument is ignored. Defaults to False.

  • fixed (bool) – Conduct an exact case sensitive match with the input string. Defaults to False.

  • ties_method (str) – Method for determining how to deal with multiple outputs for a given input. Must be one of "all" or "first". If "first", then only the first match for each code is returned. If "all", then all matches for each code are returned. Defaults to first.

Return type

list

Returns

If ties_method="first", returns a list of the first value found for each input key. If ties_method="all", returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of MSA names. The returned list will be the same length as the input, with None or {} if no values are found for ties_method="first" and ties_method="all", respectively.

covidcast.name_to_cbsa(name, ignore_case=False, fixed=False, ties_method='first', state=None)

Look up MSA codes by names with regular expression support.

Given an individual or list of names or regular expressions, look up the corresponding MSA codes.

Parameters
  • name (Union[str, Iterable[str]]) – Individual or list of MSA names or regular expressions.

  • ignore_case (bool) – Boolean for whether or not to be case insensitive in the regular expression. If fixed=True, this argument is ignored. Defaults to False.

  • fixed (bool) – Conduct an exact case sensitive match with the input string. Defaults to False.

  • ties_method (str) – Method for determining how to deal with multiple outputs for a given input. Must be one of "all" or "first". If "first", then only the first match for each code is returned. If "all", then all matches for each code are returned. Defaults to first.

  • state (Optional[str]) – 2 letter state code, case insensitive, to restrict results to.

Return type

list

Returns

If ties_method="first", returns a list of the first value found for each input key. If ties_method="all", returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of MSA codes. The returned list will be the same length as the input, with None or {} if no values are found for ties_method="first" and ties_method="all", respectively.

States

covidcast.abbr_to_name(abbr, ignore_case=False, fixed=False, ties_method='first')

Look up state name by abbreviation with regular expression support.

Given an individual or list of state abbreviations or regular expressions, look up the corresponding state names.

Parameters
  • abbr (Union[str, Iterable[str]]) – Individual or list of state abbreviations or regular expressions.

  • ignore_case (bool) – Boolean for whether or not to be case insensitive in the regular expression. If fixed=True, this argument is ignored. Defaults to False.

  • fixed (bool) – Conduct an exact case sensitive match with the input string. Defaults to False.

  • ties_method (str) – Method for determining how to deal with multiple outputs for a given input. Must be one of "all" or "first". If "first", then only the first match for each code is returned. If "all", then all matches for each code are returned. Defaults to first.

Return type

list

Returns

If ties_method="first", returns a list of the first value found for each input key. If ties_method="all", returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of state names. The returned list will be the same length as the input, with None or {} if no values are found for ties_method="first" and ties_method="all", respectively.

covidcast.name_to_abbr(name, ignore_case=False, fixed=False, ties_method='first')

Look up state abbreviation by name with regular expression support.

Given an individual or list of state names or regular expressions, look up the corresponding state abbreviations.

Parameters
  • name (Union[str, Iterable[str]]) – Individual or list of state names or regular expressions.

  • ignore_case (bool) – Boolean for whether or not to be case insensitive in the regular expression. If fixed=True, this argument is ignored. Defaults to False.

  • fixed (bool) – Conduct an exact case sensitive match with the input string. Defaults to False.

  • ties_method (str) – Method for determining how to deal with multiple outputs for a given input. Must be one of "all" or "first". If "first", then only the first match for each code is returned. If "all", then all matches for each code are returned. Defaults to first.

Return type

list

Returns

If ties_method="first", returns a list of the first value found for each input key. If ties_method="all", returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of state abbreviations. The returned list will be the same length as the input, with None or {} if no values are found for ties_method="first" and ties_method="all", respectively.

covidcast.abbr_to_fips(code, ignore_case=False, fixed=False, ties_method='first')

Look up state FIPS codes by abbreviation with regular expression support.

Given an individual or list of state abbreviations or regular expressions, look up the corresponding state FIPS codes. The returned codes are 5 digits: the 2 digit state FIPS with 000 appended to the end.

Parameters
  • code (Union[str, Iterable[str]]) – Individual or list of abbreviations or regular expressions.

  • ignore_case (bool) – Boolean for whether or not to be case insensitive in the regular expression. If fixed=True, this argument is ignored. Defaults to False.

  • fixed (bool) – Conduct an exact case sensitive match with the input string. Defaults to False.

  • ties_method (str) – Method for determining how to deal with multiple outputs for a given input. Must be one of "all" or "first". If "first", then only the first match for each code is returned. If "all", then all matches for each code are returned. Defaults to first.

Return type

list

Returns

If ties_method="first", returns a list of the first value found for each input key. If ties_method="all", returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of county names. The returned list will be the same length as the input, with None or {} if no values are found for ties_method="first" and ties_method="all", respectively.

covidcast.fips_to_abbr(code, ignore_case=False, fixed=False, ties_method='first')

Look up state abbreviation by FIPS codes with regular expression support.

Given an individual or list of FIPS codes or regular expressions, look up the corresponding state abbreviation. FIPS codes can be the 2 digit code (covidcast.fips_to_abbr("12")) or the 2 digit code with 000 appended to the end (covidcast.fips_to_abbr("12000").

Parameters
  • code (Union[str, Iterable[str]]) – Individual or list of FIPS codes or regular expressions.

  • ignore_case (bool) – Boolean for whether or not to be case insensitive in the regular expression. If fixed=True, this argument is ignored. Defaults to False.

  • fixed (bool) – Conduct an exact case sensitive match with the input string. Defaults to False.

  • ties_method (str) – Method for determining how to deal with multiple outputs for a given input. Must be one of "all" or "first". If "first", then only the first match for each code is returned. If "all", then all matches for each code are returned. Defaults to first.

Return type

list

Returns

If ties_method="first", returns a list of the first value found for each input key. If ties_method="all", returns a list of dicts, one for each input, with keys corresponding to all matched input keys and values corresponding to the list of county names. The returned list will be the same length as the input, with None or {} if no values are found for ties_method="first" and ties_method="all", respectively.