{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Finding data of interest\n", "\n", "The Epidata API includes numerous data streams -- medical claims data, cases and\n", "deaths, mobility, and many others -- covering different geographic regions. This\n", "can make it a challenge to find the data stream that you are most interested in.\n", "This page will provide some advice on how to locate donate that may be useful to\n", "you.\n", "\n", "## Using the Delphi Epidata API documentation\n", "\n", "The Delphi Epidata API documentation lists all the available data sources and\n", "signals for\n", "[COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html)\n", "and for [other\n", "diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters).\n", "The site also includes a search tool if you have a keyword (e.g. \"Taiwan\") in\n", "mind. Generally, any endpoint listed in the Delphi Epidata API has an associated\n", "function in this client where its API endpoint name is prefixed with either\n", "`pub_` or `pvt_`, e.g. `pub_covidcast` or `pvt_twitter`.\n", "\n", "## Epidata data sources\n", "\n", "The parameters available for each source data are documented in each linked\n", "source-specific API page. The epidatpy client will also expect certain fields,\n", "depending on the endpoint, though the Delphi Epidata API documentation will\n", "contain more information about the accepted ranges of values for each field. \n", "\n", "A dynamically generated list of all available data sources can be obtained by\n", "using the built-in `available_endpoints()`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# Hidden cell (set in the metadata for this cell)\n", "import pandas as pd\n", "\n", "# Set common options and context\n", "pd.set_option(\"display.max_columns\", None)\n", "pd.set_option(\"display.max_rows\", 10)\n", "pd.set_option(\"display.width\", 1000)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import HTML\n", "\n", "from epidatpy import available_endpoints\n", "\n", "table = available_endpoints()\n", "HTML(table.to_html(index=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Covidcast source and signal metadata\n", "\n", "The `CovidcastEpidata` class provides a way to access information about the data\n", "in the `pub_covidcast` endpoint directly from within the client. The cell below\n", "demonstrates how to access this metadata by using `source_df` property, which\n", "returns a Pandas DataFrame of metadata describing all data streams publically\n", "accessible from the COVIDcast endpoint of the Delphi Epidata API. This mirrors\n", "the information found in the [COVIDcast signals\n", "endpoint](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from epidatpy import CovidcastEpidata\n", "\n", "epidata = CovidcastEpidata()\n", "epidata.source_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This DataFrame contains the following columns:\n", "\n", "- `source` - API-internal source name.\n", "- `name` - Human-readable source name.\n", "- `description` - Description of the signal.\n", "- `reference_signal` - Geographic level for which this signal is available, such as county, state, msa, hss, hrr, or nation. Most signals are available at multiple geographic levels and will hence be listed in multiple rows with their own metadata.\n", "- `license` - The license.\n", "- `dua` - Link to the Data Use Agreement.\n", "- `signals` - List of signals available from this data source.\n", "\n", "The `signal_df` DataFrame can also be used to obtain information about the signals\n", "that are available - for example, what time range they are available for,\n", "and when they have been updated." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "epidata.signal_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This DataFrame contains one row each available signal, with the following columns:\n", "\n", "- `source` - Data source name.\n", "- `signal` - API-internal signal name.\n", "- `name` - Human-readable signal name.\n", "- `active` - Whether the signal is currently not updated or not. Signals may be inactive because the sources have become unavailable, other sources have replaced them, or additional work is required for us to continue updating them.\n", "- `short_description` - Brief description of the signal.\n", "- `description` - Full description of the signal.\n", "- `geo_types` - Spatial resolution of the signal (e.g., `county`, `hrr`, `msa`, `dma`, `state`). More detail about all `geo_types` is given in the [geographic coding documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html).\n", "- `time_type` - Temporal resolution of the signal (e.g., day, week; see [date coding details](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_times.html)).\n", "- `time_label` - The time label (\"Date\", \"Week\").\n", "- `value_label` - The value label (\"Value\", \"Percentage\", \"Visits\", \"Visits per 100,000 people\").\n", "- `format` - The value format (\"per100k\", \"percent\", \"fraction\", \"count\", \"raw\").\n", "- `category` - The signal category (\"early\", \"public\", \"late\", \"other\").\n", "- `high_values_are`- What the higher value of signal indicates (\"good\", \"bad\", \"neutral\").\n", "- `is_smoothed` - Whether the signal is smoothed.\n", "- `is_weighted` - Whether the signal is weighted.\n", "- `is_cumulative` - Whether the signal is cumulative.\n", "- `has_stderr` - Whether the signal has `stderr` statistic.\n", "- `has_sample_size` - Whether the signal has `sample_size` statistic.\n", "- `geo_types` - Geographical levels for which this signal is available.\n" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 2 }