Currently in development, launching early 2021.
Global.health is a not-for-profit data science initiative and international consortium of researchers, engineers, designers, and volunteers tracking, standardizing, and organizing open access epidemiological data for the public health community. You can find out more about our team, funders, and partners on our About page.
Our mission is to enable the rapid sharing of trusted and open public health data to advance the response to infectious disease outbreaks. Our initial focus area is providing a centralized resource of line-list COVID-19 case data from around the world.
The data currently available on Global.health are detailed COVID-19 line-list case data from over 100 sources organized into a standardized schema including up to 40 fields of metadata where available. As of publication, our database includes over 100 million cases from over 100 countries. You can explore the Global.health database using our Data or Map web applications.
Data on Global.health come from a variety of carefully vetted sources, including official government public health agencies, trusted media outlets, and established open-access repositories. Each entry in the database has an associated source name and URL visible when clicking on a record or downloading the data. All data source acknowledgements can be found here.
All data on the Global.health platform goes through a rigorous curation, ingestion, and verification process managed by a team of researchers, graduate students, and volunteers at universities across the world. We’re always looking for new team members to join us, and encourage you to get involved if you’re interested in contributing to the future of open public health data!
Most data available on the Global.health platform are detailed, anonymized, line-list data. The data are standardized using a computational pipeline where geo-codes are added to each entry in the database. This differs from other public health data repositories and websites which are often presented in aggregate form. Global.health also provides open source code so the platform can be replicated and adapted as the pandemic progresses and new challenges and data types become available.
“Line-list” data capture granular case details such as when individuals became symptomatic, how long it takes for a test result to be returned, or whether a person had recent travel history. These data are critically important to identify the natural history of a disease, assess the turnaround time for testing, and understand how imported cases trigger new outbreaks. Line-list data also include demographic information which help understand which age groups may be most at risk of becoming infected or infecting others.
Global.health was developed by an interdisciplinary group of researchers (epidemiologists and computational scientists) and enabled by a philanthropic contribution from Google.org; including a group of Fellows and contributors. We invite you to get to know our team on our About page!
Global.health is funded by grants from Google.org, The Rockefeller Foundation, and the Oxford Martin School, as well as in-kind support from our participating institutions and partners. All code from the platform is openly available on our GitHub.
Please visit our Press section for more information on G.h’s origin story and interviews with our founders.
We have a standard schema for line-list entries based on research with numerous public health agencies and subject matter experts. Each record contains up to 40 fields of metadata. You can access our data dictionary on GitHub, and a full description of our process is detailed in our peer reviewed publication in Nature Scientific Data.
Please cite the data by first citing the underlying data descriptor:
Xu, B., Gutierrez, B., Mekaru, S. et al. Epidemiological data from the COVID-19 outbreak, real-time case information. Sci Data 7, 106 (2020). https://doi.org/10.1038/s41597-020-0448-0
And when the data was downloaded:
Global.health contributors. Detailed Epidemiological Data from the COVID-19 Pandemic. Accessed on YYYY-MM-DD from https://global.health. (2021)
Today our data is aggregated to a reasonably crude spatial resolution (administrative level 3) and only openly available data is ingested. Further, any unique identifiers are removed (such as names or addresses). In the near future we are looking to expand our data aggregation process by including differential privacy algorithms.
Protecting privacy is central to our values and mission. In addition to our own standard of care in ensuring all data on our platform has been anonymized, we performed a Data Protection Impact Assessment (DPIA) which is common practice when dealing with health data. We are happy to provide a more detailed description of the DPIA upon request and plan to publish a summary in due course.
We welcome contributions of line-list case data from research groups, public health laboratories, and accredited individual practitioners. Please get in touch describing the data you’ve collected, including its time-frame, and the country(ies) of origin. While our current focus is COVID-19 and its Variants of Concern (VOCs), we look forward to facilitating the sharing of datasets for other infectious diseases.
You can explore the latest research enabled by Global.health data here.
Please visit our GitHub account to raise a bug or suggest a new feature. All code is accessible and you can make pull requests too. You’re also welcome to reach out to us via email.
We’re always looking for new additions to our growing team of collaborators and contributors. We embrace a diverse range of skill sets, nationalities, and personalities and encourage you to get involved!
Please see our API documentation and instructions on GitHub.
While our primary focus is on providing comprehensive COVID-19 line-list data, our vision for Global.health is an open access data sharing platform for the international public health community to better understand, anticipate, and scale health data efforts for all infectious diseases. Our hope is to expand the capabilities of our platform to integrate diverse data sets and help develop new standards for open access public health data.
We’d love to hear from you and find a way to collaborate! Please send us an email and a member of our team will follow up.
Currently in development, launching early 2021.