is an open knowledge repository hosted by Google that provides a unified view across multiple public datasets, combining economic, scientific and other open datasets into an integrated data graph.[1] The site was launched in May 2018 with an initial dataset consisting of fact-checking data published in "ClaimReview" format by several fact checkers from the International Fact-Checking Network.[2][3] Google has worked with partners including the United States Census, the World Bank, and US Bureau of Labor Statistics to populate the repository,[4] which also hosts data from Wikipedia, the National Oceanic and Atmospheric Administration and the Federal Bureau of Investigation.[5] The service expanded during 2019 to include an RDF-style Knowledge Graph populated from a number of largely statistical open datasets. The service was announced to a wider audience in 2019.[6] In 2020 the service improved its coverage of non-US datasets, while also increasing its coverage of bioinformatics and coronavirus.[7]

Features places more emphasis on statistical data than is common for Linked Data and knowledge graph initiatives. It includes geographical, demographic, weather and real estate data alongside other categories,[1] describing states, Congressional districts, and cities in the United States as well as biological specimens, power plants, and elements of the human genome via the Encyclopedia of DNA Elements (ENCODE) project.[5] It represents data as semantic triples each of which can have its own provenance.[1] It centers on the entity-oriented integration of statistical observations from a variety of public datasets. Although it supports a subset of the W3C SPARQL query language,[8] its APIs[9] also include tools — such as a Pandas dataframe interface — oriented towards data science, statistics and data visualization. is integrative, meaning that, rather than providing a hosting platform for diverse datasets, it attempts to consolidate much of the information the datasets provide into a single data graph.

Technology is built on a graph data-model. The graph can be accessed through a browser interface and several APIs,[1][5] and is expanded through loading data (typically CSV and MCF-based templates).[10] The graph can be accessed by natural language queries in Google Search.[11] The data vocabulary used to define the graph is based upon[1] In particular the terms StatisticalPopulation[12] and Observation[13] were proposed to to support datacommons-like usecases.[14]

Software from the project is available on GitHub under Apache 2 license.[15]


